5 Things You’re Doing Wrong In Your Data Analysis

The best marketing initiatives are doomed to fail unless you’ve got some strong data analysis going on in the background; a bad marketing campaign suffers even more without a good look at the numbers. Think about it, unless you know exactly what’s going on, you won’t be able to:

1) Effectively measure success

2) Find places for improvement

3) Realize things are going south and fix the issues

I think most marketers these days agree that collecting, interpreting, and reacting to data is the foundation without which their efforts go largely in vain. That said, it’s shocking how poorly informed most marketers are about the proper way to interpret the information they see. You can blame public schools that don’t teach strong stats-based classes, blame the under-emphasis placed on math skills in humanities-based major programs, or just blame people for sucking at math; I prefer to find solutions. So, here are five incredibly common and easily fixed mistakes most marketers are making right now, and how to fix them:

1) Relying on Tired Cliches – Correlation != Causation

Yes, we all know that correlation does not imply causation. We know this because people with an elementary grasp of statistics grab on to this mantra like the last life-vest on the Titanic and hang on for dear life until they eventually freeze to death in the icy cold waters. Because that’s what happens if you grab on to this terribly misunderstood cliche and use it to inform your marketing decisions.

Yes, correlation does not imply causation, and should not be taken as proof of X causing Y. It DOES, however, imply a relationship, and a relationship implies that you had damn well better look into things to see what’s going on. Searching for correlation should be the very first tool you pull out of your analysis toolbox every time you get a new data set. Not because you’re going to base your 6-month budget based on correlation, but because identifying and isolating relationships gives you an idea of where you should focus your more involved investigation. Without checking for correlation, you can spend hours, days, or weeks trying things out and running experiments that don’t get you any closer to your marketing goals.

The Fix: Looking for correlation is to your web data analysis what finding a corner is to assembling a jigsaw puzzle: it lays out some constraints for where you should focus later research. If you notice a pattern and it can be shown to be strongly or moderately correlated (r >= .3 or so), you don’t have proof of causality or a true relationship, but you have a strong POTENTIAL for one. Once you isolate the correlated data, you should conduct a rigorous (or as rigorous as is possible on the web) experiment to determine whether the two (or more) variables are REALLY related, and what causes a change in what. This will focus your testing and analysis, and limit the amount of time and money you waste chasing after results.

2) Data Analysis by Leaps and Bounds – Jumping to Conclusions

Don’t base your final analysis and actions on half-analyzed data. Just don’t. Don’t just look for a pattern and call it a day. This may seem like a contradiction of the last point, but it isn’t if you read both closely. In fact, this is the polar opposite of the last point, and as with most things being at either extreme is very very bad. Too many marketers without a strong background in statistics look for patterns, invariably find some, and then file their final report and call it a day. “Well, we posted three blog entries last week and bounce rates went up! We need to post less often!” This is wrong. Very wrong. Every time you say something like that, Avinash Kaushik punches a kitten. Won’t someone please think of the kittens?

The problem is that many marketers, or at least those without a strong stat/math/science background simply don’t know what to do next. It’s fairly easy to identify a pattern. Unfortunately, it’s just as easy to identify a pattern that either doesn’t actually exist or doesn’t mean anything. It’s even easier to identify a pattern, create some meaning out of whole cloth, build a half-assed action plan around that half-assed analysis, and be out of the office by 4:30.

The Fix: First of all, learn how to calculate Pearson’s Coefficient, that is the level of correlation between two data points. The formula is:

r = (n(Σxy)-(Σx)(Σy))/√([nΣx²-(Σx)²][nΣy²-(Σy)²])

r, the correlation coefficient, is going to be a number between -1 and 1. 0 means no relationship, 1 means the numbers are perfectly positively correlated (as x goes up, y goes up by the same amount), -1 means the numbers are perfectly negatively related, 0 means the numbers are completely random and unrelated. You should not be worried about r values between -.3 and .3. There’s not enough of a relationship there to matter. Once you find something with a significant correlation, your work isn’t done. You need to go and test for causality. Before you can claim that your math means ANYTHING, no matter how strong the correlation, you need to test it to make sure it isn’t simply a coincidence. Don’t even think of composing your final report unless you at least recommend an experiment to test causality. For the kittens.

3) Enough Isn’t Enough!

A lot of novice marketers (and way too many veterans of the field) have difficulty with quantity in their analysis. The problem is one of sample size: it’s very tempting to just look at the last couple of data points and declare a trend. This may be fine and dandy for talking heads, pundits, and gurus who live carefree by the motto “Three points is a trend”. Unfortunately, this isn’t how things work in the real world, and it certainly shouldn’t be how you run your analysis.

Any kind of statistics depends heavily on the idea that you need a minimum number of data points in order to draw conclusions that are meaningful, that is: it can be said with a reasonable amount of certainty that the results are representative of the population at large. Essentially, the smaller the ratio between the number of data points in your sample and the number of data points in the population at large, the more likely your results are to be a fluke and not representative of the real world. This is closely tied to the concept of confidence intervals. The other thing you want to make sure of is that any results or conclusions you draw are the way they are because of a legitimate pattern, and not completely by chance. This is called significance, and is an incredibly important concept in statistics. This is particularly important in determining whether specific events are important or not.

The Fix: You need to familiarize yourself with the idea of sample size, significance, and confidence interval before trying to draw any conclusions from any piece of data. Why? To keep from making boneheaded mistakes. The long and the short of it is that you need to stay away from the tendency to draw trend lines through too few data points. A week-long traffic spike to articles about cats COULD mean that you need to write more articles about cats. Or it could much more likely be a complete fluke and fade a week from next Monday. A conversion optimization that tints your site bright pink could be preferred by 6 customers out of 10, but unless your site gets less that 100 visitors a month, that sample size is so small as to be completely meaningless. It’s just as likely that your site was visited by 6 weirdos who are obsessed with pink as it is that your customers really like the new color scheme. Read up on sound experimental design, or better yet use one of the great testing tools that take human error out of experimental design and implementation.

4) If it’s Not A, it Must Be B!

Repeat after me: You do not EVER prove the null hypothesis. Now back up a second: what is a null hypothesis? In any analysis or experiment, the optimal method for determining anything is to set up a hypothesis. So, let’s say you have an eCommerce site, and you want to know if changing the color of the “Buy Now” button from yellow to red will increase conversions. In order to test this in a rigorous and meaningful way, you can’t simply state your hypothesis, you have to formulate a null hypothesis – a default statement that captures the essence of you being wrong. So in our case, the null hypothesis could be something like: “Changing the ‘Buy Now’ button color will have no effect on conversion rates”(Note to stat-heads: this is intentionally bad). This is the default assumption. So you run a split test, and after thousands of visits, you check the data. Much to your surprise, you realize that there is no statistically significant difference between the two groups of visitors.

TA DA! You have now proven that yellow buttons convert better than red buttons, right? Well, no. No you haven’t. All you’ve proven is that the data doesn’t support rejecting the null hypothesis. It’s a technical difference, but it’s important to note. Why? Because calling it a day limits your future testing. For starters, the null hypothesis is terrible. It’s way too vague, and precludes the possibility that maybe red just wasn’t the optimal color. This highlights the importance of being very clear and very specific in your testing. The bigger reason for thinking that you have proven, or disproven, the null hypothesis is a mental one. It precludes you from the potential conclusion that maybe your experiment was bad, and the results are not to be trusted. Now, I am not suggesting that you become paranoid and reject result after result. What I am saying is that just because you ran an experiment once and got a particular result doesn’t mean that you shelve that line of investigation and move on.

The Fix: Understand that your results, even experimental results, are possibly incorrect. If it’s technically and financially feasible, I strongly recommend re-running old experiments occasionally. It is not beyond the realm of possibility that you just got really unlucky and got a weird batch of subjects. When you claim proof, or disproof, you create a mental block that pushes your thinking down a specific corridor. One that it is incredibly difficult to escape from. Especially when you present your findings to decision makers who are NOT analysts, and who will latch on to results as if they came from the mouth of god himself. Just don’t do it. Give yourself enough wiggle room that you have the freedom to change your answer later without having to patiently explain to your boss why you gave him a contradictory answer a year ago.

5) I’m an Analyst, and If I’m Not Analyzing, I’m Not Working!

Over-interpretation of results is probably the biggest no-no you can commit as an analyst. It’s also one of the easiest mistakes to make, because that’s how the general public is used to seeing statistics presented. I blame poor journalism. How many times have you read pop-sci coverage of a clinical study where the headline blared something ludicrous like: “Eating raw lobster cures cancer!” You get excited, read the news story, and in the very last paragraph, you learn that what the study ACTUALLY found is that a certain tribe of Pacific Islanders who eat raw lobster regularly have a much lower incidence of eyebrow cancer. Well, that’s what analysts across the country do EVERY SINGLE DAY, without the slightest bit of shame. In fact, sometimes it’s much, much worse.

Granted, over-interpreting a study/experiment/data-set is less a specific error, and more some combination of one or more of the other problems we’ve talked about here, but it’s still important to mention on its own. Too many analysts feel like they need to make bold proclamations on a daily basis in order to justify their paycheck, and this is not only hurting their companies and clients, but the very field of data analysis. It also makes life hell for the employees of people who read analyst reports and take them as gospel. I once had a boss who read an article years ago about green being the best color for website buttons. He insisted that EVERY BUTTON EVER needed to be green. Not only did it clash with the company’s color scheme and make the website absolutely hideous, it also ruined any potential benefits of using green as a call-out color since ALL buttons were green. This could have easily been avoided had the article writer simply made a point to emphasize that this was ONE result in ONE situation that did not necessarily apply to everything at all times and in all contexts.

Bloggers and gurus are especially bad about this. In their effort to write headlines that capture eyeballs and links, they are incentiveized to contort data to say things it has no business saying. Don’t help them!

The Fix: You are a data scientist, act like one. Couch your analysis in the terms of scientific inquiry. Things are not certain. Data from one sample doesn’t necessarily apply to another sample or the population at large (“Our sweater store converts GREAT in Minnesota. California has way more people, so we’ll sell even more sweaters if we start selling there!”). A trend line, no matter how validly arrived at, can’t be extended to infinity (“We made $100 selling pogs two years ago, $1000 last year, and $10,000 this year. In 5 years our revenue will be $1,000,000,000!”). Carefully consider the practical significance of any analysis, not just the statistical significance (“Performing this complicated, technical change to the website that will cost thousands will increase our conversion rate by 1%! It’s statistically significant, so we HAVE to do it!”). Remember, you are a scientician! Try to act like one.

5 Things You’re Doing Wrong In Your Data Analysis

1) Relying on Tired Cliches – Correlation != Causation

2) Data Analysis by Leaps and Bounds – Jumping to Conclusions

3) Enough Isn’t Enough!

4) If it’s Not A, it Must Be B!

5) I’m an Analyst, and If I’m Not Analyzing, I’m Not Working!

Related

Trackbacks/Pingbacks

Leave a ReplyCancel reply

Recent Posts

Subscribe

Pin It on Pinterest