Why Most Startups Are Doing Data-Driven Decision Making Wrong

probablypower · on May 16, 2023

This article is really weakened by its failure to resolve the initial example.

What was it in the initial A/B test that resulted in your 10% improvement in conversion rate to be non-statistically significant?

Would applying a simple statistical test to your decision making process have resulted in a different conclusion?

There might be a deeper, useful insight in answering those questions, but as it stands now I really don't see anything valuable in this article. If anything it leads me to believe that the author is susceptible to p-hacking (see: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3204791)

alexb_ · on May 16, 2023

>Notice how ChatGPT didn’t put its thumb in the air and see which way the wind was blowing? Using a proper tool for analysis not only saves time but also adds precision. This example shows how statistical testing is no joke. Don’t rely on gut feeling or surface-level data.

Hold on - you can't just say ChatGPT "adds precision" with nothing to back it up! Acting as if ChatGPT is capable of thinking about your results better than you seems extremely unwise.

The real way that companies are doing Data-Drived Decisions wrong is by only measuring one thing without correctly thinking of other factors. For example, a website might find that having a pop-up that says "SUBSCRIBE TO OUR MAILING LIST!!!" might increase mailing list subscriptions, but you are forgetting the effects that has on other parts of your business like the website being slower, customers being less happy, etc.

camwest · on May 16, 2023

Good point. The Chat GPT example was really only meant as a statsig test in a fun way.

You're totally correct in saying that forgetting the effects on other parts of the business is where a lot of the analysis needs to be directed.

jstx1 · on May 16, 2023

Throwing ChatGPT into it distracts from the overall point you're trying to make.

DonHopkins · on May 18, 2023

Maybe ChatGPT wrote the article, and threw itself in.

tqi · on May 16, 2023

This is a pretty thin article - as far as I can tell entire article boils down to "apply a statistical test" to ab test results?? The four step process laid out doesn't magically protect you from confirmation bias, for example there are ample opportunities for it to manifest while "identifying the ideal data".

The toy example provided is actually a perfect example of why tools (like chat gpt) can't protect you from fucking this up. Why is the view count so different between versions a and b? Was the test set up as a 60-40 split, or had you intended it to be closer to 50-50? Does the massive increase in conversion rate pass the smell test? If anything seems suspicious, you probably should do some further investigations...

acyou · on May 16, 2023

It is often said that we have two (or more) decision making centers in the human body. Mainly, one is in the head and one is in the stomach, in the gut. There is a good case to be made that many of our decisions happen instinctively, intuitively, subconsciously, in advance and under the hood, and our brains come up with logical justifications after the fact and tell us what the decision was.

Given any model, a researcher usually doesn't just run it once and accept the output (unless they happen to agree with the output of the first run). If they get an "unexpected" result, they will either "reassess", meaning that they genuinely have learned something new, or "develop the model", re-tweaking the model structure and parameters until they get an "expected" output.

Most of the value in these data-driven exercises doesn't seem to be deciding between A and B. Any executive can do that (and ask the data scientists to justify the decisions after the fact, as others have said). It's in finding C, which you never even considered and takes you off in a new direction. For that to happen, the human mind and the organization behind it needs to be flexible. And this generation of generative AI certainly seems capable of providing or helping to reach novel insights. Or is it? How can we go off of the beaten track using a tool which is built on top of following the track as much as possible?

I like what you said down lower in your comments about looking at data, "Update your gut with an updated map of the state". That's a great description and I think it's exactly what you want from whatever data tool, AI or whatever.

teaearlgraycold · on May 16, 2023

Can you say more about how your example copy change experiment had a confirmation bias? I’m guessing the issue was it was inconsequential and you happened to see an uptick in the initial hours but it would have evened out over time. But hearing more about that would complete the case study.

camwest · on May 16, 2023

Yeah exactly, it was some sort of seasonality effect that, or just plain randomness.

jameshart · on May 16, 2023

But you showed an example which was significant. That confuses your point somewhat.

It’s also not clear why you think “most” startups are not doing statistical significance tests on their a/b test results

johndhi · on May 16, 2023

I'm not sure I understand your point. Are you saying that in the absence of statistical significance, you should ignore the data you do have?

I tend to agree that data-driven decision making is often deeply confounded and more association-based than causation-based, but I'm not really sure statistical significance is the right way to solve that in every case.

lumost · on May 16, 2023

Not the OP, however Anecdotally, I've seen firms/teams who intrinsically lack data opine that they need data to make any decisions which are uncomfortable.

Data comes in a variety of forms - many of which are intrinsically expensive to acquire. It's ok to acknowledge that you don't have sufficient data, and won't be able to acquire sufficient data in a reasonable time/budget - and that you need to use judgement.

numbsafari · on May 16, 2023

I've been seeing a number of blog posts and things like this that query ChatGPT as way to appeal to authority.

I really don't like it.

apienx · on May 16, 2023

The post could be more informative, but I 100% agree with the sentiment. Confirmation bias and "rain making" are prevalent in the tech industry.

"Rationality is not just about knowing facts. It's about knowing which facts are the most relevant." - Epictetus

game_the0ry · on May 16, 2023

I used to know a guy that had a math phd and worked as a data scientist in a hedge fund in nyc (a big one). When I asked him about how his profession works, he often found that management folks that employ him often come to a business conclusion and then ask him to make that data support that conclusion.

Given that he was paid stupid money, I would be inclined to do the same.

smt88 · on May 16, 2023

> management folks that employ him often come to a business conclusion and then ask him to make that data support that conclusion

This is also how a lot of finance (especially valuation) and all of management consulting work.

game_the0ry · on May 16, 2023

As someone who has worked in finance, I agree.

etothepii · on May 16, 2023

It's very hard to do anything in a data driven way. Where do you start? Where do you stop?

Is it signups that matter? Or DAUs? Or MAUs? Or (and it actually is this one) revenue?

But how do you decide the relationship between each of these? Does an easier on-board mean that you get less "sticky" users? How will you tell?

marginalia_nu · on May 16, 2023

Yeah.

Constructing experiments is hard, and interpreting data is even harder. PhDs, whose job it is to do these things, who typically have something on the order of a decade of education and even more years of practice doing these things, PhDs mess these things up. It happens all the time.

That's not even getting into the fact that in many times in data driven decision-making, an experiment is constructed to motivate a particular decision. If you have a preferred outcome, it's very easy to deliberately or unconsciously put your hand on the scale. Even without deliberate sabotage, there are many methods to apply to a particular dataset, and if you apply enough tools, sure enough one will support your assertion.

More often than not the result is that data driven decisionmaking is more like an elaborate ouija-board that says more about the people creating the experiments than what they purport to verify.

camwest · on May 16, 2023

Agreed it's super hard. I think that's why you need to be careful about looking at data until you've got your hypothesis. Too much data and you'll overfit or introduce bias. I think of it more as a way to update your gut with an updated map of the state.

bcrosby95 · on May 16, 2023

I haven't done this in a while, but somewhere in my notes I have a formula to determine sample size based upon power and the effect we want to detect.

After we hit that size we would analyze the results. If they were ambiguous we just picked the one we subjectively felt was the better choice.

criddell · on May 16, 2023

I'm guessing you would have have an idea of what the right answer was going to be before you ran the analysis. How often was your guess wrong? How often did the data tell you to do something that you were uncomfortable with? Did you ever use your taste or judgement to go against what the algorithm said you should do?

bcrosby95 · on May 16, 2023

We had guesses, but nothing particularly informed. My recollection is that we were right more often than not in our guesses, but it's been a while. We only ran experiments on changes we were willing to make. We never went against conclusive results due to judgement.

We also used data a bit less formally. For example, we had something similar to Google analytics' "funnel" feature of how people arrived at some of our pages, and we noticed a usage pattern that indicated we were missing a feature. When we added the feature directly it was one of our most used features.

Simon_O_Rourke · on May 16, 2023

You mean by saying "we do data-driven decision making", and then the procurement manager agreeing to a multi-million dollar CRM investment because the head of sales says so.

lumb63 · on May 16, 2023

Is a Chi-squared statistical test really appropriate in this situation? I’ve never seen one used in this way; I would’ve used a normal distribution to test the hypothesis. At least, that’s what my college statistics memory tells me to do.

nequo · on May 16, 2023

They are closely related. If X is distributed normal with mean 0 and variance 1, then X^2 is distributed Chi-squared with degrees of freedom 1.

deltree7 · on May 16, 2023

For Data Driven Decisions, you must first have the correct model of the universe (or at least the subset of the universe that you are interested in) that you can feed the data in.

A vast majority of orgs that claim to be data-driven don't have the correct model