How to Use Causal Inference In Day-to-Day Analytical Work (Part 1 of 2)

Analysts and data scientists operating in the business world are awash in observational data. This is data that’s generated in the course of the operations of the business. This is in contrast to experimental data, where subjects are randomly assigned to different treatment groups, and outcomes are recorded and analyzed (think randomized clinical trials or AB tests).

Experimental data can be expensive or, in some cases, impossible/unethical to collect (e.g., assigning people to smoking vs non-smoking groups). Observational data, on the other hand, are very cheap since they are generated as a side effect of business operations.

Given this cheap abundance of observational data, it is no surprise that ‘interrogating’ this data is a staple of everyday analytical work. And one of the most common interrogation techniques is comparing groups of ‘subjects’ — customers, employees, products, … — on important metrics.

Shoppers who used a “free shipping for orders over $50” coupon spent 14% more than shoppers who didn’t use the coupon.

Products in the front of the store were bought 12% more often than products in the back of the store.

Customers who buy from multiple channels spend 30% more annually than customers who buy from a single channel.

Sales reps in the Western region delivered 9% higher bookings-per-rep than reps in the Eastern region.

Comparisons are very useful and give us insight into how the system (i.e. the business, the organization, the customer base) really works.

And these insights, in turn, suggest things we can do — interventions — to improve outcomes we care about.

Customers who buy from multiple channels spend 30% more annually than customers who buy from a single channel.

30% is a lot! If we could entice single-channel shoppers to buy from a different channel the next time around (perhaps by sending them a coupon that only works for that new channel), maybe they will spend 30% more the following year?

Products in the front of the store were bought 12% more often than products in the back of the store.

Wow! So if we move weakly-selling products from the back of the store to the front, maybe their sales will increase by 12%?

These interventions may have the desired effect if the data on which the original comparison was calculated is experimental (e.g., if a random subset of products had been assigned to the front of the store and we compared their performance to the ones in the back).

But if our data is observational — some products were selected by the retailer to be in the front of the store for business reasons; given a set of channels, some customers self-selected to use a single channel, others used multiple channels— you have to be careful.

Why?

Because comparisons calculated from observational data may not be real. They may NOT be a reflection of how your business really works and acting on them may get you into trouble.

How can we tell if a comparison is trustworthy?  Read the rest of the post on Medium to learn how.

Share/Bookmark