I was recently on a call with a potential retail client and they voiced their suspicion that the weather had an impact on their sales. "Totally," I thought, "people behave differently in response to different weather. Sure." It certainly wasn't their prime focus, more of a curiosity, but I still wondered how I'd go about investigating this (and producing a meaningful answer, with a quick turn around).
Aggregated sales data would be readily accessible from their POS, and daily weather data is also available, by location, from BOM (hourly would be better, so we could know if weather events happened before or during operating hours -but keeping things simple for now).
Obviously, it's a statistics question and the math nerd in me is keen to get stuck into it. But, of course, before we go there, I have to ponder the "so what?" question:
What would be their return for their investment in such an investigation?
What business actions could this knowledge inform?
Maybe being aware that they were up for a busier day might allow them to schedule an additional staff member. Conversely, for a quiet day, they might put out fewer perishable "widgets" into their display.
The effect of weather (we'll focus on rain, for now) is just one aspect of their broader general understanding of what causes their sales to fluctuate. Each "bit" of information, guides the optimization of their business and reduces "surprises". The immediate value of this particular "bit" would really depend upon the effect size discovered. So a quick, low budget investigation would be the way to go.
I went ahead and generated some dummy data, for us to run through a pretend (but still realistic) investigation.
Let's keep our focus pretty simple at first. In the plot below, each day is a dot.
The "line of best fit" shows a very slight upward trend. We can determine this trend, statistically.
Over the course of a year, average daily sales have increased by 20 (0.0539 * 365 = 19.6). Nice one!
But this upward trend could just be due to there being more or less rainy days (and the corresponding impact on customers), rather than all your hard work!
It's difficult to say if there's a clear difference between rainy and non-rainy days, in the chart above. In the chart below, the rainy distribution certainly seems slightly shifted to the right (indicating more sales).
Below, shows us that a difference is certainly present.
On average, rainy days are better for sales. We can detect if this is "statistically significant" or not by another simple regression.
Yes it is (p.value < 0.05).
If we were going to guess the number of sales, our results would certainly be more accurate if we included our rainy day feature. The trend (from the day_index) has also stayed significant -so we can say that the increasing trend is independent of the rate of rainy days.
It looks like rain adds 34.5 extra sales to a day. We could bring in the standard error measurement to qualify that statement a bit, by saying that, most of the time (~95% of the time), rain adds between 23 and 45 extra sales (34 +/- 1.96*5.71) to a day.
But this "rain addition" might be mitigated by some other factor that we already know, such as the day of the week.
We can see a pretty consistent pattern. The days behave differently to each other; and, within the days, we can see higher value rainy days. Except for Friday.
The following regression shows us the impact.
For these figures, our base estimation is a non-rainy Monday (at the start of our timeline). We can see that the rain still makes a "significant" difference, even given the day of the week and how far along the timeline we are.
A Tuesday usually has 21.9 more sales than a Monday.
A Wednesday is normally 48.7 more than a Monday, ect...
...and, whichever day it happens, rain usually adds ~ 41.2 extra sales.
So now, our guess at the impact of rain has changed from 34.5 to 41.2 extra sales.
But the above model does not take into account the fact that on Fridays, rain has a different impact. We need to add in a weekday/rain "interaction".
Here we can see the significant terms in our model.
The day of the week coefficients have changed slightly. And we can see that rain now adds 56.2 sales for any day that it occurs on. Except for Fridays, where it causes sales to go down by around 43.8 (56.2 - 100).
So, this represents the level of sophistication that I would be comfortable presenting back to a client. This also represents the entirety of the effects within the "dummy" sales data (because I put them in there, myself, when I composed it).
There are more sophisticated models and more interesting "features" out there, to investigate:
- Does the amount of rain (or when it occurs) make a difference?
- Do successive days of rain have an impact?
- Will a busy day affect sales in the following day?
And also, the "what if" questions to ask, such as "What if we call in an additional staff member for rainy days? What are the flow on effects of this?"
The good thing about simple models is that you can very quickly and (relatively) easily get to a clear sense of the effect sizes you are dealing with.
From there, it's easier to determine if increased investigation could possibly result in more profitable business decisions and, thus, be worth the additional investment.
Oftentimes, there are other aspects of the business that also require a similar, fundamental level of visibility.
For those curious, my code can be found below.