How do you deal with whales in LTV calculations and predictions? From a statistical point of view, they are extreme cases or outliers. Do you recommend including them in the calculations even though they significantly distort the result? What do you suggest when more than 50% of revenue comes from whales?
That is a very relevant topic in F2P games. Of course, it depends on the definition of a “whale”.
Yes, it is often the case that most of the revenue comes from these big spenders. Let’s consider for example that 2% of players decide to make a purchase. That means that 100% of revenue comes from just 2% of the player base. But from these 2%, most of the payers buy only one offer for a few dollars (usually some starter offer). Just a fraction of these payers spend significantly more.
This leads to an extremely skewed distribution of revenue per player. So we cannot work with it like in basic statistics and disregard whales as outliers. If you exclude whales from your LTV calculation, you significantly underestimate your overall LTV.
Every time we calculate LTV we need to be very careful of variance. When predicting the LTV of cohorts we need to make sure that each cohort has a reasonable sample size of spending players. We can have 10 000 players in a cohort but if there are only several payers (players who contribute to the LTV) we can expect a huge variance of error. We can identify this also with big irregular “jumps” in cohort LTV curves. You want to have a smooth curve that shows you the growth rate over time. You can use simulation methods (e.g. Monte Carlo simulation, bootstrapping) to check how robust your measurement is.
This is an issue only for games that monetize mainly by in-app purchases. If you monetize more by ads your LTV curves should be smooth and predictions more accurate because the percentage of players that contribute to your revenue is much higher.
Viktor Gregor, Senior Data Scientist