I always wonder how apple order apps when you search a keyword. It makes sense that apple search apps by downloads or income, but is there anything else?
I was playing around with iTunes API and wrote a small tool with it (as mentioned here). Now, created some features (as defined in Machine Learning) and ran a T-Test to realize which feature has a correlation with ranking. By this test, the goal was to verify the hypothesis if these features are driving app ranking or not:
Features:
- is app universal (iPhone and iPad)
- Minimum Age permitted to use the app (Rating)
- Rating count for current version
- Release Date (number of seconds)
- Size of app
- Number of languages supported by app
- Release date of current version
- Current version Rating count
- Total rating count
- Average rating for current app (5 stars, 4 stars, …)
- Average rating total
- Minimum version of iOS supported
Then I ran the t-test analysis on the data for 200 apps for the search result of “slideshow”. Here are the test results:
OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.815 Model: OLS Adj. R-squared: 0.804 Method: Least Squares F-statistic: 75.60 Date: Sat, 26 Nov 2016 Prob (F-statistic): 4.15e-63 Time: 23:03:01 Log-Likelihood: -1064.2 No. Observations: 200 AIC: 2150. Df Residuals: 189 BIC: 2187. Df Model: 11 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ x1 -4.8211 8.774 -0.549 0.583 -22.129 12.487 x2 -1.1637 1.725 -0.675 0.501 -4.566 2.239 x3 -0.0048 0.004 -1.214 0.226 -0.013 0.003 x4 -9.449e-08 7.26e-08 -1.302 0.195 -2.38e-07 4.87e-08 x5 -2.732e-08 6.49e-08 -0.421 0.674 -1.55e-07 1.01e-07 x6 -1.0004 0.644 -1.554 0.122 -2.270 0.269 x7 2.485e-07 7.35e-08 3.378 0.001 1.03e-07 3.94e-07 x8 -0.0009 0.000 -2.464 0.015 -0.002 -0.000 x9 -6.4069 3.369 -1.901 0.059 -13.053 0.240 x10 -5.0835 3.873 -1.313 0.191 -12.724 2.557 x11 -10.5053 3.112 -3.376 0.001 -16.643 -4.367 ============================================================================== Omnibus: 9.896 Durbin-Watson: 0.563 Prob(Omnibus): 0.007 Jarque-Bera (JB): 4.634 Skew: -0.094 Prob(JB): 0.0985 Kurtosis: 2.279 Cond. No. 4.92e+09 ==============================================================================
As highlighted in the table above, most of the features are not passing the t-test. But only four of them do pass the t-test:
- Release date of current version
- Total rating count
- Average rating for current version (5 stars, 4 stars, …)
- Minimum version of iOS supported
By running the t-test for only these four features:
OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.807 Model: OLS Adj. R-squared: 0.803 Method: Least Squares F-statistic: 204.4 Date: Sat, 26 Nov 2016 Prob (F-statistic): 9.47e-69 Time: 23:07:07 Log-Likelihood: -1068.5 No. Observations: 200 AIC: 2145. Df Residuals: 196 BIC: 2158. Df Model: 4 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ x1 1.516e-07 1.41e-08 10.744 0.000 1.24e-07 1.79e-07 x2 -0.0011 0.000 -3.541 0.000 -0.002 -0.001 x3 -10.3560 2.382 -4.347 0.000 -15.054 -5.658 x4 -12.0864 2.766 -4.370 0.000 -17.541 -6.631 ============================================================================== Omnibus: 22.168 Durbin-Watson: 0.479 Prob(Omnibus): 0.000 Jarque-Bera (JB): 7.048 Skew: -0.088 Prob(JB): 0.0295 Kurtosis: 2.097 Cond. No. 1.11e+09 ==============================================================================
As you can see in the table above, all these four features are significant in the app rating score. That means the app with following features would show up higher:
- The app that has the most recent update shows up higher
- The app that has the most ratings for all versions
- The app that rated higher (e.g. 5 stars) for the current version available
- The app that supports latest iOS
The script is available in the GitHub.
Hope that helps 🙂