App Store Statistical Analysis

I always wonder how apple order apps when you search a keyword. It makes sense that apple search apps by downloads or income, but is there anything else?

I was playing around with iTunes API and wrote a small tool with it (as mentioned here). Now, created some features (as defined in Machine Learning) and ran a T-Test to realize which feature has a correlation with ranking. By this test, the goal was to verify the hypothesis if these features are driving app ranking or not:

 

Features:

  • is app universal (iPhone and iPad)
  • Minimum Age permitted to use the app (Rating)
  • Rating count for current version
  • Release Date (number of seconds)
  • Size of app
  • Number of languages supported by app
  • Release date of current version
  • Current version Rating count
  • Total rating count
  • Average rating for current app (5 stars, 4 stars, …)
  • Average rating total
  • Minimum version of iOS supported

 

Then I ran the t-test analysis on the data for 200 apps for the search result of “slideshow”. Here are the test results:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.815
Model:                            OLS   Adj. R-squared:                  0.804
Method:                 Least Squares   F-statistic:                     75.60
Date:                Sat, 26 Nov 2016   Prob (F-statistic):           4.15e-63
Time:                        23:03:01   Log-Likelihood:                -1064.2
No. Observations:                 200   AIC:                             2150.
Df Residuals:                     189   BIC:                             2187.
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1            -4.8211      8.774     -0.549      0.583       -22.129    12.487
x2            -1.1637      1.725     -0.675      0.501        -4.566     2.239
x3            -0.0048      0.004     -1.214      0.226        -0.013     0.003
x4         -9.449e-08   7.26e-08     -1.302      0.195     -2.38e-07  4.87e-08
x5         -2.732e-08   6.49e-08     -0.421      0.674     -1.55e-07  1.01e-07
x6            -1.0004      0.644     -1.554      0.122        -2.270     0.269
x7          2.485e-07   7.35e-08      3.378      0.001      1.03e-07  3.94e-07
x8            -0.0009      0.000     -2.464      0.015        -0.002    -0.000
x9            -6.4069      3.369     -1.901      0.059       -13.053     0.240
x10           -5.0835      3.873     -1.313      0.191       -12.724     2.557
x11          -10.5053      3.112     -3.376      0.001       -16.643    -4.367
==============================================================================
Omnibus:                        9.896   Durbin-Watson:                   0.563
Prob(Omnibus):                  0.007   Jarque-Bera (JB):                4.634
Skew:                          -0.094   Prob(JB):                       0.0985
Kurtosis:                       2.279   Cond. No.                     4.92e+09
==============================================================================

As highlighted in the table above, most of the features are not passing the t-test. But only four of them do pass the t-test:

  • Release date of current version
  • Total rating count
  • Average rating for current version (5 stars, 4 stars, …)
  • Minimum version of iOS supported

By running the t-test for only these four features:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.807
Model:                            OLS   Adj. R-squared:                  0.803
Method:                 Least Squares   F-statistic:                     204.4
Date:                Sat, 26 Nov 2016   Prob (F-statistic):           9.47e-69
Time:                        23:07:07   Log-Likelihood:                -1068.5
No. Observations:                 200   AIC:                             2145.
Df Residuals:                     196   BIC:                             2158.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1          1.516e-07   1.41e-08     10.744      0.000      1.24e-07  1.79e-07
x2            -0.0011      0.000     -3.541      0.000        -0.002    -0.001
x3           -10.3560      2.382     -4.347      0.000       -15.054    -5.658
x4           -12.0864      2.766     -4.370      0.000       -17.541    -6.631
==============================================================================
Omnibus:                       22.168   Durbin-Watson:                   0.479
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                7.048
Skew:                          -0.088   Prob(JB):                       0.0295
Kurtosis:                       2.097   Cond. No.                     1.11e+09
==============================================================================

 

As you can see in the table above, all these four features are significant in the app rating score. That means the app with following features would show up higher:

  • The app that has the most recent update shows up higher
  • The app that has the most ratings for all versions
  • The app that rated higher (e.g. 5 stars) for the current version available
  • The app that supports latest iOS

 

The script is available in the GitHub.

Hope that helps 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s