python - OLS using statsmodel.formula.api versus statsmodel.api -


can explain me difference between ols in statsmodel.formula.api versus ols in statsmodel.api?

using advertising data islr text, ran ols using both, , got different results. compared scikit-learn's linearregression.

import numpy np import pandas pd import statsmodels.formula.api smf import statsmodels.api sm sklearn.linear_model import linearregression  df = pd.read_csv("c:\...\advertising.csv")  x1 = df.loc[:,['tv']] y1 = df.loc[:,['sales']]  print "statsmodel.formula.api method" model1 = smf.ols(formula='sales ~ tv', data=df).fit() print model1.params  print "\nstatsmodel.api method" model2 = sm.ols(y1, x1) results = model2.fit() print results.params  print "\nsci-kit learn method" model3 = linearregression() model3.fit(x1, y1) print model3.coef_ print model3.intercept_ 

the output follows:

statsmodel.formula.api method intercept    7.032594 tv           0.047537 dtype: float64  statsmodel.api method tv    0.08325 dtype: float64  sci-kit learn method [[ 0.04753664]] [ 7.03259355] 

the statsmodel.api method returns different parameter tv statsmodel.formula.api , scikit-learn methods.

what kind of ols algorithm statsmodel.api running produce different result? have link documentation answer question?

the difference due presence of intercept or not:

  • in statsmodels.formula.api, r approach, constant automatically added data , intercept in fitted
  • in statsmodels.api, have add constant (see the documentation here). try using add_constant statsmodels.api

    x1 = sm.add_constant(x1) 

Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -