python - OLS using statsmodel.formula.api versus statsmodel.api -
can explain me difference between ols in statsmodel.formula.api versus ols in statsmodel.api?
using advertising data islr text, ran ols using both, , got different results. compared scikit-learn's linearregression.
import numpy np import pandas pd import statsmodels.formula.api smf import statsmodels.api sm sklearn.linear_model import linearregression df = pd.read_csv("c:\...\advertising.csv") x1 = df.loc[:,['tv']] y1 = df.loc[:,['sales']] print "statsmodel.formula.api method" model1 = smf.ols(formula='sales ~ tv', data=df).fit() print model1.params print "\nstatsmodel.api method" model2 = sm.ols(y1, x1) results = model2.fit() print results.params print "\nsci-kit learn method" model3 = linearregression() model3.fit(x1, y1) print model3.coef_ print model3.intercept_
the output follows:
statsmodel.formula.api method intercept 7.032594 tv 0.047537 dtype: float64 statsmodel.api method tv 0.08325 dtype: float64 sci-kit learn method [[ 0.04753664]] [ 7.03259355]
the statsmodel.api method returns different parameter tv statsmodel.formula.api , scikit-learn methods.
what kind of ols algorithm statsmodel.api running produce different result? have link documentation answer question?
the difference due presence of intercept or not:
- in
statsmodels.formula.api
, r approach, constant automatically added data , intercept in fitted in
statsmodels.api
, have add constant (see the documentation here). try using add_constant statsmodels.apix1 = sm.add_constant(x1)
Comments
Post a Comment