python - Is Pandas 0.16.1 groupby().apply() method applying function more than once to the same group? -
this question has answer here:
i have noticed in cases pandas 0.16.1, apply() function on groupby() being applied more once 1 or more of output groups. here reproduction:
in [1]: df2 = dataframe ({"a" : ["alpha", "alpha", "alpha", "beta","beta","beta","beta","gamma"]}) df2 ["b"] = series ([i in range(0,len(df2))]) df2 out [1]: b 0 alpha 0 1 alpha 1 2 alpha 2 3 beta 3 4 beta 4 5 beta 5 6 beta 6 7 gamma 7 in [2]: def my_func (df): print(df.index) in [3]: df2.groupby("a").apply(my_func) out [3]: int64index([0, 1, 2], dtype='int64') int64index([0, 1, 2], dtype='int64') int64index([3, 4, 5, 6], dtype='int64') int64index([7], dtype='int64') notice [0,1,2] index appearing twice in output. seem indicate function applied alpha group twice.
this not huge issue, since it's practice these functions idempotent in first place. however, if functions costly in terms of runtime (think big regression runs, etc.), can more of problem.
am using api incorrectly and/or misinterpreting output, or there possible issue here?
according doc (http://pandas.pydata.org/pandas-docs/dev/generated/pandas.dataframe.apply.html)
in current implementation apply calls func twice on first column/row decide whether can take fast or slow code path.
Comments
Post a Comment