python - how to add a percentage to grouped data? -
i learning pandas , struggling how data organized in module.
i follow tutorial , docs handle basic task: percentages of occurrence of state ('color') within bins ('site'). code below clarifies have , want do:
import pandas pd import random # example of few first entries generated below: # [('site2', 'red'), ('site3', 'red'), ('site1', 'yellow'), ... sites = ['site1', 'site2', 'site3'] colors = ['red', 'blue', 'yellow'] d = [] in range(0,100): s = ( sites[random.randint(0, 2)], colors[random.randint(0, 2)], ) d.append(s) df = pd.dataframe(d) df.columns = ['site', 'color'] grouped = df.groupby(['site', 'color']) p = grouped.size() # whole group print(p) # number of instances of 'blue' in 'site2' print(p['site2']['blue']) # total number of instances 'site2' print(p['site2'].sum())
the output expected: "for given site, show number of events specific color"
site color site1 blue 16 red 11 yellow 6 site2 blue 9 red 12 yellow 12 site3 blue 11 red 7 yellow 16 dtype: int64 9 33
what trying achieve generate new column in grouped data percentage of given color given site. in practical terms, example above
site1 blue 16 48.4 red 11 33.3 yellow 6 18.2 site2 blue 9 27.3 (...)
i have numbers make calculation (the last 2 outputs example), not know how loop though group add calculated percentages.
p = grouped.size()
type series
. somehow enrich calculated percentages?
this can calculated dividing size
sum
on first level of index:
in [38]: grouped.size() / grouped.size().sum(level=0) * 100 out[38]: site color site1 blue 25.714286 red 45.714286 yellow 28.571429 site2 blue 32.432432 red 43.243243 yellow 24.324324 site3 blue 32.142857 red 39.285714 yellow 28.571429 dtype: float64
of course, output above different yours due random input values.
edit
it's more readable pass name of level wish sum by:
in [46]: grouped.size() / grouped.size().sum(level='site') * 100 out[46]: site color site1 blue 25.714286 red 45.714286 yellow 28.571429 site2 blue 32.432432 red 43.243243 yellow 24.324324 site3 blue 32.142857 red 39.285714 yellow 28.571429 dtype: float64
Comments
Post a Comment