python - how to add a percentage to grouped data? -


i learning pandas , struggling how data organized in module.

i follow tutorial , docs handle basic task: percentages of occurrence of state ('color') within bins ('site'). code below clarifies have , want do:

import pandas pd import random  # example of few first entries generated below:  # [('site2', 'red'), ('site3', 'red'), ('site1', 'yellow'), ... sites = ['site1', 'site2', 'site3'] colors = ['red', 'blue', 'yellow'] d = [] in range(0,100):     s = (         sites[random.randint(0, 2)],         colors[random.randint(0, 2)],     )     d.append(s)  df = pd.dataframe(d) df.columns = ['site', 'color']  grouped = df.groupby(['site', 'color']) p = grouped.size()  # whole group print(p) # number of instances of 'blue' in 'site2' print(p['site2']['blue']) # total number of instances 'site2' print(p['site2'].sum()) 

the output expected: "for given site, show number of events specific color"

site   color  site1  blue      16        red       11        yellow     6 site2  blue       9        red       12        yellow    12 site3  blue      11        red        7        yellow    16 dtype: int64 9 33 

what trying achieve generate new column in grouped data percentage of given color given site. in practical terms, example above

site1  blue      16 48.4        red       11 33.3        yellow     6 18.2 site2  blue       9 27.3 (...) 

i have numbers make calculation (the last 2 outputs example), not know how loop though group add calculated percentages.

p = grouped.size() type series. somehow enrich calculated percentages?

this can calculated dividing size sum on first level of index:

in [38]:  grouped.size() / grouped.size().sum(level=0) * 100 out[38]: site   color  site1  blue      25.714286        red       45.714286        yellow    28.571429 site2  blue      32.432432        red       43.243243        yellow    24.324324 site3  blue      32.142857        red       39.285714        yellow    28.571429 dtype: float64 

of course, output above different yours due random input values.

edit

it's more readable pass name of level wish sum by:

in [46]:  grouped.size() / grouped.size().sum(level='site') * 100 out[46]: site   color  site1  blue      25.714286        red       45.714286        yellow    28.571429 site2  blue      32.432432        red       43.243243        yellow    24.324324 site3  blue      32.142857        red       39.285714        yellow    28.571429 dtype: float64 

Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -