numpy - Python Pandas DataFrame If Index Contains Any String Values, Apply Label, Else Apply Different Label -
i have dataframe outputs following table. note 'search term' index.
search term impressions clicks cost converted clicks american brewing 286446 104862 8034.18 6831 american brewing supplies 165235 64764 3916.48 4106 brewing supplies 123598 8131 6941.87 278 wine bottles 272969 7438 4944.7 194 www americanbrewing com 2782 1163 227.17 120 home brewing 216138 3744 3468.24 110 wine making 147985 6602 5024.54 108
if 'search term' (index) contains 'american brewing'
or 'americanbrewing'
, apply label 'brand'
, else apply 'non-brand'
column header label
.
search term impressions clicks cost converted clicks label american brewing 286446 104862 8034.18 6831 brand american brewing supplies 165235 64764 3916.48 4106 brand brewing supplies 123598 8131 6941.87 278 non-brand wine bottles 272969 7438 4944.7 194 non-brand www americanbrewing com 2782 1163 227.17 120 brand home brewing 216138 3744 3468.24 110 non-brand wine making 147985 6602 5024.54 108 non-brand
i have seen many examples on stackoverflow this:
df['label'] = df[df['somecolumn'].str.contains('american brewing|americanbrewing')]
but doesn't work because 'somecolumn'
df.index
, when try like:
df['label'] = df[df.index.str.contains('american brewing|americanbrewing')]
i error attributeerror: 'index' object has no attribute 'str'
i saw examples of using np.where
looks promising still run same problem because 'search term'
not column, it's index
.
df['label'] = np.where(df['search term'].str.contains('american brewing|americanbrewing', 'brand', 'non-brand')
here's full code:
import pandas pd import numpy np brand_terms = ['american brewing', 'americanbrewing'] data = pd.read_csv(r'sqr.csv', encoding='cp1252') df = pd.dataframe(data) df['search term'] = df['search term'].replace(r'[^\w&\' ]', '', regex=true) df['cost'] = df['cost'].replace(r'[^\d\.]', '', regex=true).astype('float') #print(df.dtypes) grouped = df.groupby('search term') result = grouped[['impressions', 'clicks', 'cost', 'converted clicks']].sum() result = result.sort(['converted clicks','cost'], ascending=false) #this doesn't work result['label'] = result.where(result['search term'].str.contains('|'.join(brand_terms), 'brand', 'non-brand')) result.to_csv('sqr_aggregate.csv')
how output label
column in result
dataframe
based on if search term
(index) contains of several possible string values? true
, apply brand
, else, apply non-brand
label
column.
if don't want reset index, here's 1 way it.
you convert index
series
, apply transformations.
in [16]: np.where(pd.series(df.index).str.contains('american brewing|americanbrewing'), 'brand', 'non-brand') out[16]: array(['brand', 'brand', 'non-brand', 'non-brand', 'brand', 'non-brand', 'non-brand'], dtype='|s9')
Comments
Post a Comment