numpy - Python Pandas DataFrame If Index Contains Any String Values, Apply Label, Else Apply Different Label -

August 15, 2010

i have dataframe outputs following table. note 'search term' index.

search term                 impressions clicks  cost     converted clicks american brewing            286446      104862  8034.18  6831 american brewing supplies   165235      64764   3916.48  4106 brewing supplies            123598      8131    6941.87  278 wine bottles                272969      7438    4944.7   194 www americanbrewing com     2782        1163    227.17   120 home brewing                216138      3744    3468.24  110 wine making                 147985      6602    5024.54  108

if 'search term' (index) contains 'american brewing' or 'americanbrewing', apply label 'brand', else apply 'non-brand' column header label.

search term                 impressions clicks  cost     converted clicks    label american brewing            286446      104862  8034.18  6831                brand american brewing supplies   165235      64764   3916.48  4106                brand brewing supplies            123598      8131    6941.87  278                 non-brand wine bottles                272969      7438    4944.7   194                 non-brand www americanbrewing com     2782        1163    227.17   120                 brand home brewing                216138      3744    3468.24  110                 non-brand wine making                 147985      6602    5024.54  108                 non-brand

i have seen many examples on stackoverflow this:

df['label'] = df[df['somecolumn'].str.contains('american brewing|americanbrewing')]

but doesn't work because 'somecolumn' df.index , when try like:

df['label'] = df[df.index.str.contains('american brewing|americanbrewing')]

i error attributeerror: 'index' object has no attribute 'str'

i saw examples of using np.where looks promising still run same problem because 'search term' not column, it's index.

df['label'] = np.where(df['search term'].str.contains('american brewing|americanbrewing', 'brand', 'non-brand')

here's full code:

import pandas pd import numpy np  brand_terms = ['american brewing', 'americanbrewing']  data = pd.read_csv(r'sqr.csv', encoding='cp1252')  df = pd.dataframe(data) df['search term'] = df['search term'].replace(r'[^\w&\' ]', '', regex=true) df['cost'] = df['cost'].replace(r'[^\d\.]', '', regex=true).astype('float') #print(df.dtypes) grouped = df.groupby('search term') result = grouped[['impressions', 'clicks', 'cost', 'converted clicks']].sum() result = result.sort(['converted clicks','cost'], ascending=false)  #this doesn't work result['label'] = result.where(result['search term'].str.contains('|'.join(brand_terms), 'brand', 'non-brand'))  result.to_csv('sqr_aggregate.csv')

how output label column in result dataframe based on if search term (index) contains of several possible string values? true, apply brand, else, apply non-brand label column.

if don't want reset index, here's 1 way it.

you convert index series , apply transformations.

in [16]: np.where(pd.series(df.index).str.contains('american brewing|americanbrewing'),                   'brand', 'non-brand') out[16]: array(['brand', 'brand', 'non-brand', 'non-brand', 'brand', 'non-brand',        'non-brand'],       dtype='|s9')

Search This Blog

Script

numpy - Python Pandas DataFrame If Index Contains Any String Values, Apply Label, Else Apply Different Label -

Comments

Post a Comment

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Magento/PHP - Get phones on all members in a customer group -

spring cloud - How to configure SpringCloud Eureka instance to point to https on non standard port -