Python - Read specific lines in a text file based on a condition -
problem statement:
i have file below.
name | date | count john | 201406 | 1 john | 201410 | 2 mary | 201409 | 180 mary | 201410 | 154 mary | 201411 | 157 mary | 201412 | 153 mary | 201501 | 223 mary | 201502 | 166 mary | 201503 | 163 mary | 201504 | 169 mary | 201505 | 157 tara | 201505 | 2 the file shows count data 3 people john, mary , tara couple of months. analyze data , come status tag each person i.e. active, inactive or new.
a person active if have entries 201505 , other previous months - mary
a person inactive if not have entries 201505 - john
a person new if have 1 entry 201505 - tara.
furthermore, if person active, median of last 5 counts. example, mary, mean ((157 + 169 + 163 + 166 + 223 ) / 5).
question:
i understand how should read file in python 2.7 in order fulfill requirements. started following not sure how previous entries (i.e. previous lines in file) particular person.
for line in data: col = line.split('\t') name = col[0] date = col[1] count = col[2]
import pandas pd: df = pd.read_csv('input_csv.csv') # assumes have csv format file names = {} name, subdf in df.groupby('name'): if name not in names: names[name] = {} if (subdf['date']==201505).any(): if subdf['count'].count()==1: names[name]['status'] = 'new' else: names[name]['status'] = 'active' names[name]['last5median'] = subdf['count'].tail().median() else: names[name]['status'] = 'inactive' >>> {'john': {'status': 'inactive'}, 'mary': {'last5median': 166.0, 'status': 'active'}, 'tara': {'status': 'new'}}
Comments
Post a Comment