python - pandas dataframe check specific columns for same values -
is there way check , sum specific dataframe columns same values.
for example in following dataframe
column name 1, 2, 3, 4, 5 ------------- a, g, h, t, j b, a, o, a, g c, j, w, e, q d, b, d, q,
when comparing columns 1 , 2 sum of values same 2 (a , b)
thanks
you can use isin
, sum
achieve this:
in [96]: import pandas pd import io t="""1, 2, 3, 4, 5 a, g, h, t, j b, a, o, a, g c, j, w, e, q d, b, d, q, i""" df = pd.read_csv(io.stringio(t), sep=',\s+') df out[96]: 1 2 3 4 5 0 g h t j 1 b o g 2 c j w e q 3 d b d q in [100]: df['1'].isin(df['2']).sum() out[100]: 2
isin
produce boolean series, calling sum on boolean series converts true
, false
1
, 0
respectively:
in [101]: df['1'].isin(df['2']) out[101]: 0 true 1 true 2 false 3 false name: 1, dtype: bool
edit
to check , count number of values present in columns of interest following work, note dataset there no values present in columns:
in [123]: df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])).all(axis=1).sum() out[123]: 0
breaking above down show each step doing:
in [124]: df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])) out[124]: 1 2 3 4 0 true false false false 1 true true false true 2 true false false false 3 true true true false in [125]: df.ix[:, :'4'].apply(lambda x: x.isin(df['1'])).all(axis=1) out[125]: 0 false 1 false 2 false 3 false dtype: bool
Comments
Post a Comment