performance - Python: Efficient split column in pandas DF -
suppose have df contains column of form
0 a.1 1 a.2 2 b.3 3 4.c
and suppose want split columns '.' using element after '.'. naive way
for in range(len(tbl)): tbl['column_name'].iloc[i] = tbl['column_name'].iloc[i].split('.',1)[1]
this works. , it's slow large tables. have idea how speed process? can use new columns in df not restricted changing source column (as reuse in example). thanks!
pandas
has string methods such things efficiently without loops (which kill performance). in case, can use .str.split
:
>> import pandas pd >> df = pd.dataframe({'a': ['a.1', 'a.2', 'b.3', 'c.4']}) >> df 0 a.1 1 a.2 2 b.3 3 c.4 >> df.a.str.split('.').apply(pd.series) 0 1 0 1 1 2 2 b 3 3 c 4
Comments
Post a Comment