python - Pandas set value in dataframe, multithreading - loc vs. set_value -
i'm seeing strange behavior updating dataframe in multi-threaded environment. i'm updating cell cell, using lock 1 process accesses dataframe @ same time. it's part of large application in nutshell what's going on, df dataframe within large class (self):
def update_data(self, idx): self.update_cell(idx, 'a', 0.5*(self.df.at[idx,'b']+self.df.at[idx,'c'])) print self.df.at[idx,'a'] print self.df.loc[idx,'a'] def update_cell(self,idx,col,value): self.lock.acquire() # version 1: self.df.loc[idx,col] = value # version 2: self.df.at[idx,col] = value # version 3: self.df.set_value(idx,col,value) self.lock.release()
now - no matter version use, first print statement works , gives right value. second print statement fails (returns pandas.np.nan) in except version 1. looks version 1 updates dataframe.
thoughts? thanks,
answering own question: using pandas 0.17, no longer issue , things work expected. believe using pandas 0.14 @ time of posting initial question. fwiw, .at markedly faster .loc ends being significant improvement.
Comments
Post a Comment