datetime - Python Pandas 0.13: normalizing dates and selecting times from dataframe with certain attributes -
i'm using pandas library came anaconda, using python 2.7.9.
my question two-fold.
i have several data sets have date , time field, unfortunately instrument created them did not consistently label dates, such of them in dd/mm/yyyy format, instrument seemingly randomly left off leading zeroes of month , day half of dates. pandas has had trouble reading them correctly (from excel files), , since dataset starts april 10th, keeps starting @ 2014-10-04, has unconverted dates in between (when day goes above 12) , starts reading them yyyy-mm-dd again when makes sense considering input date. there way force pandas read these dates correctly, , concatenate date , time fields , use index, instead of assigning numbers? tried create , insert converter function date field format dates correctly, reason applied after pandas had read date incorrectly, , formatted incorrectly.
since want index these data time series, doing creating date/time range , setting index dataframe, worked fine. except, data set, there 2 days of data instrument apparently started taking data @ freq of sample per minute, instead of sample every 10 minutes. there way assign index , force keep matching records? failing that, i've been attempting try , query dataframe times minute ends 0, or delete records, no success @ all. have no idea here.
among other things, i've tried:
in[168]: ddata = ddata[str(ddata[' time'])[:5].endswith('0')] traceback (most recent call last): file "c:\users\tom\anaconda\lib\site-packages\ipython\core\interactiveshell.py", line 2883, in run_code exec(code_obj, self.user_global_ns, self.user_ns) file "<ipython-input-156-098b3e02871f>", line 1, in <module> ddata = ddata[str(ddata[' time'])[:5].endswith('0')] file "c:\users\tom\anaconda\lib\site-packages\pandas\core\frame.py", line 1678, in __getitem__ return self._getitem_column(key) file "c:\users\tom\anaconda\lib\site-packages\pandas\core\frame.py", line 1685, in _getitem_column return self._get_item_cache(key) file "c:\users\tom\anaconda\lib\site-packages\pandas\core\generic.py", line 1052, in _get_item_cache values = self._data.get(item) file "c:\users\tom\anaconda\lib\site-packages\pandas\core\internals.py", line 2565, in loc = self.items.get_loc(item) file "c:\users\tom\anaconda\lib\site-packages\pandas\core\index.py", line 1181, in get_loc return self._engine.get_loc(_values_from_object(key)) file "index.pyx", line 129, in pandas.index.indexengine.get_loc (pandas\index.c:3656) file "index.pyx", line 149, in pandas.index.indexengine.get_loc (pandas\index.c:3534) file "hashtable.pyx", line 696, in pandas.hashtable.pyobjecthashtable.get_item (pandas\hashtable.c:11911) file "hashtable.pyx", line 704, in pandas.hashtable.pyobjecthashtable.get_item (pandas\hashtable.c:11864) keyerror: false in[169]: ddata1 = ddata.query('time[4] == 0') traceback (most recent call last): file "c:\users\tom\anaconda\lib\site-packages\ipython\core\interactiveshell.py", line 2883, in run_code exec(code_obj, self.user_global_ns, self.user_ns) file "<ipython-input-166-48cd98cf78bd>", line 1, in <module> ddata1 = ddata.query('time[4] == 0') file "c:\users\tom\anaconda\lib\site-packages\pandas\core\frame.py", line 1816, in query res = self.eval(expr, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\core\frame.py", line 1868, in eval return _eval(expr, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\eval.py", line 235, in eval ret = eng_inst.evaluate() file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\engines.py", line 69, in evaluate self.result_type, self.aligned_axes = _align(self.expr.terms) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\align.py", line 136, in _align typ, axes = _align_core(terms) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\align.py", line 54, in wrapper return _result_type_many(*term_values), none file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\common.py", line 17, in _result_type_many return np.result_type(*arrays_and_dtypes) typeerror: data type not understood in[170]: ddata1 = ddata.query('str(time)[4] == 0') traceback (most recent call last): file "c:\users\tom\anaconda\lib\site-packages\ipython\core\interactiveshell.py", line 2883, in run_code exec(code_obj, self.user_global_ns, self.user_ns) file "<ipython-input-167-452d91f45daf>", line 1, in <module> ddata1 = ddata.query('str(time)[4] == 0') file "c:\users\tom\anaconda\lib\site-packages\pandas\core\frame.py", line 1816, in query res = self.eval(expr, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\core\frame.py", line 1868, in eval return _eval(expr, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\eval.py", line 230, in eval truediv=truediv) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 635, in __init__ self.terms = self.parse() file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 652, in parse return self._visitor.visit(self.expr) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit return visitor(node, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 320, in visit_module return self.visit(expr, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit return visitor(node, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 323, in visit_expr return self.visit(node.value, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit return visitor(node, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 560, in visit_compare return self.visit(binop) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit return visitor(node, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 404, in visit_binop op, op_class, left, right = self._possibly_transform_eq_ne(node) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 355, in _possibly_transform_eq_ne left = self.visit(node.left, side='left') file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit return visitor(node, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 440, in visit_subscript value = self.visit(node.value) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit return visitor(node, **kwargs) file "c:\users\tom\anaconda\lib\site-packages\pandas\computation\expr.py", line 205, in f "implemented".format(node_name)) notimplementederror: 'call' nodes not implemented
i tried on csv linked , seems work me:
df.date = pd.datetools.to_datetime(df.date) df.date.head() out[972]: 0 2014-05-31 1 2014-05-31 2 2014-05-31 3 2014-05-31 4 2014-05-31 name: date, dtype: datetime64[ns] for second part of question, slice dataframe this:
df[df.time.map(lambda x: x.minute % 10 == 0)]
Comments
Post a Comment