sorting - python spark sort elements based on value -

May 15, 2014

i new python spark , need help, in advance that!

so here go, have piece of script:

from datetime import datetime pyspark import sparkcontext  def getnormalizeddate(dateofcl):         #the result in [0,1]         dot=datetime.now()         od=datetime.strptime("jan 01 2010", "%b %d %y")          return (float((dateofcl-od).days)/float((dot-od).days))  def addition(a, b):         a1=a         b1=b         if not type(a) float:                 a1=getnormalizeddate(a)         if not type(b) float:                 b1=getnormalizeddate(b)          return float(a1+b1)  def debugfunction(x):         print "x[0]: " + str(type(x[0]))         print "x[1]: " + str(type(x[1])) + " --> " + str(x[1])         return x[1]    if __name__ == '__main__':         sc = sparkcontext("local", "file scores")          textfile = sc.textfile("/data/spark/file.csv")         #print "number of lines: " + str(textfile.count())          test1 = textfile.map(lambda line: line.split(";"))         # result of this:         # [u'01', u'01', u'add', u'filename', u'path', u'1', u'info', u'info2', u'info3', u'sep 24 2014']          test2 = test1.map(lambda line: (line[3], datetime.strptime(line[len(line)-1], "%b %d %y")))          test6=test2.reducebykey(addition)         #print test6         test6.persist()          result=sorted(test6.collect(), key=debugfunction)

this ends error:

traceback (most recent call last):   file "/data/spark/script.py", line 40, in <module>     result=sorted(test6.collect(), key=lambda x:x[1]) typeerror: can't compare datetime.datetime float

for info, test6.collect() gives content

[(u'file1', 0.95606060606060606),  (u'file2', 0.91515151515151516),  (u'file3', 0.8797979797979798),  (u'file4', 0.0),  (u'file5', 0.94696969696969702),  (u'file6', 0.95606060606060606),  (u'file7', 0.98131313131313136),  (u'file8', 0.86161616161616161)]

and want sort based on float value (not key) how should proceed please?

thank guys.

for might interested, found problem. reducing key, , after performing addition of items contained in list of values. of files unique , won't affected reduction, still have date instead of float.

what

test2 = test1.map(lambda line: (line[3], line[len(line)-1])).map(getnormalizeddate)

that make pairs of (file, float)

only then, reduce key

finally, step

result=sorted(test6.collect(), key=lamba x:x[1])

gives me right sorting looking for.

i hope helps!!

Search This Blog

Script

sorting - python spark sort elements based on value -

Comments

Post a Comment

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

javascript - Bootstrap Popover: iOS Safari strange behaviour -

spring cloud - How to configure SpringCloud Eureka instance to point to https on non standard port -