multithreading - Scala parallel frequency calculation using aggregate doesn't work -
i'm learning scala working exercises book "scala impatient". please see following question , answer , code. i'd know if answer correct. code doesn't work (all frequencies 1). where's bug?
q10: harry hacker reads file string , wants use parallel collection update letter frequencies concurrently on portions of string. uses following code:
val frequencies = new scala.collection.mutable.hashmap[char, int] (c <- str.par) frequencies(c) = frequencies.getorelse(c, 0) + 1why terrible idea? how can parallelize computation?
my answer: not idea because if 2 threads concurrently updating same frequency, result undefined.
my code:
def parfrequency(str: string) = {   str.par.aggregate(map[char, int]())((m, c) => { m + (c -> (m.getorelse(c, 0) + 1)) }, _ ++ _) } unit test:
"method parfrequency" should "return frequency of each character in string" in {   val freq = parfrequency("harry hacker")    freq should have size 8    freq('h') should be(2) // fails   freq('a') should be(2)   freq('r') should be(3)   freq('y') should be(1)   freq(' ') should be(1)   freq('c') should be(1)   freq('k') should be(1)   freq('e') should be(1) } edit: after reading this thread, updated code. test works if ran alone, fails if ran suite.
def parfrequency(str: string) = {   val freq = immutablehashmap[char, int]()   str.par.aggregate(freq)((_, c) => immutablehashmap(c -> 1), (m1, m2) => m1.merged(m2)({       case ((k, v1), (_, v2)) => (k, v1 + v2)   })) } edit 2: see solution below.
++  not combine values of identical keys. when merge maps, (for shared keys) 1 of values (which in case 1), not sum of values. 
this works:
def parfrequency(str: string) = {   str.par.aggregate(map[char, int]())((m, c) => { m + (c -> (m.getorelse(c, 0) + 1)) },   (a,b) => b.foldleft(a){case (acc, (k,v))=> acc updated (k, acc.getorelse(k,0) + v) }) }   val freq = parfrequency("harry hacker") //> map(e -> 1, y -> 1, -> 2,   -> 1, c -> 1, h -> 2, r -> 3, k -> 1) the foldleft iterates on 1 of maps, updating other map key/values found.
Comments
Post a Comment