join - How do I remove rows of an RDD whose key is not in another RDD? -


let's have pairrdd, students (id, name). keep rows id in rdd, activestudents (id).

the solution have create pairdd activestudents, (id, id), , join students.

is there more elegant way of doing this?

thats pretty solution start with. if active students small enough collect ids map , filter id presence (this avoids having shuffle).


Comments

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Magento/PHP - Get phones on all members in a customer group -

session - Logging Out Using PHP -