join - How do I remove rows of an RDD whose key is not in another RDD? -


let's have pairrdd, students (id, name). keep rows id in rdd, activestudents (id).

the solution have create pairdd activestudents, (id, id), , join students.

is there more elegant way of doing this?

thats pretty solution start with. if active students small enough collect ids map , filter id presence (this avoids having shuffle).


Comments

Popular posts from this blog

javascript - Bootstrap Popover: iOS Safari strange behaviour -

Website Login Issue developed in magento -

Can the constants be defined inside a model file of a framework in PHP? -