python - dataframe or sqlctx (sqlcontext) generated "Trying to call a package" error -
i using spark 1.3.1. in pyspark, have created dataframe rdd , registered schema, :
datalen=sqlctx.createdataframe(myrdd, ["id", "size"]) datalen.registertemptable("tbl")
at point fine can make "select" query "tbl", example "select size tbl id='abc'".
then in python function, define :
def getsize(id): total=sqlctx.sql("select size tbl id='" + id + "'") return total.take(1)[0].size
at point still no problem, can getsize("ab")
, return value.
the problem occurred when invoked getsize
within rdd, have rdd named data of (key, value) list, when do
data.map(lambda x: (x[0], getsize("ab"))
this generated error is
py4j.protocol.py4jerror: trying call package
any idea?
spark doesn't support nested actions or transformations , sqlcontext
not accessible outside driver. you're doing here cannot work. not clear want here simple join
, either on rdds
or dataframes
should trick.
Comments
Post a Comment