python - dataframe or sqlctx (sqlcontext) generated "Trying to call a package" error -
i using spark 1.3.1. in pyspark, have created dataframe rdd , registered schema, :
datalen=sqlctx.createdataframe(myrdd, ["id", "size"]) datalen.registertemptable("tbl") at point fine can make "select" query "tbl", example "select size tbl id='abc'".
then in python function, define :
def getsize(id): total=sqlctx.sql("select size tbl id='" + id + "'") return total.take(1)[0].size at point still no problem, can getsize("ab") , return value.
the problem occurred when invoked getsize within rdd, have rdd named data of (key, value) list, when do
data.map(lambda x: (x[0], getsize("ab")) this generated error is
py4j.protocol.py4jerror: trying call package
any idea?
spark doesn't support nested actions or transformations , sqlcontext not accessible outside driver. you're doing here cannot work. not clear want here simple join, either on rdds or dataframes should trick.
Comments
Post a Comment