Google Cloud Dataflow: Can't parse proto using TextIO.Read -


here's code

pcollection<myproto> pcollection = p.apply(textio.read.from(             "gs://my_bucket/*")             .withcoder(proto2coder.of(myproto.class))); 

but fails error

caused by: com.google.protobuf.invalidprotocolbufferexception: protocol message contained invalid tag (zero). 

the file when downloaded locally parses fine.

i've tried same thing using stringutf8coder , bytearraycoder, no dice.

any help? should not using textio? other options have?

textio splits file lines , applies coder each line. naturally, doesn't work formats not line-based. suppose files contain single serialized proto each, correct? in case have 2 options:

  • create own source , reader classes (see generic documentation on creating sources , sinks) subclassing filebasedformat.
  • treat act of processing files pardo - create in-memory pcollection containing filenames process (using create.of()) , pipe through pardo takes filename , parses file protobuf; pipe rest of pipeline.

the second easier first work better if have lot of files.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - Bypass Geo Redirect for specific directories -

php - .htaccess mod_rewrite for dynamic url which has domain names -