Dataflow streaming - does it fit our use case? -


we've been using dataflow in batch mode while now. however, can't seem find info on streaming mode.

we have following use case:

  • data/events being streamed real-time bigquery
  • we need transform/clean/denormalize data before analysis business

now, of course use dataflow in batch mode, , take chucks of data bigquery (based on timestamps), , transform/clean/denormalize way.

but that's bit of messy approach, because data being streamed real-time , real gnarly working out data needs worked on. sounds brittle too.

it great if transform/clean/denormalize in dataflow, , write bigquery as it's streaming in.

is dataflow streaming intended for? if so, data source can dataflow read in streaming mode?

yes, reasonable use case streaming mode. support reading cloud pub/sub via pubsubio source. additional sources in works. output can written bigquery via bigqueryio sink. pcollection docs cover distinction between bounded , unbounded sources/sinks, available concrete implementations.

as apparent lack of streaming-specific documentation, majority of unified model applicable in batch , streaming, there no streaming-specific section. said, i'd recommend looking on windowing , triggers sections of pcollection docs, particularly applicable when dealing unbounded pcollections.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - .htaccess mod_rewrite for dynamic url which has domain names -

Website Login Issue developed in magento -