logging - What's the proper way to log big data to organize and store it with Hadoop, and query it using Hive? -


so have apps on different platforms sending logging data server. it's node server accepts payload of log entries , saves them respective log files (as write stream buffers, fast), , creates new log file whenever 1 fills up.

the way i'm storing logs 1 file per "endpoint", , each log file consists of space separated values correspond metrics. example, player event log structure might this:

timestamp user mediatype event

and log entry

1433421453 bob iphone play

based off of reading documentation, think format hadoop. way think works, store these logs on server, run cron job periodically moves these files s3. s3, use logs source hadoop cluster using amazon's emr. there, query hive.

does approach make sense? there flaws in logic? how should saving/moving these files around amazon's emr? need concatenate log files 1 giant one?

also, if add metric log in future? mess previous data?

i realize have lot of questions, that's because i'm new big data , need solution. thank time, appreciate it.

if have large volume of log dump changes periodically, approach laid out makes sense. using emrfs, can directly process logs s3 (which know).

as 'append' new log events hive, part files produced. so, dont have concatenate them ahead of loading them hive.

(on day 0, logs in delimited form, loaded hive, part files produced result of various transformations. on subsequent cycles, new events/logs appened part files.)

adding new fields on ongoing basis challenge. can create new data structures/sets , hive tables , join them. joins going slow. so, may want define fillers/placeholders in schema.

if going receive streams of logs (lots of small log files/events) , need run near real time analytics, have @ kinesis.

(also test drive impala. faster)

.. 2c.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - .htaccess mod_rewrite for dynamic url which has domain names -

Website Login Issue developed in magento -