How to automate Amazon aws EC2 for scraping -


hi i'd set amazon ec2 instances (multiple) scrape data arbitrary sites. way imagine being set 1 amazon instance that's master programatically set other instances scrape. right have php scripts can scrape way want to, how can set master server to...

1) make other ec2 instances

2) communicate between master server , slave servers

you could build having master launch worker instances when needed, send them scrape requests, terminate them when needed , code orchestration , try make highly available. that's not way this. instead, should take advantage of aws features.

you use combination of sqs , auto scaling groups. master instance add scrape requests sqs queue , have auto scaling group triggered on sqs queue depth launches new worker instances - helps automate launching of workers (scrapers) when workload high , terminate workers when workload low. worker instances pull scrape request sqs queue, scraping work, , repeat.

another way use aws lambda. can trigger lambda functions sqs or sns. have master add scrape requests sqs queue or have master publish requests sns topic, , drive web-scraper lambda function (written in javascript) sqs queue or sns topic.

personally, investigate lambda route first.


Comments

Popular posts from this blog

Magento/PHP - Get phones on all members in a customer group -

php - .htaccess mod_rewrite for dynamic url which has domain names -

Website Login Issue developed in magento -