Part ll: How we run Spark/Sqoop in production
In the last post, we described our legacy infrastructure and event processing code, along with the key design decisions we made as we architected our new data infrastructure. In this post, we’ll discuss some of the operational details involved with deploying these systems in production and some lessons that we’ve learned along the way. Here, we’ll cover two topics:
- Running Spark jobs
- Running Sqoop imports
Spark in Production
As we described in our last post, we elected to run our own Spark + CDH installation on top of EC2 nodes.