To Learn More or Register: LinuxCon North America | CloudOpen North America
Back To Schedule
Wednesday, August 20 • 11:15am - 12:05pm
Introducing Apache Spark for Distributed Analytics - Will Benton, Red Hat

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Apache Spark is a compute engine for parallel and distributed computing. Spark is resilient to machine failures because each computation encodes its dependencies back to a collection on stable storage, so any intermediate result can be reproduced at any time. However, Spark is also fast because it allows these intermediate results to be cached in cluster memory. Spark also presents a productive programming model with a general, powerful abstraction that supports a wide range of analytical and query tasks.

In this talk, I'll provide a general introduction to Spark. We'll discuss the fundamental abstraction of Spark, the resilient distributed dataset, and examine Spark's rich standard libraries for machine learning, structured query, graph computations, and stream processing. We'll close with a case study showing how Spark made it easy for me to make sense of some real-world data.

Survey this Session   


Will Benton

William Benton works on distributed computing technologies at Red Hat; his recent efforts include working with the Fedora Big Data SIG as a packager and sponsor and contributing to the Spark project. His professional expertise includes research and development in the areas of static... Read More →

Wednesday August 20, 2014 11:15am - 12:05pm CDT

Attendees (0)