Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
To Learn More or Register: LinuxCon North America | CloudOpen North America
View analytic
Wednesday, August 20 • 11:15am - 12:05pm
Introducing Apache Spark for Distributed Analytics - Will Benton, Red Hat

Sign up or log in to save this to your schedule and see who's attending!

Apache Spark is a compute engine for parallel and distributed computing. Spark is resilient to machine failures because each computation encodes its dependencies back to a collection on stable storage, so any intermediate result can be reproduced at any time. However, Spark is also fast because it allows these intermediate results to be cached in cluster memory. Spark also presents a productive programming model with a general, powerful abstraction that supports a wide range of analytical and query tasks.

In this talk, I'll provide a general introduction to Spark. We'll discuss the fundamental abstraction of Spark, the resilient distributed dataset, and examine Spark's rich standard libraries for machine learning, structured query, graph computations, and stream processing. We'll close with a case study showing how Spark made it easy for me to make sense of some real-world data.

Survey this Session   

Speakers
WB

Will Benton

William Benton works on distributed computing technologies at Red Hat; his recent efforts include working with the Fedora Big Data SIG as a packager and sponsor and contributing to the Spark project. His professional expertise includes research and development in the areas of static program analysis, managed language runtimes, logic databases, cluster management, and music technology. Benton holds a PhD in computer sciences from the University... Read More →


Wednesday August 20, 2014 11:15am - 12:05pm
Colorado

Attendees (25)