Spark also makes it possible to write code more quickly as you have over 80 high-level operators at your disposal. Last year, Spark took over Hadoop by completing the 100 TB Daytona GraySort contest 3x faster on one tenth the number of machines and it also became the fastest open source engine for sorting a petabyte. Spark lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop. Spark provides a faster and more general data processing platform. It has a thriving open-source community and is the most active Apache project at the moment. Spark is an Apache project advertised as “lightning fast cluster computing”. It contains information from the Apache Spark website as well as the book Learning Spark - Lightning-Fast Big Data Analysis. This article provides an introduction to Spark including use cases and examples. Indeed, Spark is a technology well worth taking note of and learning about. According to the Spark FAQ, the largest known cluster has over 8000 nodes. Today, Spark is being adopted by major players like Amazon, eBay, and Yahoo! Many organizations run Spark on clusters with thousands of nodes. I highly recommend it for any aspiring Spark developers looking for a place to get started. This turned out to be a great way to get further introduced to Spark concepts and programming. ![]() Some time later, I did a fun data science project trying to predict survival on the Titanic. I first heard of Spark in late 2013 when I became interested in Scala, the language in which Spark is written.
0 Comments
Leave a Reply. |