Availability: Open source
Written in: Java, Scala, Python, R
Available in: Java, Scala, Python, R, SQL
It runs the tasks 100 times faster than Apache Hadoop due to its fast in-memory big data processing engine, which incorporates Machine Learning abilities
It comprises a set of libraries and Spark Core.
The core is the distributed execution engine and the Java, Scala, and Python APIs offer a platform for distributed ETL application development.
Additional libraries, built on the top of the core, allow diverse workloads for streaming, SQL, and machine learning.
It consists of MLlib that provides a set of machine algorithms involving Classification, Regression, Collaborative Filtering, Clustering, and Dimensionality Reduction.
The ML pipeline package in Spark models a typical machine learning workflow and provides abstractions like Transformer, Estimator, Pipeline & Parameters
Speed
Supports Multiple Languages
Advanced analytics