Spark is a unified engine designed for large-scale distributed data processing, on premises in data centers or in the cloud. It provides in-memory storage for intermediate computations, making it much faster than MapReduce. Spark also incorporates libraries with composable APIs for machine learning, SQL, stream processing for and graph processing.
This note is only a selective section that covers the basic concepts with some emphasis on streaming. It aims to give an overview instead of all the detailed APIs/implementations.
Side - A case study on the delta lake architecture that uses Spark