0 likes | 5 Views
An experimental survey conducted in Lyon, France, focused on popular Big Data frameworks including Hadoop, Spark, Storm, Samza, and Flink. The study categorized these frameworks, analyzed their features, and evaluated them in batch and stream modes using workloads like kmeans, WordCount, PageRank, and ETL. The study investigated scalability, data partitioning, cluster manager impact, configuration parameters' impact, and resource consumption. The research provided insights into best practices for utilizing Big Data frameworks.
E N D
An Experimental Survey on Big Data An Experimental Survey on Big Data Frameworks Frameworks W. Inoubli, S. Aridhi, H. Mezni, M. Maddouri, E. Mephu Nguifo vendredi 30 août 2024 Les journées Platformes Lyon, France 1
Categorization of popular Big Data frameworks Hadoop Spark Storm Flink Samza Data Format Key-value RDD,DataFram e,DStream Key-value Key-value, DataStream Events Programming mode Batch Batch and Stream Stream Batch and Stream Stream Data sources HDFS HDFS, DBMS and Kafka HDFS, HBase and Kafka Kafka, Kinesis, message queus, socket streams Kafka Programming model Map and Reduce Transformation and Action Topology Transformation MapReduce Pogramming languages Java Java, scala and python Java Java Java Cluster manager YARN, Mesos YARN, Mesos, Standalone Zookeeper YARN ,Standalone, Mesos YARN Comments Stores large data in HDFS Gives several APIs to develop interactive applications Suitable for real-time applications An extension of MapReduce with graph methods Based on Hadoop and Kafka vendredi 30 août 2024 A comparative study of popular Big Data frameworks Les journées Platformes Lyon, France 2
Experimental protocol (1) Batch Mode evaluation Experimental environment : Galactica Workload: kmeans, WordCount and PageRank. Frameworks: Hadoop (Mapreduce), Spark and Flink. Features: Scalabilty, Configuration parameters. vendredi 30 août 2024 Les journées Platformes Lyon, France 3
Experimental protocol (2) StreamMode evaluation Workload: ETL Workload Frameworks: Storm, Spark, Samza and Flink Features: Number of processed events vendredi 30 août 2024 Les journées Platformes Lyon, France 4
Experimental Study Scalability Data partitioning Impact of the cluster manager Impact of some configuration parameters Resources consumption Les journées Platformes Lyon, France 5 vendredi 30 août 2024
Conclusion • An overview of Big Data frameworks (Hadoop, Spark, Storm, Samza and Flink) • Published in the Future Generation Computer Systems • We have identified the features of each framework • An experimental study of the studied frameworks • Best practices vendredi 30 août 2024 Les journées Platformes Lyon, France 6
Thank you for your attention Any questions or recommendations? vendredi 30 août 2024 Les journées Platformes Lyon, France 7