1 / 7

Experimental Survey on Big Data Frameworks and Comparative Study at Platforms 2024

An experimental survey conducted in Lyon, France, focused on popular Big Data frameworks including Hadoop, Spark, Storm, Samza, and Flink. The study categorized these frameworks, analyzed their features, and evaluated them in batch and stream modes using workloads like kmeans, WordCount, PageRank, and ETL. The study investigated scalability, data partitioning, cluster manager impact, configuration parameters' impact, and resource consumption. The research provided insights into best practices for utilizing Big Data frameworks.

secara
Download Presentation

Experimental Survey on Big Data Frameworks and Comparative Study at Platforms 2024

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Experimental Survey on Big Data An Experimental Survey on Big Data Frameworks Frameworks W. Inoubli, S. Aridhi, H. Mezni, M. Maddouri, E. Mephu Nguifo vendredi 30 août 2024 Les journées Platformes Lyon, France 1

  2. Categorization of popular Big Data frameworks Hadoop Spark Storm Flink Samza Data Format Key-value RDD,DataFram e,DStream Key-value Key-value, DataStream Events Programming mode Batch Batch and Stream Stream Batch and Stream Stream Data sources HDFS HDFS, DBMS and Kafka HDFS, HBase and Kafka Kafka, Kinesis, message queus, socket streams Kafka Programming model Map and Reduce Transformation and Action Topology Transformation MapReduce Pogramming languages Java Java, scala and python Java Java Java Cluster manager YARN, Mesos YARN, Mesos, Standalone Zookeeper YARN ,Standalone, Mesos YARN Comments Stores large data in HDFS Gives several APIs to develop interactive applications Suitable for real-time applications An extension of MapReduce with graph methods Based on Hadoop and Kafka vendredi 30 août 2024 A comparative study of popular Big Data frameworks Les journées Platformes Lyon, France 2

  3. Experimental protocol (1) Batch Mode evaluation  Experimental environment : Galactica  Workload: kmeans, WordCount and PageRank.  Frameworks: Hadoop (Mapreduce), Spark and Flink.  Features: Scalabilty, Configuration parameters. vendredi 30 août 2024 Les journées Platformes Lyon, France 3

  4. Experimental protocol (2) StreamMode evaluation  Workload: ETL Workload  Frameworks: Storm, Spark, Samza and Flink  Features: Number of processed events vendredi 30 août 2024 Les journées Platformes Lyon, France 4

  5. Experimental Study  Scalability  Data partitioning  Impact of the cluster manager  Impact of some configuration parameters  Resources consumption Les journées Platformes Lyon, France 5 vendredi 30 août 2024

  6. Conclusion • An overview of Big Data frameworks (Hadoop, Spark, Storm, Samza and Flink) • Published in the Future Generation Computer Systems • We have identified the features of each framework • An experimental study of the studied frameworks • Best practices vendredi 30 août 2024 Les journées Platformes Lyon, France 6

  7. Thank you for your attention Any questions or recommendations? vendredi 30 août 2024 Les journées Platformes Lyon, France 7

More Related