1 / 1

spark-scala online training in hyderabad

Spark is Apache product and it is a advanced for Big data Hadoop .Apache spark is one of lightning fast cluster computing technology. Spark is a based on Big data Hadoop Map reduce and it is extends the map reduce model to the efficiently use for number different computations...The main futures of spark is 1) Speed 2) Support the multiple Languages 3) Advanced analytics

naveennunna
Download Presentation

spark-scala online training in hyderabad

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spark scala online training in hyderabad Spark scala online training Module - I. Introduction to Big Data Hadoop and Spark What is Big Data? What are the challenges for processing big data? What technologies support big data? 3V’s of BigData and Growing. What is Hadoop? Why Hadoop and its Use cases History of Hadoop Different Ecosystems of Hadoop. Advantages and Disadvantages of Hadoop Real Life Use Cases MapReduce limitations Spark History Spark Architecture Spark and Hadoop Advantages Benefits of Spark + Hadoop Introduction to Spark Eco-system Module - II. HDFS (Hadoop Distributed File System) HDFS architecture Features of HDFS Where does it fit and Where doesn't fit? HDFS daemons and its functionalities Name Node and its functionality Data Node and its functionality Secondary Name Node and its functionality Data Storage in HDFS Introduction about Blocks Data replication Accessing HDFS CLI(Command Line Interface) and admin commands Java Based Approach Hadoop Administration Hadoop Configuration Files Configuring Hadoop Domains Precedence of Hadoop Configuration Diving into Hadoop Configuration Scheduler RackAwareness Cluster Administration Utilities Rebalancing HDFS DATA Copy Large amount of data from HDFS FSImage and Edit.log file. Module - III. EcoSystems - Installation's Single node, Pseudo-distribution and Multinode Cluster Hadoop Installation Hive Installation Sqoop Installation Spark Installation Cassandra Installation VMware Installation Ubuntu Installation Kafka Installation Zookeeper Installation MongoDB Installation Zeppelin Installation Python Installation Java Installation Scala Installation R Installation Eclipse Installation SBT Installation Maven Installation HBASE Intro to HBASE Intro to NoSQL database Sparse and dense Concept in RDBMS Intro to columnar/column oriented database Core architecture of HBase Why Hbase? HDFS vsHBase Intro to Regions, Region server and Hmaster Limitations of Hbase Integration with Hive and Hbase Hbase commands Use cases for HBASE Module - IV. Introduction to Scala Scala foundation Features of Scala Setup Spark and Scala on Unbuntu and Windows OS Install IDE's for Scala Run Scala Codes on Scala Shell Understanding Data types in Scala Implementing Lazy Values Control Structures Looping Structures Functions Procedures Collections Loop Statements Arrays and Array Buffers Map's, Tuples and Lists Module - V. Object Oriented Programming in Scala Implementing Classes Implementing Getter & Setter Object & Object Private Fields Implementing Nested Classes Using Auxilary Constructor Primary Constructor Companion Object Apply Method Understanding Packages Override Methods Type Checking Access Modifier Casting Abstract Classes Extractors Exception Handling Module - VI. Functional Programming in Scala Understanding Functional programming in Scala Implementing Traits Layered Traits Rich Traits Call By Name Function Function with Named Arguments Function With Variable Argument Recursion Function Default Parameter Values Nested Functions Anonymous Functions Partially Applied Function Higher Order Functions Closures and Currying Performing File Processing Module - VII. Introduction to Data Analysis with Spark What is Apache Spark A Unified Stack - Spark Core, Spark SQL, Spark Streaming, MLib, GraphX, Cluster Manager Basic operations on Shell Spark Java projects Spark Context and Spark Properties Persistence in Spark HDFS data from Spark Module - VIII. Working with Resilient Distributed DataSets (RDD) What is Spark RDDs How RDDs make Spark a feature rich framework Transformations, action and persistence Lazy operations and fault tolerance Load data and create RDD Persist RDD in memory or disk Pair operations and key-value Spark Hadoop Integration Hands on and core concepts of map() transformation. Hands on and core concepts of filter() transformation. Hands on and core concepts of flatMap() transformation. Compare map and flatMap transformation. Understanding RDD Loading data into RDD Scala RDD, Paired RDD, Double RDD & General RDD Functions Implementing HadoopRDD, Filtered RDD, Joined RDD Transformations, Actions and Shared Variables Spark Operations on YARN Sequence File Processing Partitioner and its role in Performance improvement Difference between Map Reduce Key-Value pair and RDD Key-Value pair RDD Lineage Garbage Collector and Memory Management Working with Key-Value Paired RDD RDD Partitions Partitioning of File-based RDDs HDFS and Data Locality All Methods if Transformations and Actions (Every RDD Method will get covered) Module - IX. Loading and Saving The Data File Formats Text Files JSON Comma-Separated Values and Tab-Separated Values Sequence Files Object Files Parquet Files Hadoop Input and Output Formats File Compression Filesystems Local Regular FS HDFS Structured Data with Spark SQL Apache Hive JSON Connectivity with Databases Java Database Connectivity Connectivity with Cassandra Connectivity with Mongo DB Module - X. Running On A Cluster Introduction of a Cluster Manager Spark Runtime Architecture YARN Mesos Amazon The Driver Executors Cluster Manager Launching a Program Deploying Applications with spark-submit Cluster Managers Standalone Cluster Manager Hadoop YARN Apache Mesos Amazon EC2 Which Cluster Manager to Use? Module - XI. Spark SQL What is Spark SQL Features and Data flow Spark SQL architecture and components Hive and Spark together Data frames and loading data Hive Queries through Spark Various DDL and DML operations Caching Loading and Saving Data Apache Hive Parquet JSON From RDD's JDBC/ODBC Server Working with Beeline Long-Lived Tables and Queries User-Defined Functions Spark SQL UDFs Hive UDFs Catalyst Optimizer Various Execution Plans Joins (SQL & Core) DataFrames DataSets Module - XII. Spark Streaming Introduction to Spark Streaming Need for stream analytics Comparison with Storm and S4 Real time data processing using streaming Fault tolerance and check pointing Stateful Stream Processing DStream and window operations Spark Stream execution flow Connection to various source systems Performance optimizations in Spark Spark Streaming Overview-Example: Streaming Word Count. Other Streaming Operations. Sliding Window Operation. Developing Spark Streaming Applications. Architecture and Abstraction Transformations Stateless Transformations Stateful Transformations Output Operations Input Sources Core Sources Additional Sources Multiple Sources and Cluster Sizing 24/7 Operation Checkpointing Driver Fault Tolerance Worker Fault Tolerance Receiver Fault Tolerance Processing Guarantees Streaming UI Performance Considerations Batch and Window Sizes Word Count Socket Streaming and Twitter Example Level of Parallelism Garbage Collection and Memory Usage Module - XIII. Advanced Spark Programming Kafka with Spark Cassandra with Spark Zeppelin with Spark Spark with Python Spark with Java Accumulators Accumulators and Fault Tolerance Custom Accumulators Broadcast Variables Optimizing Broadcasts Working on a Per-Partition Basis Piping to External Programs Numeric RDD Operations Optimizing and Performance Tuning Optimizing Garbage Collection Optimizing Level of Parallelism Understanding the future of optimization - project Tungsten Putting Spark into Production Spark In-Depth Use Cases Apache Spark Developer Cheat Sheet

More Related