An Introduction to Apache Pig - PowerPoint PPT Presentation

semtechs
apache pig n.
Skip this Video
Loading SlideShow in 5 Seconds..
An Introduction to Apache Pig PowerPoint Presentation
Download Presentation
An Introduction to Apache Pig

play fullscreen
1 / 8
Download Presentation
An Introduction to Apache Pig
113 Views
Download Presentation

An Introduction to Apache Pig

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Apache Pig • What is it ? • How does it work ? • Why use it ? • PigLatin Data Types • PigLatin Maths • PigLatin Example www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  2. Pig – What is it ? • A high level language • Used to analyse large data sets • Used to create MapReduce jobs • Abstracts definition of jobs • Uses Pig Latin to define jobs • Less code needed • Compiles to MapReduce code www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  3. Pig – How does it work ? • Three ways to use it • Grunt – Pig's interactive shell • Write Pig Latin in a script file • Embed Pig commands in another language • Run modes • Local mode – single machine • Hadoop – run on a Hadoop/MapReduce cluster • Creates MapReduce code automatically www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  4. Pig – Why use it ? • It is quicker • It is data omnivorous • It is easy to learn • It is widely used • Minor performance loss • Compared to native code • It can be extended via user defined functions ( UDF )‏ www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  5. PigLatin Data Types • Int • Long • Float • Double • Chararray • Bytearray • Tuple • Bag • Map www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  6. PigLatin Maths Some of the built in maths functions • ABS • CEIL • EXP • FLOOR • LOG • ROUND • SIN • TAN www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  7. PigLatin Example Example borrowed from Wikipedia input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray); -- Extract words from each line and put them into a pig bag -- datatype, then flatten the bag to get one word on each row words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word; -- filter out any words that are just white spaces filtered_words = FILTER words BY word MATCHES '\\w+'; -- create a group for each word word_groups = GROUP filtered_words BY word; -- count the entries in each group word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word; -- order the records by count ordered_word_count = ORDER word_count BY count DESC; STORE ordered_word_count INTO '/tmp/number-of-words-on-internet'; www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  8. Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems