1 / 50

大规模数据处理 / 云计算 Lecture 2 – "Hello World" in Hadoop

大规模数据处理 / 云计算 Lecture 2 – "Hello World" in Hadoop. 彭波 北京大学信息科学技术学院 7/3/2014 http://net.pku.edu.cn/~course/cs402/. Jimmy Lin University of Maryland. SEWMGroup.

Download Presentation

大规模数据处理 / 云计算 Lecture 2 – "Hello World" in Hadoop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 大规模数据处理/云计算Lecture 2 – "Hello World" in Hadoop 彭波 北京大学信息科学技术学院 7/3/2014 http://net.pku.edu.cn/~course/cs402/ Jimmy Lin University of Maryland SEWMGroup This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

  2. CodeLab1 • 遇到的困难 • 不熟悉java! • 开发和运行环境搭建?(eclipse, hadoop) • guide里面的代码编译报错? • 运行时报错? • 。。。。。。。。。

  3. 貌似pdf里给的代码不能用,点那个“source code here”出来的代码是能用的……呃……不过我跑出来的结果和pdf里的不一样…… • The method setInputPath(Path) is undefined for the type JobConf WordCount/src WordCount.java line 21 1404272734726 310不知道什么原因。。 • 编译通不过 求助 • FileInputPath cannot be resolvedFileOutputPath cannot be resolved这是什么情况。 • Exception in thread "main" java.io.IOException: Cannot run program "chmod": CreateProcess error=2, ?????????我运行的时候报的这个错误

  4. Java Programming for C/C++ Developers

  5. Historical background • The C programming language • early 1970s • UNIX • The C++ programming language • early 1980s • object-oriented • a wide variety of application programming • The Java programming language • early 1990s • originally for consumer electronic devices • enterprise application development

  6. Java SDK • Software Development Kit • a group of command-line tools and packages that you will need to write and run Java programs • base classes (Library)

  7. Working with the SDK • Factorial • input: a value as a command-line argument • output: factorial of that number OR exception • Java Specification • every Java source code file must have the exact same name as the class that is defined inside of it

  8. Execution Environment

  9. Primitive data types • Char • 16 bits • Unicode character set • escape sequences

  10. Primitive data types • integer types • signed • exact size

  11. Primitive data types • The floating-point types • IEEE 754 floating-point values

  12. Primitive data types • The boolean types • true, false

  13. Operators • + is overloaded • If you use the + operator with a String and another operand that is not a String, the other operand is converted into a String

  14. C/C++ functions versus Java methods • In Java terminology, functions are called methods. • Methods can only be declared as members of a class; you can't define a method outside of a Java class

  15. Arrays • objects, so they are declared using the new operator • scores.length • the bracket characters ([ ]) that are used to indicate arrays are bound to the array type, not the array name • java.lang.ArrayIndexOutOfBounds exception

  16. Strings • objects of the String class • String objects are immutable • same string literals • String class has a rich interface

  17. Strings

  18. The main() method • a strict naming convention • first element in the array is the first argument, not the name of the program.

  19. Other differences • Pointers: • Java references are pointers to Java objects • cannot be incremented or decremented • no address of operators • Global variables • no way to declare global variables (or methods) • no struct, union, typedef, enum • Freely placed methods • Garbage collection • no malloc() and free()

  20. Defining a Java class

  21. Defining a Java class • Each member must have its own public or private modifier • You don't use semicolons (;) after the closing brackets in class and method definitions. • The main() method is a member of the class • You call the constructor using the new keyword

  22. access modifiers

  23. access modifiers • public • private • protected • package access

  24. Inheritance • extends • super()

  25. Overloading and overriding

  26. The Object class • All Java classes are ultimately subclasses of class Object • a centrally rooted class hierarchy • usage • toString() • define data structures that take objects of class Object , it can hold any Java object .vs. C++ template

  27. Interfaces • All interfaces are implicitly abstract • All members of an interface are implicitly public • All fields defined in an interface are implicitly static and final • A Java class can extend only one class, but it can implement any number of interfaces • Best practice for polymorphism

  28. more on objects • Inner classes and inner interfaces • Anonymouse classes and objects

  29. Using Library(Java API) • Java API, classes are grouped into packages • you already been using classes from a default package: java.lang when call System.out.println() • import java.util.ArrayList; or java.util.ArrayList<xx> list = ....

  30. Data Structures • java.util.* • java generics

  31. Deploying your application • A Java program is a bunch of classes. • A JAR file is Java Archive • create a manifest.txt state which class has main() method • Main-Class: MyApp • use jar tool to package all classes files and manifest.txt • $jar -cvmf manifest.txt app.jar *.class • $java -jar app.jar

  32. Package • put your classes in packages • java.util, java.net, java.text .... • preface your package with your reverse domain name • setup a matching directory structure

  33. References • 《Java programming for C C++ developers》 • 《Head First Java》

  34. "Hello World" in Hadoop

  35. What is MapReduce? Programming model for expressing distributed computations at a massive scale Execution framework for organizing and performing such computations Open-source implementation called Hadoop 40

  36. Brief History of Hadoop • Hadoop was created by Doug Cutting, the creator of Apache Lucene/Nutch, • 2003, Google published GFS • 2004, Google published MapReduce • 2005, Nutch ported to Mapreduce/HDFS • 2006, Cutting join Yahoo! • 2008.1, Hadoop became top-level project at Apache • 2008.2, Hadoop run on 10000-core cluster

  37. Hadoop Release

  38. New MapReduce API • favors abstract classes over interfaces • new API in org.apache.hadoop.mapreduce, old in org.apache.hadoop.mapred • new Context class • JobConf, OutputCollector,Reporter • new Job class • JobClient • reduce() method passes values • new: java.lang.Iterable, for (VALUEIN value : values) { ... } • old: java.lang.Iterator, hasNext(), next()

  39. Hadoop Streaming & Pipes • Streaming • support any programming language, even shell scripts • uses standard input and output to communicate with the map and reduce code • Pipes • C++ interface to Hadoop MapReduce • uses sockets as the communication channel

  40. Hadoop Command • docs in distribution • api • tutorial • hadoop • -conf xxx

  41. Changping Cluster • 28 Nodes, 12 Cores/48GB RAM/10T DISK • Namenode/JobTracker server - changping11 • ip : 222.29.134.11 • hdfs port : 9000 • mapreduce port: 9001

  42. How to use ChangpingCluster • 1. 添加一个域名解析 • windows: 编辑 C:\WINDOWS\system32\drivers\etc\hosts 文件, • linux : /etc/hosts 添加一行如下: 222.29.134.11 changping11 • 否则运行 job 会报告名字解析错误

  43. How to use ChangpingCluster • 2. 身份设置 • 1). 输出文件统一到 "/cs402/YourName"目录下 • 代码中是:FileOutputFormat.setOutputPath(conf, new Path("/cs402/YourName")); • 2). Mapred Location里设置好hadoop.job.ugi = YourName, cs402 • 用户名和上面文件路径中的名字一致, • 组名必须是 cs402 • 或者在driver程序里直接设置好。 • Configuration conf = new Configuration(); • conf.set("hadoop.job.ugi", "YourName,cs402");

  44. References • Tom White, Hadoop: The Definitive Guide, O'Reilly, 3rd, 2012.5.

  45. Q&A

More Related