Recitation for BigData

Recitation for BigData HW1 preview and Java Review Jay Gu Jan 10

Outline • HW1 preview • Review of java basics • An example of gradient descent for linear regression in Java

HW1 Preview On ~1 million size data. • Warm up exercise • Stochastic Gradient Descent for Logistic Regression • SGD with Hashing Kernel • Extra credit: Personalized Logistic Regression

Starter Code • Class for parsing the input file and iterate over the dataset. Dataset dataset = new Dataset(your_path, is_training, size) While(dataset.hasNext()) { DataInstance d = dataset.next(); … some action on d … }

Starter Code public class DataInstance { int clicks; // number of clicks, -1 if it is testing data. int impressions; // number of impressions, -1 if it is testing data. // Feature of the session int depth; // depth of the session. int[] query; // List of token ids in the query field // Feature of the ad …. // Feature of the user …. }

Starter Code public class Weights { double w0; /* * query.get("123") will return the weight for the feature: * "token 123 in the query field". */ Map<Integer, Double> query; Map<Integer, Double> title; Map<Integer, Double> keyword; Map<Integer, Double> description; double wPosition; double wDepth; double wAge; double wGender; }

BigData is often sparse Be as lazy as you can … Update only when necessary…

Avoid O(d): Sparse and lazy update • Although the feature space d is huge, each data point only has a few tokens. • Only update what is changed. • But even so, regularization should be applied to all d weights at each step. • Delay and batch the regularization.

Java Review • Language: Class, Object, variable, method • Data Structure: Java Collections • Array • List : ArrayList • Map: HashMap Not required but good to know: Interface, Inheritance, Access Modifier, I/O,…

Class public class DataInstance { // Feature of the session int[] query …. // Feature of the ad int[] title … DataInstance(String line, … ) { // parse the line, and set the field } public void print() { System.out.println( “title: “); for (int token : title) System.out.print(token + “\t”); } } Members or fields Constructor Method

Object • DataInstance data = new DataInstance(); • int clicked = data.clicked • data.print()

Collections • Array • int[] tokens • double[] weights • ArrayList • ArrayList<DataInstance> • HashMap • HashMap<K, V> Fixed Length, Most compact Dynamically Increasing (double the size every time) Constant time key value look up Dynamically Increasing, use more memory

Variables • “Everything” in Java is an Object • Except for primitive types : int, double • All object variables are reference/pointers to the Object • Function passes variables by value

Example: SGD for linear regression • Demo

Recitation for BigData

Recitation for BigData

Presentation Transcript

SALAAH RECITATION

Malloc Recitation

Basic Arabic for Qur’an Recitation

Recitation 8

Recitation

Sonnet Recitation!

Recitation4 for BigData

Recitation 07

Recitation4 for BigData

bigdata™

Recitation

Recitation 4

Recitation 8

BigData Hadoop Online Training

Bigdata

BigData World Congress

BigData - NoSQL Hadoop - Couchbase

Hybris – cloud - bigdata

Recitation 12