1 / 14

Recitation for BigData

Recitation for BigData. HW1 preview and Java Review. Jay Gu Jan 10. Outline. HW1 preview Review of java basics An example of gradient descent for linear regression in Java. HW1 Preview. On ~1 million size data. Warm up exercise Stochastic Gradient Descent for Logistic Regression

jaxon
Download Presentation

Recitation for BigData

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recitation for BigData HW1 preview and Java Review Jay Gu Jan 10

  2. Outline • HW1 preview • Review of java basics • An example of gradient descent for linear regression in Java

  3. HW1 Preview On ~1 million size data. • Warm up exercise • Stochastic Gradient Descent for Logistic Regression • SGD with Hashing Kernel • Extra credit: Personalized Logistic Regression

  4. Starter Code • Class for parsing the input file and iterate over the dataset. Dataset dataset = new Dataset(your_path, is_training, size) While(dataset.hasNext()) { DataInstance d = dataset.next(); … some action on d … }

  5. Starter Code public class DataInstance { int clicks; // number of clicks, -1 if it is testing data. int impressions; // number of impressions, -1 if it is testing data. // Feature of the session int depth; // depth of the session. int[] query; // List of token ids in the query field // Feature of the ad …. // Feature of the user …. }

  6. Starter Code public class Weights { double w0; /* * query.get("123") will return the weight for the feature: * "token 123 in the query field". */ Map<Integer, Double> query; Map<Integer, Double> title; Map<Integer, Double> keyword; Map<Integer, Double> description; double wPosition; double wDepth; double wAge; double wGender; }

  7. BigData is often sparse Be as lazy as you can … Update only when necessary…

  8. Avoid O(d): Sparse and lazy update • Although the feature space d is huge, each data point only has a few tokens. • Only update what is changed. • But even so, regularization should be applied to all d weights at each step. • Delay and batch the regularization.

  9. Java Review • Language: Class, Object, variable, method • Data Structure: Java Collections • Array • List : ArrayList • Map: HashMap Not required but good to know: Interface, Inheritance, Access Modifier, I/O,…

  10. Class public class DataInstance { // Feature of the session int[] query …. // Feature of the ad int[] title … DataInstance(String line, … ) { // parse the line, and set the field } public void print() { System.out.println( “title: “); for (int token : title) System.out.print(token + “\t”); } } Members or fields Constructor Method

  11. Object • DataInstance data = new DataInstance(); • int clicked = data.clicked • data.print()

  12. Collections • Array • int[] tokens • double[] weights • ArrayList • ArrayList<DataInstance> • HashMap • HashMap<K, V> Fixed Length, Most compact Dynamically Increasing (double the size every time) Constant time key value look up Dynamically Increasing, use more memory

  13. Variables • “Everything” in Java is an Object • Except for primitive types : int, double • All object variables are reference/pointers to the Object • Function passes variables by value

  14. Example: SGD for linear regression • Demo

More Related