1 / 26

An Automatic Advertisement/Topic MODELING AND RECOMMENDING SYSTEM

An Automatic Advertisement/Topic MODELING AND RECOMMENDING SYSTEM. Yi Hou , Center for Clinical Investigation (CCI) EECS Department, Case Western Reserve University. Review. Word matters! -->. Motivation Lack systematic and automatic ADs/Topic categorizing system

barr
Download Presentation

An Automatic Advertisement/Topic MODELING AND RECOMMENDING SYSTEM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Automatic Advertisement/Topic MODELING AND RECOMMENDING SYSTEM Yi Hou, Center for Clinical Investigation (CCI) EECS Department, Case Western Reserve University

  2. Review Word matters! --> • Motivation • Lack systematic and automatic ADs/Topic categorizing system --> no place to specify category • Social Network Platform Popularity revenue from Facebook advertising shoot up 191 percent year-over-year in the first --> quarter of 2014

  3. Tasks • 1) Given all the ADs/topics, establish a word network, where two words share an edge iff they co-occurred in at least one AD/topic and the edge weight is the counting of the times they have occurred together in an AD/topic. • Small world, power law distribution • 2) Given a word network, build a taxonomy T • Modularity based clustering • Top 20 IF-IDF keywords (due to vocabulary issue) • Empirical Network Analysis • 3) Given a user's current texting information. e.g. the most recent few Tweets/Posts (we initiate the value of 10 here), we are trying to build a ranking model R, where each AD will be ranked based on R and the top-10 ADs will be returned to the user.

  4. Data Source • Data Crawling • Twitters stream APIs • ruby gem ”twitterstream” • acquired application-only authentication tokens • set up listening point recording global Tweets • only selected 5 categories of ADs/Topics: Car/Dating/Education/Grocery/Hiring, by keyword filtering • Manually collected data (experimented on) • only selected 5 categories of ADs/Topics: Car/Dating/Education/Grocery/Hiring.

  5. Method • Data Preprocessing • Build word network • Build topic taxonomy • ADs/topic ranking

  6. Data Preprocessing • Remove Stop Words • Such as “is”, “are”, “when” … • List from Stanford NLP lab. • Stemming • Reducing inflected words to their stem, base or root form • Used Porter stemmer at http://text-processing.com/demo/stem/ • e.g. “stemming”  ”stem” • Result • Original: “I like data mining. It is awesome.” • New: ”I like data mine It awesom"

  7. Data Visualization • In total, 1104 unique words, with word cloud representation.

  8. Build Word Network • Co-occurrenceMatrix of Words • co-occurrence counting served as similarity of measurement of word pairs • co-occurrence matrix served as our adjacent matrix • co-occurrence counting served as the edge weight • Coded in C++. • # of nodes: 1104 • # of edges: 18972

  9. Build topic Taxonomy • Modularity-based community finding • The algorithm exhaustively search the graph to maximize the modularity measurement • Heavily connected component signify the topic models • Each cluster/topic described by top-K highest TF-IDF keywords

  10. Modularity-based finding • Modularity • one measure of the structure of networks or graphs • A measure of goodness of division of a network into sub clusters • Q represents the measure of goodness • C represents sets of clusters • eijstands for number of edges between cluster i, j • m represents total number of edges • Reference: • 1. Vincent D Blondel, Jean-Loup Guillaume, RenaudLambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000

  11. How the algorithm works • Start with all vertices initiated as isolated clusters; • Successively join clusters with greatest increase ∆Q for modularity measurement; • Stop the procedure when joining any two clusters will result in ∆Q ≤ 0;

  12. Clustering • We found 13 clusters: • Visualized: different clusters with different colors.

  13. Clustering • We found 13 clusters: • Why not 5? • If we zoom in and look at 2 clusters, yellow and blue, respectively. We can see that they actually both belong to grocery. • So actually modularity based clustering categorize words in a better granularity. (Divided grocery into food/electronics…)

  14. Clustering • Percentage distribution of 13 clusters:

  15. Clustering • Top 20 TF-IDF keywords in each cluster: • Intuitively: • Cluster 1: chevy, ford  cars • Cluster 2: date, single  dating • Cluster 3: lunch, friend  social (new) • Cluster 4: hire, join  hiring • ……. • We observed well-defined clusters. • We observed new categories.

  16. Empirical Network Analysis • Property definitions: • Diameter d: the diameter of a network is the largest geodesic distance in the (connected) network. • Shortest path lu,v: the shortest path between two nodes u and v in the network. • Average shortest path lnetwork: the average shortest path for every pairs of nodes in the network. • Power law distribution: node degree distribution follows a power law, at least asymptotically. • Small world property: small world property holds two conditions, that is, 1) high clustering coefficient(as compared to Erdos-Renyi model) and 2) low average shortest path, which means typical distance l between two random nodes grows proportionally to the logarithm of the number of nodes N in the network, as lnetwork ∝ ln(N).

  17. Empirical Network Analysis • Clustering coefficients definitions: • 1) Global clustering coefficient : • Nt: number of triangles formed in the graph • Nc: connected triple nodes in the graph • 2) Local clustering coefficient : • Directed graph: Undirected graph: • ni: direct neighbors of node I, • nc: direct connections between i’s direct neighbors • Averaged over all nodes: • Reference: “Social network analysis” – by LadaAdamic, University of Michigan

  18. Empirical Network Analysis • In our experiment, we use local clustering coefficient definition(for undirected graph) , here is the statistics of the experiments. • The network satisfies small-world property! • Let’s recall: • Small world property: small world property holds two conditions, that is, 1) high clustering coefficient(as compared to Erdos-Renyi model) and 2) low average shortest path, which means typical distance l between two random nodes grows proportionally to the logarithm of the number of nodes N in the network, as lnetwork ∝ ln(N).

  19. Network Diameter • Betweenness Centrality  Closeness Centrality Eccentricity  Reference: 1. UlrikBrandes, A Faster Algorithm for Betweenness Centrality, in Journal of Mathematical Sociology 25(2):163-177, (2001)

  20. Power Law Distribution • Degree Distribution   In-degree Out-degree 

  21. Power Law Distribution • Degree Distribution   In-degree Out-degree 

  22. Power Law Distribution • The nodes with high degrees satisfy power law distribution. • The nodes with low degrees don’t. • Because of limit of data, 1104 words in total.

  23. Continue work: ranking • FB ranking: assign weights for each features. • But Youtube added randomness to increase recall at the cost of precision. • Reference: • 1, James Davidson, Benjamin Liebald, Junning Liu, PalashNandy, Taylor Van Vleet, UllasGargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, DasarathiSampath: The YouTube video recommendation system. RecSys 2010: 293-296

  24. Continue work: ranking • Our ranking: a combination of FB news ranking and Youtube ranking: • We use cosine similarity to measure which topic cluster the user is most interested in. • We generate top 8 ADs/Topic by FB ranking algorithm. • And we add two more ADs/Topic by random. • Increase the prediction broadness (increase recall), at the cost of precision.

  25. Limitation and Future Work • Will perform the system in larger scale dataset. • Since we don’t have real data, e.g. the performance(CTR) for each AD/topic, we need to generate them based on Gaussian model.

  26. Thank you! Questions?

More Related