Thank you prof dr gerhard boerner
1 / 18

Thank you Prof. Dr. Gerhard Boerner ! - PowerPoint PPT Presentation

  • Uploaded on

Thank you Prof. Dr. Gerhard Boerner !. Stephen, Thomas, Houjun, Me, Robert Jing. Large Scale Statistics in Internet Behaviors. H ongguang Bi Greetingland , LLC Los Angeles, CA. Chapter 1. Chapter 2. Chapter 3. Internet and WWW History, how it works.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Thank you Prof. Dr. Gerhard Boerner !' - forest

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Thank you prof dr gerhard boerner

Thank you Prof. Dr. Gerhard Boerner !





Large scale statistics in internet behaviors

Large Scale Statistics in Internet Behaviors

Hongguang BiGreetingland, LLCLos Angeles, CA

Chapter 1

Chapter 2

Chapter 3

Internet and WWW History, how it works

Internet User Behaviors & Privacy

Online Advertising

Geo, contextual and behavior targetings, Real-time bidding, Yield management

About Collect User Information, what and how

Chapter 4

Chapter 1 internet and www
Chapter 1: Internet and WWW

Cosmology: Nature defines physical laws

Internet: Human defines laws (or specifically: protocols)

Cosmology: Real World

Internet: Information World, or Virtual World

Cosmology: photons, electrons, neutrinos… (monad? Leibniz)

Internet: bit

Cosmology: particles => stars => galaxies => clusters etc.

Internet: bits => bytes or integers => words => pages & emails

Cosmology: millions of galaxies detected => billions

Internet: millions to billions of users

Cosmology: goal=> structures, statistics of galaxies

Internet: goal=>behaviors, statistics of users

Information Age: Web and Email

WWW: March 1989, Tim Berners-Lee http 0.9: 1995; http 1.0: 1996; http 1.1: June 1999, RFC 2616

Mailbox Protocol: 1971

SMTP: 1982, RFC 821Later developments: UUCP, sendmail,

Http how web works
http, how webworks

Cookie is the only way that server can insert data into user’s browser.

How does it work?

  • User sends request

  • URL Address

  • Browser (Firefox, IE, Mobile etc.)

  • Language, who refers you, etc.

  • Cookies

  • Web server responses

  • Message body

  • Message size, modified time etc.

  • Server information

  • Setup cookies

Client: send request without cookie;

Server: response with a “Set-Cookie” header, containing some informationClient: send request with a “Cookie” header containing the SAME information

Cookie is bound to the specific server, and can be multiple

Chapter 2 user behaviors privacy
Chapter 2: User Behaviors & Privacy

  • 1 Billion internet users: few hundred millions in Europe, 100M in US, China

  • IP4 is full, which is 2^32 = 4.3 Billion addresses

  • Google gets 80 billions views every day, e.g. one internet user visits about 1 Google page very day (e.g. search, email, ad)

  • Internet brings new economics, life styles, and social phenomena. E.g. online shopping, social network (facebook), newspaper and publication, US elections

  • For the 1st time in history, human beings might lose privacy; and their social activities can be tracked, studied, finally, manipulated by powerful players such as US government or Google etc.


  • Currently: “Tracking case”, Apply & GoogleInformation is transmitted securely to the Apply iAd server via a cellular network connection or Wi-Fi Internet conneciton,” explained a letter Apple sent to US Rep Edward Marke, D-Mass., on July 12 in response to his request for information, “The latitude/longitude coordinates are converted immediately by the server to a five-digit ZIP code”.

  • 2008 “Suicide” case, mySpace

  • On the technical side, Credit card industry has successfully built up tracking tools that trackuser behaviors for 20 year!

What kind of private information
What kind of Private Information?

  • May lose, un-protected

  • Demographic information e.g.age, gender, income, household

  • Via ISP, or cellular service provider, social network sites, other Free services

  • You definitely expose

  • Geographic information (via IP)

  • OS and Browser, such as PC, Linux, iPhone

  • Language

  • May lost, protected by laws

  • You name, identity cards (credit card, SSI, driver license etc.)

  • Via online shopping sites, government/university service sites, credit report sites, dating sites etc.

  • practically, still be stolen => virus, spyware, break-in

Chapter 4 collect user information
Chapter 4: Collect User Information

  • Existing Techniques

  • Relational Database

  • Moving averages

  • Artificial neural Network

  • User Profile

  • Uniquely identified by an anonymous ID

  • The ID is tracked by using cookie and permanently saved in disk

  • Every ID has a profile , consisting of geographic information, demographic information, interests, shopping histories, recent behavior types (or, audiences) => any valuable information for advertisers

Relational database
Relational Database

  • A database consists of many “normalized” tables

  • A table consists of a primary key and multiple values

  • One table can have many keys to search

    ResearchGroup: group_id, name, desciption, head

    Member: member_id, group_id, name, type (profession, postdoc, student), status (current, left)

    Left: left_id, member_id, when, where

Moving average
Moving Average

A simplified time-series analysis tool

  • A new value is an average of the last N detections, with weights that decay on time.

Artificial neural network
Artificial Neural Network

Machine learning


3,5 => 15

4,6 => 24

9,8 => 72



6,7 => 41

Neurons work

in parallel

=> very fast

Chapter 5 online advertising
Chapter 5: Online Advertising

The Good side of tracking

The system we are developing
The system we are developing

The Good side of user tracking

Current Challenges

  • server process 10,000requests per second

  • for each request, update user profile with 100 attributes

  • pick up one from 100possible advertiser candidates

  • 10^8 decisions per second

  • 100 million impressions per day

In the future
In the Future

  • Statistics => dynamic, finding rules,clustering analysis,time-series analysis

  • Instant change of behaviors , e.g. shopping intention

  • How are behaviors affected by environment : social effect, “friend-recommendation” effect