Essential google
Download
1 / 35

Essential Google - PowerPoint PPT Presentation


  • 95 Views
  • Updated On :

Essential Google. by Zhongyuan Wang. Outline. Motivation & Goals Problems Solution : BigTable File System vs Database Google’s Database : Google Base Conclusion & Outlook References. Outline. Motivation & Goals Problems Solution : BigTable File System vs Database

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Essential Google' - prosper


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Essential google l.jpg

Essential Google

by Zhongyuan Wang


Outline l.jpg
Outline

  • Motivation & Goals

  • Problems

  • Solution:BigTable

  • File System vs Database

  • Google’s Database:Google Base

  • Conclusion & Outlook

  • References


Outline3 l.jpg
Outline

  • Motivation & Goals

  • Problems

  • Solution:BigTable

  • File System vs Database

  • Google’s Database:Google Base

  • Conclusion & Outlook

  • References


Motivation l.jpg
Motivation

  • Lots of (semi-)structured data at Google

    • URLs:

      • Contents, crawl metadata, links, anchors, pagerank,…

    • Per-user data:

      • User preference settings, recent queries/search results, …

    • Geographic locations:

      • Physical entities (shops, restaurants, etc.), roads, satellite image data, user annotations, …

  • Scale is large

    • Billions of URLs, many versions/page (~20K/version)

    • Hundreds of millions of users, thousands of q/sec

    • 100TB+ of satellite image data


Goals l.jpg
Goals

  • Want asynchronous processes to be continuously updating different pieces of data

    • Want access to most current data at any time

  • Need to support:

    • Very high read/write rates (millions of ops per second)

    • Efficient scans over all or interesting subsets of data

    • Efficient joins of large one-to-one and one-to-many datasets

  • Often want to examine data changes over time

    • E.g. Contents of a web page over multiple crawls


Outline6 l.jpg
Outline

  • Motivation & Goals

  • Problems

  • Solution:BigTable

  • File System vs Database

  • Google’s Database:Google Base

  • Conclusion & Outlook

  • References


Why not just use commercial db l.jpg
Why not just use commercial DB ?

  • Scale is too large for most commercial databases

  • Even if it weren't, cost would be very high

    • Building internally means system can be applied across many projects for low incremental cost

  • Low-level storage optimizations help performance significantly

    • Much harder to do when running on top of a database layer

However, it’s fun and challenging to build large-scale systems :)


Outline8 l.jpg
Outline

  • Motivation & Goals

  • Problems

  • Solution:BigTable

  • File System vs Database

  • Google’s Database:Google Base

  • Conclusion & Outlook

  • References


Bigtable overview l.jpg
BigTable Overview

There may be hundreds of machines go down every day

  • Distributed multi-level map

    • With an interesting data model

  • Fault-tolerant, persistent

  • Scalable

    • Thousands of servers

    • Terabytes of in-memory data

    • Petabyte of disk-based data

    • Millions of reads/writes per second, efficient scans

  • Self-managing

    • Servers can be added/removed dynamically

    • Servers adjust to load imbalance


Background building blocks l.jpg
Background: Building Blocks

Building blocks:

  • Google File System (GFS): Raw storage

  • Scheduler: schedules jobs onto machines

  • Lock service: distributed lock manager

    • Also can reliably hold tiny files (100s of bytes) w/ high availability

  • MapReduce: simplified large-scale data processing

    BigTable uses of building blocks:

  • GFS: stores persistent state

  • Scheduler: schedules jobs involved in BigTable serving

  • Lock service: master election, location bootstrapping

  • MapReduce: often used to read/write BigTable data


Google file system gfs l.jpg

Replicas

GFS Master

Master

Client

GFS Master

Client

C0

C1

C1

C0

C3

C3

C4

C3

C5

C4

Google File System (GFS)

  • Master manages metadata

  • Data transfers happen directly between clients/chunkservers

  • Files broken into chunks (typically 64 MB)

  • Chunks triplicated across three machines for safety

Chunksevers


Mapreduce easy to use cycles l.jpg
MapReduce: Easy-to-use Cycles

a typical MapReduce computation processes many terabytes of data on thousands of machines

Many Google problems: “Process lots of data to produce other data”

  • Many kinds of inputs:

  • Want to use easily hundreds or thousands of CPUs

  • MapReduce: framework that provides (for certain classes of problems):

    • Automatic & efficient parallelization/distribution

    • Fault-tolerance, I/O scheduling, status/monitoring

    • User writes Map and Reduce functions

  • Heavily used: ~3000 jobs, 1000s of machine days each day

    BigTable can be input and/or output for MapReduce computations


Slide13 l.jpg

“contents”

COLUMNS

ROWS

t1

www.cnn.com

t2

“<html>…”

TIMESTAMPS

t3

Basic Data Model of Bigtable

  • Distributed multi-dimensional sparse map

    (row, column, timestamp) cell contents

  • Good match for most of Google’s applications

BigTable Cell

BigTable


Slide14 l.jpg

Tablets

  • Large tables broken into tablets at row boundaries

    • Tablet holds contiguous range of rows

      • Clients can often choose row keys to achieve locality

    • Aim for ~100MB to 200MB of data per tablet

  • Serving machine responsible for ~100 tablets

    • Fast recovery:

      • 100 machines each pick up 1 tablet from failed machine

    • Fine-grained load balancing

      • Migrate tablets away from overloaded machine

      • Master makes load-balancing decisions


Slide15 l.jpg

“language”

“contents”

aaa.com

cnn.com

EN

“<html>…”

cnn.com/sports.html

TABLETS

Website.com

Zuppa.com/menu.html

Tablets & Splitting


Slide16 l.jpg

“language”

“contents”

aaa.com

cnn.com

EN

“<html>…”

cnn.com/sports.html

TABLETS

Website.com

Yahoo.com/kids.html

Yahoo.com/kids.html?D

Zuppa.com/menu.html

Tablets & Splitting


Slide17 l.jpg

Aggressive prefetching+caching,

Most ops go right to proper machine

META1 tablets

Actual tablet in table T

META0 tablet

Pointerto META0location

Locating Tablets

  • Google’s approach: 3-level hierarchical lookup scheme for tablets

    • Location is ip:port of relevant server

    • 1st level: bootstrapped from lock server, points to owner of META0

    • 2nd level: Uses META0 data to find owner of appropriate META1 tablet

    • 3rd level: META1 table holds locations of tablets of all other tables


Slide18 l.jpg

Shared Logs

  • Designed for 1M tablets, 1000s of tablet servers

    • 1M logs being simultaneously written performs badly

  • Solution: shared logs

    • Write log file per tablet server instead of per tablet

      • Updates for many tablets co-mingled in same file

    • Start new log chunks every so often (64MB)

  • Problem: during recovery, server needs to read log data to apply mutations for a tablet

    • Lots of wasted I/O if lots of machines need to read data for many tablets from same log chunk


Slide19 l.jpg

Shared Log Recovery

Recovery:

  • Servers inform master of log chunks they need to read

  • Master aggregates and orchestrates sorting of needed chunks

    • Assigns log chunks to be sorted to different tablet servers

    • Servers sort chunks by tablet, writes sorted data to local disk

  • Other tablet servers ask master which servers have sorted chunks they need

  • Tablet servers issue direct RPCs to peer tablet servers to read sorted data for its tablets


Slide20 l.jpg

BMDiff

  • Bentley, Mcllroy DCC’99: “Data Compression Using Long Common Strings”

  • Input: dictionary * source

  • Output: sequence of

    • COPY: <x> bytes from offset <y>

    • LITERAL: <literal text>

  • Store hash at every 32-byte aligned boundary in

    • Dictionary

    • Source processed so far

  • For every new source byte

    • Compute incremental hash of last 32 bytes

    • Lookup in hash table

    • On hit, expand match forwards & backwards and emit COPY

  • Encode: ~100MB/s, Decode: ~1000MB/s


Slide21 l.jpg

Zippy

  • LZW-like: Store hash of last four bytes in 16K entry table

  • For every input byte:

    • Compute hash of last four bytes

    • Lookup in table

    • Emit COPY or LITERAL

  • Differences from BMDiff:

    • Much smaller compression window (local repetitions)

    • Hash table is not associative

    • Careful encoding of COPY/LITERAL tags and lengths

  • Sloppy but fast:

    Algorithm % remaining Encoding Decoding

    Gzip 13.4% 21MB/s 118MB/s

    LZW 20.5% 135MB/s 410MB/s

    Zippy 22.2% 172MB/s 409MB/s


System structure l.jpg

Bigtable client

Bigtable cell

Bigtable clientlibrary

Bigtable master

performs metadata ops,load balancing

Open()

Bigtable tablet server

Bigtable tablet server

Bigtable tablet server

serves data

serves data

serves data

Cluster Scheduling Master

GFS

Lock service

handles failover, monitoring

holds tablet data, logs

holds metadata,handles master-election

System Structure


Outline23 l.jpg
Outline

  • Motivation & Goals

  • Problems

  • Solution:BigTable

  • File System vs Database

  • Google’s Database:Google Base

  • Conclusion & Outlook

  • References


Disadvantages of file processing system l.jpg
Disadvantages of File-processing system

  • Data redundancy and inconsistency

  • Difficulty in accessing data

  • Data isolation

  • Integrity problems

  • Atomicity problems

  • Concurrent-access anomalies

  • Security problems

BigTable solve these problems

Fast, fast and fast!!!


Is google file system impeccable l.jpg
Is Google File System Impeccable?

  • The answer is obviously “NO”!

  • From An Introduction to Database System:

    “Data structure is the fundamental difference between database and file system ”

  • A bold prediction:

Google File System will be replaced in five years!


Outline26 l.jpg
Outline

  • Motivation & Goals

  • Problems

  • Solution:BigTable

  • File System vs Database

  • Google’s Database:Google Base

  • Conclusion & Outlook

  • References


What does google say l.jpg
What does Google Say

  • Google Base is Google's database into which you can add all types of content. We'll host your content and make it searchable online for free.

  • Examples of items you can find in Google Base

    -Description of your party planning service

    -Articles on current events from your website

    -Listing of your used car for sale

    -Database of protein structures

  • Google Base will complement existing methods such as Google web crawl and Google Sitemaps


Google base semantic web l.jpg
Google Base & Semantic Web

  • Google Base rival Ebay and Amazon?Or more?

  • the chicken or the egg, who is the first?

  • Anyone, from large companies to website owners and individuals, can use Google Base to submit their content in the form of data items

  • Google Base: A part Semantic Web?


How will google base develop l.jpg
How will Google Base develop?

  • There are several possibilities

    -A successful E-commerce website?

    -A successful online community?

    -A typical semantic web?

    -Suddenly disappear one day?

  • We can only wait and see


Outline31 l.jpg
Outline

  • Motivation & Goals

  • Problems

  • Solution:BigTable

  • File System vs Database

  • Google’s Database:Google Base

  • Conclusion & Outlook

  • References


2006 priorities l.jpg
2006 Priorities

2006 Priorities Investments Key Objectives

R&D and CapEx toimprove search anddevelop new products

Search Quality & End User Traffic

IncreasedTrafficIncreased Ad Revenue

R&D and CapEx toimprove ad relevanceand advertiser tools

Quality of Ads as Perceived by End Users

IncreasedAd Revenue

Building New Products andServices for Publishers of Information

MoreContentfor UsersMore Ways toMonetize

R&D, CapEx and S&M

Growth ofAdSense

Wider Product Distribution

MoreTrafficfor Advertisers

S&M andPartnerships / TAC

Growing our Overall Partnerships

CapEx, R&D and G&A todevelop world classsystems and processes

Continue to ImplementControlsandRisk MgmtProcesses

Building Systemsand Infrastructure of a Global $100B Company


Outline33 l.jpg
Outline

  • Motivation & Goals

  • Problems

  • Solution:BigTable

  • File System vs Database

  • Google’s Database:Google Base

  • Conclusion & Outlook

  • References


References l.jpg
References

  • Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung:The Google File System

  • Jeffrey Dean and Sanjay Ghemawat:MapReduce: Simplified Data Processing on Large Clusters

  • Jeff Dean:BigTable A System for Distributed Structured Storage

  • http://labs.google.com/

  • http://labs.google.com/papers/

  • http://googlebase.blogspot.com/

  • Google base and Semantic Web: http://blog.donews.com/sayonly/archive/2005/10/28/605465.aspx

  • Database System Concepts(Fourth Edition)

  • An Introduction to Database System(Third Edition)



ad