Future directions in computer science
Download
1 / 72

Future Directions in Computer Science - PowerPoint PPT Presentation


  • 409 Views
  • Updated On :

Future Directions in Computer Science. John Hopcroft Cornell University Ithaca, New York. Time of change. There is a fundamental revolution taking place that is changing all aspects of our lives.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Future Directions in Computer Science' - sarah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Future directions in computer science l.jpg

Future Directions in Computer Science

John Hopcroft

Cornell University

Ithaca, New York


Time of change l.jpg
Time of change

  • There is a fundamental revolution taking place that is changing all aspects of our lives.

  • Those individuals who recognize this and position themselves for the future will benefit enormously.


Drivers of change in computer science l.jpg
Drivers of change in Computer Science

  • Computers are becoming ubiquitous

  • Speed sufficient for word processing, email, chat and spreadsheets

  • Merging of computing and communications

  • Data available in digital form

  • Devices being networked


Computer science departments are beginning to develop courses that cover underlying theory for l.jpg
Computer Science departments are beginning to develop courses that cover underlying theory for

  • Random graphs

  • Phase transitions

  • Giant components

  • Spectral analysis

  • Small world phenomena

  • Grown graphs


What is the theory needed to support the future l.jpg
What is the theory needed to support the future? courses that cover underlying theory for

  • Search

  • Networks and sensors

  • Large amounts of noisy, high dimensional data

  • Integration of systems and sources of information


Internet queries are changing l.jpg

Today courses that cover underlying theory for

Autos

Graph theory

Colleges, universities

Computer science

Tomorrow

Which car should I buy?

Construct an annotated bibliography on graph theory

Where should I go to college?

How did the field of CS develop?

Internet queries are changing


What car should i buy l.jpg
What car should I buy? courses that cover underlying theory for

  • List of makes

    • Cost

    • Reliability

    • Fuel economy

    • Crash safety

  • Pertinent articles

    • Consumer reports

    • Car and driver


Where should i go to college l.jpg
Where should I go to college? courses that cover underlying theory for

  • List of factors that might influence choice

    • Cost

    • Geographic location

    • Size

    • Type of institution

  • Metrics

    • Ranking of programs

    • Student faculty ratios

    • Graduates from your high school/neighborhood


How did field develop l.jpg
How did field develop? courses that cover underlying theory for

  • From ISI database create set of important papers in field?

  • Extract important authors

    • author of key paper

    • author of several important papers

    • thesis advisor of Ph.D.’s student(s) who is (are) important author(s)

  • Extract key institutions

    • institution of important author(s)

    • Ph.D. granting institution of important authors

  • Output

    • Directed graph of key papers

    • Flow of Ph.D.’s geographically with time

    • Flow of ideas


  • Theory to support new directions l.jpg

    Large graphs courses that cover underlying theory for

    Spectral analysis

    High dimensions and dimension reduction

    Clustering

    Collaborative filtering

    Extracting signal from noise

    Theory to supportnew directions


    Graph theory of the 50 s l.jpg
    Graph Theory of the 50’s courses that cover underlying theory for

    Theorem: A graph is planar if it does not contain a Kuratowski subgraph as a contraction.


    Theory of large graphs l.jpg
    Theory of Large Graphs courses that cover underlying theory for

    Large graphs

    • Billion vertices

    • Exact edges present not critical

      Theoretical basis for study of large graphs

    • Maybe theory of graph generation

    • Invariant to small changes in definition

    • Must be able to prove basic theorems


    Erd s renyi l.jpg
    Erdös-Renyi courses that cover underlying theory for

    • n vertices

    • each of n2 potential edges is present with independent probability

    N

    n

    pn (1-p)N-n

    numberofvertices

    vertex degree

    binomial degree distribution


    Generative models for graphs l.jpg
    Generative models for graphs courses that cover underlying theory for

    • Vertices and edges added at each unit of time

    • Rule to determine where to place edges

      • Uniform probability

      • Preferential attachment - gives rise to power law degree distributions


    Slide23 l.jpg

    Preferential attachment gives rise to the power law degree distribution common in many graphs

    Number

    of

    vertices

    Vertex degree


    Slide24 l.jpg

    Protein interactions distribution common in many graphs

    2730 proteins in data base

    3602 interactions between proteins

    Science 1999 July 30; 285:751-753


    Giant component l.jpg
    Giant Component distribution common in many graphs

    • Create n isolated vertices

    • Add Edges randomly one by one

    • Compute number of connected components


    Giant component26 l.jpg
    Giant Component distribution common in many graphs


    Giant component27 l.jpg
    Giant Component distribution common in many graphs


    Slide28 l.jpg

    Tubes distribution common in many graphs

    In

    44 million

    SCC

    56 million nodes

    Out

    44 million

    Tendrils

    Tendrils

    Disconnected components

    Source: almaden.ibm.com


    Phase transitions l.jpg
    Phase transitions distribution common in many graphs

    • G(n,p)

      • Emergence of cycle

      • Giant component

      • Connected graph

    • N(p)

      • Emergence of arithmetic sequence

    • CNF satisfiability

      • Fix number of variables, increase number of clauses


    Access to information smart technology l.jpg

    document distribution common in many graphs

    Access to InformationSMART Technology

    aardvark 0

    abacus 0

    .

    .

    .

    antitrust 42

    .

    .

    .

    CEO 17

    .

    .

    .

    microsoft 61

    .

    .

    .

    windows 14

    wine 0

    wing 0

    winner 3

    winter 0

    .

    .

    .

    zoo 0

    zoology 0

    Zurich 0


    Locating relevant documents l.jpg
    Locating relevant documents distribution common in many graphs

    Query: Where can I get information on gates?

    2,060,000 hits

    Bill Gates 593,000

    Gates county 177,000

    baby gates 170,000

    gates of heaven 169,000

    automatic gates 83,000

    fences and gates 43,000

    Boolean gates 19,000


    Clustering documents l.jpg
    Clustering documents distribution common in many graphs

    cluster documents

    refine cluster

    gates

    Gates

    County

    Bill Gates

    automatic

    gates

    Boolean

    gates

    microsoft

    windows

    antitrust


    Refinement of another type books l.jpg

    mathematics distribution common in many graphs

    physics

    chemistry

    astronomy

    children’s books

    textbooks

    reference books

    general population

    Refinement of another type: books


    Slide34 l.jpg

    Volume of cube is one in all dimensions distribution common in many graphs

    Volume of sphere goes to zero

    High Dimensions

    Intuition from two and three dimensions not valid for high dimension


    Slide35 l.jpg

    1 distribution common in many graphs

    Unit sphere

    Unit square

    2 Dimensions


    Slide36 l.jpg

    4 Dimensions distribution common in many graphs


    Slide37 l.jpg

    1 distribution common in many graphs

    d Dimensions




    High dimensional data is inherently unstable l.jpg
    High dimensional data is inherently unstable dimensional space

    • Given n random points in d dimensional space essentially all n2 distances are equal.


    Gaussian distribution l.jpg
    Gaussian distribution dimensional space

    Probability mass concentrated between dotted lines


    Gaussian in high dimensions l.jpg
    Gaussian in high dimensions dimensional space


    Two gaussians l.jpg
    Two Gaussians dimensional space


    Distance between two random points from same gaussian l.jpg
    Distance between two random points from same Gaussian dimensional space

    • Points on thin annulus of radius

    • Approximate by sphere of radius

    • Average distance between two points is

      (Place one pt at N. Pole other at random. Almost surely second point near the equator.)


    Expected distance between pts from two gaussians separated by l.jpg

    Expected distance between pts from two Gaussians separated by δ

    Rotate axis so first point is perpendicular to screen

    All other points close to plane of screen






    Dimension reduction l.jpg
    Dimension reduction by

    • Project points onto subspace containing centers of Gaussians

    • Reduce dimension from d to k, the number of Gaussians


    Slide54 l.jpg

    Centers retain separation by

    Average distance between points reduced

    by


    Can separate gaussians provided l.jpg
    Can separate Gaussians provided by

    > some constant involving k and γ independent of the dimension


    Slide56 l.jpg



    Slide58 l.jpg

    15% by

    restart


    Slide59 l.jpg

    15% by

    restart

    Restart yields strongly connected graph


    Suppose you wish to increase the page rank of vertex v l.jpg
    Suppose you wish to increase the page rank of vertex v by

    • Capture restart

      web farm

    • Capture random walk

      small cycles


    Capture restart l.jpg
    Capture restart by

    • Buy 20,000 url’s and capture restart

    • Can be countered by small restart value

    • Small restart increases web rank of page that captures random walk by small cycles.


    Slide62 l.jpg

    Capture random walk by

    0.85

    1

    1

    0.15 restart

    0.66

    0.1restart

    0.66

    1

    0.66

    1.56

    0.23 restart


    Slide63 l.jpg

    Y=0.66 by

    0.1restart

    0.56

    0.66

    1

    X=1.56

    0.66

    0.23restart

    X=1+0.85*y

    Y=0.85*x/2


    Slide64 l.jpg

    If one loop increases Pagerank from 1 to 1.56 why not add many self loops?

    Maximum increase in Pagerank is 6.67


    Slide65 l.jpg

    Discovery time – time to first reach a vertex by random walk from uniform start

    S

    Cannot lower discovery time of any page in S below minimum already in S


    Slide66 l.jpg

    Why not replace Pagerank by discovery time? walk from uniform start

    No efficient algorithm for discovery time

    DiscoveryTime(v)

    remove edges out of v

    calculate Pagerank(v) in modified graph



    Information is important l.jpg
    Information is important is not statistically detectable

    • When a customer makes a purchase what else is he likely to buy?

      • Camera

        • Memory card

        • Batteries

        • Carrying case

        • Etc.

  • Knowing what a customer is likely to buy is important information.


  • Slide69 l.jpg


    Detecting trends before they become obvious l.jpg
    Detecting trends before they become obvious a web site?

    • Is some category of customer changing their buying habits?

      • Purchases, travel destination, vacations

    • Is there some new trend in the stock market?

    • How do we detect changes in a large database over time?


    Identifying changing patterns in a large data set l.jpg
    Identifying Changing Patterns a web site?in a Large Data Set

    • How soon can one detect a change in patterns in a large volume of information?

    • How large must a change be in order to distinguish it from random fluctuations?


    Conclusions l.jpg
    Conclusions a web site?

    • We are in an exciting time of change.

    • Information technology is a big driver of that change.

    • The computer science theory of the last thirty years needs to be extended to cover the next thirty years.


    ad