Slide1 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London [email protected] [email protected] PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on
  • Presentation posted in: General

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London [email protected] [email protected] http://www.casa.ucl.ac.uk/ Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL). Motivation.

Download Presentation

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London [email protected] [email protected]

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

Scaling in the Geography

of US Computer Science

Rui Carvalho and Michael Batty

University College London

[email protected] [email protected]

http://www.casa.ucl.ac.uk/

Thanks: Michael Gastner (SFI),Isaac Councill (PSU),Chris Brunsdon (Leicester),Ben Gimpert (UCL)


Motivation l.jpg

Motivation

  • Why Geography?

    • Scientists: who can I collaborate with in my city/country?

    • Funding Agencies: where are new research centres emerging? Is regional distribution of funds optimal?

    • Scientometrics: distinguish between J. Smith (PSU) and J. Smith (UCL);

  • Preprint server challenges:

    • [USA] NIH-funded investigators are required to submit to PubMed their papers within 1 year of publication (effective May 2, 2005);

    • [UK] Wellcome Trust-funded papers will in future have to be placed in a central public archive within six months of publication;

  • Data mining challenges:

    • Processing of large databases give promise to uncover knowledge hidden behind the mass of available data;

    • Dramatically speed up achievements formerly reached solely by human effort and provide new results that could not have been reached by humans unaided;

  • Statistical Challenges:

    • Conventional wisdom holds that (geographical) spatial point processes have characteristic scales...

    • Yet most “real world” phenomena are often far from equilibrium.

PNAS, 6 April 2004


Slide3 l.jpg

Plan

  • Open Archives Datasets:

    • Citeseer (Computer Science);

    • arXiv.org (mainly Physics, but also Maths and CS)

  • Geographical Datasets:

    • The US census bureau makes available on the web datasets for geocoding, but Europe lacks a unified ‘open-access’ database;

  • Plan:

    • Extract ZIP codes from authors’ addresses;

    • Map research centres geographically;

  • Questions about the research centres:

    • How productive are they?

    • Are there non-trivial spatial structures at a geographical scale?


Slide4 l.jpg

Plan

  • Open Archives Datasets:

    • Citeseer (Computer Science);

    • arXiv.org (mainly Physics, but also Maths and CS)

  • Geographical Datasets:

    • The US census bureau makes available on the web datasets for geocoding, but Europe lacks a unified ‘open-access’ database;

  • Plan:

    • Extract ZIP codes from authors’ addresses;

    • Map research centres geographically;

  • Questions about the research centres:

    • How productive are they?

    • Are there non-trivial spatial structures at a geographical scale?

Can Statistical Physics Help?


What is citeseer l.jpg

What is Citeseer?

  • Founded by Steve Lawrence and C. Lee Giles in 1997 (NEC);

  • Now at Penn State http://citeseer.ist.psu.edu/

  • Archive of computer science research papers harvested from the web and submitted by users;

  • Currently (Dec 2005) contains over 730,000 documents;

  • Citeseer was developed as a model for Autonomous Citation Indexing, i.e. citation indexes are created automatically;

  • Can search content in postscript and PDF files.


Data collecting and parsing l.jpg

Data Collecting and Parsing

  • Citeseer metadata:

    • 525,055 computer science research papers;

    • 399,757 (76.14%) of which are unique;

    • 103,172 (25.81%) of the unique papers have one or more US authors;

    • 2,975 different ZIP codes in the unique papers belong to the US conterminous states (48 states, plus the District of Columbia);

  • 5 most productive ZIP codes:

    • Count: 3950 Zip: 15213 Carnegie Mellon Univ, Pittsburgh PA;

    • Count 3403 Zip: 02139 MIT, Cambridge, MA;

    • Count: 2954 Zip: 94305 Stanford Univ, CA;

    • Count: 2691 Zip: 94720 Univ California at Berkley, CA;

    • Count: 2309 Zip: 61801 Univ Illinois at Urbana Champaign, IL


Q1 how productive are the research centres l.jpg

Q1: How productive are the research centres?


Q2 non trivial spatial structures l.jpg

Q2: Non-trivial spatial structures?


The geography of citeseer l.jpg

The Geography of Citeseer


Cartograms l.jpg

Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)

Density-equalizing map projections: Diffusion-based algorithm and applications

Michael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)


Cartograms11 l.jpg

Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)

Density-equalizing map projections: Diffusion-based algorithm and applications

Michael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)


Cartograms12 l.jpg

Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)


Cartograms13 l.jpg

Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)


Spatial point processes l.jpg

Spatial Point Processes

  • Moments:

    • First moment: ρ, expected number of points per unit area;

    • Second moment: Ripley’s function. ρK(r) is the expected number of points within distance r of a point.

      • For a Poisson process, ;

  • But neither the first or second moments give a feel for the way in which spatial distribution changes within an area.


The two point correlation function l.jpg

The Two-Point Correlation Function

  • The two-point correlation function

    describes the probability to find a point in volume dV(x1) and another point in dV(x2) at distance r = |x1-x2|;

  • For a Poisson process g(r)=1;

  • Edge corrections (Ripley’s Weights): take a circle centred on point x passing through another point y. If the circle lies entirely within the domain, D, the point is counted once. If a proportion p(x,y) of the circle lies within D, the point is counted as 1/p points.


Computation of the two point correlation function l.jpg

Computation of the Two-Point Correlation Function

Intersection with border gives more than one polygon

Geographical range at which the two-point

correlation function can be approximated by a power-law


Two point correlation function l.jpg

Two-Point Correlation Function


Speculation knowledge diffusion l.jpg

Speculation: knowledge diffusion?


Speculation universality l.jpg

Speculation: Universality?


To find out more l.jpg

To find out more

  • http://www.casa.ucl.ac.uk/

  • Spatially Embedded Complex Systems Engineering (SECSE):

    http://www.secse.net/

    members: UCL, Leeds, Southampton, Sussex

  • [email protected] [email protected]


Plot of state r d expenditure nsf vs population l.jpg

Plot of state R&D expenditure (NSF) vs population


Poisson point process l.jpg

Poisson Point Process

  • We say that a spatial process is completely random iff:

    • The number of events in any planar region A with area |A| follows a Poisson distribution with mean λ |A|, where λ is the density of points;

    • For any two disjoint regions A and B, the random variables N(A) and N(B) are independent.


  • Login