Scaling in the Geography
Download
1 / 23

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty - PowerPoint PPT Presentation


  • 130 Views
  • Uploaded on

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London [email protected] [email protected] http://www.casa.ucl.ac.uk/ Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL). Motivation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty' - len


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Scaling in the Geography

of US Computer Science

Rui Carvalho and Michael Batty

University College London

[email protected] [email protected]

http://www.casa.ucl.ac.uk/

Thanks: Michael Gastner (SFI),Isaac Councill (PSU),Chris Brunsdon (Leicester),Ben Gimpert (UCL)


Motivation
Motivation

  • Why Geography?

    • Scientists: who can I collaborate with in my city/country?

    • Funding Agencies: where are new research centres emerging? Is regional distribution of funds optimal?

    • Scientometrics: distinguish between J. Smith (PSU) and J. Smith (UCL);

  • Preprint server challenges:

    • [USA] NIH-funded investigators are required to submit to PubMed their papers within 1 year of publication (effective May 2, 2005);

    • [UK] Wellcome Trust-funded papers will in future have to be placed in a central public archive within six months of publication;

  • Data mining challenges:

    • Processing of large databases give promise to uncover knowledge hidden behind the mass of available data;

    • Dramatically speed up achievements formerly reached solely by human effort and provide new results that could not have been reached by humans unaided;

  • Statistical Challenges:

    • Conventional wisdom holds that (geographical) spatial point processes have characteristic scales...

    • Yet most “real world” phenomena are often far from equilibrium.

PNAS, 6 April 2004


Plan

  • Open Archives Datasets:

    • Citeseer (Computer Science);

    • arXiv.org (mainly Physics, but also Maths and CS)

  • Geographical Datasets:

    • The US census bureau makes available on the web datasets for geocoding, but Europe lacks a unified ‘open-access’ database;

  • Plan:

    • Extract ZIP codes from authors’ addresses;

    • Map research centres geographically;

  • Questions about the research centres:

    • How productive are they?

    • Are there non-trivial spatial structures at a geographical scale?


Plan

  • Open Archives Datasets:

    • Citeseer (Computer Science);

    • arXiv.org (mainly Physics, but also Maths and CS)

  • Geographical Datasets:

    • The US census bureau makes available on the web datasets for geocoding, but Europe lacks a unified ‘open-access’ database;

  • Plan:

    • Extract ZIP codes from authors’ addresses;

    • Map research centres geographically;

  • Questions about the research centres:

    • How productive are they?

    • Are there non-trivial spatial structures at a geographical scale?

Can Statistical Physics Help?


What is citeseer
What is Citeseer?

  • Founded by Steve Lawrence and C. Lee Giles in 1997 (NEC);

  • Now at Penn State http://citeseer.ist.psu.edu/

  • Archive of computer science research papers harvested from the web and submitted by users;

  • Currently (Dec 2005) contains over 730,000 documents;

  • Citeseer was developed as a model for Autonomous Citation Indexing, i.e. citation indexes are created automatically;

  • Can search content in postscript and PDF files.


Data collecting and parsing
Data Collecting and Parsing

  • Citeseer metadata:

    • 525,055 computer science research papers;

    • 399,757 (76.14%) of which are unique;

    • 103,172 (25.81%) of the unique papers have one or more US authors;

    • 2,975 different ZIP codes in the unique papers belong to the US conterminous states (48 states, plus the District of Columbia);

  • 5 most productive ZIP codes:

    • Count: 3950 Zip: 15213 Carnegie Mellon Univ, Pittsburgh PA;

    • Count 3403 Zip: 02139 MIT, Cambridge, MA;

    • Count: 2954 Zip: 94305 Stanford Univ, CA;

    • Count: 2691 Zip: 94720 Univ California at Berkley, CA;

    • Count: 2309 Zip: 61801 Univ Illinois at Urbana Champaign, IL





Cartograms
Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)

Density-equalizing map projections: Diffusion-based algorithm and applications

Michael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)


Cartograms1
Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)

Density-equalizing map projections: Diffusion-based algorithm and applications

Michael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)


Cartograms2
Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)


Cartograms3
Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)


Spatial point processes
Spatial Point Processes

  • Moments:

    • First moment: ρ, expected number of points per unit area;

    • Second moment: Ripley’s function. ρK(r) is the expected number of points within distance r of a point.

      • For a Poisson process, ;

  • But neither the first or second moments give a feel for the way in which spatial distribution changes within an area.


The two point correlation function
The Two-Point Correlation Function

  • The two-point correlation function

    describes the probability to find a point in volume dV(x1) and another point in dV(x2) at distance r = |x1-x2|;

  • For a Poisson process g(r)=1;

  • Edge corrections (Ripley’s Weights): take a circle centred on point x passing through another point y. If the circle lies entirely within the domain, D, the point is counted once. If a proportion p(x,y) of the circle lies within D, the point is counted as 1/p points.


Computation of the two point correlation function
Computation of the Two-Point Correlation Function

Intersection with border gives more than one polygon

Geographical range at which the two-point

correlation function can be approximated by a power-law





To find out more
To find out more

  • http://www.casa.ucl.ac.uk/

  • Spatially Embedded Complex Systems Engineering (SECSE):

    http://www.secse.net/

    members: UCL, Leeds, Southampton, Sussex

  • [email protected] [email protected]



Poisson point process
Poisson Point Process

  • We say that a spatial process is completely random iff:

    • The number of events in any planar region A with area |A| follows a Poisson distribution with mean λ |A|, where λ is the density of points;

    • For any two disjoint regions A and B, the random variables N(A) and N(B) are independent.


ad