slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Daniela Ichim, Giulio Perani, Giovanni Seri Italian National Statistical Institute (Istat) PowerPoint Presentation
Download Presentation
Daniela Ichim, Giulio Perani, Giovanni Seri Italian National Statistical Institute (Istat)

Loading in 2 Seconds...

play fullscreen
1 / 18

Daniela Ichim, Giulio Perani, Giovanni Seri Italian National Statistical Institute (Istat) - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Designing Linkage between Patents and Business Registers: the Italian Experience. Daniela Ichim, Giulio Perani, Giovanni Seri Italian National Statistical Institute (Istat) {ichim,perani,seri}@istat.it EESW European Establishment Statistics Workshop 2011. Neuchatel, 12 – 14 September 2011.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Daniela Ichim, Giulio Perani, Giovanni Seri Italian National Statistical Institute (Istat)


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Designing Linkage between Patents and Business Registers: the Italian Experience Daniela Ichim, Giulio Perani, Giovanni Seri Italian National Statistical Institute (Istat) {ichim,perani,seri}@istat.it EESW European Establishment Statistics Workshop 2011 Neuchatel, 12 – 14 September 2011

    2. EESW European Establishment Statistics Workshop 2011 Outline Project description Data sets Linkage approach Pre-processing of the input files Choice of the matching variables Choice of the similarity function Creation of the search space of link candidate pairs Choice of the decision model and selection of unique links Record linkage evaluation Preliminary results Future works

    3. EESW European Establishment Statistics Workshop 2011 Project description • Aim: profiling the Italian patenting enterprises • Linking economic data and technological information on patenting enterprises in order to identify the key drivers of patenting propensity • Evaluating the economic impact of the patenting activity • Identifying and collecting additional information on enterprises to be surveyed as R&D performers • Investigating specific sub-population of enterprises (e.g. biotech enterprises)

    4. EESW European Establishment Statistics Workshop 2011 Project description • Source of data: PATSTAT - EPO Worldwide Patent Statistical Database • Target data: applicants based in Italy • Period: patent applications from 1985 to 2010 • Subject classification criterium: • A) individuals • B) establishments • Business enterprises • Public institutions • Non profit institutions • Universities

    5. EESW European Establishment Statistics Workshop 2011 Data sets: patents PATSTAT (1) Applications 299769 Application number (by year) International Patent Classification (IPC) code (each application can be classified under several IPC codes) PATSTAT (2) Applications 72034 Application number (by year) Applicant name Applicant code Postal/Zip Code Applicant Country (=IT) • Additional information toberetrievedfrom the above database: • Yearof first/last applicationbyapplicant • Numberofpatentapplicationsfiledbyapplicant • Regionof residence of the applicants

    6. EESW European Establishment Statistics Workshop 2011 Data sets: enterprises Italian business register: ASIA (Archivio Statistico Imprese Attive) it is the frame for Istat surveys built as a logical and physical combination of data from both surveys and administrative sources (Tax Register, Register of Enterprises and Local Units, Social Security Register, Work Accident Insurance Register, Register of the Electric Power Board). ASIA Enterprises identification number Enterprises name Postal/Zip Code NACE code Address, municipality, province, region Legal form Fiscal code Enterprise’s size variables: Number of employees Turnover ASIA 1998-2008 (size 2008 ~ 4.5million records)

    7. EESW European Establishment Statistics Workshop 2011 Data sets: linkage output Shared variables: Name Postal/Zip Code Enterprises identification number Applicant identification number Surveys

    8. EESW European Establishment Statistics Workshop 2011 Pre-processing of the input files Standardisation: • Accents • Symbols & special characters • Double spaces • Dots (e.g. L.T.D. in LTD), punctuations • Known abbreviations (about 150 ways to say “in short”) • Most frequent words (more than 1000 and 100) • Lower/upper letters • Deduplication of words • Known legal forms (reduced to 6 main categories) • Universities/public administrations dropped

    9. EESW European Establishment Statistics Workshop 2011 Choice of the matching variables • Std name in upper letter and alphabetical order • Postal/Zip code • Legal form

    10. EESW European Establishment Statistics Workshop 2011 Search space reduction Patent applicants: Establishments (Enterprises) – Individuals - several words in a name (OK only for enterprises, not for individuals) Individuals: Std Applicant name does not contain - legal form - a name not included in the database of Italian first names “List of italian first names”* - special terms: “enterprise”, “construction”, “hotel”, “systems”, “group”, … (63 values) *(http://www.nomix.it/nomi-italiani-maschili-e-femminili.php)

    11. EESW European Establishment Statistics Workshop 2011 Search space reduction • Blocking by year of application • (reduces only the size of the patent applicants archive: ineffective) • Blocking by Postal/Zip Code-Region (ineffective) • Partition of ASIA 2008 (more than 10 employees, 1 employee with legal form) • ASIA 2007-1998 (recursively removing the enterprises included in most recent ASIA archives) • R&D survey frame (as a subset of ASIA archive)

    12. EESW European Establishment Statistics Workshop 2011 Search space reduction Neighbourhoods of words: the set of ASIA enterprises having at least one word in common with the patent applicant name Huge number of small problems!!!!

    13. EESW European Establishment Statistics Workshop 2011 Search space reduction Neighbourhoods of words: Hypotheses: - assumes at least one word in a name registered at the same manner in both registers Problems: - very short words (1-2 letters) generate huge neighbourhoods - very common words generate huge neighbourhoods - names without neighbourhood - not applicable in a probabilistic approach * 23338 Patent applicants ~ ASIA 2008 (10+ number of employees)

    14. EESW European Establishment Statistics Workshop 2011 Preliminary results Still under expert clerical check (~hundreds) No Duplicated Enterprises code

    15. EESW European Establishment Statistics Workshop 2011 Preliminary results Patent applicants by year: lost and found (black and red)

    16. EESW European Establishment Statistics Workshop 2011 Preliminary results Patenting enterprises in ASIA 2008by economic activity (NACE 2007) The 5 most frequent NACE’s divisions

    17. EESW European Establishment Statistics Workshop 2011 Future Work • Methods • Neighborhood based on similarity instead of equality • Probabilistic approach (using the R&D survey frame) • Units • Names containing only 2 letters words • Individuals (names without legal form) • List of companies’ owners and partners • List of University Professors/Researchers • No neighbourhood names • Analyses • Produce analytical evidence on specific technological areas (e.g. Biotech) using ICP codes • Overall classification of patent applicants

    18. EESW European Establishment Statistics Workshop 2011 Thank you for your attention!