1 / 49

Implications for ethnic analysis of research surveys and administrative datasets

Using Intelligent Systems to infer ethnicity from names. Implications for ethnic analysis of research surveys and administrative datasets. Richard Webber Visiting Professor, Department of Geography, UCL / OriginsInfo Oxford, 18 July 2006. Name and address. Daniele Ceccomori

katoka
Download Presentation

Implications for ethnic analysis of research surveys and administrative datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Intelligent Systems to infer ethnicity from names Implications for ethnic analysis of research surveys and administrative datasets Richard Webber Visiting Professor, Department of Geography, UCL / OriginsInfo Oxford, 18 July 2006

  2. Name and address Daniele Ceccomori 16a Broadlands Road London N6 4AN

  3. Intelligent Inferences Italy Central Italy Young Daniele Ceccomori 16a Broadlands Road London N6 4AN Male 1880 - 1939 Subdivided house Global Connections

  4. Monica / Stage Age Daniele Ceccomori 16a Broadlands Road London N6 4AN

  5. Mosaic / Acorn Daniele Ceccomori 16a Broadlands Road London N6 4AN Global Connections

  6. Nam Pehchan / Sangra / Origins Italy Central Italy Daniele Ceccomori 16a Broadlands Road London N6 4AN

  7. ‘Ethnic’ coding systems • Based typically on surname element of name • Typically targetting a single or regional group • South Asians (UK) • Hispanics (US) • East Asians (US) • Calibrated against a reference database • Optimised for a specific geographical area • May be delivered using bespoke software application

  8. ‘Onomastic’ coding systems • Categories based on common naming practices • Based on interconnection between personal and family names • May or may not correspond to ethnic, linguistic, religious or cultural categorisations • No external validation • Global coverage and territorial independence

  9. Example of Onomastic codingBrent North Electoral Register

  10. Example of Onomastic analysis :Computer Studies graduates, City University, 2006

  11. Optimising ‘onomastic’ classifications Origins - Australia

  12. ‘Origins’ • Onomastic coding system • Base files • UK / Ireland / Australia • France • Spain • Italy • Netherlands • Romania • Norway • Number of unique names • Family names 620,000 • Personal names 200,000 • Number of categories • 195 onomastic types • 13 onomastic groups

  13. Key Origins Groups European English Celtic Western European Hispanic Nordic Eastern European Jewish and Armenian Rest of the World African Muslim South Asian Sikh East Asian Japanese

  14. General Strategy • Access universal file

  15. General Strategy • Access universal file • Personal name • Family name • Geodemographic cluster • Regional division

  16. General Strategy • Access universal file • Personal name • Family name • Geodemographic cluster • Regional division • Rank names by frequency • Names of known origin (eg Peter, John, Patel, Rees) • Names of unknown or unclear origin

  17. General Strategy • Access universal file • Personal name • Family name • Geodemographic cluster • Regional division • Rank names by frequency • Names of known origin (eg Peter, John, Smith, Rees) • Names of unknown or unclear origins • Unknown or unclear names • Triage • Text mining • Geodemographic profiling • Regional analysis

  18. Naming practices as indicators of cultural integration

  19. Triage example : Lorcan

  20. Triage example : Ourania • UK total : 88 • 11.4% have British family name • 27.3% have Greek Orthodox family name • 61.4% have family names that have not been classified • Greek Orthodox total : 24 • 62.5% have Greek family names • 37.5% have Greek Cypriot family names

  21. Text mining to find Nigerians

  22. Italy : names occurring only in South Tyrol

  23. Netherlands : names exclusive to multi-cultural geodemographic clusters

  24. General Strategy • Access universal file • Personal name • Family name • Geodemographic cluster • Regional division • Rank names by frequency • Names of known origin (eg Peter, John, Smith, Rees) • Names of unknown or unclear origins • Unknown or unclear names • Triage • Text mining • Geodemographic profiling • Regional analysis • Extend to other countries • Compare frequencies between countries

  25. Personal names : Spain, Italy, UK

  26. Section of current family name reference file • Current family name total 562,558

  27. Arbitration and the use of confidence scores

  28. Validation : Comparing census with name based classifications

  29. Validation : Gender

  30. Research Questions • The size of different minority groups • Their regional dispersion • Their degree of residential integration • Their success in improving their social position • The growth or decline in their numbers

  31. ‘Bird’ names

  32. River endings‘-bourne’‘-burn’

  33. ‘-son’ Patronymic endings ‘-s’

  34. Electors with ‘Welsh’ surnames

  35. Destinations of Irish migrants

  36. Destinations of Cornish migrants

  37. Destinations of ‘Cornish’ migrants in Australia

  38. Destinations of Cornish migrants in the US

  39. US : Destinations of migrants from Cornwall (above) and Devon (below)

  40. Turkish migrants in Greater London

  41. Ethiopian migrants in Greater London

  42. West London postcodes where Hindus and Sikhs are the majority community

  43. A contemporary ethnic map of London

  44. Evaluating economic and social integration

  45. Residential segregation : selected Local Authorities

  46. Neighbourhood segregation in Blackburn

  47. Neighbourhood segregation in Leicester

  48. Residential segregation : selected communities

  49. Using Intelligent Systems to infer ethnicity from names Implications for ethnic analysis of research surveys and administrative datasets Richard Webber Visiting Professor, Department of Geography, UCL / OriginsInfo Oxford, 18 July 2006

More Related