Ontologically based searching for jobs in linguistics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Ontologically-based Searching for Jobs in Linguistics PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

Ontologically-based Searching for Jobs in Linguistics. Deryle Lonsdale [email protected] Funded by:. The BYU Data Extraction Group. Group of faculty (5) and students (15) from CS, Linguistics, SOAIS Goal: ontology-based data extraction NSF funding: CISE/IIS/IDM TIDIE Website: www.deg.byu.edu/

Download Presentation

Ontologically-based Searching for Jobs in Linguistics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ontologically based searching for jobs in linguistics

Ontologically-based Searching for Jobs in Linguistics

Deryle Lonsdale

[email protected]

Funded by:


The byu data extraction group

The BYU Data Extraction Group

  • Group of faculty (5) and students (15) from CS, Linguistics, SOAIS

  • Goal: ontology-based data extraction

  • NSF funding: CISE/IIS/IDM TIDIE

  • Website: www.deg.byu.edu/

    • Papers, presentations

    • Tools

    • Demos


The byu data extraction group1

The BYU Data Extraction Group


Overview

Overview

  • Ontology-based extraction

  • Building knowledge sources

  • Jobs in linguistics (Sproat)

  • Putting it all together

  • Some sample results


Ontologies and ie

Ontologies and IE

Source

Target


Document based ie

Document-based IE


Conceptual modeling osm

Year

Price

1..*

1..*

1..*

has

has

Make

1..*

Mileage

0..1

0..1

0..1

0..1

has

has

Car

0..1

0..1

0..*

is for

PhoneNr

has

has

1..*

Model

0..1

1..*

1..*

has

Feature

1..*

Extension

Conceptual modeling (OSM)


Recognition and extraction

Car Feature

0001 Auto

0001 AC

0002 Black

0002 4 door

0002 tinted windows

0002 Auto

0002 pb

0002 ps

0002 cruise

0002 am/fm

0002 cassette stereo

0002 a/c

0003 Auto

0003 jade green

0003 gold

Car Year Make Model Mileage Price PhoneNr

0001 1989 Subaru SW $1900 (336)835-8597

0002 1998 Elantra (336)526-5444

0003 1994 HONDA ACCORD EX 100K (336)526-1081

Recognition and Extraction


Car ads ontology textual

Car-Ads Ontology (textual)

Car [->object];

Car [0..1] has Year [1..*];

Car [0..1] has Make [1..*];

Car [0...1] has Model [1..*];

Car [0..1] has Mileage [1..*];

Car [0..*] has Feature [1..*];

Car [0..1] has Price [1..*];

PhoneNr [1..*] is for Car [0..*];

PhoneNr [0..1] has Extension [1..*];

Year matches [4]

constant {extract “\d{2}”;

context "([^\$\d]|^)[4-9]\d[^\d]";

substitute "^" -> "19"; },

End;


The data frame library

The data-frame library

  • Low-level patterns implemented as regular expressions

  • Match items such as email addresses, phone numbers, names, etc.

    Mileage matches [8]

    constant { extract "\b[1-9]\d{0,2}k"; substitute "[kK]" -> "000"; },

    { extract "[1-9]\d{0,2}?,\d{3}";

    context "[^\$\d][1-9]\d{0,2}?,\d{3}[^\d]"; substitute "," -> "";},

    { extract "[1-9]\d{0,2}?,\d{3}";

    context "(mileage\:\s*)[^\$\d][1-9]\d{0,2}?,\d{3}[^\d]"; substitute "," -> "";},

    { extract "[1-9]\d{3,6}";

    context "[^\$\d][1-9]\d{3,6}\s*mi(\.|\b\les\b)";},

    { extract "[1-9]\d{3,6}";

    context "(mileage\:\s*)[^\$\d][1-9]\d{3,6}\b";};

    keyword "\bmiles\b", "\bmi\.", "\bmi\b", "\bmileage\b";

    end;


Lexicons

Lexicons

  • Repositories of enumerable classes of lexical information

  • FirstNames, LastNames, USstates, ProvoOremApts, CarMakes, Drugs, CampGroundFeats, etc.


Accessing the output

Accessing the output

  • Extracted information is stored in a relational database

  • Results can be queried using SQL

  • Wide range of views is possible


Finding jobs in linguistics

Finding jobs in linguistics

  • Linguistlist.org, LSA

  • Email distribution lists (corpora, langage naturelle, CAAL/ACLA, etc.)

  • Usual commercial sites (monster.com, flipdog.com, dice.com)

  • Word-of-mouth sources


Sproat s analysis

Sproat’s analysis

  • Random sample (224/2250) of LinguistList postings, 1994-2001

  • Development vs. research, academic vs. industrial

  • Linguists are most often (approx. 80% of the time) offered development jobs

  • Linguists hired more for specific tasks (e.g. grammar, lexicon development) rather than for more general research-oriented tasks (e.g. creating new technological approaches.)


The banner years

The banner years

Year Academia Industry % Industry

1994 27 2 7%

1995 45 5 10%

1996 52 3 5%

1997 48 3 6%

1998 57 3 5%

1999 56 14 20%

2000 55 43 39%

2001 (mid) 22 10 31%

  • Dramatic rise in 1999, 2000

  • Steep drop-off since 2001

  • Rising demand for technical, computational skills


Linguistic jobs ontology

Linguistic jobs ontology

  • Why?

    • user-specifiable constraints

  • Somewhat closely follows existing ontologies (e.g. jobs, software)


Data frames and lexicons

Data frames and lexicons

  • Language names

    • ethnologue

  • (sub)fields of linguistics

    • Linguistlist.org

  • Tools, toolkits

  • Software components, programming languages

  • Linguistics-related job titles

  • Activities

  • Responsibilities

  • Country names


The corpus

The corpus

  • 3237 postings (LinguistList, Corpora, LN, WoM):

    1998 541

    1999 575

    2000 871

    2001 952

    2002 788

  • Some noise (non-English, factored, program descriptions, attachments, etc.)

  • Semi-automatic edits (boilerplate, publicity blurbs about institutions, etc.)


Sample output

Sample output

  • Here


Observations

Observations

  • 270 don’t have linguist* (!)

  • Demand for knowledge of English equals that for all other languages combined (G, F, S, J, C)

  • Computer/computational background required for almost 1/3 (1116)

  • Noticeable amount of headhunting, particularly in Seattle, DC areas


Programming languages

Programming languages


Popular subfields

Popular subfields


Subfields another perspective

Subfields (another perspective)


An engineering discipline

An engineering discipline?

  • 160 linguistics jobs ending in “engineer”

  • Software development cycle

    • research e., software design e.

    • development e., software e.

    • software quality e., linguistic test e., linguistic quality e.

    • linguistic support e., user experience e.

    • presales e., technical sales e.

  • Specific subfields

    • web site e.

    • speech e., voice recognition e., speech recognition application e., speech e., ASR tuning e., audio e.

    • dialog e.

  • tools e.

  • AI e., NLP e.

  • knowledge e.

  • linguist e., natural language e.

  • staff e.

  • human factors e., user interface e.


Paradigms

Paradigms


Other observations

Other observations

  • Often a job title is not even listed (!)

  • More in18 of data frames (e.g. email, ph. #)

  • Great need for (preferably hierarchical) lexical repositories related to linguistics

    • job titles

    • theoretical frameworks, subfields

    • typical linguist job activities

    • linguistic research/development venues


  • Login