kyoungryol kim n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Kyoungryol Kim PowerPoint Presentation
Download Presentation
Kyoungryol Kim

Loading in 2 Seconds...

play fullscreen
1 / 21

Kyoungryol Kim - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on

Meeting Information Extraction from Meeting Announcement in Korean. Kyoungryol Kim. Table of Contents. Introduction Motivation Goal Problem Definition The Proposed Method Problem Modeling / Checklist Overall Architecture Normalization Process. Introduction. Motivation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Kyoungryol Kim' - tevy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
kyoungryol kim

MeetingInformation Extraction

from Meeting Announcement in Korean

Kyoungryol Kim

table of contents
Table of Contents
  • Introduction
    • Motivation
    • Goal
    • Problem Definition
  • The Proposed Method
    • Problem Modeling / Checklist
    • Overall Architecture
      • Normalization Process
motivation
Motivation
  • Everyday we receive a lot of Meeting Announcement
    • Conference, Seminar, Workshop, Meeting, Appointment…
    • Meeting announcement accounts for 17% (30,201 out of 183,022) of emails in Enron Email Dataset.
  • Smartphone era
    • Many people manage schedule using online-calendar via smartphonee.g. Google Calendar
    • But, typing by touch screen keyboard make many errors and even it’s difficult.

* Enron Email Dataset, August 21, 2009 version, http://www.cs.cmu.edu/~enron/

slide5
Goal
  • Extracting schedule information from meeting announcement,and update them to the calendar, automatically.

Meeting Announcement

무더운 날씨가 본격적으로 시작되는 즈음하여 유니브캐스트의 상반기 평가와 하반기 운영을 위한 정기팀장회의를 개최합니다.

날짜 : 7월 19일(토) 오후 2시

장소 : 민들레영토

민들레영토오는길

지도와 같이 명동역 8번 출구로 나오셔서 쭉 상가 끼고 걸어가시면 저기YMCA빌딩 1층에 있습니다.

Extract

Update

problem definition
Problem Definition

To find Meeting Location, the problem divided into 2 parts :

  • Finding locations from the text for each type of predefined complexity.
  • Named entity disambiguation on found locations.

무더운 날씨가 본격적으로 시작되는 즈음하여 유니브캐스트의 상반기 평가와 하반기 운영을 위한 정기팀장회의를 개최합니다.

날짜 : 7월 19일(토) 오후 2시

장소 : 민들레영토

기본 안건

- 제작지원비 지급 지연에 대한 설명

- 기금 조정 운영안

- 가을 워크샵준비위 구성

- 기타(기타 안건으로 상정할 것이 있으면 각 팀장들은 제안해 주시기 바랍니다)

민들레영토 오는길

지도와 같이 명동역8번 츨구로 나오셔서 쭉 상가 끼고 걸어가시면 저기 YMCA빌딩 1층에 있습니다.

참고하세요

무더운 날씨가 본격적으로 시작되는 즈음하여 유니브캐스트의 상반기 평가와 하반기 운영을 위한 정기팀장회의를 개최합니다.

날짜 : 7월 19일(토) 오후 2시

장소 : 민들레영토

기본 안건

- 제작지원비 지급 지연에 대한 설명

- 기금 조정 운영안

- 가을 워크샵준비위 구성

- 기타(기타 안건으로 상정할 것이 있으면 각 팀장들은 제안해 주시기 바랍니다)

민들레영토 오는길

지도와 같이 명동역8번 츨구로 나오셔서 쭉 상가 끼고 걸어가시면 저기 YMCA빌딩 1층에 있습니다.

참고하세요

1. Finding

TargetLocations

2. Disambiguation

problem modeling
Problem Modeling

Meeting Announcement Text

Location on the Map

7. How to represent Location?

Extract location strings

1. How to extract location string?

Extract address information

and limit the boundary

2. How to extract address information?

3. What kind of DB can we use?

4. How to manipulate the query?

Search the location

from the DB

Search the location

from external resources

5. What kind of external resources can we use?

Disambiguation

among found locations

6. What are the measures to find desired location?

problem list 1 2
Problem List (1/2)
  • How to extract location strings from the given text?
  • How to extract address information from location strings?
  • To search the location, what kind of database can we use?
  • To search the location, how to manipulate the query?
  • To search the location, what kind of external resources can we use?
  • What are the measures to find desired locations among candidates?
  • How to represent the location ?
problem list 2 2 reorganized
Problem List (2/2) - Reorganized
  • How to extract location strings from the given text?
  • How to extract address information from location strings?
    • How to check whether address information is included or not?
    • How to construct database which can limits boundary of address
    • boundary 를 가리키는 지역이 여러군대라면?
  • To search the location, what resources can we use?
    • Internal database : How to construct internal database?
    • External resources : What external resources available?
  • To search the location, how to manipulate the query?
  • What are the measures to find desired locations among candidates?
  • How to represent the location ?
    • To store the location to the DB
    • To represent the location on the map
problem checklist 6 6
Problem Checklist : (6/6)

How to represent the location ?

  • To store the location to the DB
    • Uses OpenStreetMap representation
    • Node / Way / Relation
  • To represent the location on the map
    • WGS84 (standard) : ( latitude, longitude [, altitude] )
representation of meeting location
Representation of Meeting Location
  • Follows basic representations of the Nodein OpenStreetMap to represent location.
    • Regard the meeting location as Point-of-Interest
    • Variable attributes (key-value pair)http://wiki.openstreetmap.org/wiki/Map_Features
      • used_as_meeting_location=true
      • search_query=user’s query (comma separated)
    • Meeting location can be imported to OSM server (interoperability)

<node id="850918486" lat="37.4936384" lon="127.0137745" user="cyana" uid="74529" visible="true"

version="3" changeset="5478335" timestamp="2010-08-13T02:26:19Z">

<tag k="name" v="교대(Gyodae)"/>

<tag k="name:en" v="Gyodae"/>

<tag k="name:ko_rm" v="Gyodae"/>

<tag k="railway" v="station"/>

</node>

<node id="368738707" lat="37.4990100" lon="127.0275800"

user="cyana" uid="74529" visible="true"

version="2" changeset="4370541" timestamp="2010-04-09T08:09:50Z">

<tag k="amenity" v="dentist"/>

<tag k="name" v="미소드림치과 (Misodeurim Dental Clinic)"/>

<tag k="name:en" v="Misodeurim Dental Clinic"/>

<tag k="name:ko" v="미소드림치과"/>

<tag k="name:ko_rm" v="Misodeurimchigwa"/>

<tag k="ncat" v="치과"/>

</node>

slide13

node

changeset

node_tag

changeset_tag

bounds

example bounds
Example : bounds
  • Bounds information constructed by using Google Maps API
    • Closed-world is South Korea area (possibly can be expanded)
overall architecture
Overall Architecture

Training System

Testing System

Input

Document

PersonalInformation

Expand

Gazetteer

Finding Target Locations

Gazetteer

Location NER

Trained

Models

(CRFs,SVMs)

Corpus Expansion

OpenAPI

Map Services

Train

Models

Relation-type Classification

Document

Annotation

Adding

Document to Corpus

Normalization

Disambiguation

Training

Corpus

OUTPUT

normalization process
Normalization Process

Normalization

Input Query:

프란치스코교육회관2층

Input

Document

Pre-Processing :

Remove HTML-tag/URL/㈜

Replace (),[],{}with space

Finding Target Locations

Location NER

{

“query” : {

“full” : “프란치스코교육회관2층”,

}

}

OpenAPI

Map Services

Relation-type Classification

Normalization

Split the Query into 2 parts :

Main Part / Extra-Part

Disambiguation

PersonalInformation

Main : Chunks include Main location information.

Extra : Chunks include Floor/room information.

Gazetteer

OUTPUT

Trained

Models

(CRFs,SVMs)

{

“query” : {

“full” : “프란치스코교육회관2층”,

“main” : “프란치스코교육회관”,

“extra” : “2층”

}

}

slide17

Normalization

Extract Address Information

1. if query doesn’t have Address information:

Without boundary limitation, just do search

from the databases and APIs

No

has

Address Info?

Input

Document

has Address info?

1) main query 를 space 단위로 chunking 하고

2) 각 chunk 를 iteration 하면서

- chunk가 “-시”, “-시/-구/-군”, “-동/-가/-면/-읍”, “-리” 로 끝나는지,

- DB의 시/구/동/리 칼럼의 값으로 시작되는지

확인하여, 찾아진 칼럼과 값을 저장한다.

3) 주소정보가 포함되어 있다면,

뒤에 번지수까지 포함하고 있는지 확인한다.

[0-9]+, [0-9]+\-[0-9]+, [0-9]+번지, [0-9]+\-[0-9]+번지

- 번지수까지 포함되어 있으면, 바로 geocoding.

- 번지수는 없으면, 해당지역까지의 bounds 를 db에서 가져옴.

Yes

include

House no?

No

Get Bounds info

from Address

(SW, NE)

{

“query” : {

“full” : “프란치스코교육회관2층”,

“main” : “프란치스코교육회관”,

“extra” : “2층”

}

}

Finding Target Locations

Bounds

DB

Yes

Location NER

OpenAPI

Map Services

Relation-type Classification

Geocoding

by Query

Normalization

Disambiguation

PersonalInformation

{

“query” : {

“full” : “서울시 강남구 삼성동 159-1 무역회관 2001호”,

“main” : “서울시 강남구 삼성동 159-1 무역회관”,

“extra” : “2001호”

},

found_locations : [

{

“title” : “대한민국 서울특별시 강남구 삼성동 159-1”,

“administrative_address” : “대한민국 서울특별시 강남구 삼성동 159-1”,

“geometry_location” : {

“lat” : 37.5103598,

“lng” : 127.0611803

}

]

}

{

“query” : {

“full” : “소공동 코리아나 호텔”,

“main” : “소공동 코리아나 호텔”,

“extra” : “”

},

“limited_bound” : {

“name” : “대한민국 서울특별시 중구 소공동”,

“southwest” :

{ lat : 37.4346000, lng : 126.7968000},

“northeast” :

{ lat : 37.6956000, lng : 127.1823000}

}

}

Gazetteer

OUTPUT

Trained

Models

(CRFs,SVMs)

slide18

Normalization

Extract Address Information

2. if the query have address information(with house number):

Geocode the address information and return.

(Disambiguation finished)

No

has

Address Info?

Input

Document

Yes

include

House no?

No

Get Bounds info

from Address

(SW, NE)

{

“query” : {

“full” : “프란치스코교육회관2층”,

“main” : “프란치스코교육회관”,

“extra” : “2층”

}

}

Finding Target Locations

Bounds

DB

Yes

Location NER

OpenAPI

Map Services

Relation-type Classification

Geocoding

by Query

Normalization

Disambiguation

PersonalInformation

{

“query” : {

“full” : “서울시 강남구 삼성동 159-1 무역회관 2001호”,

“main” : “서울시 강남구 삼성동 159-1 무역회관”,

“extra” : “2001호”

},

found_locations : [

{

“title” : “대한민국 서울특별시 강남구 삼성동 159-1”,

“administrative_address” : “대한민국 서울특별시 강남구 삼성동 159-1”,

“geometry_location” : {

“lat” : 37.5103598,

“lng” : 127.0611803

}

]

}

{

“query” : {

“full” : “소공동 코리아나 호텔”,

“main” : “소공동 코리아나 호텔”,

“extra” : “”

},

“limited_bound” : {

“name” : “대한민국 서울특별시 중구 소공동”,

“southwest” :

{ lat : 37.4346000, lng : 126.7968000},

“northeast” :

{ lat : 37.6956000, lng : 127.1823000}

}

}

Gazetteer

OUTPUT

Trained

Models

(CRFs,SVMs)

slide19

Normalization

Extract Address Information

3. if the query have address information(no house number):

Get bound information and search the location in the bound.

No

has

Address Info?

Input

Document

Yes

include

House no?

No

Get Bounds info

from Address

(SW, NE)

{

“query” : {

“full” : “프란치스코교육회관2층”,

“main” : “프란치스코교육회관”,

“extra” : “2층”

}

}

Finding Target Locations

Bounds

DB

Yes

Location NER

OpenAPI

Map Services

Relation-type Classification

Geocoding

by Query

Normalization

Disambiguation

PersonalInformation

{

“query” : {

“full” : “서울시 강남구 삼성동 159-1 무역회관 2001호”,

“main” : “서울시 강남구 삼성동 159-1 무역회관”,

“extra” : “2001호”

},

found_locations : [

{

“title” : “대한민국 서울특별시 강남구 삼성동 159-1”,

“administrative_address” : “대한민국 서울특별시 강남구 삼성동 159-1”,

“geometry_location” : {

“lat” : 37.5103598,

“lng” : 127.0611803

}

]

}

{

“query” : {

“full” : “소공동 코리아나 호텔”,

“main” : “소공동 코리아나 호텔”,

“extra” : “”

},

“limited_bound” : {

“name” : “대한민국 서울특별시 중구 소공동”,

“southwest” :

{ lat : 37.4346000, lng : 126.7968000},

“northeast” :

{ lat : 37.6956000, lng : 127.1823000}

}

}

Gazetteer

OUTPUT

Trained

Models

(CRFs,SVMs)

slide20

{

“query” : {

“full” : “소공동 코리아나 호텔”,

“main” : “소공동 코리아나 호텔”,

“extra” : “”

},

“limited_bound” : {

“name” : “대한민국 서울특별시 중구 소공동”,

“southwest” : { lat : 37.4346000, lng : 126.7968000},

“northeast” : { lat : 37.6956000, lng : 127.1823000}

}

}

Normalization

Local Search

Input

Document

Find Candidate Locations

Geocoding

SWRC

Meeting

Location DB

(Priority 2)

User

Meeting

Location DB

(Priority 1)

Open API

(OpenStreetMap,

Naver)

(Priority 3)

SWRC

DB

User

DB

Open

API

WMS

Coordinate Conversion

KTM -> WGS84

Finding Target Locations

Location NER

OpenAPI

Map Services

Relation-type Classification

Remove Duplicated Addresses

Normalization

{

“query” : {

“full” : “소공동 코리아나 호텔”,

“main” : “소공동 코리아나 호텔”,

“extra” : “”

},

“limited_bound” : {

“name” : “대한민국 서울특별시 중구 소공동”,

“southwest” : { lat : 37.4346000, lng : 126.7968000},

“northeast” : { lat : 37.6956000, lng : 127.1823000}

},

found_locations: [

{

“query” : “밀레니엄 힐튼 서울”,

“title” : “밀레니엄 힐튼 서울”,

“administrative_address” : “대한민국 서울특별시 중구 태평로1가 61-1”,

“geometry_location” : {

“lat” : 37.5103598,

“lng” : 127.0611803

},

{ ..... }

]

}

Disambiguation

PersonalInformation

Gazetteer

OUTPUT

Trained

Models

(CRFs,SVMs)

disambiguation
Disambiguation

Title | Query | Address

Original Query

동강밀레니엄래프팅밀레니엄 대한민국 강원도 영월군 영월읍거운리547-1

밀레니엄피시방서현점밀레니엄 대한민국 경기도 성남시 분당구 서현동 307

밀레니엄모텔 밀레니엄 대한민국 광주광역시 북구 오룡동1114-1

서울힐튼호텔밀레니엄 힐튼 서울 대한민국 서울특별시 중구 남대문로5가 395

밀레니엄 힐튼 서울

Input

Document

  • Disambiguation
  • Number of Matched characters query-title, query-original query, query-address
  • (Can be used ) Semantic Type / Personal Annotation DB / Distance between locationLandmark
  • Personal Address book/Search history/GPS log

Finding Target Locations

Location NER

OpenAPI

Map Services

Relation-type Classification

Normalization

서울힐튼호텔: 대한민국 서울특별시 중구 남대문로5가 395 (36.3414225, 127.3914705) (Hotel)

Disambiguation

PersonalInformation

Gazetteer

OUTPUT

Trained

Models

(CRFs,SVMs)