lt 4 el wp1 setting the scene wp leader uaic univ ai i cuza of iasi faculty of computer science n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene PowerPoint Presentation
Download Presentation
Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene

Loading in 2 Seconds...

play fullscreen
1 / 28

Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene - PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on

LT 4 eL - WP1 : Setting the scene WP leader: UAIC Univ . AI. I. Cuza of Iasi Faculty of Computer Science. Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene Contact: dcristea@info.uaic.ro. Utrecht Review Meeting, February 1, 2007. Objectives.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Dan Cristea, Corina Forăscu, Dan Tufiş, Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene' - haruki


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lt 4 el wp1 setting the scene wp leader uaic univ ai i cuza of iasi faculty of computer science

LT4eL - WP1: Setting the sceneWP leader: UAICUniv. AI. I. Cuza of IasiFaculty of Computer Science

Dan Cristea, Corina Forăscu, Dan Tufiş,

Ionuţ Pistol, Diana Trandabăţ, Adrian Iftene

Contact: dcristea@info.uaic.ro

Utrecht Review Meeting, February 1, 2007

objectives
Objectives
  • inventarization and classification of existing tools necessary for the development of the relevant functionalities (i.e. key word extractor, glossary candidate detector);
  • collection and normalization of the learning material related to the use of the computer in education (Humanities, Social Sciences);
  • investigation of IPR issues;
  • adoption of relevant standards for linguistic annotation of learning objects;
  • dissemination of the results through a Web portal
partners in wp1
Partners in WP1
  • Utrecht University (UU), The Netherlands
  • University of Hamburg (UHH), Germany
  • University of Lisbon (FFCUL), Portugal
  • Charles University Prague (CUP), Czech Republic
  • Institute for Parallel Processing, Bulgarian Academy of Sciences (IPP-BAS), Bulgaria
  • University of Tübingen (UTU), Germany
  • Institute of Computer Science, Polish Academy of Sciences (ICS-PAS), Poland
  • Zürich University of Applied Sciences Winterthur (ZHW), Switzerland
  • University of Malta (UOM), Malta
slide4

LMS

User Profile

LING. PROCESSOR

EN

GE

Lemmatizer, POS, Partial Parser

Ontology

CROSSLINGUAL RETRIEVAL

Lexikon

Lexikon

Lexicon

Lexikon

Lexicon

Lexikon

Lexikon

Lexikon

Lexikon

RO

PT

PL

CZ

BG

DT

MT

PT

GE

PL

RO

DT

MT

EN

CZ

Documents SCORM

Pseudo-Struct.

Basic XML

CONVERTOR 2

Documents SCORM

Documents HTML

Pseudo-Struct

Glossary

CONVERTOR 1

Metadata (Keywords)

Ling. Annot XML

BG

EN

Documents User

(PDF, DOC, HTML,

SCORM,XML)

REPOSITORY

the portal
The Portal
  • A working space:
    • Repository for resources, tools, deliverables
    • Exchange information among participants
    • Statistics
  • Hosted by UAIC:
    • January 2007: 1.15 Gb (without realTimeStat, searchForm, upload/updateForm)
  • Address: http://consilr.info.uaic.ro/uploads_lt4el
    • Username: guestLt4eL
    • Passwd: elearning

Demo version on CD

o1 collection of language resources and tools 1
O1. Collection of language resources and tools (1)
  • Inventarization and classification of existing tools (http://consilr.info.uaic.ro/uploads_lt4el/tools/all.php?) relevant to:
    • the integration of language technology resources in eLearning (WP2)
    • the integration of semantic knowledge (WP3)
o1 collection of language resources and tools 2
O1. Collection of language resources and tools (2)
  • Inventarization and classification of existing language resources
    • corpora and frequencies lists:http://consilr.info.uaic.ro/uploads_lt4el/menu/all.php
    • lexica: http://www.let.uu.nl/lt4el/wiki/index.php/Lexica_Joint_Table
o2 collection of los the portal
O2. Collection of LOs: the portal

Uploads, updates & real-time statistics at http://consilr.info.uaic.ro/uploads_lt4el/

Criteria (→ attributes):

  • Subdomains relevant for beginners in IST & e-learning → Domain
  • Multilingualism → Language
  • Medium sized documents → Numberofwords
  • IPR~clear → IPR
  • Uniformity in topics →keywordsselected initially
collection of los domains
Collection of LOs: domains

1. Use of computers in education, with sub-domains:

1.1 Teaching academic skills, with sub-domains:

1.1.1 Academic skills

1.1.2 Relevant computer skills for the above tasks (MS Word, Excel, Power Point, LaTex, Web pages, XML)

1.1.3 Basic skills (use of computer for beginners) (chats, e-mail, Intenet)

1.2 e-Learning, e-Marketing

1.3 The I*Teach document (Leonardo project, http://i-teach.fmi.uni-sofia.bg/)

1.4 Impact of use of computers in society

1.5 Studies about use of computers in schools / high schools

1.6 Impact of e-Learning on education

2. Calimera documents (parallel corpus developped in the Calimera FP5 project, http://www.calimera.org/ )

collection of los annotation layers
Collection of LOs: annotation layers
  • Initial documents: doc, pdf, html, txt → Base-XML
  • Linguistic annotation: tokens, POS, lemma, chunks → WP2 XML format (LT4ELAna.dtd)
  • Keywords, definitions and ontology links annotations
level 1 conversions
Level 1 conversions

doc

pdf

latex

other

doc → html

html

plain text

Base-XML

level 1 conversions doc html utf 8
Level 1 conversions doc → html (UTF-8)
  • MS Office: Save As html
  • OpenOffice Writer SXC/ODT: Save As html
level 1 conversions1
Level 1 conversions

doc

pdf

latex

other

pdf → html

html

plain text

Base-XML

level 1 conversions pdf html utf 8
Level 1 conversions: pdf → html (UTF-8)

1. Adobe on-line conversion tool

2. pdfbox (Windows)

3. pdftohtml (Linux)

4. OpenOffice

5. Adobe Acrobat Professional

level 1 conversions2
Level 1 conversions

doc

pdf

latex

other

html

plain text

Base-XML convertor

Base-XML

level 1 conversions html base xml
Level 1 conversions: html → Base-XML
  • The UAIC Java converter
    • keeps all the tags possibly useful (fixed)
    • produces a log of all the removed tags/data
  • The CUP html2xml.pl converter
    • tags kept according to a DTD
collection of los second level
Collection of LOs: second level

morpho

tok

pos

lemma

NP

Language specific tools

tok-pos-lemma

WP2 XML format

collection of los second level1
Collection of LOs: second level

morpho

tok

pos

lemma

NP

tok-pos-lemma

scripts

WP2 XML format

collection of los kw extractor
Collection of LOs: KW extractor

WP2 XML format

Level 2

KW extractor

Level 3

Man KD XML

Auto KD XML

collection of los kw extractor1
Collection of LOs: KW extractor

WP2 XML format

Level 2

Level 3

Man KD XML

Auto KD XML

KW extractor evaluation

collection of los third level
Collection of LOs: third level

Man KD XML

Auto KD XML

def extractor

Incl. km.xml, dm.xml

Incl. akw, adef

akw: automatically annotated kws

adef: automatically annotated defs

kmxml: manually annotated kws

dmxml: manually annotated defs

collection of los third level1
Collection of LOs: third level

Man KD XML

Auto KD XML

def extractor

Incl. km.xml, dm.xml

Incl. akw, adef

akw: automatically annotated kws

adef: automatically annotated defs

kmxml: manually annotated kws

dmxml: manually annotated defs

def extractor evaluation

o pen issues
Open issues
  • Convertors
    • Tables, figures, page look…
  • IPRs
    • Clarify the IPR status
      • authors & EU + national legislation
    • Define IPR categories for LOs:
      • usage (free, restricted, for research...)
wp1 over time
WP1 over time

Official end of WP1

Beginning of project

D1.1

Evaluation

December 05

May 06

Now

February 06

  • Structure & functionalities to the portal
  • BaseXML convertors
  • new LOs

Initial collection on Portal

  • Levels 2&3 additions
  • new tools
  • grammars
  • guides, docs
  • - ontology, TermLex
proposal the hierarchy seen as a processing environment

tok

akw

txt

axml

doc

pdf

latex

html

other

tpl

morpho

adef

pos

lemma

NP

wp2xml

sxml

Level 1

Level 2

Level 3

Proposal: the hierarchy seen as a processing environment
conclusions
Conclusions
  • LOs, resources and tools collected
  • Initially: portal seen as a repository
  • Now: portal potentially integrated with the LMS as a processing environment