Rencontres TEI Council Lyon 2009 - PowerPoint PPT Presentation

rencontres tei council lyon 2009 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Rencontres TEI Council Lyon 2009 PowerPoint Presentation
Download Presentation
Rencontres TEI Council Lyon 2009

play fullscreen
1 / 6
Rencontres TEI Council Lyon 2009
97 Views
Download Presentation
uma
Download Presentation

Rencontres TEI Council Lyon 2009

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Rencontres TEI CouncilLyon 2009 Serge Heiden ICAR Laboratory / Lyon University slh@ens-lsh.fr Council, ENS-LSH, Lyon (France), 1 April 2009

  2. Context (1/2) • Project objective (2007-2009) : To develop an open-source software platform for Textometry analysis of textual data • Partners : • Univ. of Lyon (lead) [Weblex] • Univ. of Nice [Hyperbase] • Univ. of Franche-Comté [Diatag] • Univ.of Paris 3 [Lexico] • Univ. of Oxford [Xaira] • Univ. of Montréal [Sato] • Web sites : • http://textometrie.ens-lsh.fr (project site) • http://textometrie.sourceforge.net (dev site) • And others : • Univ. of Chicago[PhiloLogic]

  3. Context (2/2) • Textometry methodology • TEI encoded and NLP enriched textual data analysis • Qualitative data analysis • Deep Text Search Engine, kwic concordances • Hyper Textual data rendering and navigation • Quantitative data analysis • factorial analysis, classification, specificity • N-gram analysis, cooccurrence, collocation, burst

  4. TEI Role and Usage • Open-source contract between data and software • Textometry point of view for data input from TEI : • Textual dimensions (main language, secondary language, cited text, out of text - comments, notes, titles…) <index> • Lexical units (words, phrases…) and their properties (pos, lemma…) <w> • Contextual units (sentence, verse, chapter, text…) and their properties (language, number, domain, genre…) <s> • Contrasts between units • Structural units (navigation : physical - page, logical) <pb/> • References (unit coordinates based on their properties) • Rendering (device, segmentation, style) • Alignment (between two corpora)

  5. Discussion (1/2) : Textometry related TEI element types(BFM : A. Lavrentiev) • Tokenize words (segment + value) • >= : expan|note|name|s • = : w|abbr|num • < : c|ex • Segment sentences (segment + value) • > : TEI|text|front|body|div|head|trailer|p|ab|sp|speaker|list • >~ : q|quote|item • Transversal : • ~ : choice|corr|sic|add|del|reg|orig|foreign|hi|title|supplied|subst|damage|pb|lb|milestone|gap • Meta : note, teiHeader • Primary linguistic content of a text : index ? • NLP results : specify stand-off

  6. Discussion (2/2) : Software related information • bind software parameters to TEI texts • meta.xml file of the ODT format • corpus_parameters.xml of Xaira software => external pointer in teiHeader (like image or audio files) ?