Some thoughts on hpc in natural language engineering
1 / 18

Some Thoughts on HPC in Natural Language Engineering - PowerPoint PPT Presentation

  • Uploaded on

Some Thoughts on HPC in Natural Language Engineering. Steven Bird University of Melbourne & University of Pennsylvania. Sponsorship. Natural Language Engineering: Integrating Parallel and Parametric Processing Victorian Partnership for Advanced Computing Expertise Grant EPPNME092.2003.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Some Thoughts on HPC in Natural Language Engineering' - druce

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Some thoughts on hpc in natural language engineering

Some Thoughts on HPC inNatural Language Engineering

Steven Bird

University of Melbourne &

University of Pennsylvania


Natural Language Engineering: Integrating Parallel and Parametric Processing

Victorian Partnership for Advanced Computing Expertise Grant EPPNME092.2003

Nle application areas
NLE Application Areas

  • Spoken dialogue systems

  • Cross-language information retrieval

  • Word-sense disambiguation

  • Multi-document summarisation

  • Natural language database interfaces

  • Information Extraction

  • Information Retrieval

  • Authoring Tools

  • Language Analysis

  • Language Understanding

  • Knowledge Representation

  • Knowledge Discovery

  • Spoken Language Input

  • Written Language Input

  • Natural Language Generation

  • Spoken Output

  • Multilinguality

  • Multimodality

  • Discourse and Dialogue

Some nle applications in detail
Some NLE Applications in detail

  • Information extraction from broadcast news

    • Tokenization, alignment, entity detection, coreference resolution, semantic mapping

  • Spoken language dialogue systems (SLDS)

    • Speech recognition, parsing, user modelling, discourse management, generation, synthesis

  • Language analysis

    • Interlinear text annotation, lexicon development, morphosyntactic grammar development

Meta activities
Meta Activities

  • Discovery

    • What tools work with data in format X?

    • What lexical resources exist for language Y?

  • Reuse

    • Diverse implementation frameworks

    • Component integration, wrapping, etc

  • Training and evaluation

    • Parametric and parallel processing

    • Comparing systems running on the same data

    • Gold standard vs theory comparison

    • Analyzing interaction logs

Learn about nle
Learn about NLE

  • This department hosts a mirror of the ACL digital anthology

  • 50k pages, 40 years



  • Common components, different arrangements

    • Multiple components for doing the same task

  • Most NLE components convert between information types

    • Parser: from strings to trees

    • ASR: from speech to text

    • Summariser: from text to selected text

  • But:

    • Many processes benefit from other information sources (e.g. exploiting intonation in input)

    • Input and output can be aligned

    • Solution: multilayer annotations

Annotation graphs
Annotation Graphs

  • Labelled digraphs with timestamped nodes

Annotation graphs complex example
Annotation Graphs: complex example

  • AGTK: Annotation Graph Toolkit

    • library, applications


Nle and grids
NLE and Grids

  • NLE Applications

    • typically constructed out of numerous components

    • each component responsible for a specialised task

    • executed against large data sets

  • To use grids in NLE:

    • subscribe to a model which allows automated discovery of data and components

    • flexible design of applications, coordination of execution, storage of results

  • Ideally:

    • view grid as a commodity, hidden from application developers

Architectural components
Architectural Components

  • Data

    • Language resources for analysis

    • E.g. Switchboard, 2400 annotated telephone conversations (26 CDs)

  • Software Components

    • minimal individual functional units

      • e.g. Annotation Server, Alignment, ASR, Data Source Packaging, Format Conversion, Text Annotation, Lexicon Server, Semantic Mapping

    • common interface specification

  • Metadata Repositories

    • Dublin Core Application Profile for NLE resources

  • Application

    • data + components + processing instructions

    • declarative specification in XML

  • Grid Service

    • computational and storage resources for application execution


  • Natural Language Engineering

    • interesting test case for grid services

    • many mature component technologies

    • applications that are both data and processor intensive

    • applications for building the multilingual information society of the future...