1 / 10

Beespace Prototype Design Meeting Entity Recognition

Beespace Prototype Design Meeting Entity Recognition. Jing Jiang 09/28/2005. Entity Recognition in Prototype V1. Target entities: gene names Supervised learning: LingPipe (word trigram and tag bigram model) Training data: BioCreative (manually annotated)

sven
Download Presentation

Beespace Prototype Design Meeting Entity Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beespace Prototype Design MeetingEntity Recognition Jing Jiang 09/28/2005

  2. Entity Recognition in Prototype V1 • Target entities: gene names • Supervised learning: LingPipe (word trigram and tag bigram model) • Training data: • BioCreative (manually annotated) • Drosophila (generated from gene lists)

  3. Sample Results • http://sifaka.cs.uiuc.edu/jiang4/Beespace

  4. Performance • Some gene names without explicit mention of “gene” can be captured • E.g., “glutathione S-transferase” • Problems • Gene-like phrases, e.g., “China 2”, “13.8” • Mismatch of gene name boundaries and noun phrase boundaries, e.g., “nicotinic” in “nicotinic pathway”

  5. V2 -- Entity Types • Annotation guideline for BioCreative • Guideline for Beespace? • Ontology? (GENIA ontology) • What to tag? • Genes and proteins • Family of genes • Gene descriptions • Entity boundaries and noun phrase boundaries • Tag only noun phrases that refer to genes or tag any occurrence of a gene name inside a noun phrase?

  6. Sample Sentences • A dose-dependent transactivation of human hARE-mediated chloramphenicol acetyltransferase (cat) geneexpression was observed upon treatments of the Hepa-1 transfectants with TPA, a known inducer, as well as with CAPE. • In the present study, we identified its preferred binding sequence as 5'-CCCTATCGATCG-ATCTCTACCT-3' and characterized its DNA -binding properties using truncated Mblk-1 mutants.

  7. Sample Sentences (cont.) • At least two kinds of nicotinic receptors seem to be involved in honeybee memory, an alpha-bungarotoxin-sensitive and an alpha-bungarotoxin-insensitive receptor. • The involvement of nicotinic pathways in memory formation and retrieval processes was tested by injecting…

  8. Sample Sentences (cont.) • We report the cloning of a honeybee CSP gene calledASP3c, as well as the structural and functional characterization of the encoded protein. • Natural occurring variatioin in npr-1, a gene encoding a putative receptor for an NPY-like molecule, causes variation in feeding behaviour.

  9. Sample Sentences (cont.) • The gene encoding ZENK, an EARLY IMMEDIATEGENE well known in other learning and memory contexts, has figured prominently in molecular songbird research thus far. • This is because frequent contacts of these types cause an increase in the expression of the gene encoding a glucocortocoidreceptor in the hippocampus, and…

  10. Training Data • Dictionary • Rules/guidelines • Bootstrapping • Cross-domain training • Can training data in other domains (fly, human, etc.) still be useful?

More Related