Biomedical articles per year

Biomedical articles per year

Questions of biomedical experts Yes/No question “Are there any DNMT3 proteins present in plants?” Exact Answer “Yes” Ideal Answer • “Yes. The plant DOMAINS REARRANGED METHYLTRANSFERASE2 (DRM2) is a homolog of the mammalian de novo methyltransferase DNMT3. DRM2 contains a novel arrangement of the motifs required for DNA methyltransferase catalytic activity.”

Questions of biomedical experts Factoid question “What is the methyl donor of DNA (cytosine-5)-methyltransferases?” Exact Answer “S-adenosyl-L-methionine” Ideal Answer • “S-adenosyl-L-methionine (AdoMet, SAM) is the methyl donor of DNA (cytosine-5)-methyltransferases. DNA (cytosine-5)-methyltransferasescatalyze the transfer of a methyl group from S-adenosyl-L-methionine to the C-5 position of cytosine residues in DNA.”

Questions of biomedical experts (III) List question “Which species may be used for the biotechnological production of itaconic acid?” Exact Answer “Aspergillusterreus”, “Aspergillusniger”, “Ustilagomaydis” Ideal Answer • “In 1955, the production of itaconic acid was firstly described for Ustilagomaydis. Some Aspergillus species, like A. itaconicus and A. terreus, show the ability to synthesize this organic acid and A. terreus can secrete significant amounts to the media. Itaconic acid is mainly supplied by biotechnological processes with the fungus Aspergillusterreus. Cloning of the cadA gene into the citric acid producing fungus A. niger showed that it is possible to produce itaconic acid also in a different host organism.”

Questions of biomedical experts (III) Summary question “How do histone methyltransferases cause histone modification?” Exact Answer - Ideal Answer • “Histone methyltransferases (HMTs) are responsible for the site-specific addition of covalent modifications on the histone tails, which serve as markers for the recruitment of chromatin organization complexes. There are two major types of HMTs: histone-lysine N-Methyltransferases and histone-arginine N-methyltransferases. The former methylate specific lysine (K) residues such as 4, 9, 27, 36, and 79 on histone H3 and residue 20 on histone H4. The latter methylate arginine (R) residues such as 2, 8, 17, and 26 on histone H3 and residue 3 on histone H4. Depending on what residue is modified and the degree of methylation (mono-, di- and tri-methylation), lysine methylation of histones is linked to either transcriptionally active or silent chromatin.”

Finding relevant snippets

Not only texts: ontologies, linked data, …

Information from structured data List question “Which forms of cancer is the Tpl2 gene associated with?” Related RDF triple • Subject: http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseases/3003 (lung cancer) • Predicate: http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/associatedGene • Object: http://www4.wiwiss.fu-berlin.de/diseasome/resource/genes/TPL2" Related concepts http://www.disease-ontology.org/api/metadata/DOID:162 (cancer) http://www.uniprot.org/uniprot/M3K8_RAT (TPL2 synonym)

BioASQ Vision • Make sure this knowledge is used to the benefit of patients • Need to make it accessible to biomedical experts • Search is not effective enough • Push research in automated answering of questions • A challenge for such systems can achieve a multiplying effect

What is BioASQ? A challenge funded by the European Union (FP7). • Task a: Hierarchical text classification • Organizers distribute newunclassifiedPubMed articles. • Participants assign MeSH termsto the articles. • Evaluation based on annotations of PubMedcurators. • Task b: IR, QA, summarization, … • Organizers distribute English biomedical questions. • Participants provide: relevant articles, snippets, concepts, triples, “exact” answers, “ideal” answers. • Evaluation: both automatic (GMAP, MRR, ROUGE etc.) and manual (by biomedical experts).

The challenge Taska Task b

Behind the scenes

BioASQ Platform

Datasets Task b data contain gold articles, snippets, concepts, triples, “exact” and “ideal” answers prepared by biomedical experts from around Europe.

Data sources Theyinclude both text and structured info. • PubMed abstracts, PubMed Central articles,MeSH. • Gene Ontology, UniProt, Jochem, Disease Ontology.

Annotation: questions and queries

Annotation: snippets

Annotation: answers

Assessment: relevance of material

Assessment: information in answers

BioASQ social network

Oracle

Two cycles 2013 Schedule The official challenge is over, but… • Task acontinues to run each week . • An oracle for task b will be available soon. • Oracles will remain available. • Third cycle is being designed … 2014 Schedule March 2013 June 2013 August 2013 September 2013 February 2014 March 2014 May 2014 September 2014

Challenge participants so far

Challenge participants in each cycle

Evaluation measures Task a: Hierarchical text classification Flat measures for multi-label classification: Accuracy, MiF, MaF, EBF Hierarchical measures: LCA-F (new), HF • Task b: IR, QA, summarization, … • Phase A: • standard IR measures, mean precision, mean recall, mean F-measure, MAP (used for winners selection), G-MAP • Phase B: • ‘Exact answers’ (based on type): accuracy (yes/no), strict/lenient accuracy, MRR (factoid), mean F-measure (list) • ‘Ideal answers’: manual scores from the experts {Readability, Repetition, Information Precision and Recall}, plus ROUGE

First year technology/results overview • Task 1a • Mainly SVMs and learning-to-rank. • Mostly flat classification, ignoring class taxonomy. • Mediocre results by hierarchical methods. • One of the systems outperformed NLM’s system. • Task 1b • Phase A (retrieve relevant documents, concepts, snippets, triples): low performance (compared to baselines). • Phase B (formulate ‘exact’ and ‘ideal’ answers): poor performance for ‘exact’ answers (except for yes/no questions); high performance for ‘ideal’ answers (paragraph-sized summaries), but starting with gold documents, snippets etc. • Large scope for improvements, esp. in Task 1b.

“Exact” answer results (batch 2/3)

“Ideal” answer results (batch 2/3)

Results – task a – flat measures

Results – task a – hierarchical

First challenge prizes

Sustainability Making the challenge viable, at very low cost, after the end of the project • BioASQ Oracle • Software release and installation instructions • Benchmark datasets • BioASQ social network • Involvement of the biomedical community in the process • Attracting sponsors for prizes

Project Consortium • National Centre for Scientific Research “Demokritos” -NSCR “D” (EL) • Transinsight GmbH – TI (D) • Universite Joseph Fourier- UJF (F) • University Leipzig - ULEI (D) • Universite Pierre et Marie Curie Paris 6 – UPMC (F) • Athens University of Economics and Business – Research Centre – AUEB-RC (EL)

Project Consortium

Get in touch! • BioASQ workshop @CLEF (Sheffield, Sept 14) • Visit www.bioasq.orgFollow@BioASQ

Useful Links • BioASQ Annotation & assessment tools: • http://at.bioasq.org/ • http://assess.bioasq.org/ • https://github.com/AKSW/BioASQ-AT • BioASQ social network: • http://sn.bioasq.org/ • https://github.com/AKSW/BioASQ-SN • BioASQ platform: • http://bioasq.lip6.fr/ • BioASQ Oracles: • http://bioasq.lip6.fr/oracle/ • A. Kosmopoulos, I. Partalas, E. Gaussier, G. Paliouras, I. Androutsopoulos, Evaluation Measures for Hierarchical Classification: a unified view and novel approaches. Data Mining and Knowledge Discovery (To appear)

Biomedical articles per year

Biomedical articles per year

Presentation Transcript

Biomedical Informatics Year in Review

Articles

250 per month your first year 300 per month your second year 350 per month your third year 400 per month your fourth

Biomedical articles per year

Phishing Attacks per Year

Yield s ( Gallons of oil per acre per year )

Yield s ( Gallons of oil per acre per year )

Yield s ( Gallons of oil per acre per year )

Articles

ARTICLES

Points needed per year: 30

Semi-Automatic Indexing of Full Text Biomedical Articles

Articles

Number of visits per year

ARTICLES

Articles

$21 per paycheck $1092 per year

Saving $625,000 Per Year

Towards Improving Classification of Real World Biomedical Articles

Articles