Multilingual Access to Large Spoken Archives. Douglas W. Oard University of Maryland, College Park, MD, USA. MALACH Project’s Goal. Dramatically improve access to large multilingual spoken word collections. … by capitalizing on the unique characteristics of the Survivors
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Douglas W. Oard
University of Maryland, College Park, MD, USA
Dramatically improve access to
large multilingual spoken word
… by capitalizing on the unique
characteristics of the Survivors
of the Shoah Visual History
Foundation's collection of
videotaped oral history interviews.
1.5 million words/$
20% of capacity
38% recent use
DeliverySupporting Information Access
Berlin-1939 Employment Josef Stein
Berlin-1939 Family life Gretchen Stein
Dresden-1939 Schooling Gunter Wendt
Language switchingMALACH Overview
English 200 39.6%
Czech 84 39.4%
Russian 20 (of 100) 66.6%
As of May 2003
~2,000 hours to manually transcribe
200 hours from 800 speakers
Hours to transcribe 15 minutes of speech
Training: 65 hours (acoustic model)/200 hours (language model)
Personal useWho Uses the Collection?
Based on analysis of 280 access requests
Rich data collection
High school teacher
Opportunistic data collection
Focus group discussionsObservational Studies
Workshop 1 (June)
Workshop 2 (August)
Entity normalizationMALACH Overview
transcripts aligned with scratchpad-based boundaries
Use available data to estimate the temporal extent of labels in a way that optimizes the utility of the resulting estimates for interactive searching and browsing
Strictly from the point of view of finding out about the topic, how useful is this segment for the requester? This judgment is made independently of whether another segment (or 25 other segments) give the same information.
4 Makes an important contribution to the topic, right on target
3 Makes an important contribution to the topic
2 Should be looked at for an exhaustive treatment of the topic
1 Should be looked at if the user wants to leave no stone unturned
0 No need to look at this at all
Direct evidence for what the user asks for
Directly on topic, direct aboutness. The information describes the events or circumstances asked for or otherwise speaks directly to what the user is looking for. First-hand accounts are preferred, e.g., the testimony contains a report on the interviewee's own experience, or an eye-witness account on what happened, or self-report on how a survivor felt. Second-hand accounts (hearsay) are acceptable, such as a report on what an eyewitness told the interviewee or a report on how somebody else felt.
* Direct Evidence *- Evidence that stands on its own to prove an alleged fact, such as testimony of a witness who says she saw a defendant pointing a gun at a victim during a robbery. Direct proof of a fact, such as testimony by a witness about what that witness personally saw or heard or did. ('Lectric Law Library's Lexicon)
Provides indirect evidence on the topic, indirect aboutness (data from which one could infer, with some probability, something about the topic, what in law is known as circumstantial evidence) Such evidence often deals with events or circumstances that could not have happened or would not normally have happened unless the event or circumstance of interest (to be proven) has happened. It may also deal with events or circumstances that precede the events or circumstances of interest, either enabling them (establishing their possibility) or establishing their impossibility. This category takes precedence over context. One could say that provides indirect evidence also provides context (but the reverse is not true).
* Circumstances, Circumstantial Evidence * Circumstantial evidence is best explained by saying what it is not - it is not direct evidence from a witness who saw or heard something. Circumstantial evidence is a fact that can be used to infer another fact.
Provides background / context for topic, sheds additional light on a topic, facilitates understanding that some piece of information is directly on topic.
So this category covers a variety of things. Things that influence, set the stage, or provide the environment for what the user asks for. (To take the law analogy again any things in the history of a person who has committed a crime that might explain why he committed it).
Includes support for or hindrance of an activity that is the topic of the query andactivities or circumstances that immediately follow on the activity or circumstance of interest.
In a way, this category is broader than indirect If a context element can serve as indirect evidence, indirect takes precedence.
Provides information on similar / parallel situations or on a contrasting situation for comparison
The basic theme of what the user is interested in, but played out in a different place or time or type of situation.
Comparable segments will be those segments that provide information either on similar/parallel topics, or on contrasting topics. This type of relevance relationship identifies items that can aid understanding of the larger framework, perhaps contributing to identification of query terms or revision of search strategies. An example would be a segment in which an interviewee describes activities like activities described in a topic description, but which occurred at a different place or time than the topic description