slide1
Download
Skip this Video
Download Presentation
Jürgen Krause University of Koblenz-Landau and Social Science Information Centre (IZ-Bonn)

Loading in 2 Seconds...

play fullscreen
1 / 24

Jürgen Krause University of Koblenz-Landau and Social Science Information Centre (IZ-Bonn) - PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on

Current research information as part of digital libraries and the heterogeneity problem. Integrated searches in the context of databases with different content analyses . CRIS2002, Kassel. Jürgen Krause University of Koblenz-Landau and Social Science Information Centre (IZ-Bonn)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Jürgen Krause University of Koblenz-Landau and Social Science Information Centre (IZ-Bonn)' - ebony-blake


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Current research information as part of digital libraries and the heterogeneity problem.Integrated searches in the context of databases with different content analyses.CRIS2002, Kassel

Jürgen Krause

University of Koblenz-Landau and Social Science Information Centre (IZ-Bonn)

Lennéstr. 30, 53113 Bonn, Germany,

mailto:[email protected]

slide3
“Scientists are increasingly using search engines to locate research of interest; some rarely use libraries, locating research articles primarily online ... About 85% of users use search engines to locate information.”
  • Lawrence/Giles (1999:107)

“It doesn’t matter what you want to know, there are people in the Internet who already have this knowledge and want to help you”

(Hahn, 1999: 107)

weizenbaum 2000 2001 germany
Weizenbaum 2000/2001 Germany
  • „Das Internet ist ein großer Misthaufen ...“
  • ( „The internet is a large dunghill ...“
  • Nov. 2000 „Gutenbergs Folgen“ Kongreß Mainz
  • Mai 2001 Fachseminar Hamburg
  • www.heise.de/newsticker/data/wst-03.05.01-001/
www today a worst case of the ambiguity problem
WWW today:a) “Worst case” of the ambiguity problem

Out of the estimated 800 million pages on around three million servers, only 6% relate to the fields of science and education (by comparison: 1.5% relate to pornography).

NEC 1999

NEC 2000:thousand million

  • the “worst case” for the ambiguity problem
  • No reasonable results can be obtained without additional conceptual components
summary and consequences
Summary and consequences

When used for specialist information retrieval (IR), general WWW search engines run counter to nearly every criterion which actually permits a successful search based on IR knowledge. This involves all the main components of an IR system, the database and its selection, the use of research logic and user expectations. Based on his/her knowledge of these aspects, the user should develop the best possible research strategy, something which is impossible with WWW search engines

slide7

Nevertheless WWW search engines have one advantage compared with current specialist databases: embedded in an enormous volume of irrelevant data is data which is not found in specialist databases and which may be of value to experts. This means that it is simply not possible to return to the recommendation to narrow down the search to the original specialist databases. New ways have to be found to make research, including WWW sources, more satisfactory than is the case at present using general WWW search engines.

conceptual gaps
Conceptual gaps

Additional to technological integration:

  • Different stages of content analysis of textual data:
    • an intelligently indexed term in a library classification
    • ……
    • automatic full text indexing in fully unstructured data pools

Descriptor A in one such system:  wide range of meanings

research projects iz bonn
Research Projects IZ Bonn

ViBSoz„Social Science Virtual Library“, Virtual Library Project of the German Research Association (DFG)

CARMEN „ Content Analysis, Retrieval and Metadata: Effective Networking“, special support program of the German Ministry of Education and Research (BMBF).

ELVIRA “Electronic Retrieval and Analysis System for Industrial Associations”, funded by the German Federal Ministry of Economics and Technology

ETB “The European Schools Treasury Browser” funded by the European Commission

metadata
Metadata

U.S. Bureau of the Census: Integrated Information solutions – The future of census bureau data access and dissemination, Sept. 1999. Working paper

“Recent surveys of Census Bureau customers show that two out of three use multiple data sets. ... If we continue to saddle data users with the burden of putting data from disparate sources into digestible forms, we do it at the risk of our own peril.“(p.2)

“Solutions of these issues ... will remove around the further development of standards, metadata ...“ (p.3)

“IIS will help minimize data user burden, data uncertainty and maximize data quality and usefulness through the use of metadata“ (p.2)

din sict paper german position
DIN SICT paper: German position

„Strategie für die Standardisierung der Informations- und Kommunikationstechnik (ICT)“ (DIN Berlin 2002, draft)

... It is ... necessary to find a new concept relating to the still existing demand for consistency retention and interoperability. This concept can be described by means of the following premise: standardization must be considered in terms of the remaining heterogeneity. Only joint interaction between intellectual and automatic processes for the treatment of heterogeneity and standardization will produce a solution strategy which also ensures, under present-day marginal conditions, usable consistency and interoperability conditions

(translation from German)

carmen and vibsoz coping with heterogenity by transfer components
CARMEN and ViBSoz:Coping with heterogenity by transfer components

Documents

Metadata

Retrieval

  • Coping with heterogeneity
  • Cross-concordances
  • Statistical transformation and neural networks
  • Deductive methods

extract metadata from various document formats algorithmically

mathematics physics msc and pacs
Mathematics – Physics: MSC and PACS

statistical:

PACS 62.30.+d Mechanical and elastic waves; vibrations (Mechanische und elastische Wellen, Schwingungslehre)

MSC 74S15 Boundary element methods (Randelementmethode)

intellectual:

PACS 62. Not connected

example semantic pragmatic relation

Einfache Suche

Suchbegriff

Dominanz

(dominance)

Zahl der relevanten Treffer

16

Example: semantic-pragmatic relation

G. Binder

slide17

Erweiterte Suche

Transferbegriffe

Dominanz, Messen, Mongolei, Nichtregierungsorganisation, Flugzeug, Datenaustausch, Kommunikationsraum, Kommunikationstechnologie, Medienpädagogik,

Zahl zusätzliche relev. Treffer

7

Anteil der zusätzlichen relev. Treffer an den zusätzl. Treffern

50%

  • Mitglieder des Vereins [email protected] reisten zur UNO-Frauenkonfernez nach Beijing. Auf der Fahrt durch die Mongolei und die Wüste ...

G. Binder

standard method one step transformation
Standard method: one step transformation
  • question

non-differentiated handling of vagueness

B

A

C

document term sets

two step transformation
Two step transformation
  • question

V1: Handling of vagueness between questions and terms

A

V2

B

V3

C

document term sets

V2/V3:

Bilateral handling of vagueness

jugendarbeitslosigkeit

B

A

C

Jugendarbeitslosigkeit

Jugendarbeitslosigkeit

Youth unemployment

SWD

USB Köln

Thesaurus A

X from A

X from A

Broker

Jugendlicher +Arbeitslosigkeit

Y from B

IZ Thesaurus

IZ Soz.

Thesaurus B

Z from C

Thesaurus C

statistical and neural networks transformation
Statistical and Neural networks transformation
  • Co-occurence-based similarity
    • In ViBSoz: statistical crosswalk between two different thesauri (SWD as a universal thesaurus and the IZ thesaurus for the social sciences as a special thesaurus),
    • in ELVIRA between a thesaurus for data and free text terms
  • Transformation networks
    • USB Thesaurus to the IZ Thesaurus
    • the USB Thesaurus or IZ Thesaurus to the IZ

 Precision

 LSI and Transformation network x Statistical methods

Recall

 Fig. 3: Transformation network USB Thesaurus to IZ Thesaurus (Fig. 7-12 from Mandl 2000:206)

conclusion
Conclusion

Todays search engines do not adequately solve the problem of a worldwide search for relevant documents and data in a special scientific community. They only represent an incomplete, albeit valuable first step. Users want to interlink literature and research project databases with the catalogues of virtual libraries, the WWW homepages of science institutions and fact sources, e.g. data archives with their survey data. In this case integration should not be performed only on a technical level or using solely intellectually created links, as is the case at present. A key role is played here by automatic transfer between different content analysis methods and standardizations of the document sets to be integrated. Based on the initial empirical results of different IZ projects, the proposed strategy appears to be highly promising: vagueness problems are not treated non-specifically as a transfer between all documents and the query but will be done cognitively plausible with individual bilateral modules.

ad