slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
An Integrated Approach to Mapping Worldwide Bioterrorism Research Capabilities PowerPoint Presentation
Download Presentation
An Integrated Approach to Mapping Worldwide Bioterrorism Research Capabilities

Loading in 2 Seconds...

play fullscreen
1 / 37
Download Presentation

An Integrated Approach to Mapping Worldwide Bioterrorism Research Capabilities - PowerPoint PPT Presentation

Download Presentation

An Integrated Approach to Mapping Worldwide Bioterrorism Research Capabilities

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

    1. An Integrated Approach to Mapping Worldwide Bioterrorism Research Capabilities MIS580 Spring 2009 Artificial Intelligence Lab University of Arizona

    2. Outline Introduction Literature Review Research Testbed Research Design Experimental Study Conclusion and Directions References

    3. Bioterrorism Research Since the Anthrax attacks after 9/11, bioterrorism has been given a high priority in national security. Biomedical research that is essential to creating the medicines, vaccines, and technologies to counter the threat of bioterrorism and naturally occurring disease, might also be applied towards biological weapons development. The U.S. Government has attempted to monitor and control biomedical research labs, especially those that study bioterrorism agents/diseases. However, monitoring worldwide biomedical research and researchers is still an issue.

    4. Research Literature Literature resources are very important to scientific research. In general, experimental labs may collect literatures in a special domain. As the trends of scientific cooperation become popular, the needs of sharing literatures turn to be prerequisite (Huang et al., 2003). With the rapid development of science and technology, the number of scientific publications is growing exponentially. With the explosive growth of scientific information, there is an overwhelming amount of journal articles in various research areas (Bruijn and Martin, 2002; Cohen and Hersh, 2005). Research Literature may be used as a resource to monitor bioterrorism research.

    5. Research Objectives In this study, we develop an integrated approach to monitoring and analyzing worldwide bioterrorism research literature by using knowledge mapping techniques. Our objectives are to identify: researchers whohave expertise in bioterrorism agents/diseases research domain, major institutions and countries where these researchers reside, and emerging topics and trends in bioterrorism agents/diseases research.

    6. Bioterrorism Literature Analysis During the past years, biomedical informatics tools have been developed to protect our populations from bioterrorism attacks (Kohane, 2002). Some previous studies used bibliometrics to examine terrorism research publications and offer an evolution view of the development of the field (Kennedy and Lum, 2003; Reid, 1997). Hu et al. (2006) used text mining approach to identify candidate viruses and bacteria as potential bioterrorism weapons from PubMed. However, few efforts have been made to monitor and analyze worldwide bioterrorism research and researchers. Few studies have used knowledge mapping techniques to analyze the research status of the bioterrorism area.

    7. Knowledge Mapping Three types of analysis are often adopted in knowledge mapping research: text mining, network analysis, and information visualization (Chen and Roco, 2008). Text mining consists of two significant classes of techniques: Natural Language Processing (NLP) and content analysis. Social network analysis (SNA) has provided the means for studying the network of productive scholars. Information visualization techniques can be used to turn abstract textual documents into objects that can be displayed.

    8. Natural Language Processing (NLP) In NLP, automatic indexing (Salton, 1989) is a method commonly used to represent the content of a document by means of a vector of keywords or terms (Chen and Roco, 2008). Noun-phrasing techniques can capture a richer linguistic representation of document than the Bag of Words (BOW). Examples of noun-phrasing tools include MITs Chopper, Nptool (Voutilainen, 1997), and Arizona Noun Phraser (Tolle and Chen, 2000). Information extraction is another computationally effective method to identify important concepts from text documents (Chen and Roco, 2008). The best systems have been shown to achieve more than 90% accuracy in both precision and recall rates when extracting persons, locations, organizations, dates, times, currencies, and percentages from newspaper articles (Chinchor, 1998).

    9. Content Analysis By using content analysis, articles that are collected and grouped based on authors, institutions, topic areas, countries, or regions can be analyzed to identify the underlying themes, patterns, or trends (Chen and Roco, 2008). Popular content analysis techniques include: Clustering Algorithms, Self-Organizing Map (SOM), Multidimensional Scaling (MDS), Principal Component Analysis (PCA), Co-word Analysis, and PathFinder Network.

    10. Social Network Analysis (SNA) SNA is capable of detecting subgroups (of scholars), discovering their pattern of interactions, identifying central individuals, and uncovering network organization and structure (Chen and Roco, 2008). Burt (Burt, 1976) applied hierarchical clustering methods based on structural equivalence measure (Lorrain and White, 1971) to detect subgroups in a social network. Blockmodel analysis approach can be used to discover patterns of interactions between subgroups (Wasserman and Faust, 1994; Xu and Chen, 2005). Several measures, such as degree, betweenness, and closeness, are related centrality, which deals with the roles of individuals in a network (Wasserman and Faust, 1994).

    11. Information Visualization The last step in the knowledge mapping process is to make knowledge transparent through the use of various information visualization (or mapping) techniques (Chen and Roco, 2008). Shneiderman (1996) proposed seven types of information representation methods including: 1D (one-dimensional) representation , 2D representation , 3D representation , multi-dimensional representation , tree representation , network representation , and temporal representation.

    12. Research Testbed We built two sets of test data based on human and animal related bioterrorism agents/diseases respectively. For human bioterrorism agents/diseases, we retrieved 178,599 publication records from MEDLINE (1964-2005), by searching article abstracts and titles using 58 keywords from CDCs list of agents by category ( For animal bioterrorism agents/diseases, we retrieved 135,774 publication records from MEDLINE (1965-2005) by searching article abstracts and titles using 58 keywords from OIEs list of diseases by species (

    13. Human Agents/Diseases Research Collection

    14. Animal Agents/Diseases Research Collection

    15. Research Design

    16. Data Acquisition Research articles are retrieved from the MEDLINE database. Compiled by the U.S. National Library of Medicine (NLM) and published on the Web by Community of Science, MEDLINE is the world's most comprehensive source of life sciences and biomedical bibliographic information. It contains nearly eleven million records from over 7,300 different publications from 1965 to November 16, 2005 ( All the related articles are collected by using keyword filtering.

    17. Data Parsing and Cleaning Data Parsing The title, abstract, and authors information of each article are parsed and stored in a relational database. The institutions and countries of the authors are parsed out by using dictionaries of countries, states, cities, and institutions. All the author names of an article are parsed out, but only the first authors institution is kept for later analysis. Facts Consolidating Some variations of foreign institution names, and city names were spot checked and fixed manually.

    18. Data Analysis Productivity Status We use bibliographic analysis to study the productivity of authors, institutions, and countries. We also assess the trends and evolution of bioterrorism agents/diseases research activities . Collaboration Status We use co-authorship analysis to study the collaborations between researchers. We also detect the independent or isolated research groups in the field. Research Trend Topics We use SOMs to study the active research topics, and discover the emerging research topics in different time spans.

    19. Experimental Study Human Agents/Diseases Research Productivity Status Collaboration Status Emerging Topics Animal Agents/Diseases Research Productivity Status Collaboration Status Emerging Topics

    20. Human Agents/Diseases Research: Productivity Status (Country Level)

    21. Human Agents/Diseases Research: Productivity Status (Institution Level)

    22. Human Agents/Diseases Research: Productivity Status (Researcher Level)

    23. Human Agents/Diseases Research: Productivity Status (Researcher Level)

    24. Human Agents/Diseases Research: Collaboration Status Different collaboration networks for authors are generated based on different agents/diseases and regions. The node in the network represents an individual researcher. The bigger the node, the more publications the researcher has published. The link between two researchers means that these two researchers have published one ore more scientific articles together. The thicker the link, the more articles these two authors have published together. We included only researchers who published more than five articles.

    25. Human Agents/Diseases Research: Collaboration Status (by Disease)

    26. Human Agents/Diseases Research: Collaboration Status (by Region)

    27. Content map analysis was used to identify the emerging topics and trends. The nodes in the folder tree and colored regions are topics extracted from research papers. The topics are organized by the multi-level self-organizing map algorithm. The conceptually closer technology topics (according to co-occurrence patterns) are positioned closer geographically. Numbers of papers belong to the topics are presented after the topic labels. The sizes of the topic regions also correspond to the number of documents assigned to the topics. Region color indicates the growth rate of the associated topic: the warmer the color, the higher the growth rate. The growth rate is defined as the number of articles published in the previous time period / the number of articles published in the following time period for a particular topic (region). Human Agents/Diseases Research: Emerging Topics

    28. Human Agents/Diseases Research: Emerging Topics

    29. Animal Agents/Diseases Research: Productivity Status (Country Level)

    30. Animal Agents/Diseases Research: Productivity Status (Institution Level)

    31. Animal Agents/Diseases Research: Productivity Status (Researcher Level)

    32. Animal Agents/Diseases Research: Collaboration Status

    33. Animal Agents/Diseases Research: Emerging Topics

    34. Conclusion and Future Directions Monitoring worldwide bioterrorism research is becoming more and more important and urgent. In this study, we built an integrated approach to mapping worldwide bioterrorism literature and capabilities. We analyzed the productivity status, collaboration status, and emerging topics by using knowledge mapping techniques. In future, we plan to monitor and analyze more bioterrorism agents/diseases together with more literature sources. We also plan to develop and incorporate more advanced analysis and visualization techniques into our approach.

    35. References Bruijn, B. d. and J. Martin (2002). "Literature Mining in Molecular Biology." Proceedings of the EFMI Workshop on Natural Language Processing in Biomedical Applications. Nicosia, Cyprus. March 8-9.: 1-5. Burt, R. S. (1976). "Positions in Networks." Social Forces 55(1): 93-122. Chen, H. and M. Roco (2008). Mapping Nanotechnology Innovations and Knowledge: Global, Longitudinal Patent and Literature Analysis. Chinchor, N. (1998). "MUC-7 test scores introduction." In Proceedings of the Seventh Message Understanding Conference. Cohen, A. M. and W. R. Hersh (2005). "A Survey of Current Work in Biomedical Text Mining." BRIEFINGS IN BIOINFORMATICS 6(1): 5771. Hu, X., X. Zhang, et al. (2006). Text Mining the Biomedical Literature for Identification of Potential Virus / Bacterium as Bioterrorism Weapons, Springer.

    36. References Huang, L., W. Chen, et al. (2003). "Literature Resource Portal Based on Virtual and Dynamic Hierarchical Architecture." Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA'03) 17(3): 329-347. Kennedy, L. W. and C. M. Lum (2003). Developing a Foundation for Policy Relevant Terrorism Research in Criminology, New Brunswick, Rutgers University. Kohane, I. S. (2002). "The Contributions of Biomedical Informatics to the Fight Against Bioterrorism." Biomedical Informatics and Bioterrorism 9: 116-119. Lorrain, F. and H. C. White (1971). "Structural equivalence of individuals in social networks." Journal of Mathematical Sociology 1: 49-80. Reid, E. O. F. (1997). "Evolution of a body of knowledge: an analysis of terrorism research." Information Processing and Management 33(1): 91-106. Salton, G. (1989). Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley Longman Publishing Co., Inc.

    37. References Shneiderman, B. (1996). The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In Proceedings of the IEEE Symposium on Visual Languages, Washington, IEEE Computer Society Press. Tolle, K. M. and H. Chen (2000). "Comparing Noun Phrasing Techniques for Use with Medical Digital Library Tools." Journal of the American Society for Information Science 51(4): 352-370. Voutilainen, A. (1997). "A Short Introduction to NPtool." In Wasserman, S. and K. Faust (1994). Social Networks Analysis: Methods and Applications, Cambridge: Cambridge University Press. Xu, J. J. and H. Chen (2005b). "Criminal Network Analysis and Visualization." Communications of the ACM 48(6): 101-107.