Joe Zucca UPenn PPT

1. Notes:This presentation looks at the Penn Library Data Farm project, a six year effort to think about and build systems that support decision making and assessment needs of a large research library. There are two parts to the talk: the first describes the current state of affairs with the Data Farm. After several years we�ve learned what works well and what challenges are particularly tenacious. Which leads into the second part of the talk: how can we apply lessons learned and how would we proceed if we were beginning anew. Independently of this outline--where are we and where might we go--there is an overarching theme: namely that the resources needed to build decision-support frameworks, to support assessment and intelligence gathering, if you will, are present in the architecture of our digital systems and in the profusion of data and data structures that make up, surround, and connect to these systems. Data structures, data manipulations (in the engineering sense rather than the statistical) and our efforts to exploit them are the underlying themes of this talk.Notes:This presentation looks at the Penn Library Data Farm project, a six year effort to think about and build systems that support decision making and assessment needs of a large research library. There are two parts to the talk: the first describes the current state of affairs with the Data Farm. After several years we�ve learned what works well and what challenges are particularly tenacious. Which leads into the second part of the talk: how can we apply lessons learned and how would we proceed if we were beginning anew. Independently of this outline--where are we and where might we go--there is an overarching theme: namely that the resources needed to build decision-support frameworks, to support assessment and intelligence gathering, if you will, are present in the architecture of our digital systems and in the profusion of data and data structures that make up, surround, and connect to these systems. Data structures, data manipulations (in the engineering sense rather than the statistical) and our efforts to exploit them are the underlying themes of this talk.

2. A report like this on Penn�s Biology fund, with color codes that represent the grandparent, parent and child funds, the title-level purchases, publisher, split among other funds, various dollar amounts, and relevant dates. Here the output represents a third data structure in the chain, and I hasten to note a very processed but still raw data file. If the consumer of this report is interested in producing a time series or something more complicated, that capability is available in excel and she�s free to ask a number of different questions�If she discovers something later that she didn�t know she wanted to know at the start, the data structure-- hopefully -- will be amenable to further manipulation. If not in excel than maybe at the data farm level�.A report like this on Penn�s Biology fund, with color codes that represent the grandparent, parent and child funds, the title-level purchases, publisher, split among other funds, various dollar amounts, and relevant dates. Here the output represents a third data structure in the chain, and I hasten to note a very processed but still raw data file. If the consumer of this report is interested in producing a time series or something more complicated, that capability is available in excel and she�s free to ask a number of different questions�If she discovers something later that she didn�t know she wanted to know at the start, the data structure-- hopefully -- will be amenable to further manipulation. If not in excel than maybe at the data farm level�.

3. As in this case, where selectors came to us after the report builder was designed and asked to be able to pivot funds by publisher. This required a small alteration to the sql in the report script, and some modifications to formatting code dfor excel.As in this case, where selectors came to us after the report builder was designed and asked to be able to pivot funds by publisher. This required a small alteration to the sql in the report script, and some modifications to formatting code dfor excel.

4. Re-structure it in datafarm as a simpler table that can reproduce the fund hierarchies resident in voayger, but in a comprehnesive way, unlike Voyagers own fund reporting clients. The result is Re-structure it in datafarm as a simpler table that can reproduce the fund hierarchies resident in voayger, but in a comprehnesive way, unlike Voyagers own fund reporting clients. The result is

5. I began by saying this talk is about experiments we�ve been doing to repurpose the data that live around us, that Data Farm is in the first place about finding and leveraging that data structures that live around us. As I mentioned these can live in logs, or other highly structured text files, or they can be databases, That these data structures should command attention before we consider what statistics the organization needs, at least from the perspective of developing intelligence frameworks. So let me drill in a little closer and look at a just one example of how this re-structuring is operationalized in data farm because it raises lots of issues for such projects, issues that are both tactical--that�s is how work is accomplished-- and strategic--why are we doing this in the manner we do.. Here are relationships in the voyager database that are necessary to track fund expenditures for books, serials and e-resources. We can invision the data structure as a diagram or as an sql query. We use this set of relationships to harvest spevific data from Voyager and�I began by saying this talk is about experiments we�ve been doing to repurpose the data that live around us, that Data Farm is in the first place about finding and leveraging that data structures that live around us. As I mentioned these can live in logs, or other highly structured text files, or they can be databases, That these data structures should command attention before we consider what statistics the organization needs, at least from the perspective of developing intelligence frameworks. So let me drill in a little closer and look at a just one example of how this re-structuring is operationalized in data farm because it raises lots of issues for such projects, issues that are both tactical--that�s is how work is accomplished-- and strategic--why are we doing this in the manner we do.. Here are relationships in the voyager database that are necessary to track fund expenditures for books, serials and e-resources. We can invision the data structure as a diagram or as an sql query. We use this set of relationships to harvest spevific data from Voyager and�

6. Here�s the high-altitude plan of data farm. We begin in the environment where service occurs and identify lots of potential data sources. They�re the systems behind our catalogs, circulation services, and fund accounting (systems like Voyager or III), they include our web service logs, link resolvers, erms (ered is a home-grown erm and web page delivery application), they include 3rd party systems that might operate at the consortial level (that�s what Sirsi-Dynix is to us) and they include the human-driven services of research and instruction. In the data farm setting, we harvest lots of data from this service sphere as well as information about people and networks in our community. Processes in the management info sphere (The Data Farm Environment) capture, clean, normalize and anonymize these sources and feed them into a central repository, which for us is a relational database. We then build on top of the repository various kinds of tools to enable staff to acquire and interact with data. The analytical work of staff and the decisions they make feedback into the service environment through planning, collection development, staff deployment and other processes.Here�s the high-altitude plan of data farm. We begin in the environment where service occurs and identify lots of potential data sources. They�re the systems behind our catalogs, circulation services, and fund accounting (systems like Voyager or III), they include our web service logs, link resolvers, erms (ered is a home-grown erm and web page delivery application), they include 3rd party systems that might operate at the consortial level (that�s what Sirsi-Dynix is to us) and they include the human-driven services of research and instruction. In the data farm setting, we harvest lots of data from this service sphere as well as information about people and networks in our community. Processes in the management info sphere (The Data Farm Environment) capture, clean, normalize and anonymize these sources and feed them into a central repository, which for us is a relational database. We then build on top of the repository various kinds of tools to enable staff to acquire and interact with data. The analytical work of staff and the decisions they make feedback into the service environment through planning, collection development, staff deployment and other processes.

7. Just to try to represent the scale of Data Farm�Just to try to represent the scale of Data Farm�

25. Tommy, does fund resolution have to happen in this event or can there be an expenditure event that links to this by an id, like item. Could we pull the circ event into DF oracle where tables are maintained for such linkages? Some events may be composites of composites. If spending money is an event, and circulating the purchased thing is a separate event. Are there primary and secondary event processes that allow the layering of event xml�s? This could come up much in courseware, where you have blackboard use event, reference event and expenditure events relative to both.Tommy, does fund resolution have to happen in this event or can there be an expenditure event that links to this by an id, like item. Could we pull the circ event into DF oracle where tables are maintained for such linkages? Some events may be composites of composites. If spending money is an event, and circulating the purchased thing is a separate event. Are there primary and secondary event processes that allow the layering of event xml�s? This could come up much in courseware, where you have blackboard use event, reference event and expenditure events relative to both.

Joe Zucca UPenn PPT

Joe Zucca UPenn PPT

Presentation Transcript

Joe Zucca Assessment, Planning and Publications Librarian University of Pennsylvania

Building Frameworks of Organizational Intelligence Joe Zucca Director for Planning and Communication University

UPENN Collaborative on Community Integration

UPenn ASME

GCC- UPenn

JOE

Upenn/UNC Joint Reconstruction Effort

Joe

UPenn School of Veterinary Medicine

DARPA urban challenge 2007 Upenn – KAIST collaboration

The UPENN and Unique Advantage Partnership

Joe Zucca Assessment, Planning and Publications Librarian University of Pennsylvania

sip.EDU @ UPenn

Joe St Sauver, Ph.D. (joe@uoregon) uoregon/~joe/infragard-2009/

Joe Zucca Assessment, Planning and Publications Librarian University of Pennsylvania

Joe St Sauver, Ph.D. joe@internet2 or joe@oregon.uoregon

Building Frameworks of Organizational Intelligence Joe Zucca