slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Joe Zucca UPenn PPT PowerPoint Presentation
Download Presentation
Joe Zucca UPenn PPT

Loading in 2 Seconds...

  share
play fullscreen
1 / 34
Download Presentation

Joe Zucca UPenn PPT - PowerPoint PPT Presentation

Antony
320 Views
Download Presentation

Joe Zucca UPenn PPT

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

    1. Notes:This presentation looks at the Penn Library Data Farm project, a six year effort to think about and build systems that support decision making and assessment needs of a large research library. There are two parts to the talk: the first describes the current state of affairs with the Data Farm. After several years weve learned what works well and what challenges are particularly tenacious. Which leads into the second part of the talk: how can we apply lessons learned and how would we proceed if we were beginning anew. Independently of this outline--where are we and where might we go--there is an overarching theme: namely that the resources needed to build decision-support frameworks, to support assessment and intelligence gathering, if you will, are present in the architecture of our digital systems and in the profusion of data and data structures that make up, surround, and connect to these systems. Data structures, data manipulations (in the engineering sense rather than the statistical) and our efforts to exploit them are the underlying themes of this talk.Notes:This presentation looks at the Penn Library Data Farm project, a six year effort to think about and build systems that support decision making and assessment needs of a large research library. There are two parts to the talk: the first describes the current state of affairs with the Data Farm. After several years weve learned what works well and what challenges are particularly tenacious. Which leads into the second part of the talk: how can we apply lessons learned and how would we proceed if we were beginning anew. Independently of this outline--where are we and where might we go--there is an overarching theme: namely that the resources needed to build decision-support frameworks, to support assessment and intelligence gathering, if you will, are present in the architecture of our digital systems and in the profusion of data and data structures that make up, surround, and connect to these systems. Data structures, data manipulations (in the engineering sense rather than the statistical) and our efforts to exploit them are the underlying themes of this talk.

    2. A report like this on Penns Biology fund, with color codes that represent the grandparent, parent and child funds, the title-level purchases, publisher, split among other funds, various dollar amounts, and relevant dates. Here the output represents a third data structure in the chain, and I hasten to note a very processed but still raw data file. If the consumer of this report is interested in producing a time series or something more complicated, that capability is available in excel and shes free to ask a number of different questionsIf she discovers something later that she didnt know she wanted to know at the start, the data structure-- hopefully -- will be amenable to further manipulation. If not in excel than maybe at the data farm level.A report like this on Penns Biology fund, with color codes that represent the grandparent, parent and child funds, the title-level purchases, publisher, split among other funds, various dollar amounts, and relevant dates. Here the output represents a third data structure in the chain, and I hasten to note a very processed but still raw data file. If the consumer of this report is interested in producing a time series or something more complicated, that capability is available in excel and shes free to ask a number of different questionsIf she discovers something later that she didnt know she wanted to know at the start, the data structure-- hopefully -- will be amenable to further manipulation. If not in excel than maybe at the data farm level.

    3. As in this case, where selectors came to us after the report builder was designed and asked to be able to pivot funds by publisher. This required a small alteration to the sql in the report script, and some modifications to formatting code dfor excel.As in this case, where selectors came to us after the report builder was designed and asked to be able to pivot funds by publisher. This required a small alteration to the sql in the report script, and some modifications to formatting code dfor excel.

    4. Re-structure it in datafarm as a simpler table that can reproduce the fund hierarchies resident in voayger, but in a comprehnesive way, unlike Voyagers own fund reporting clients. The result is Re-structure it in datafarm as a simpler table that can reproduce the fund hierarchies resident in voayger, but in a comprehnesive way, unlike Voyagers own fund reporting clients. The result is

    5. I began by saying this talk is about experiments weve been doing to repurpose the data that live around us, that Data Farm is in the first place about finding and leveraging that data structures that live around us. As I mentioned these can live in logs, or other highly structured text files, or they can be databases, That these data structures should command attention before we consider what statistics the organization needs, at least from the perspective of developing intelligence frameworks. So let me drill in a little closer and look at a just one example of how this re-structuring is operationalized in data farm because it raises lots of issues for such projects, issues that are both tactical--thats is how work is accomplished-- and strategic--why are we doing this in the manner we do.. Here are relationships in the voyager database that are necessary to track fund expenditures for books, serials and e-resources. We can invision the data structure as a diagram or as an sql query. We use this set of relationships to harvest spevific data from Voyager andI began by saying this talk is about experiments weve been doing to repurpose the data that live around us, that Data Farm is in the first place about finding and leveraging that data structures that live around us. As I mentioned these can live in logs, or other highly structured text files, or they can be databases, That these data structures should command attention before we consider what statistics the organization needs, at least from the perspective of developing intelligence frameworks. So let me drill in a little closer and look at a just one example of how this re-structuring is operationalized in data farm because it raises lots of issues for such projects, issues that are both tactical--thats is how work is accomplished-- and strategic--why are we doing this in the manner we do.. Here are relationships in the voyager database that are necessary to track fund expenditures for books, serials and e-resources. We can invision the data structure as a diagram or as an sql query. We use this set of relationships to harvest spevific data from Voyager and

    6. Heres the high-altitude plan of data farm. We begin in the environment where service occurs and identify lots of potential data sources. Theyre the systems behind our catalogs, circulation services, and fund accounting (systems like Voyager or III), they include our web service logs, link resolvers, erms (ered is a home-grown erm and web page delivery application), they include 3rd party systems that might operate at the consortial level (thats what Sirsi-Dynix is to us) and they include the human-driven services of research and instruction. In the data farm setting, we harvest lots of data from this service sphere as well as information about people and networks in our community. Processes in the management info sphere (The Data Farm Environment) capture, clean, normalize and anonymize these sources and feed them into a central repository, which for us is a relational database. We then build on top of the repository various kinds of tools to enable staff to acquire and interact with data. The analytical work of staff and the decisions they make feedback into the service environment through planning, collection development, staff deployment and other processes.Heres the high-altitude plan of data farm. We begin in the environment where service occurs and identify lots of potential data sources. Theyre the systems behind our catalogs, circulation services, and fund accounting (systems like Voyager or III), they include our web service logs, link resolvers, erms (ered is a home-grown erm and web page delivery application), they include 3rd party systems that might operate at the consortial level (thats what Sirsi-Dynix is to us) and they include the human-driven services of research and instruction. In the data farm setting, we harvest lots of data from this service sphere as well as information about people and networks in our community. Processes in the management info sphere (The Data Farm Environment) capture, clean, normalize and anonymize these sources and feed them into a central repository, which for us is a relational database. We then build on top of the repository various kinds of tools to enable staff to acquire and interact with data. The analytical work of staff and the decisions they make feedback into the service environment through planning, collection development, staff deployment and other processes.

    7. Just to try to represent the scale of Data FarmJust to try to represent the scale of Data Farm

    25. Tommy, does fund resolution have to happen in this event or can there be an expenditure event that links to this by an id, like item. Could we pull the circ event into DF oracle where tables are maintained for such linkages? Some events may be composites of composites. If spending money is an event, and circulating the purchased thing is a separate event. Are there primary and secondary event processes that allow the layering of event xmls? This could come up much in courseware, where you have blackboard use event, reference event and expenditure events relative to both.Tommy, does fund resolution have to happen in this event or can there be an expenditure event that links to this by an id, like item. Could we pull the circ event into DF oracle where tables are maintained for such linkages? Some events may be composites of composites. If spending money is an event, and circulating the purchased thing is a separate event. Are there primary and secondary event processes that allow the layering of event xmls? This could come up much in courseware, where you have blackboard use event, reference event and expenditure events relative to both.