Monday, May 16, 2011

Digital Public Library of America meeting, Amsterdam


2011/05 Amsterdam/DPLA, a set on Flickr.
Below are my rough notes from the first day of meetings held around the Digital Public Library of America (meeting agenda). CAVEAT: These are on the fly notes and may not actually represent the ideas of the speakers.

Day 1
15 May 2011

Maura Marx (OKC)
Lucie Guibault (U of Amsterdam)

John Palfrey (Harvard)
Discussion of topics; one of the key things is interoperability; system built must be interoperable in both a technical sense, but also theoretically and at the institutional level. Not only interoperate within the US, but also with other large national projects.

 Dan Brickley (U of Amsterdam)
Quick overview of some of the buzz terms and words related to linked data.
- linked data
- semantic web

 RDF first appeared in 1997 and was a way of showing that allowed documentation of content (metadata). First started during the time of the browser wars and fell astray of MSFT and Netscape agendas. Went underground with W3C until about 2000. DARPA then did some funding to bring it back out of hiberbation and began to be used in academe. Tim Berners-Lee then threw out the "semantic web" concept to describe the thinking behind it.

The "semantic web" concept was perceived as too academic and not really something for building practical things. Too much of a research field. A group of people in around 2005 began to do things like "friend of friend," etc. Tim Berners-Lee in a 2005 redefined things as linked data with the following priciples:

- use URIs as name for things
- use HTTP as the identifiers
- use URIs that have useful information that is machine friendly
- include links to other URIs

How does the abstract world of things and data become refactored into the real world of specific things and linkages. Over the past few years, the work has been to see what are the appropriate levels of descirption of things.

 What's the relationship of ontologies and definitions of properties and relationships. So now you can say things like "a car is not a book".

How can you share various descriptions about the same things that use differnt descriptions? So, from the book world, FRBR provides some of the context for doing that via ontologies.


First Panel
Linked Data and Interoperability in Europeana
Moderator: Martin Kalfatovic

Paul Keller, Kennisland
Project part of Eurpeana Connect.

Spent that last two years working on a licening framework.

As you work with IP issues, you'll find that libraries are easy, archives less so, museums are the hardest.

Copyright is one of the complex and descisive issues around sharing data. Copyright is supposed to deal with the works and not with the metadata. The factual data should be out there, shareable, and linkable. BUT, this isn't always so.

For (cultural heritage organizations (CHO), they don't always have the rights to the things, but they do have it over the data. Since they do have rights over the metadata, they feel that there should be ways to monetize the metadata. CHOs don't want to put the metatadata out for commercial purposes, they don't monetize it, but are afraid that someone else might exploit it. Europeana is trying to get partners to accept CC0 rights statements and are making progress.

Structured discussions around "risks and rewards" at Europeana. This has helped to form the discussion and clarify issues for participants. Ask partners what the risks are and define them. Don't confront institutions with blunt statement, but work them with them to define successes. Fears includes loss of control and loss of revenue.

Question about what is "safe, low-res" metadata as a gateway drug. Answer is that there could be defined elements that are simple and basic that could be used (quick description, author, etc.)

Antoine Isaac, University of Amsterdam
Description of the Europeana Data Model (EDM).

Looks a lot like a Dublin Core record. Very static and forces interoperability on the data that is not inherent in the data.

The new EDM is going to be richer with more functionality.

Goal of EDM is to provide references to digitized content, so must include links to the objects on the partner sites.

Lots of discrepencies of data between types of institutions. Since a key goal is to initiate services on the data, how can the data be re-served for other purposes?

Since the current data records are very flat, you might get all types of data merged into one record (about the object, about the digized version, about the creator of the object). New EDM will try and tease these out so that you have "stuff and description about the stuff."

How do you deal with different descriptions about the same things (e.g. books); how do you merge these records (o dare you?) and how do you maintain provenance of the metadata?

The EDM must also be able to serve up the data at various levels of granularity.

These different requirements are all best met though linked data/semantic web systems.

The EDM has been a very collaborative process and has taken about 2-3 years and is part of a EUropeana work package that came from many different types of institutions. It has been an open process, but NOT as open as everyone would have hoped.

 Stefan Gradmann, Humboldt University Berlin
 Did a quick recap of the LOD and a coverage of the Tim B-L original proposal for the WWWW.

Basic concept of linked data is to extend linking in scope, description, etc.

 RDF allows you to make statements about things on the web.

[joke about how Europeana is obsessed with the Mona Lisa - each presenter used it in their talk]

To extend the web in scope, you can use the web to link things that are descriptions to things themselves (digital manifestations).

Certain goals:
- creating a data commons
- how do you move from QUANTITY of data to QUALITY of data?
- LOD2 project is an EU funded project that is about integration of data (8.53 m Euro budget); hopes to form a registry of LOD; using some interesting tools to allow you to move from database structures to RDF- see new book by Chris Spitzer & Heath on Linked Data

Be aware of the discrepency between Linked Data and Linked Open Data. Can you have LD without it being LOD? Technically, yes, but to fully involve the benefits of LD, it must be LOD.

Stefan: "Dirt in the cloud"


Second Panel
Interoperable Discovery: Bibliographic Metadata
Rufus Pollock, Open Knowledge Foundation

Open Bibliography at OKF. Interest started about 7 years ago

Biblio data is a platform, not a commodity; we want to build it, not sell it. In this context, open data means:

Open = Freedom for anyone to use/reuse/redistribute/even commercial
( ;

Open is fundamental to:
- interoperability
- scaling
- building the ecosystem

These ideas are not creative commons which has a series of sometimes overlapping and conflicting rules.

Interoperability and scaling
- closed data doesn't scale
- to scale you need t componentize
- we need to reunited separated data

Open data ecosystem:
- many minds principle
- best things to do with your data will be thought of by someone else
- you will think of the best things to do with other peoples' data

JISC penBib and Bibliographica (, a rapid-prototype of Europeana in miniature unintentionally)

In the world of zero cost for reproducability, matching is king

David Weinberger mentioned the "Library Cloud" project that hopes to expose certain types of library data not usually displayed (circultation, hold, and reserve info, and search information).

 Lorcan Dempsey, OCLC
The Virtual International Authority File

Overview of VIAF, encouraged people to look at Trove (from Au).

Walked people through how VIAF worked. Showed how it is the combination of the authority files from national sources (DNB, LOC, BnF, OCLC); also puts this data "into the web" as opposed to just a database.
VIAF data available via a number of formats, including RDF, MARC, native SML; includes links to national libraries as well as Wikipedia.

Future of VIAF:
- more source files
- more types of names
- become an OCLC service
- used in WorldCat
- ISNI and Orcid (projects looking into names)
- Will remain open

 Ed Summers, Library of Congress
Issues of data sychronization

Talked about a number of problems, issues, and solutions to synchronizing data across different systems.


Jonathan Rothman
Bibliographic Metadata & the HathiTrust

Quick overview of how HathiTrust aggragates metadata. Attempts to match metadata to objects, but the first record in wins; matching done on OCLC number.
U of Michigan provides a RSS feed of public domain material from Google scans.

 OCLC is a consumer of the data and uses the files to derive digital master records from HathiTrust materials.

Also has the ability to export individual metadata records as RDF output (just add .rdf to the end of the search term).

Continuing to work with OCLC on deduplication; would like to rely on WorldCat to get records from partners, but no method to handle volume information.

 Working on a system with CDL to develop a tool that will allow for clustering of records.

 John Weise, University of Michigan
Fulltext search and opportunities to extend discovery

HathiTrust will always have its own search and access points to ensure discovery and access; recall and precision are critical repository funcations.

Also wants to have a presence in where users are (Summon, Google, etc.)

- 8 m volumes
- 2.2 m in public domain
- 209 k serial titles
- 4.7 million book titles
- 1.2 b pages
- 7 tb of OCR
- 4.5 tb index
- 1.7 trillion words (dirty OCR)

Metadata Search Interoperability Conundrum
- federated searches have weaknesses
- unified index (e.g. Summon) is powerful, but requires 3rd party access to metadata, or providing pre-built indexes

Full text searches have these same problems, but with the added dimension of scale and copyright (since the content is actually part of the index).


Third Panel
Interoperable use: licensing frameworks and rights language
John Palfrey

Paul Keller, Kennisland
Marking the public domain

Overview of the creation of the Europeana document that attempts to clarify the public domain.

Creative commons has an option, with too many caveats, Public Domain Mark 1.0

See Public Domain Calculator:


John Weise, HathiTrust
Determing rights and opening access in HathiTrust

Some more overview of HathiTrust, seeking to make all legally responsible uses of the collection as easy as possible.

Try to make rights holders have as much information as possible to manage rights.

Copyright Review Management System
- web-based application
- 18 staff and 4 institutions (IU, UM, UMinn, UWisc)
- attempting to ID items in the public domain
- goal is to expand to non-US works

HathiTrust feels:
- copyright reform is neccessary, esp. around duration of protection
- libraries don't make enough use of fair use
- orphan works legislation is probably not needed
- must identify orphan works as part of the cataloging process

Best case for in-copyright
- ideally, we locate copyright holders, and ask their permission
- most people say yes

Lucie Guibault, U ofAmsterdam
Working with licensing frameworks

Lucie discussed the role of licensing content at a continent-level. Not going with fair use (which doesn't exist in Europe), but extended collective licensing.

Paola Mazzucchi, ARROW
Bridging gaps: ARROW Rights information infrastructure

Rights clearing project for Europeana.

What does it do?
- Comprehensive system to facilitate rights management for digitization projects.
- Up and running in Germany, France, Spain, and United Kingdom
- Alliance between libraries, publishers, authors and commercial services
- Acts as an interoperability facilitator across domains and between public and privte initiatives
- enables simplified solutions fo the licesning of certain categories of works
- suppors due diligence searches in automated, streamlined ways

Goal is to create digital libraries w/out black holes

Urs Gasser, Bermkan
Concluding remarks

Various thoughts on how to do licensing of materials with interoperability issues in mind.

Three general observations:
- how do we avoid using old metaphors that might not still be relevant
- there are many organizational, structural, legal blockages to DPLA (antitrust, unfair competition, etc.)
- be prepared a to what the communication strategy is to the public at large that will make DPLA relevant to appropriate stakeholders.

1 comment:

Tom Garnett said...

excellent notes and shots.