Sunday, August 09, 2020

Archives and Special Collections Linked Data: Navigating between Notes and Nodes (2020) by @OCLC Research, @nbputnam, @KarenS_Y, et al.

OCLC Research Archives and Special Collections Linked Data Review Group. 2020. Archives and Special Collections Linked Data: Navigating between Notes and Nodes. Dublin, OH: OCLC Research. 14 pp.  
https://doi.org/10.25333/4gtz-zd88. COVID-19 Professional Reading. 


Linked data offers many possibilities for enhancing access to library and archives materials. OCLC has been a leading force in examining areas that can expand and simplify access to collections through a number of collaborative projects. See, for example, Creating Library Linked Data with Wikibase: Lessons Learned from Project Passage (2019)..


While there is significant work in expanding the use of linked data in what could be called commodity level collections, it is in the area of archives and special(ized) collections that linked data has some of the most important potential applications. 


This report from the OCLC Research Archives and Special Collections Linked Data Review Group is an important step in expanding the discussion of linked data in these areas. The group, comprised of sixteen practitioners from a wide range of collections were joined by OCLC colleagues, led by Nathan Putnam and Karen Smith-Yoshimura and including team members Merrilee Proffitt, Bruce Washburn and Chela Scott Weber, who represent some of the best applied thinking on linked data in the library world. 


As noted in the introduction:


As libraries and other cultural heritage organizations plot a course to move toward a linked data future, it is important to ensure that all collections are represented in that future. While published works represented in monographs may be relatively easy to represent in linked data, this is not the case for special collections materials (rare books, manuscripts, photographs, institutional archives, etc.) that may be unique or may have special physical characteristics that are of particular interest for study (p. 5).


Though there is much promise in linked data for archives and special collections, the report specifically outlines barriers and concerns to wide-spread implementation. Among these are:


Context trapped in narrative description

  • “When descriptions for these items or collections are represented in systems that were designed primarily for bibliographic description, much valuable data is recorded in unstructured notes. In either case, it is labor intensive to extract entities and relationships from these unstructured notes. This data may be well served by linked data structures, but the number of unique cases for archival and other traditions of descriptive practice may make it difficult to move forward as a community” (p. 6);


Complex models

  • Issues include trying to “Problems have arisen due to trying to model everything in the description of an item,” resistance to adopting a shared model; complex or locally derived modelts “make crosswalking into flatter linked data structures more difficult” (pp. 6-7).


High-barrier infrastructure

  • “Software, data, and workflows take a lot of work to establish and often lack institutional buy-in and support” and “Lack of support from identifiers in the data, such as issues with minting identifiers and persistency” (p. 7);


Scalability

  • “Can linked data standards, practices, and workflows scale? Institutions that collect and steward these materials include both large, relatively well-funded institutions and small and less well-resourced repositories. There are few successful exemplars of linked data implementation in special collections; stories that exemplify success will be necessary to build support with resource allocators and to demonstrate utility to dispersed user communities who will directly benefit from linked data efforts” (p. 7).


A few issues were discussed in more depth in the report. Specifically (pp. 7-10):

  • Descriptive models do not currently service special collections

  • Discursive description presents a challenge to entification

  • Potential for better discovery

  • Prescriptive vs. permissive data modeling

  • Ethical issues and community engagement

  • Challenges around multilinguality

  • Sustainability

  • Archival issues

  • The need to express relationships and change over time

  • The long tail of authorities/identifiers in special collections


I would like to focus on a few of these:


  • Potential for better discovery. As noted in the report, linked data will better uncover key areas for discoverability. These include identity markers for people (e.g. gender); and geographical indicators and contact networks (e.g. who worked with who). As the report notes, “Even though linked data structures will bring significant advantages to discovery, end users will need to adapt to new ways of interacting with linked data structures.” (p. 8);

  • Ethical issues and community engagement. Linked data provides many opportunities to increase access, but if the means of expanding that access remains siloed in a knowledgeable, but inherently limited group of practitioners, there is a strong danger of creating a well-linked and networked collection of silos that do not accommodate diverse terms, practices, and knowledge. As noted in the report: “This moment offers an opportunity to create pathways to disrupt our constructs of authority and to provide a space and a place for others with more expertise (researchers, community members) to contribute their knowledge. Institutions should be careful when balancing risk and opportunity for those who may be impacted” (p. 9);

  • Sustainability. “To date, the majority of linked data efforts have been grant-funded or special one-off projects ... Everyone wants easy-to-use and implementable tools embedded in a sustainable infrastructure.” (p. 9). No need to add more to that statement; 

  • The long tail of authorities/identifiers in special collections. Much of the value of archival and special collections is in the unique content they contain. The challenge of creating authority records and identifiers that may have a single global instance is both a workload and resource issue. Expanding this to include museum collections additionally multiplies these issues. The report comments: “How do we create and manage authorities across systems and institutions in a lighter-weight way that still can take advantage of the network? It is not possible (or desirable) to create entities for everything; instead, we should develop approaches that help to balance local versus global IDs much like NACO and local authority files do now” (p. 10).


As I noted at the start, it is possible and likely that many of the issues around linked data for current publications and widely acquired materials will be solved through the application of vendor tools and distributed description processes. The huge potential for uncovering previously hidden information in archival and special collections is where linked data will have the most impact. In conclusion, the report notes: 


Libraries are in a state of moving from experimentation to production with linked data, and it is vitally important that the needs for descriptions of special collections materials are not left behind at this critical moment (p. 11).


No comments: