Shawn Averkamp is Manager of Metadata Services at The New York Public Library where she directs strategy, production, ontology design, and quality control for digital resource and discovery metadata. Previously she worked as Data Services Librarian and Interim Head at the University of Iowa Libraries Digital Research and Publishing department, contributing to the Libraries’ digital collections, institutional repository, and crowdsourcing platform, DIYHistory, and as Metadata Librarian at the University of Alabama Libraries. She earned her MLIS from the University of Iowa and holds a BA in Music from Luther College.
Please describe your institution, its history, and mission.
Founded in 1895, NYPL is the nation’s largest public library system, featuring a unique combination of 88 neighborhood branches and four scholarly research centers. NYPL serves more than 17 million patrons a year, and millions more online. The Library holds more than 51 million items, from books, e-books, and DVDs to renowned research collections used by scholars from around the world. The Library is also well-known for its Digital Collections and digital experiments in public engagement and creative collection reuse.
Please briefly describe the resources that can be found on your digital platform and how they relate to physical holdings (i.e., a selection, only certain specific collections, do not relate at all)?
NYPL’s early digital collections highlighted its wealth of image-based resources, including large portions of the Mid-Manhattan picture collection, photographs of New York City, Japanese books and prints, plates and illustrations from rare books in the general research collections, and much more. With a migration to a new digital collections platform and advances in image display, digital object navigation, and collections contextualization, digitization grew to include more text-based archival collections, newspapers, journals, and full-book scans. We now have a wide range of our Research Libraries collections represented across almost all of our curatorial divisions and research centers.
Describe your role in maintaining the digital platform. What sized staff helps maintain the platform? Do you have other responsibilities at your institution separate from the digital platform?
I am the product manager for our homegrown metadata and workflow management system, which enables over 50 staff to create and maintain metadata, manage rights and access controls, and track digital imaging tasks. I engage with our staff users and other stakeholders to prioritize and gather requirements for new features and system improvements. This system is part of a larger suite of digital repository applications supported by in-house developers. I also manage a production services staff of five metadata specialists who guide staff across the Research Libraries in metadata creation, enrichment, and maintenance and help triage issues with our metadata management system. In addition to metadata production services and product management, I design data models for new library applications and interfaces.
How would you characterize the known user base of your platform? What methods do you employ to increase discoverability and traffic, and have they been successful?
In addition to our primary New York City audience, our Digital Collections website reaches users around the world. Our most successful recent engagement effort was a release of high-resolution downloads for nearly 200,000 public domain items. We not only gave the images away for free but encouraged creative reuse our of our content through digital “remixes” such as games and visualizations designed by some of our developers. The overwhelming press and public response told us a lot about how people are using, or want to use, our collections and gave us ideas for future avenues for promotion. We also share our digital content through the Digital Public Library of America (DPLA). While the referrals from DPLA represent a small fraction of our total Digital Collections web traffic, participation helps us reach new audiences through education initiatives, such as DPLA’s primary source sets, and through innovative apps built upon DPLA’s API.
Describe the software or system you use to manage and present your collection. Why was this system was chosen? (If you are unsure of the software and/or system, please describe the general requirements you had when the platform was created).
Our systems for managing and presenting digital content are custom-built and maintained in-house by software development teams. While we try to use existing open-source software when possible, as a large, high-profile library, we have to meet high expectations from our patrons, so it was important to have complete flexibility with design and functionality for our Digital Collections website. For example, with such a wide range in content from all areas of the research libraries, it was important to be able to give users as much context as possible about the source collections of our digitized materials as well as related records and sites. You can see this manifested in our collection view pages (ex. Writers’ Program, New York City: Negroes of New York Collection) and in our “View this item elsewhere” links in our item view pages (ex. Buttolph Collection menu).
Our Metadata Management System (MMS) is a custom-built Ruby-on-Rails app on top of a Fedora Commons digital repository. Because of our specific digitization, metadata creation, and rights workflows and our legacy metadata and digital object structure, at the time of migration, it made the most sense to build our own system rather than use an out-of-the-box solution. In looking ahead to future repository migrations, we are considering how to best adapt and contribute to current digital library software development efforts while also addressing our expanding needs for new content and metadata types.
What standards are you using for your records, if any (controlled vocabularies, authorities, schema, etc.). If there was a clear decision-making process for any of these choices, please explain what it was. Are you currently using linked data in any way, and if so, how?
We currently use the MODS XML standard for our metadata schema, a combination of RDA, DACS, and local rules for our metadata content standards (depending on the source of the content within the Research Libraries), and a variety of controlled vocabularies for names, subjects, and genres, mostly from Library of Congress. Our goals in selecting standards and vocabularies is to first meet the needs of our primary audiences while also considering interoperability with digital partners (such as DPLA) and the larger library community as well as the needs of downstream users of our API. We also understand how users outside of the library domain, such as developers or scholars, may find the MODS XML standard confusing and difficult to code around, so we have also been looking at ways to lower the barrier to our metadata. For our public domain release last year, we repackaged the metadata for our public domain items as both CSV and JSON downloads and included a data dictionary to help users better understand how to work with the data.
We do not yet use linked data in our Digital Collections, but behind the scenes, the Metadata Services Unit is working to make our metadata “linked data ready” by adding URIs for names, subjects, genres, and other vocabularies to our source records. We have also been developing an RDF-based profile based on DPLA’s Metadata Application Profile v.4.0 to power an experimental linked data discovery layer and new Research Libraries catalog interface. This exercise in mapping current models to future ambitions has not only taught us about gaps and needs in technology, but shown us where we will need to channel effort towards training and staffing for data cleanup and migration.
Do you have a wish list for future directions for your digital platform, and/or is anything new currently being developed? What are typical hurdles at your institution in actualizing such plans?
With the increase in digitization of textual materials, we would love to perform Optical Character Recognition (OCR) and provide full-text search for our digital collections not only for discoverability but for better accessibility. Search engines make this look easy, but full-text indexing is still a tricky problem that is not so easily scalable for libraries. We’re also interested in improving our asset viewers to better accommodate different content types, like books and newspapers, and to provide full-item downloads–basically anything that will make it easier for users to access our content in the way that best suits their needs. I, personally, would love to enable more user participation in our metadata contributions and corrections. I’m excited to see the progress being made in the web annotation space with projects like the Web Annotation Data Model, and I hope we can someday incorporate these features into our growing local linked data ecosystem.
Like most libraries, however, lack of resources is always a hurdle for us. Even though we employ more developers than many libraries, we are a large library with an enormous user base, and the list of software, web, and repository development needs for both the Research and Branch Libraries is much more demand than our staff can handle. However, we are striving to approach our digital development work more holistically, so that gains in one area of the library can provide potential benefits in another. Even if we are unable to offer new features to our digital platform in the near term, I am optimistic that infrastructural work carried out on other digital projects will help provide a foundation for bringing new features online much more efficiently.