Collecting Institutional Email: Results from Two Case Studies

In my final post as a METRO Fellow, I’d like to share the written work I developed with the Brooklyn Academy of Music and the Guggenheim Museum to address the collection of institutional email records. During the reverse pitch process, both sites came forward to request assistance with what they identified as a ‘problem record.’ As a performing arts venue and an art museum, these two institutions operate on similar cycles of exhibitions/performances, during which high-value institutional records are regularly created. Although both have had similar goals for realistic incorporation of email into record management schedules, they offered differing examples of scale and staff management which suggested an opportunity to consider a cross-organizational framework for email archiving.

In contrast to a traditional residency, the documentation you’ll find below examines how working with two institutions exposed trends in producer habits as well as differences in collecting needs.  



In 2016, the Metropolitan New York Library Council (METRO) established a fellowship program to identify and address needs within their network of libraries, archives, and museums. This report reviews findings of a fellowship project developed in response to two institutions in the METRO community engaged in efforts over recent years to strengthen and simplify their born-digital workflows, the Brooklyn Academy of Music (BAM) and the Solomon R. Guggenheim Museum (SRGM). For both, institutional email preservation was highlighted as a document type requiring additional attention and had therefore been excluded from previous project development. As a performing arts venue and an art museum, these two sites operate on similar cycles of exhibitions/performances, during which high-value records are created regularly by staff. Although both have had similar goals for realistic incorporation of email into record management schedules, they each offer differing examples of scale and staff practices which suggested an opportunity to consider a cross-organizational framework for email archiving.

Solomon R. Guggenheim Museum
In 2013, SRGM archives received funding from the National Historical Publications and Records Commission (NHPRC) for an electronic records start-up project. The 18-month project resulted in a pilot plan to establish an electronic records repository which would make it possible to collect, preserve, and make available born-digital records documenting the activities and history of the Solomon R. Guggenheim Foundation. This endeavor extended work undergone in 2005 to create a comprehensive records retention schedule for analog institutional records and designated records management representatives in each department. During initial stages of research enacted through focus group meetings and staff interviews, email was identified as a complex format which required further exploration. At the time, email was simply incorporated into the retention schedule under the category “significant correspondence,” without further direction regarding general email management. Email was categorized as one of the “problem records,” along with very large files, obsolete formats, obscure formats, and web archiving. The immediate solution was to pursue a PST file before or upon staff departure from the museum and suggested a closer relationship with staff prior to departure to collect significant email (Cocciolo, 2014).

In an effort to revise practices on-site and to understand the extent of content created in email applications, the SRGM Archives submitted a proposal to the Metropolitan New York Library Council (METRO) Fellowship program requesting assistance with the development of a solution driven by staff habits and needs, stating in their proposal:

What makes our project idea especially interesting is that we want to focus on the human and organizational aspects, and look specifically at how curators document their work. This is the group of staff within the museum which, more than anyone, creates and receives material of long term interest to researchers….The Fellow will conduct one or more case studies with organizations in other sectors, to identify good practice in email management and, in particular their use of filing structures for email inboxes. We hope this will assist us in creating a folder structure for curatorial department emails, mirroring our retention schedule for paper records. This will make it easy for curators to transfer a file of email records to the Archives once their work on an exhibition is completed, and ensure that unique and valuable research material can be made accessible to researchers in the future. (Sloss, 2016)

This project plan was selected to be one of three studies pursued by a METRO fellow in 2016, as the work was deemed a topic of interest within the wider community of NYC libraries and archives. The result was an investigation into requirements for implementing new policies and workflows tailored to institutional habits.

Brooklyn Academy of Music
The first Records Retention Manual focusing on staff-produced material at the Brooklyn Academy of Music (BAM) was created in 2009 by Archives Director Sharon Lehner, working with nationally-noted records management expert William Saffady. An outcome of this effort was an institution-wide annual ‘Records Cleanup Day,’ an event initiated to support employee organization, and a commitment to transfers of material to the Archives. In 2014, the IT Department was motivated to institute ‘Digital Cleanup Day’ to resolve a problem with dwindling server space. The Archives, recognizing that many significant records were no longer on paper, utilized the first Digital Cleanup Day in summer of 2015 to distribute detailed written guidelines directing departments to use a dedicated ‘Archives’ folder located on their server as a staging area for inactive records of value, but it became clear that a more structured approach was needed to follow records retention requirements as they applied to digital records. The following fall, BAM Archives began working on a more comprehensive approach to collecting through participation in the National Digital Stewardship Residency (NDSR) program. An NDSR Fellow devoted nine months to both modifying the Records Retention Manual to account for digital materials while concurrently developing procedures for depositing those materials, with accompanying descriptive and preservation metadata, into an Archives server. Additionally, in 2016 Archives asked that each department appoint a records coordinator in order to elevate the awareness of records retention and to provide the Archives with dedicated point people. During this residency, email was identified as a more complex format requiring additional attention, influencing the Archives to submit a proposal for a METRO Fellow to join BAM and review on-site production. The Archives specifically outlined a desire to fully understand cultural and legal aspects of archiving email, stating:

The Archives still needs to determine when and how to ingest these records. The current plan is to transfer an executive’s email to the Archives at the end of his/her tenure. Is that the best policy? How can sensitive issues be identified? Should there then be selection? Redaction? Restrictions imposed? Is additional metadata important, or will search capabilities make adding metadata unnecessary? What type of description is appropriate? Moreover, in light of the recent University of Oregon controversy, what is an appropriate access policy to these unique records?….The BAM Hamm Archives needs a Metro Fellow to research how other institutions implement their email archiving in order to determine best practices and develop a plan on how best to implement the Capstone policy for an institution of our size. (Shunaman, 2016)

These questions reflected goals similar to SRGM, suggesting a partnership could be made with a METRO Fellow to review practices at both sites, as well policies of comparable institutions. This report reviews findings and deliverables developed by a METRO Fellow while studying over a nine month period with BAM and SRGM to document decisions and workflows developed and tested by METRO Fellow, Katherine Martinez, under the mentorship of BAM Hamm Processing Archivist, Evelyn Shunaman, and SRGM Associate Archivist, Tali Han.

Project Scope and Analysis

Significance of Archiving Email
Email software provides a flow of communication and content sharing, rendered as a thread of incoming and outgoing messages. Unlike most paper-based correspondence, this enables both sides of a conversation to be preserved along with a rich store of information captured in message headings which Chris Prom describes as an “embedded trail of evidence,” one that “can demonstrate how people lived and worked within a network of colleagues, friends and family members” (Prom, 2011). Email has become so ubiquitous as a tool for communication that it now facilitates the collection of significant and insignificant data, forming a unique database of personal histories that may be freshly viewed and reorganized by applying different sorts or search terms. Recent messages in a user’s inbox may exist as memos for items to read, reply to, or track, while accessing a store of past mail offers the ability to revive conversations after many years, reconstruct decision-making during past projects, and retain contacts in social networks (Pennock, 2006). As users find benefits in allowing content to build over time, the role played by an archivist in identifying records with lasting value, out of hundreds or thousands of messages, appears to be a daunting task.

Email is still frequently accepted as the digital equivalent of paper correspondence despite vast differences between the formats. The scale at which content is produced through email offers new potential for capturing more information over a longer period of time, but it also challenges archivists to reconsider established methods for processing and appraising collections. Mary Elings, archivist with The Bancroft Library at University of California, Berkeley, has remarked on the effectiveness of applying natural language processing and entity recognition tools to large datasets to illustrate the scope and content of a collection, referring to the process as dynamic arrangement and description (2016). While this type of automated text analysis can expose countless new threads of inquiry for researchers and archivists, it would also appear to be the only feasible approach to collections which may span years, or the length of a career. To exemplify scale, Stanford University archivists quantified the difference between both formats in their Robert Creeley collection, noting, “In the finding aids for this archive, the correspondence listing takes 122 pages out of a total of 251 pages, indicating the importance of letters. Note that this listing had to be painstakingly and manually generated by an archivist going through Creeley’s letters. In contrast, Creeley’s email corpus consists of 163,689 pieces of email, spanning about 13 years” (Hangal, 2015). Tools such as ePADD, ArchExtract, and the forthcoming Bitcurator NLP project, may support methods for approaching stores of mail, but are less practical in the management of active email production and organization.

While entity recognition and language parsing tools could be applied at a later stage, alternative efforts need to be considered to document contextual data about collections as they are produced, without prematurely handling the records. One recommendation for appraisal of an active email account would be to isolate or annotate significant contacts to document contextual information about the producer’s social network. Specific interactions may offer data beyond what is contained in the body of messages by way of rhythmic changes in conversation habits that represent transitions within projects, jobs, and cities (Viegas, 2006). Immersion by MIT Media Lab is an example of a tool created to visualize this web of fluctuating connections, providing a new view which disrupts the traditional way we reference and remember past email exchanges (Hildalgo, 2011). In the example of institutional email, collecting information about senders and recipients allows social roles to be tracked over time, or alternately, requesting a list of non-professional contacts, such as family members or friends, creates an efficient method by which an archivist may remove large groups of messages during a transfer (Cocciolo, 2016). An ongoing discussion between producers and archivists on this topic may also have the subsequent effect of altering the way producers perceive their email correspondence, since the continual recognition of institutional interest has the benefit of reminding staff that although messages are exchanged through personal desktops, laptops, and cellphones, they may eventually reside within the institution’s archives (Pennock, 2006).

Defining Project Goals
The work of the METRO Fellow was to review the current landscape of email archiving to consider more effective approaches to collecting, and to determine whether practices at each site could be improved. In an initial project plan, SRGM and BAM selected the accounts of two previous employees to be used as case studies for developing a guide to process email. A common approach to incorporating email into an institution’s retention schedule is to export the entire email account of high-level staff on their departure, a quick, low cost preservation solution. This hands-off method requires very little communication between the Archives and other departments, and therefore does not help to clarify the types of records created in email which might be included or left out of the category “significant correspondence” on a retention schedule. As a mostly undocumented archival process, the assumption was that a thorough review of accounts at each site would provide Archives staff with much desired advice for how to appraise incoming email records by exposing practices or trends in production.

This plan was effectively revised after legal counsel employed by BAM advised that, even in an institutional setting in which an employee signs a contract stating that their email is property of the organization, it would constitute a breach of ethics to review the contents of staff mailboxes while they remain employed, and even after, only once a sufficient period has passed to allow for any business matters discussed in the mail to be considered closed and void. The decision to revisit the overall goals of the fellowship after discussing legal aspects of the project was further influenced by the early discovery that while both sites had taken steps to identify email as a record containing significant value to the Archives, neither had working systems in place for regular collection, disposition, and management. Broadly, the policy at each institution was to obtain PST files of employee accounts on their departure, though in effect, at BAM employee email was retained in a disaster recovery system without extraction, and SRGM had irregularly saved PST files in various directories on a shared server. Conversation threads deemed significant by staff for department operational needs were approached with similar inconsistency, sometimes an individual user would export the message as a PDF, other times it would be printed and saved to a physical file, and occasionally, it would be exported as a fragile EML file. These combined factors suggested that the project should address policies and workflows facilitating the collection of email records and related contextual data, rather than the stage of processing an email collection, which presents multiple barriers and could also be considered less pressing.

Staff Interviews
Opening a line of communication with contacts in each department was determined to be a necessary first step to comprehensively address email in the records retention schedule. This collaboration encouraged staff to consider the types of records they create in relation to their role within a department or organization, inspiring a conversation about what constitutes archival value and the types of records holding information not found in other documents. This type of interview can be considered an aspect of collecting that is unique to born-digital material, as suggested by Geof Huth’s description of an ‘embedded archivist,’ a role that nurtures relationships with donors to mediate record production at the creation stage; emphasizing that the “ultimate goal of that dictum is to ensure that archivists do not wait so long to address digital records that they can no longer rescue the records. The other point is to demonstrate to archivists that they need to be a new breed of professional, that the processes of the past will not always work, that paper practices do not always translate well within a digital reality” (Huth, 2016). In an institutional context, this relationship ensures participation of the archives as new internal workflows are developed over the course of various projects, establishing an early recognition of records that will be an eventual submission.

Staff interviews began with an introduction from the archives expressing an interest in records that might not be considered correspondence in the traditional sense, opening the discussion to include a more general assessment of material shared through email which reflects the mission of the institution (Cocciolo, 2016). This exposed several types of records which were not being collected. In one major example, neither BAM nor SRGM have actively retained digital ephemera, such as flyers, press releases, postcards, newsletters, and invitations. The SRGM Marketing department sends over 350 email blasts annually, while BAM produces a number relative to their community. Graphic elements of these documents are regularly saved through transfers by the design team but content and final versions are compiled and sent via email marketing software, causing a gap in the workflow which skips saving the files to a shared server. As a consequence, it went unnoticed that these records were not being sent to the archives despite being listed on the retention schedule. In both cases, a simple solution was to create an ‘Archives’ email address to be added to all mailing lists and establish workflows for the Archives to save these records to external storage.

Another unique circumstance was found in the Interactive Department at SRGM, responsible for delivering a range of web content, including video and audio files. Initially, this team expressed that they did not create email of archival interest. Discussions during meetings uncovered that correspondence surrounding noteworthy design decisions and progress occurred outside of email through two web applications, Workmajig and Basecamp. Basecamp projects are packaged and saved for the Archives, while Workmajig was disregarded as it only contained administrative information such as details for meetings and deadlines. The combined use of these platforms served to organize files more efficiently than Outlook or other email software, which often accumulates the kind of clutter separated in this instance by Workmajig. In Basecamp, the interface design additionally influences the user to codify correspondence as it is produced so that team members logging in and out of the system can easily view progress or new conversation threads. This contrasts with updates shared in email via the carbon copy (cc) function, in which multiple parties are copied without a need for a unified filing system post-receipt, or any system at all. In a similar position, the Creative Services department at BAM uses Slack for this type of communication. Although it can be considered an extension of digital correspondence, it is a more controversial platform due to a tendency to contain non-professional content, such as GIFs and other translations of casual office banter. For this reason, some institutions may decline to look into saving these chats, but for others, it could be a rich source of project history if used to develop major or lasting design concepts. Currently, BAM does not collect this content.

Identifying record types with a fixed term retention that exist outside the scope of the Archives collecting responsibility is critical to writing a comprehensive records schedule, and was equally applicable in the case of email production. A significant finding in this project was that a number of departments such as Finance, Development, Retail, and Visitor Services, do not need to be included in a long-term email archiving solution since they use dedicated databases to track valuable data (e.g. Great Plains, Raiser’s Edge). These high volume accounts may be set to a regular disposal schedule to relieve storage space and unnecessary maintenance of old messages that are no longer referenced. At SRGM, IT and Archives are starting to plan how they will work closely to set automatic deletion of identified routine email during the migration from Outlook 2010 to Office 365 in 2018.

Senior Level Accounts
In conjunction with department interviews, another review was performed to consider which executive or senior staff accounts should be archived in their entirety. Focusing on the collection of email produced by higher level staff assumes that the most important decisions and actions will eventually be discussed with the head of departments, and therefore, saving the sent and received mail of these employees will safely capture the most significant conversations occurring within and around the organization. This stage largely defined the difference in scale between BAM and SRGM and highlighted the divide between material produced in the context of an art museum and that of a performing arts theater. Each organization is comprised of a comparable departmental structure, but a distinction was made when determining the number of accounts producing vital records with archival value versus those with mostly operational exchanges that did not need to be permanently retained.

At BAM, the email accounts of four executive staff members were selected to be archived annually after determining that the email of other directors and department heads produce material with short-term significance relating to operational activity. In contrast, at SRGM significant email is created in roles extending beyond executive titles. The most complex examples exist in the Conservation and Curatorial departments, specifically within the record series ‘Artist and Object Files.’ This category contains physical and born-digital material with critical information on artworks and artists in the collection, including technical documentation on artwork installation. Historically, these files are permanently under the custodianship of the producing department, while other documents created in both of these departments are sent to the Archives when deemed inactive. Making a distinction between records retained by the Conservation or Curatorial department, versus the Archives, is less achievable in a collection of email due to the way conversation forms an overlap in subjects and various topic threads.

Moreover, the variety and crossover of content related to vital records promotes using email applications as a search engine, rather than a unique filing system. While some staff may actively export messages as PDFs to be saved to related subject files, most find that email chains contain so much interrelated content that it is difficult to categorize, making it easier to search across the span of one account and leaving little motivation for moving specific messages or folders offline to a shared server. In this capacity, files are actively referenced for extended, undefined periods of time. (Print and file as well as creating PDFs of email is both unreliable system as it relies on the individual to make the effort to save email and also the paper format loses information such as text-searchability and sometimes the link between conversations that happened in the past.) The crossover of content and frequency of access makes these category of vital records difficult to categorize for archives.

Overview of Project Recommendations

1. Prioritize collecting.
Early drafts of the project proposal suggested that a potential project outcome would be documentation for processing an email collection of a previous employee, assuming this would provide insight into communication and organization trends, as well as guidelines for making distinctions between significant versus routine mail. Several conclusions motivated a shift in project goals. After meeting with legal counsel at BAM, access restrictions were determined for individual email collections which prevented an immediate review of messages. Additionally, procedures for creating transfers to the Archives were not actively in place, suggesting the direction of the fellowship plan should address workflows in an effort to build a more systematic approach to ingests and overall, prioritize the collection and secure storage of material. Documenting the types of significant records created in email involved writing an addendum to the current retention schedule to incorporate specific requirements of the format.

Review current practices and create workflows for ingest.

Related Documents:
Brooklyn Academy of Music Records Retention Schedule: Addendum on Email
Solomon R. Guggenheim Museum Retention Schedule: Addendum on Email

2. Allow staff to define archival records.
Managing born-digital workflows requires regular contact with staff to remain abreast of current and emerging projects which may produce material eventually intended for the Archives. Keeping this relationship active supports the sustainability of workflows by forming partnerships in progress, while maintaining transparency around collecting policies of the Archives. Reviewing the types of records created in email by staff may be achieved through discussions with each department. During this project, digital ephemera in the form of newsletters, invitations, and educational outreach, was identified as a large category of inconsistently captured records despite their inclusion in the retention schedule. Taking note of this type of routine mail guarantees it will be transferred to the Archives if that is the intent and also provides an opportunity for staff to establish rules based on recurring subject lines or senders, moving items with short term interest into folders with automated disposition periods.

Perform staff interviews.
Establish and

Related Documents:
Aid4Mail Workflow
Review of Tools

3. Identify individuals with archival accounts.
A review of roles within a department helps to identify individual accounts holding significant mail to be archived as a collection. In the case of BAM, the criteria in these cases was defined by executive staff positions after it was confirmed that the most important decisions within the institution would be discussed with the four employees selected for collection. All other production information would be captured through material submitted in final contracts and reports. SRGM offered more complex examples through senior Curatorial and Conservation staff, as their correspondence documents critical information related to artworks and the collection, and may not be translated strictly through executive positions at the foundation level. This contrast demonstrated the importance of reviewing the functions and roles within a department to determine whether higher level staff produce significant records for archival purposes, or if instead they are for routine business operations and do not need to be sent to the Archives.

Taking this step requires the secondary action of establishing a workflow for regular ingests, preferably no less than annually, to avoid loss of data over long spans of time. This was an issue encountered at BAM, where two prior server migrations created errors when exporting a nine year backlog of mail produced by one staff member. Exporting mail annually produces a PST file for the Archives, provides an opportunity to review staff turnover, and the chance to collect contextual metadata, such as contacts within the producer’s social network to be specifically removed from a collection.

Collect annually

Related Documents:
Aid4Mail Workflow

4. Determine a standard retention period for general correspondence.
Previously at BAM, a standard policy for email disposition was considered to be seven years after creation for senior level positions, and three years for all other employees. This was influenced by storage constraints requiring regular removal of records as a way to alleviate space, though it was inconsistently enacted by I.T. The benefit of 50GB of storage per account provided through the migration to a server with Office 365 is that retention periods may be more flexible. Several factors accounted for setting an automated removal period of seven years, overall, for all staff. In addition to affording the space, a unified period is more easily carried out by I.T. and would account for promotions within a department, or transfers across teams. Executive accounts collected annually by the Archives will not have a set disposal date and will remain the only department distinct from this rule, unless a department submits a request to IT for their disposition period to be changed or removed.

SRGM’s IT department and Archives have worked collaboratively over the past years to enhance digital preservation within the institution, addressing more seamless internal file sharing capabilities, cloud storage, and robust data governance. During the course of this project, both departments expressed the desire to move forward with a migration to Office 365 to assist with email management and removal. In the last two months of the METRO fellowship, the Archives submitted an IMLS grant proposal requesting support in this endeavor.

Collect annually

Related Documents:
Brooklyn Academy of Music Records Retention Schedule: Addendum on Email
Solomon R. Guggenheim Museum Retention Schedule: Addendum on Email

5. Distribute guidelines for record management.
A solution for sharing tips and information with staff about managing email, and general assistance with born-digital records, was to create a web-based ‘digital archives handbook’ for BAM. This offers a centralized, interactive resource for employees to access all information related to institutional records, such as the retention schedule, contact details for appointed records coordinators in each department, and a history of digital archives at the institution. As an online resource, the information can be linked through various access points, like the staff social media platform, Yammer, or the Human Resources Staff Handbook. This site was created as an open source manual in Github to address the needs of the Archives as well as staff. Version control will allow the Archives to track changes to the retention schedule through an annotated history of updates. This also guarantees that staff are always reading the most up to date version of various documents without relying on the Archives to circulate new items. The manual may also offer incoming Archives interns an opportunity to practice web development while contributing to evolving workflows surrounding born-digital material.

Create a centralized resource for all documents related to records management as a way to share workflows, retention periods, and transparent collecting policies.

Related Documents:


Cocciolo, A. (2014). Pilot 9: Problem records: Preserving significant e-mail correspondence.Retrieved from:

Cocciolo, A. (2016). Email as cultural heritage resource: Appraisal solutions from an art museum context. Records Management Journal, Vol. 26 Iss 1.

Elings, M. (2016). Using NLP to support dynamic arrangement, description, and discovery of born-digital collections: The ArchExtract Experiment. BloggERS! Retrieved from:

Hangal, S., Chan, P., Piratla, V., Edwards, G., Manovit, C., & Lam, M. S. (2015). Historical research using email archives. CHI EA ’15 Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, 735-742.

Hidalgo, C., Jagdish, D., Smilkov, D. (2011). Immersion by MIT media lab. Retrieved from:

Huth, Geof. (2016). Module 14: Appraising Digital Records. Appraisal and Acquisition Strategies, edited by Michael J. Shallcross, Christopher J. Prom, Society of American Archivists.

Pennock, M. (2006). Curating E-Mails: A life-cycle approach to the management and preservation of e-mail messages. University of Bath, Bath, U.K. Retrieved from: (accessed 21 April, 2017).

Prom, C. J. (2011). Preserving email. Digital Preservation Coalition. Heslington, U.K. Retrieved from:

Shunaman, E. (2016). Practical email archiving for cultural institutions. Retrieved from

Sloss, K. (2016). Making born digital records management easy. Retrieved from:

Viegas, F. B., Golder, S., Donath, J. (2006). Visualizing email content: Portraying relationships from conversational histories. CHI ’06: Proceedings of the SIGCHI conference on Human Factors in computing systems. Retrieved from:

Further Reading

Chan, P. and Josh Schneider. (2016, May 3). Let the entities describe themselves [Blog post]. bloggERS!

Cronin, B. (2015). New email archive tool to sift literary legacies. Wall Street Journal. Retrieved from:

ePADD Presentations and Publications. (2011-2016). Stanford University Libraries.

InterPares 3: General Study 05 – Keeping and Preserving E-mail. Final Report 2009:

Moser, Benjamin, (2014), “In the Sontag archives,” New York Times

Owens, T. (2014, October 20). The ePADD Team on Processing and Accessing Email

Archives [Blog post]. The Signal.

Schneider, J. (2015). ePADD: A new platform for conducting DH research on email correspondence. Retrieved from:

White, A. (2016). Politics, transparency, and email: Lessons learned from trying to preserve the historical record [Blog post]. bloggERS! Retrieved from:

Zalinger, B.; et al. (2013). Reading Ben Shneiderman’s Email: Identifying Narrative Elements in Email Archives. Personal Archiving: Preserving Our Digital Heritage. Medford, New Jersey.


Leave a Reply

Your email address will not be published. Required fields are marked *