Research Guides: Data and Statistics: Archival Government Data

Finding Archival Government Data, List from Santa Fe Community College

Data Rescue Project
A volunteer-driven initiative that mobilized librarians, scientists, and technologists to identify and preserve vulnerable U.S. government data, particularly environmental and climate-related datasets, during the 2016–2017 federal transition. It organized nationwide "DataRescue" events to archive datasets at risk of being altered or deleted, working in partnership with the Internet Archive and Data Refuge to ensure long-term public access.
Internet Archive's Wayback Machine
The largest and oldest web archive in the world, dating back to 1996, with over 916 billion archived web pages. Users can enter any URL to view historical snapshots of websites, including government pages that have been removed. The "Save Page Now" feature allows anyone to manually archive current pages before they're potentially removed. The Wayback Machine has been actively documenting the current administration's website removals.
End of Term Web Archive
A collaborative project specifically designed to preserve U.S. government websites during presidential transitions. Partners include the Internet Archive, Library of Congress, University of North Texas, Stanford University, and several other institutions. The 2024/2025 crawl has already collected more than 500 terabytes of data including over 100 million unique web pages. This project has been preserving transition data since 2008 and is especially valuable during the current administration's extensive content removal.
Environmental Data & Governance Initiative (EDGI)
A nonprofit organization specifically focused on preserving federal environmental and climate datasets that may be at risk of removal. EDGI collaborates with the Internet Archive and accepts public nominations for data to preserve, making it particularly relevant as environmental policy information has been targeted for removal by the current administration.
CDCGuidelines.com
An independent archive specifically preserving Centers for Disease Control and Prevention (CDC) content. This is particularly valuable as CDC pages were among the first targeted for removal in January 2025 following executive orders. The site preserves public health guidelines and recommendations that may no longer be available on official government websites.
Data Liberation Project
A collaborative initiative run by MuckRock and Big Local News that identifies and reformats government datasets of public interest. They specifically target data that has been obscured or removed from public access, making their work particularly relevant during the current administration's information removal initiatives.
Harvard Dataverse
An academic repository where researchers like Jack Cushman are actively archiving government datasets that have been deleted from official sources. The platform provides permanent access to these preserved datasets and includes robust citation information for academic and journalistic purposes.
Environmental Data & Governance Initiative (EDGI)
An activist academic collaborative focused on preserving environmental and climate data and ensuring government accountability. EDGI monitors changes to federal environmental websites and openly archives vulnerable scientific data sets from agencies like EPA, NOAA, NASA, etc. Founded during the transition to the Trump administration, EDGI works to safeguard the “environmental right to know” by building tools and networks to save datasets, web pages, and databases before they can be removed or censored. Their archive efforts have captured things like climate databases, pollution data, and environmental justice tools that were at risk of deletion, making these resources accessible through their site and partner repositories.
Data Refuge
A public, collaborative project (initiated by the University of Pennsylvania’s Program in Environmental Humanities) created to archive federal climate and environmental data in danger of being lost. Data Refuge mobilized scientists, librarians, and citizen volunteers starting in late 2016 to identify important datasets (from EPA, NOAA, NASA, etc.), especially those not easily captured by web crawlers, and save them to a secure repository. The project developed workflows to download “uncrawlable” data (such as large databases, datasets behind interactive tools, or scientific PDFs) and host them in a public data catalog. By creating research-quality copies of these federal datasets and working alongside the Internet Archive’s End of Term Harvest, Data Refuge ensured that climate data and other scientific information remained available to researchers and the public despite any removal from official websites.
DataLumos (ICPSR)
A crowd-sourced archive for valuable government datasets, hosted by the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. DataLumos invites the public, researchers, and librarians to deposit copies of federal data resources that they fear may disappear, as well as to suggest important data for ICPSR to preserve. The repository is part of ICPSR’s long-standing mission to safeguard and disseminate social science and government data. It has accepted data ranging from education statistics and economic indicators to FEMA emergency management data and museum records, making these datasets available even if the source agencies revise or remove them. DataLumos became a primary home for datasets rescued by the Data Rescue movement, providing stable, citable access to the files.
Climate Mirror
A volunteer-driven project to mirror and back up federal climate change data. Launched via a grassroots effort in late 2016, Climate Mirror distributed the task of copying critical climate and environmental datasets to multiple servers around the world in anticipation of data suppression under the Trump administration. The project works in conjunction with universities (like Penn and University of Toronto) and the Internet Archive, and it stores copies of datasets from NOAA, NASA, EPA, and other agencies. By using a network of mirrored sites and even torrents, Climate Mirror ensures that climate data (e.g. temperature records, climate models, environmental databases) remain publicly available even if the original government webpages are removed or altered. This redundancy protects against any single point of failure or deletion.