Skip to content
Change the repository type filter

All

    Repositories list

    • pplc

      Public
      pandemic program link checker
      Python
      MIT License
      0000Updated Oct 31, 2024Oct 31, 2024
    • Database views and scripts that support the Mosaic LLM project
      Shell
      0000Updated Oct 31, 2024Oct 31, 2024
    • Corpus-specific schema objects for UN Archives metadata and text
      0000Updated Oct 31, 2024Oct 31, 2024
    • Course materials from the 2023 & 2024 Archiving Digital Records track of the Archives as Data Summer Institute.
      1300Updated Oct 30, 2024Oct 30, 2024
    • REST API for Freedom of Information Archive (FOIArchive)
      Python
      03103Updated Oct 26, 2024Oct 26, 2024
    • History Lab COVID-19 Archive Prototype
      Python
      MIT License
      0000Updated Oct 25, 2024Oct 25, 2024
    • Streamlit for FOIArchive search GUI
      Python
      MIT License
      1200Updated Oct 24, 2024Oct 24, 2024
    • Scripts for updating corpus-specific topic models in the FOIArchive database.
      Shell
      MIT License
      0000Updated Oct 21, 2024Oct 21, 2024
    • SQL scripts for dumping FOIArchive data to CSV
      0000Updated Sep 27, 2024Sep 27, 2024
    • Scripts, configuration and examples for the PostgREST proof of concept
      PLpgSQL
      0100Updated Sep 18, 2024Sep 18, 2024
    • Utility that takes a FOIArchive database SQL query as input and produces the result set in a JSON file.
      Python
      0000Updated Sep 4, 2024Sep 4, 2024
    • Jupyter Notebook
      1000Updated May 20, 2024May 20, 2024
    • Example of querying the FOIArchive REST API via a Python program
      Jupyter Notebook
      MIT License
      0000Updated Feb 27, 2024Feb 27, 2024
    • Research project investigating OCR evaluation mechanisms at Columbia's History Lab.
      Python
      1000Updated Feb 13, 2024Feb 13, 2024
    • Downloads PDFs and stores the text in the FOIArchive database and a copy in an s3 bucket
      Python
      MIT License
      0000Updated Jan 28, 2024Jan 28, 2024
    • Scripts for preprocessing and loading of metadata and text for the History Lab-Muckrock COVID-19 Collection
      Python
      MIT License
      0000Updated Oct 16, 2023Oct 16, 2023
    • piir-eval

      Public
      Framework for PII redaction evaluation
      PLpgSQL
      0100Updated Apr 28, 2023Apr 28, 2023
    • piir-gui

      Public
      Streamlit web GUI for PII redaction POC
      Python
      MIT License
      0100Updated Nov 23, 2022Nov 23, 2022
    • Script for determining the primary language of a document in a FOIArchive collection
      Python
      MIT License
      0000Updated Jun 29, 2022Jun 29, 2022
    • Finds dates in the first N characters of a FOIArchive doc. Useful for finding or confirming a document date.
      Python
      MIT License
      0000Updated Jun 29, 2022Jun 29, 2022
    • Java
      1015Updated Jun 21, 2022Jun 21, 2022
    • Harvests archive metadata via OAI-PMH API.
      Python
      MIT License
      0000Updated May 3, 2022May 3, 2022
    • Streamlit interface to pdf2mbox
      Python
      MIT License
      0000Updated Mar 22, 2022Mar 22, 2022
    • xmpdf

      Public
      A Python module for extracting emails from a PDF.
      Python
      MIT License
      0200Updated Mar 22, 2022Mar 22, 2022
    • pdf2mbox

      Public
      a command-line utility and Python package for converting PDF emails to MBOX format
      Python
      MIT License
      0500Updated Mar 22, 2022Mar 22, 2022
    • Jupyter Notebook
      1000Updated Aug 22, 2019Aug 22, 2019
    • cabinet

      Public
      1100Updated Apr 3, 2018Apr 3, 2018
    • 0000Updated Jul 21, 2017Jul 21, 2017
    • Scraper for state department consular names and positions
      Python
      0000Updated Apr 28, 2014Apr 28, 2014