Impresso is a library to interact with the Impresso dataset. It provides a set of classes to interact with the API and a set of tools that make working with the data easier.
With pip
:
pip install impresso
See sample notebooks in the examples/notebooks directory or examples available in the Impresso Datalab.
We use Poetry for dependency management. To install the package in development mode, run the following command in the root directory of the project:
poetry install
This will create and activate a virtual environment with all the dependencies installed.
poetry run pytest
poetry run pytest
poetry run flake8 impresso tests
poetry run mypy impresso tests
The OpenAPI client is generated using the OpenAPI Generator. Pydantic models from the OpenAPI spec are generated too. The following command generates both the client code and the pydantic models:
poetry run generate-client
Whenever the OpenAPI spec of the Impresso Public API changes, the client code and the pydantic models must be regenerated.
Filters used in some endpoints are serialized as a protobuf message. The protobuf message is defined in the impresso-jscommons project. The python code is generated using the protoc
compiler (must be installed separately). The following command generates the python code for the protobuf message:
poetry run generate-protobuf
Impresso - Media Monitoring of the Past is an interdisciplinary research project that aims to develop and consolidate tools for processing and exploring large collections of media archives across modalities, time, languages and national borders. The first project (2017-2021) was funded by the Swiss National Science Foundation under grant No. CRSII5_173719 and the second project (2023-2027) by the SNSF under grant No. CRSII5_213585 and the Luxembourg National Research Fund under grant No. 17498891.
Copyright (C) 2024 The Impresso team.
This program is provided as open source under the GNU Affero General Public License v3 or later.