Open Source Observer is a free analytics suite that helps funders measure the impact of open source software contributions to the health of their ecosystem.
/apps
: The OSO apps/docker
: Docker files/lib
: Common libraries/oss-artifact-validators
: Simple library to validate different properties of an "artifact"
/warehouse
: All code specific to the data warehouse/dbt
: dbt configuration/oso_dagster
: Dagster configuration for orchestrating software-defined assets/oso_lets_go
: Utility for setting up dbt with Google Cloud- Also contains other tools to manage warehouse pipelines
/ops
: Our ops related code/external-prs
: GitHub app for validating pull requests/k8s-*
: Kubernetes configuration/tf-modules
: Terraform modules
Before you begin you'll need the following on your system:
- Node >= 20 (we suggest installing with nvm)
- pnpm >= 9 (see here)
- Python >=3.11 (see here)
- Python Poetry >= 1.8 (see here)
- git (see here)
- BigQuery access (see here if you don't have it setup already)
- gcloud (see here)
First, authenticate with gcloud
:
gcloud auth application-default login
Then install Node.js dependencies
pnpm install
Also install the python dependencies
poetry install
You will also need to setup dbt
to connect to Google BigQuery for running the data pipeline. The following wizard will copy a small playground dataset to your personal Google account and setup dbt
for you.
poetry run oso_lets_go
:::tip The script is idempotent, so you can safely run it again if you encounter any issues. :::
First, make sure the environment variables are set for ./apps/frontend
.
Take a look at ./apps/frontend/.env.local.example
for the complete list.
- You can either set these yourself (e.g. in CI/CD)
- or copy the file to
.env.local
and populate it.
Then do a turbo build of all apps, run the following:
pnpm install
pnpm build
The resulting static site can be found in ./build/
.
If you've already run the build, you can use pnpm serve
to serve the built files
To run a dev server that watches for changes across code and Plasmic, run:
pnpm dev:frontend
Our datasets are public! If you'd like to use them directly as opposed to adding to our dbt models, checkout our docs!
Once installation has completed you can enter the poetry environment.
$ poetry shell
From here you should have dbt on your path.
$ which dbt
This should return something like opensource-observer/oso/.venv/bin/dbt
If you have write access to the dataset then you can connect to it by setting
the opensource_observer
profile in dbt
. Inside ~/.dbt/profiles.yml
(create
it if it isn't there), add the following:
opensource_observer:
outputs:
production:
type: bigquery
dataset: oso
job_execution_time_seconds: 300
job_retries: 1
location: US
method: oauth
project: opensource-observer
threads: 32
playground:
type: bigquery
dataset: oso_playground
job_execution_time_seconds: 300
job_retries: 1
location: US
method: oauth
project: opensource-observer
threads: 32
# By default we target the playground. it's less costly and also safer to write
# there while developing
target: playground
The Power User for dbt core extension is pretty helpful.
You'll need the path to your poetry environment, which you can get by running
poetry env info --path
Then in VS Code:
- Install the extension
- Open the command pallet, enter "Python: select interpreter"
- Select "Enter interpreter path..."
- Enter the path from the poetry command above
Check that you have a little check mark next to "dbt" in the bottom bar.
Once you've updated any models you can run dbt within the poetry environment by simply calling:
$ dbt run
:::tip
Note: If you configured the dbt profile as shown in this document,
this dbt run
will write to the opensource-observer.oso_playground
dataset.
:::
It is likely best to target a specific model so things don't take so long on some of our materializations:
$ dbt run --select {name_of_the_model}
For setup and common operations for each subproject, navigate into the respective directory and check out the README.md
.
You can also find some operations guides on our documentation.
The code and documentation in this repository is released under Apache 2.0 (see LICENSE).
This repository does not contain data. Datasets may include material that may be subject to third party rights. For details on each dataset, see the Data Overview.