Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[daggy-u] - Misc. formatting changes #17287

Merged
merged 2 commits into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -56,63 +56,63 @@ The columns in the following table are as follows:

---

- **README.md**
- `README.md`
- Python
- A description and starter guide for the Dagster project.

---

- **dagster_university/**
- `dagster_university/`
- Dagster
- A Python module that will contain your Dagster code. This directory also contains the following:
- `__init__.py` - This file includes a `Definitions` object that defines that is loaded in your project, such as assets and sensors. This allows Dagster to load the definitions in a module. We’ll discuss this topic, and this file, later in this course.
- Several directories, for example: `/assets`. These directories follow our recommended best practices and will be used to contain the definitions - like assets - you create in the following lessons. We’ll discuss the files they contain later, too.

---

- **dagster_university/**init**.py**
- `dagster_university/__init__.py`
- Dagster
- Each Python module has an `__init__.py`. This root-level `__init__.py` is specifically used to import and combine the different aspects of your Dagster project. This is called defining your Code Location. You’ll learn more about this in a future lesson.

---

- **dagster_university/assets/constants.py**
- `dagster_university/assets/constants.py`
- Dagster U
- A pre-made file with some string constants that you’ll reference for convenience.

---

- **dagster_university_tests/**
- `dagster_university_tests/`
- Dagster
- A Python module that contains unit tests for `dagster_university`

---

- **data/**
- `data/`
- Dagster U
- This directory (and directories within it) is where you’ll store the data assets you’ll make during this course. In production settings, this could be Amazon S3 or a data warehouse.

---

- **.env**
- `.env`
- Python
- A text file containing pre-configured environment variables. We’ll talk more about this file in Lesson 6, when we cover connecting to external services.

---

- **pyproject.toml**
- `pyproject.toml`
- Python
- A file that specifies package core metadata in a static, tool-agnostic way. This file includes a `tool.dagster` section which references the Python module with your Dagster definitions defined and discoverable at the top level. This allows you to use the `dagster dev` command to load your Dagster code without any parameters.

---

- **setup.py**
- `setup.py`
- Python
- A build script with Python package dependencies for your new project as a package. This file is used to specify dependencies.

---

- **setup.cfg**
- `setup.cfg`
- Python
- A file that contains option defaults for `setup.py` commands.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,23 @@ To practice what you’ve learned, partition the `taxi_trips` asset by month usi

- With every partition, insert the new data into the `taxi_trips` table

- For convenience, add a `partition_date` column to represent which partition the record was inserted from. You’ll need to drop the existing `taxi_trips` because of the new `partition_date` column. In a Python REPL or scratch script, run the following:
- For convenience, add a `partition_date` column to represent which partition the record was inserted from.

{% callout %}
You’ll need to drop the existing `taxi_trips` because of the new `partition_date` column. In a Python REPL or scratch script, run the following:

```yaml
import duckdb
conn = duckdb.connect(database="data/staging/data.duckdb")
conn.execute("drop table trips;")
```
{% /callout %}

- Because the `taxi_trips` table will exist after the first partition materializes, the SQL query will have to change

- In this asset, you’ll need to do three actions:
- Create the `taxi_trips` table if it doesn’t already exist
- Delete any old data from that `partition_date` to prevent duplicates when backfilling
- Delete any old data from `partition_date` to prevent duplicates when backfilling
- Insert new records from the month’s parquet file

---
Expand Down