Adding source freshness reading and adding tests #2

ethanfuerst · 2023-04-06T14:45:05Z

creating requirements.txt and adding venv/ to .gitignore
breaking up text processing in to methods which is easier to read
using regex to make dbt core logs and dbt source logs work
formatting with black
adding tests with pytest - just run pytest in the terminal

…s logs

ethanfuerst · 2023-04-06T19:20:38Z

when following the logic to break up a line of a log, I realized that we could probably break up the logs dynamically with regex. I used regex101 to test and build the regex, you can see the logic behind it here. for a future PR, I'd like to capture test cases as well because they aren't captured in the current regex string! (you can see that in the link)

foundinblank

Thank you so much for submitting this PR! 😍 I really like a lot of the improvements you've suggested here. I just have one major concern about the regex approach vs. the current string-parsing approach, especially because I previously used a regex approach then switched to the string-parsing. Happy to discuss more the pros/cons of one approach over another.

I learned loads about how to set up test cases from this PR, by the way!

foundinblank · 2023-04-18T10:24:15Z

README.md

 * ⏱️ Return runtime duration for each still-running model
 * 📊 Communicate model runtimes for successful models and maybe a cool viz
+* 📝 Add functionality for dbt test and dbt build


I think the script currently does support dbt test and dbt build (it's able to parse output when running views, tables, incrementals, tests, and seeds)

foundinblank · 2023-04-18T10:29:07Z

app.py

-        """
-    10:03:09  1 of 10 START sql table model hyrule.source_quests  [RUN]
+
+REGEX = r"\s*(?:\[0m)?(\d{2}:\d{2}:\d{2})\s+(\d+)\s+of\s+\d+\s+.*?(?:model\s|of\s)\b(\w+\.\w+)\b"


General comment about the regex approach taken here. In the first release of this script I used a regex approach too! Then I noticed that the output was very systematic and that by dropping certain words (extraneous_words), I could parse based on space separators and output into a data frame and do the rest there.

Your PR suggests returning to the regex approach and I'm not sure I want to. Want to walk me through your reasoning some more?

foundinblank · 2023-04-18T10:29:52Z

app.py

-    10:03:09  1 of 10 START sql table model hyrule.source_quests  [RUN]
+
+REGEX = r"\s*(?:\[0m)?(\d{2}:\d{2}:\d{2})\s+(\d+)\s+of\s+\d+\s+.*?(?:model\s|of\s)\b(\w+\.\w+)\b"
+EXAMPLE_LOGS = """10:03:09  1 of 10 START sql table model hyrule.source_quests  [RUN]


Love that idea of assigning all this to a variable! Feels like a "duh"

foundinblank · 2023-04-18T10:31:35Z

app.py

+    except AttributeError:
+        st.error(
+            "The input you provided doesn't look like dbt output. Please check your input and try again."
+        )
+        st.stop()


I love this error!

foundinblank · 2023-04-18T10:32:22Z

app.py

+    try:
+        df["post_regex"] = df["raw_line"].apply(
+            lambda col: re.search(REGEX, col).groups()
+        )


See previous comment about using regex approach, help convince me this is better than df = df["raw_line"].str.split(expand=True)!

foundinblank · 2023-04-18T10:33:45Z

app.py

+    still_running = (
+        df.groupby("model_num")["raw_line"]
+        .count()
+        .reset_index()
+        .rename(columns={"raw_line": "count_records"})
+        .query("count_records == 1")["model_num"]
+        .to_list()


Nice way of thinking through this, it's cleaner than joining two subsets

foundinblank · 2023-04-18T10:34:07Z

app.py

+if __name__ == "__main__":
+    main()


What does this do?

foundinblank · 2023-04-18T10:34:37Z

app.py

+def main():
+    '''
+    Run the app
+    '''
+    title_and_description()
+    raw_input = get_logs()
+    df = clean_input(raw_input)
+    output(df)


Learned a lot from this approach of wrapping up most things in functions, thank you for the demo!

foundinblank · 2023-04-18T10:37:51Z

test_app.py

Is this a script that runs automatically or needs to be invoked manually?

ethanfuerst added 6 commits April 5, 2023 16:58

creating requirements.txt and adding space for virtual environment

7a1dc70

methodizing

77a60a3

regex-ifying to work with dbt core, dbt cloud and dbt source freshnes…

7ef84e6

…s logs

formatting with black

40bd41b

formatting, error handling and other minor changes

e3bc904

adding planned improvement to readme

5b0dea9

ethanfuerst added 7 commits April 6, 2023 16:25

removing unused column and cleaning up output

f4422aa

adding tests

93d8ed6

adding pycache to gitignore

3bb8e56

removing unused comment

26840aa

updating requirements

8df7eca

linting

6b50026

first pass at tests

7c2691c

ethanfuerst marked this pull request as ready for review April 6, 2023 20:38

ethanfuerst added 4 commits April 6, 2023 16:45

updating requirements

bb70b83

using pytest for tests

a74d2f7

adding assert clause to tests

8420a8f

removing workflows for now

016fc2c

ethanfuerst changed the title ~~Adding source freshness~~ Adding source freshness reading and adding tests Apr 6, 2023

ethanfuerst requested a review from foundinblank April 6, 2023 20:51

ethanfuerst added 2 commits April 6, 2023 16:53

removing unused gitignore

3807537

adding docstrings

c1b7e81

foundinblank reviewed Apr 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding source freshness reading and adding tests #2

Adding source freshness reading and adding tests #2

ethanfuerst commented Apr 6, 2023 •

edited

Loading

ethanfuerst commented Apr 6, 2023 •

edited

Loading

foundinblank left a comment

foundinblank Apr 18, 2023

foundinblank Apr 18, 2023

foundinblank Apr 18, 2023

foundinblank Apr 18, 2023

foundinblank Apr 18, 2023

foundinblank Apr 18, 2023

foundinblank Apr 18, 2023

foundinblank Apr 18, 2023

foundinblank Apr 18, 2023

Adding source freshness reading and adding tests #2

Are you sure you want to change the base?

Adding source freshness reading and adding tests #2

Conversation

ethanfuerst commented Apr 6, 2023 • edited Loading

ethanfuerst commented Apr 6, 2023 • edited Loading

foundinblank left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ethanfuerst commented Apr 6, 2023 •

edited

Loading

ethanfuerst commented Apr 6, 2023 •

edited

Loading