[WIP] Port R docs vignettes to epidatpy #40

rzats · 2024-08-16T14:05:39Z

Original vignettes can be found here: https://github.com/cmu-delphi/epidatr/tree/dev/vignettes

Makefile

rzats · 2024-08-16T14:06:18Z

docs/getting_started_with_epidatpy.rst

+Because the output data is a standard Pandas DataFrame, we can easily plot
+it using any of the available Python libraries:
+
+>>> data.plot(x="time_value", y="value", title="Smoothed CLI from Facebook Survey", xlabel="Date", ylabel="CLI")


What other visualization would we like?

This looks pretty good, but let's change this to a plot of multiple lines, one for each of pa, ca, and fl. Hopefully the scale is comparable (otherwise we might need to choose a normalized signal).

Also in the versioned_data.rst, it would be nice to have a plot that demonstrates data revisions where a time series as_of an older versioned is plotted against a more up to date snapshot. There are some plots that demonstrate this in epipredict vignettes (though they do it in the context of forecasting). We don't need to do anything quite so fancy, you just might be able to use the signals used there to generate a simpler plot. Just one signal but at three different as_of_snapshots (maybe a week apart?) and a choice of date and location so that the revision is visible would be good enough.

Added both of these :)

rzats · 2024-08-16T14:07:13Z

epidatpy/_covidcast.py

@@ -72,7 +72,7 @@ def define_covidcast_fields() -> List[EpidataFieldInfo]:
        EpidataFieldInfo("lag", EpidataFieldType.int),
        EpidataFieldInfo("value", EpidataFieldType.float),
        EpidataFieldInfo("stderr", EpidataFieldType.float),
-        EpidataFieldInfo("sample_size", EpidataFieldType.int),
+        EpidataFieldInfo("sample_size", EpidataFieldType.text),


Without this change the second query of docs_smoke_test fails, because in the dataset there's a sample size of O (the letter O, not zero).

😱

Could you isolate that row, get a reproducible API request, and make an issue about it in delphi-epidata? We definitely need to fix it in the database (and where-ever in acquisition that that O got introduced).

Ah, turns out I was dealing with a slightly misleading error message :) This column is supposed to have a float type, as documented here - that was the actual issue.

docs_smoke_test.py

dshemetov

Thanks Rostyslav, this looks great. I made a few requests to fix a few examples, but overall this is excellent. We might just need to let sample_size be type text for now (wow). A better getting_started plot would be good (request in a comment) and a simple plot for revisions would be good (request in a comment). Everything else is looking great though.

A few more things:

please remove the old unfinished docs: signals_covid.rst and covidcast_examples.rst
move the epidatpy reference section below the intro guides in the left bar
the signal-discovery page in epidatr has a table of available endpoints that produces a table like what is now in getting_started.rst... I would like to combine those; could you write a simple available_endpoints() function that pretty prints a table similar to what is there in R and use that function to produce a table in signal_discovery.rst?
let's move the information about epiweeks and dates from getting_started.rst to the bottom of getting_started_with_epidatpy.rst
after that we can remove getting_started.rst

docs/getting_started_with_epidatpy.rst

dshemetov · 2024-08-29T01:28:21Z

docs/getting_started_with_epidatpy.rst

+Because the output data is a standard Pandas DataFrame, we can easily plot
+it using any of the available Python libraries:
+
+>>> data.plot(x="time_value", y="value", title="Smoothed CLI from Facebook Survey", xlabel="Date", ylabel="CLI")


This looks pretty good, but let's change this to a plot of multiple lines, one for each of pa, ca, and fl. Hopefully the scale is comparable (otherwise we might need to choose a normalized signal).

dshemetov · 2024-08-29T01:31:06Z

docs/signal_discovery.rst

+This DataFrame contains the following columns:
+
+- ``source`` - Data source name.
+- ``signal`` - Signal name.
+- ``description`` - Description of the signal.
+- ``reference_signal`` - Geographic level for which this signal is available, such as county, state, msa, hss, hrr, or nation. Most signals are available at multiple geographic levels and will hence be listed in multiple rows with their own metadata.
+- ``license`` - The license
+- ``dua`` - Link to the Data Use Agreement.


fix: could you update the columns listed? They don't match the dataframe above.

docs/signal_discovery.rst

dshemetov · 2024-08-29T01:32:08Z

docs/signal_discovery.rst

+
+- ``data_source`` - Data source name.
+- ``signal`` - Signal name.
+- ``name`` - Name of signal.


Suggested change

- ``name`` - Name of signal.

- ``name`` - Human-readable signal name.

dshemetov · 2024-08-29T01:35:23Z

docs/getting_started_with_epidatpy.rst

+Because the output data is a standard Pandas DataFrame, we can easily plot
+it using any of the available Python libraries:
+
+>>> data.plot(x="time_value", y="value", title="Smoothed CLI from Facebook Survey", xlabel="Date", ylabel="CLI")


Also in the versioned_data.rst, it would be nice to have a plot that demonstrates data revisions where a time series as_of an older versioned is plotted against a more up to date snapshot. There are some plots that demonstrate this in epipredict vignettes (though they do it in the context of forecasting). We don't need to do anything quite so fancy, you just might be able to use the signals used there to generate a simpler plot. Just one signal but at three different as_of_snapshots (maybe a week apart?) and a choice of date and location so that the revision is visible would be good enough.

docs_smoke_test.py

dshemetov · 2024-08-29T01:38:38Z

epidatpy/_covidcast.py

@@ -72,7 +72,7 @@ def define_covidcast_fields() -> List[EpidataFieldInfo]:
        EpidataFieldInfo("lag", EpidataFieldType.int),
        EpidataFieldInfo("value", EpidataFieldType.float),
        EpidataFieldInfo("stderr", EpidataFieldType.float),
-        EpidataFieldInfo("sample_size", EpidataFieldType.int),
+        EpidataFieldInfo("sample_size", EpidataFieldType.text),


😱

Could you isolate that row, get a reproducible API request, and make an issue about it in delphi-epidata? We definitely need to fix it in the database (and where-ever in acquisition that that O got introduced).

rzats · 2024-09-10T17:22:37Z

@dshemetov finished updating this, please take a look again!

One thing that's still a WIP:

the signal-discovery page in epidatr has a table of available endpoints that produces a table like what is now in getting_started.rst... I would like to combine those; could you write a simple available_endpoints() function that pretty prints a table similar to what is there in R and use that function to produce a table in signal_discovery.rst?

Here I used a somewhat similar approach to R, but a bit more complex as Python doesn't have a nice native "table" format.

My solution instead is to:

grab all methods that start with pub_ or pvt_ from AEpiDataEndpoints;
get their docstrings;
combine the methods and the docstrings into a CSV file. I updated make doc so this file is regenerated before the docs;
display this CSV file as a pretty-printed table in the signal discovery doc.

Let me know if this approach is OK, and I can clean up the docstrings themselves. Currently they have a couple of missing values and inconsistent formatting, and don't specify links to the original endpoints either.

dshemetov · 2024-09-10T18:25:38Z

From a comment above: maybe a lot of the work here can be simplified by using Jupyter notebooks directly using https://github.com/spatialaudio/nbsphinx/. Pandas offers quite a lot of configuration options for how to print its tables, see here, maybe you could use that and available_endpoints()?

* switch default venv from env to .venv for uv * update gitignore

dshemetov

This looks great, thanks for setting this all up Rostyslav! I took another pass through all the documentation, made sure all the language flows, and that we we covered all the major features (like caching and signal discovery).

Port R docs vignettes to epidatpy

4443f45

rzats commented Aug 16, 2024

View reviewed changes

Makefile Show resolved Hide resolved

rzats commented Aug 16, 2024

View reviewed changes

docs_smoke_test.py Outdated Show resolved Hide resolved

rzats requested a review from dshemetov August 16, 2024 15:45

Port all initial vignettes

99fee98

dshemetov requested changes Aug 29, 2024

View reviewed changes

rzats added 3 commits September 10, 2024 16:48

Dynamic code generation

af4838c

Visualization

933be09

Final touches

ee6cec9

rzats requested a review from dshemetov September 10, 2024 17:17

rzats and others added 3 commits September 16, 2024 18:40

Port to notebooks

13609b2

newline

f7195ea

doc: iterate on all documentation pages, README

b092dad

* switch default venv from env to .venv for uv * update gitignore

dshemetov force-pushed the rzatserkovnyi/docs branch from 5294fe3 to b092dad Compare October 2, 2024 22:44

dshemetov approved these changes Oct 2, 2024

View reviewed changes

dshemetov merged commit 8d43b5e into dev Oct 2, 2024
1 check passed

dshemetov deleted the rzatserkovnyi/docs branch October 2, 2024 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Port R docs vignettes to epidatpy #40

[WIP] Port R docs vignettes to epidatpy #40

rzats commented Aug 16, 2024 •

edited

Loading

rzats Aug 16, 2024

dshemetov Aug 29, 2024

dshemetov Aug 29, 2024 •

edited

Loading

rzats Sep 10, 2024

rzats Aug 16, 2024

dshemetov Aug 29, 2024 •

edited

Loading

rzats Sep 10, 2024

dshemetov left a comment •

edited

Loading

dshemetov Aug 29, 2024

dshemetov Aug 29, 2024

dshemetov Aug 29, 2024

dshemetov Aug 29, 2024 •

edited

Loading

dshemetov Aug 29, 2024 •

edited

Loading

rzats commented Sep 10, 2024 •

edited

Loading

dshemetov commented Sep 10, 2024

dshemetov left a comment

	- ``name`` - Name of signal.
	- ``name`` - Human-readable signal name.

[WIP] Port R docs vignettes to epidatpy #40

[WIP] Port R docs vignettes to epidatpy #40

Conversation

rzats commented Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dshemetov Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dshemetov Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dshemetov left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dshemetov Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

dshemetov Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

rzats commented Sep 10, 2024 • edited Loading

dshemetov commented Sep 10, 2024

dshemetov left a comment

Choose a reason for hiding this comment

rzats commented Aug 16, 2024 •

edited

Loading

dshemetov Aug 29, 2024 •

edited

Loading

dshemetov Aug 29, 2024 •

edited

Loading

dshemetov left a comment •

edited

Loading

dshemetov Aug 29, 2024 •

edited

Loading

dshemetov Aug 29, 2024 •

edited

Loading

rzats commented Sep 10, 2024 •

edited

Loading