-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add doctests #7145
base: main
Are you sure you want to change the base?
Add doctests #7145
Changes from 10 commits
fa37c79
983e2cb
91dc141
a7ba1c6
05d3772
e0574e0
512a525
624f3f2
5532415
bc1e717
29aaa7b
b86eaca
a302912
23a7d66
31a6c63
6ed9406
06a3cb3
adc3e26
61804a7
519b67e
261dc32
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
name: Links | ||
|
||
on: | ||
repository_dispatch: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is it a repository dispatch workflow? |
||
workflow_dispatch: | ||
schedule: | ||
- cron: "00 18 * * *" | ||
DN6 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
jobs: | ||
linkChecker: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v3 | ||
|
||
- name: Link Checker | ||
id: lychee | ||
uses: lycheeverse/lychee-action@v1 | ||
with: | ||
args: './**/*.md' | ||
fail: true | ||
|
||
- name: Create Issue From File | ||
if: env.lychee_exit_code != 0 | ||
uses: diffusers/create-issue-from-file@v4 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would this do? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And where does this workflow live within |
||
with: | ||
title: Link Checker Report | ||
content-filepath: ./lychee/out.md | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it automatically create the |
||
labels: report, automated issue |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
name: Doctests | ||
|
||
on: | ||
push: | ||
branches: | ||
- doctest* | ||
sayakpaul marked this conversation as resolved.
Show resolved
Hide resolved
|
||
repository_dispatch: | ||
schedule: | ||
- cron: "0 0 * * *" | ||
|
||
env: | ||
HF_HOME: /mnt/cache | ||
RUN_SLOW: yes | ||
OMP_NUM_THREADS: 16 | ||
MKL_NUM_THREADS: 16 | ||
|
||
jobs: | ||
run_doctests: | ||
runs-on: [single-gpu, nvidia-gpu, a10, ci] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I suspect some snippets might OOM out in case they don't have the memory optimization bits enabled? I know we cannot catch it early. Just making a note here. |
||
container: | ||
image: huggingface/diffusers-all-latest-gpu | ||
sayakpaul marked this conversation as resolved.
Show resolved
Hide resolved
|
||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ | ||
|
||
steps: | ||
- name: Checkout diffusers | ||
uses: actions/checkout@v3 | ||
with: | ||
fetch-depth: 2 | ||
|
||
- name: NVIDIA-SMI | ||
uses: actions/checkout@v3 | ||
run: | | ||
nvidia-smi | ||
|
||
- name: Install dependencies | ||
run: python3 -m pip install -e .[quality,test,training] | ||
|
||
- name: Environment | ||
run: | | ||
python3 utils/print_env.py | ||
|
||
- name: Get doctest files | ||
run: | | ||
$(python3 -c 'from utils.tests_fetcher import get_all_doctest_files; to_test = get_all_doctest_files(); to_test = " ".join(to_test); fp = open("doc_tests.txt", "w"); fp.write(to_test); fp.close()') | ||
sayakpaul marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- name: Run doctests | ||
env: | ||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }} | ||
run: | | ||
python3 -m pytest -v --make-reports doc_tests_gpu --doctest-modules $(cat doc_tests.txt) -sv --doctest-continue-on-failure --doctest-glob="*.md" | ||
|
||
- name: Failure short reports | ||
if: ${{ failure() }} | ||
continue-on-error: true | ||
run: cat reports/doc_tests_gpu/failures_short.txt | ||
|
||
- name: Test suite reports artifacts | ||
if: ${{ always() }} | ||
uses: actions/upload-artifact@v3 | ||
with: | ||
name: doc_tests_gpu_test_reports | ||
path: reports/doc_tests_gpu | ||
|
||
send_results: | ||
name: Send results to webhook | ||
runs-on: ubuntu-22.04 | ||
if: always() | ||
needs: [run_doctests] | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: actions/download-artifact@v3 | ||
- name: Send message to Slack | ||
env: | ||
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }} | ||
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY_DOCS }} | ||
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY_DOCS }} | ||
Comment on lines
+75
to
+76
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just ensuring if we have configured the channels already? |
||
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to be informed about the dummy tests? |
||
run: | | ||
pip install slack_sdk | ||
python utils/notification_service_doc_tests.py | ||
DN6 marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,185 @@ | ||
import doctest | ||
import inspect | ||
import os | ||
import re | ||
from typing import Iterable | ||
|
||
from .utils import is_pytest_available | ||
|
||
|
||
if is_pytest_available(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be under |
||
from _pytest.doctest import ( | ||
Module, | ||
_get_checker, | ||
_get_continue_on_failure, | ||
_get_runner, | ||
_is_mocked, | ||
_patch_unwrap_mock_aware, | ||
get_optionflags, | ||
import_path, | ||
) | ||
from _pytest.outcomes import skip | ||
from pytest import DoctestItem | ||
else: | ||
Module = object | ||
DoctestItem = object | ||
|
||
""" | ||
The following contains utils to run the documentation tests without having to overwrite any files. | ||
|
||
The `preprocess_string` function adds `# doctest: +IGNORE_RESULT` markers on the fly anywhere a `load_dataset` call is | ||
made as a print would otherwise fail the corresonding line. | ||
|
||
To skip cuda tests, make sure to call `SKIP_CUDA_DOCTEST=1 pytest --doctest-modules <path_to_files_to_test> | ||
""" | ||
|
||
|
||
def preprocess_string(string, skip_cuda_tests): | ||
"""Prepare a docstring or a `.md` file to be run by doctest. | ||
|
||
The argument `string` would be the whole file content if it is a `.md` file. For a python file, it would be one of | ||
its docstring. In each case, it may contain multiple python code examples. If `skip_cuda_tests` is `True` and a | ||
cuda stuff is detective (with a heuristic), this method will return an empty string so no doctest will be run for | ||
`string`. | ||
""" | ||
codeblock_pattern = r"(```(?:python|py)\s*\n\s*>>> )((?:.*?\n)*?.*?```)" | ||
|
||
codeblocks = re.split(re.compile(codeblock_pattern, flags=re.MULTILINE | re.DOTALL), string) | ||
is_cuda_found = False | ||
for i, codeblock in enumerate(codeblocks): | ||
if "load_dataset(" in codeblock and "# doctest: +IGNORE_RESULT" not in codeblock: | ||
codeblocks[i] = re.sub(r"(>>> .*load_dataset\(.*)", r"\1 # doctest: +IGNORE_RESULT", codeblock) | ||
if ( | ||
(">>>" in codeblock or "..." in codeblock) | ||
and re.search(r"cuda|to\(0\)|device=0", codeblock) | ||
and skip_cuda_tests | ||
): | ||
is_cuda_found = True | ||
break | ||
|
||
modified_string = "" | ||
if not is_cuda_found: | ||
modified_string = "".join(codeblocks) | ||
|
||
return modified_string | ||
|
||
|
||
class HfDocTestParser(doctest.DocTestParser): | ||
""" | ||
Overwrites the DocTestParser from doctest to properly parse the codeblocks that are formatted with black. This | ||
means that there are no extra lines at the end of our snippets. The `# doctest: +IGNORE_RESULT` marker is also | ||
added anywhere a `load_dataset` call is made as a print would otherwise fail the corresponding line. | ||
|
||
Tests involving cuda are skipped base on a naive pattern that should be updated if it is not enough. | ||
""" | ||
|
||
# This regular expression is used to find doctest examples in a | ||
# string. It defines three groups: `source` is the source code | ||
# (including leading indentation and prompts); `indent` is the | ||
# indentation of the first (PS1) line of the source code; and | ||
# `want` is the expected output (including leading indentation). | ||
# fmt: off | ||
_EXAMPLE_RE = re.compile(r''' | ||
# Source consists of a PS1 line followed by zero or more PS2 lines. | ||
(?P<source> | ||
(?:^(?P<indent> [ ]*) >>> .*) # PS1 line | ||
(?:\n [ ]* \.\.\. .*)*) # PS2 lines | ||
\n? | ||
# Want consists of any non-blank lines that do not start with PS1. | ||
(?P<want> (?:(?![ ]*$) # Not a blank line | ||
(?![ ]*>>>) # Not a line starting with PS1 | ||
# !!!!!!!!!!! HF Specific !!!!!!!!!!! | ||
(?:(?!```).)* # Match any character except '`' until a '```' is found (this is specific to HF because black removes the last line) | ||
# !!!!!!!!!!! HF Specific !!!!!!!!!!! | ||
(?:\n|$) # Match a new line or end of string | ||
)*) | ||
''', re.MULTILINE | re.VERBOSE | ||
) | ||
# fmt: on | ||
|
||
# !!!!!!!!!!! HF Specific !!!!!!!!!!! | ||
skip_cuda_tests: bool = bool(os.environ.get("SKIP_CUDA_DOCTEST", False)) | ||
# !!!!!!!!!!! HF Specific !!!!!!!!!!! | ||
|
||
def parse(self, string, name="<string>"): | ||
""" | ||
Overwrites the `parse` method to incorporate a skip for CUDA tests, and remove logs and dataset prints before | ||
calling `super().parse` | ||
""" | ||
string = preprocess_string(string, self.skip_cuda_tests) | ||
return super().parse(string, name) | ||
|
||
|
||
class HfDoctestModule(Module): | ||
""" | ||
Overwrites the `DoctestModule` of the pytest package to make sure the HFDocTestParser is used when discovering | ||
tests. | ||
""" | ||
|
||
def collect(self) -> Iterable["DoctestItem"]: | ||
class MockAwareDocTestFinder(doctest.DocTestFinder): | ||
"""A hackish doctest finder that overrides stdlib internals to fix a stdlib bug. | ||
|
||
https://github.com/pytest-dev/pytest/issues/3456 https://bugs.python.org/issue25532 | ||
""" | ||
|
||
def _find_lineno(self, obj, source_lines): | ||
"""Doctest code does not take into account `@property`, this | ||
is a hackish way to fix it. https://bugs.python.org/issue17446 | ||
|
||
Wrapped Doctests will need to be unwrapped so the correct line number is returned. This will be | ||
reported upstream. #8796 | ||
""" | ||
if isinstance(obj, property): | ||
obj = getattr(obj, "fget", obj) | ||
|
||
if hasattr(obj, "__wrapped__"): | ||
# Get the main obj in case of it being wrapped | ||
obj = inspect.unwrap(obj) | ||
|
||
# Type ignored because this is a private function. | ||
return super()._find_lineno( # type:ignore[misc] | ||
obj, | ||
source_lines, | ||
) | ||
|
||
def _find(self, tests, obj, name, module, source_lines, globs, seen) -> None: | ||
if _is_mocked(obj): | ||
return | ||
with _patch_unwrap_mock_aware(): | ||
# Type ignored because this is a private function. | ||
super()._find( # type:ignore[misc] | ||
tests, obj, name, module, source_lines, globs, seen | ||
) | ||
|
||
if self.path.name == "conftest.py": | ||
module = self.config.pluginmanager._importconftest( | ||
self.path, | ||
self.config.getoption("importmode"), | ||
rootpath=self.config.rootpath, | ||
) | ||
else: | ||
try: | ||
module = import_path( | ||
self.path, | ||
root=self.config.rootpath, | ||
mode=self.config.getoption("importmode"), | ||
) | ||
except ImportError: | ||
if self.config.getvalue("doctest_ignore_import_errors"): | ||
skip("unable to import module %r" % self.path) | ||
else: | ||
raise | ||
|
||
# !!!!!!!!!!! HF Specific !!!!!!!!!!! | ||
finder = MockAwareDocTestFinder(parser=HfDocTestParser()) | ||
# !!!!!!!!!!! HF Specific !!!!!!!!!!! | ||
optionflags = get_optionflags(self) | ||
runner = _get_runner( | ||
verbose=False, | ||
optionflags=optionflags, | ||
checker=_get_checker(), | ||
continue_on_failure=_get_continue_on_failure(self.config), | ||
) | ||
for test in finder.find(module, module.__name__): | ||
if test.examples: # skip empty doctests and cuda | ||
yield DoctestItem.from_parent(self, name=test.name, runner=runner, dtest=test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this workflow do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it tests for broken links.
cc: @stevhliu can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this uses lychee to test for broken links