Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating broken notebooks #57

Merged
merged 9 commits into from
Oct 8, 2024
Merged

Updating broken notebooks #57

merged 9 commits into from
Oct 8, 2024

Conversation

Soap2G
Copy link
Collaborator

@Soap2G Soap2G commented Sep 4, 2024

This PR addresses a few straightforward code inconsistencies related to the notebooks.

List of notebooks fixed is at this link

Upgrades

I have:

  • Updated the environment.yml to account for missing packages in the docker image (e.g. hist and dask), which were causing the failure of the pipeline
  • Updated a few deprecated expressions (e.g. data_all = data_all.append(data) ---> data_all = pd.concat([data_all, data], ignore_index=True)) related to old packages.
  • Added a standard way to integrate packages' versions from the environment.yml file, through a setup cell that can be seen in this example notebook.
    The script is like this:
import yaml
import subprocess
import sys

# Path to your binder/environment.yml file
environment_file = "../../binder/environment.yml"

# Packages you want to install
required_packages = ['coffea', 'hist', 'dask', 'numpy', 'matplotlib']

# Load the environment.yml file
with open(environment_file, 'r') as file:
    environment_data = yaml.safe_load(file)

# Extract dependencies
dependencies = environment_data.get('dependencies', [])

# Create a list to hold the packages with versions
install_packages = []

# Find the versions for the required packages
for dep in dependencies:
    # Check if the dependency is a string (package name)
    if isinstance(dep, str):
        for package in required_packages:
            if dep.startswith(package):
                install_packages.append(dep)

# Install packages using pip
if install_packages:
    print(f"Installing packages: {install_packages}")
    subprocess.run([sys.executable, "-m", "pip", "install", "--upgrade", "--user"] + install_packages)
else:
    print("No matching packages found in environment.yml.")

ToDos

There are two items yet to be addressed:

  1. 13-TeV-examples/uproot_python/ttZ_ML_from_root.ipynb takes a LOT of time to run. This is due to the large number of events that have to be processed. It is ~1h run time and then it crashes. I'm not sure it's worth to keep it in the list of notebooks to be run.
  2. There is a notebook, 13-TeV-examples/uproot_python/Dark_Matter_Machine_Learning.ipynb, that has blank spaces, to be filled by the user. This cannot be run by definition, so I suggest we take it out of the list.
  3. The Coffea notebook has a nontrivial connectivity issue, probably due to an outdated version of the package fsspec, for which come samples return a FileNotFound error even if they do exist.

I'll leave these items to @AntonioJC and Reina, given that they are already working on that.

@Soap2G Soap2G added enhancement New feature or request automation CI/CD related improvement labels Sep 4, 2024
@Soap2G Soap2G self-assigned this Sep 4, 2024
@Soap2G Soap2G marked this pull request as ready for review September 8, 2024 12:04
@Soap2G Soap2G requested a review from a team as a code owner September 8, 2024 12:04
marianaiv
marianaiv previously approved these changes Sep 9, 2024
Copy link
Collaborator

@marianaiv marianaiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Giovanni! Thanks for all the work. I like the script thing in the notebooks :)
As for the notebooks you think we should remove, I'm currently going through all the notebooks, seeing if they're worth keeping and writing some documentation on the ones that are. My point is: we can discuss the notebooks worth keeping in my next PR.

Copy link
Collaborator

@AntonioJC AntonioJC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work @Soap2G! Please find some comments

@AntonioJC
Copy link
Collaborator

Regarding the notebooks discussion, I'll address that in @marianaiv's PR, thanks!

@Soap2G
Copy link
Collaborator Author

Soap2G commented Oct 2, 2024

Comments are now addressed.
There are still 3 notebooks that do not pass the pipeline:

  • Coffea: issue with the dask processing.
  • DM: unfinished cells that are supposed to be filled by the user.
  • ttZ_ML_from_root: computation takes too long

Since these notebooks are already mentioned as faulty in the description, I would go ahead and address these two issues in the next iteration, since PR #58 is depending on this.

@Soap2G Soap2G merged commit 6d1f283 into master Oct 8, 2024
8 of 11 checks passed
@AntonioJC AntonioJC deleted the notebooks-fix branch October 8, 2024 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automation CI/CD related improvement enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants