Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature fraud detection 1.2.0 #148

Open
wants to merge 64 commits into
base: release/fy24-q1
Choose a base branch
from

Conversation

AlexanderOllman
Copy link
Collaborator

Provide a clear and concise description of the content changes you're proposing. List all the changes you are making
to the content.

  • Updated entire main notebook, providing more context to cells and splitting cells into sections.
  • Applied v1.2 variable fix (should already be present in release-1.2.0 version)
  • Updated README with new pyenv instructions, including new descriptive flow diagram (to be standard across new demos)

If there is no issue related to this PR, kindly create one first to describe the motivation behind these changes.

Checklist:

  • I have checked that my enhancements are not duplicates of existing content changes or additions.
  • I have tested the changes in a working environment to ensure they function as intended.
  • I have followed the style guide
    outlined in the contribution guidelines.

Reviewer's Tasks (for maintainers reviewing this PR):

  • Verify that the tutorial functions correctly in a live environment.
  • Verify that the updated content aligns with the style guide
    in the contribution guidelines.
  • Check for consistency, grammar, and clarity throughout the updated content.
  • Check that the related GitHub issue is up-to-date.

Dimitris Poulopoulos and others added 30 commits September 1, 2023 18:36
Use a set of public docker images by default, since new clusters do
not have the permissions to pull images from the gcr.io registry.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
bike-sharing: Call the mlflow library after imports
Create a new 'applications' directory to house tutorias that are
tailored to specific applications.

Relocate the following tutorials to this new directory:

* feast: ride-sharing
* kubeflow-pipelines: financial-time-series
* mlflow: bike-sharing
* ray: news-recommendation

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Create a new 'integration-tutorials' directory to house tutorials that
showcase the integration of different applications inside the EzUA
platform.

Relocate the following tutorials to this new directory:

* fraud-detection
* house-pricing
* investment-banking
* loan-approval
* question-answering
* wind-turbine

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Delete any unused or deprecated tutorial or file from the
'qzua-tutorials' catalogue.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Signed-off-by: Dimitris Poulopoulos <[email protected]>
Revise the README file to explain the structure of the repository and
detail how to get started, what are the requirements to follow these
tutorials, as well as where to get help.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Introduce the 'CONTRIBUTING.md' file to specify the contribution
guidelines of the repository.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Refactor the repository to adhere to the following structure for
better readability and accessibility:

* Demos: Quick, low-code guides that showcase the platform's features
* Tutorials: Detailed, slow-paced guides designed to teach the
  functionalities of various tools

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Add a Pull Request (PR) template for content enhancements.
This template provides sections that contributors should complete
to create a thorough and detailed PR.
Introduce the GitHub issue templates through which contributors
can submit bug reports or new tutorial requests.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Add an `environment.yaml` file to capture the dependencies of the
tutorial. This file creates a new conda environment, called
`ride-sharing`, which installs every library the tutorial needs to run.

Refs #62

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Improve the Feast Ride Sharing tutorial by:
- Updating the code to work with the latest version of the Feast
  Python client.
- Extending the Notebook documentation and correcting syntax and
  grammar.
- Using the training dataset ingested in Feast to train a simple model.
- Simplifying the `definitions.py` file.

Refs #62

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Refs #62

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Add a fresh README that:
- Describes the tutorial's focus.
- Outlines prerequisites for getting started.
- Guides users on execution.
- Includes a references section.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Annotate the Notebooks of the MLflow example to provide more
information about the tutorial and its execution.

Closes #70

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Remove the unnecessary dependencies from the `environment.yaml` file.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Remove the KServe manifest. They are now generated as part of the
Notebook execution.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Update the README file to include information about the demo
procedure and a references section.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Improve the Notebook file by separating the model deployment section
in a new Pipeline step.

Also, set a new variable to hold the current user, as pipeline steps
might have no notion of the environment variable "USER".

Closes #68

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Update the README file to follow the contributing guidelines.

Refs #68

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Enhance the Notebook user experience by:
- Introducing a code cell to upload the dataset to its appropriate path
  prior to its use inside the Spark interactive session. This fixes the
  error where Spark tries to load the dataset from a location that does
  not exist.
- Refining Notebook annotations for a clearer tutorial flow.

Closes #64

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Eliminate the reduntant Python script that duplicates the code from the
notebook, offering no additional value to this tutorial.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Enhance the README file by:
- Introducing a 'Procedure' section that walks the user through the
  necessary steps for a successful run.
- Incorporating a "How it Works" section that elucidates what Livy and
  Sparkmagic are, and how they collaboratively streamline interactions
  with a Spark cluster.
- Including a 'References' section, providing links for extended
  reading.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Make the LLM predictor Docker image lighter:
- Remove the LangChain dependency: LangChain is not needed anymore since
  it was only used as a wrapper around the GPT4ALL Python library. The
  predictor now uses the GPT4ALL library directly to generate text.
- Remove the model: The model is not part of the Docker image anymore.
  The predictor is not responsible to download the model during runtime.
  This also permits us to change the model type (i.e., architecture)
  by passing an argument to the pod command (--model).

These changes yield a lighter image that is faster to build and push.

Refs #57

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Refine the LLM Transformer Docker image:
- Clean up the code: Remove unused imports.
- Introduce new features: add the `num_docs` argument for controlling
  the document retrieval count.
- Upgrade the KServe dependency: Pin the KServe dependency to `0.11.0`
  instead of `0.11.0rc0`

Refs #57

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Enhance the Vector Store Docker image:
- Support new features: Introduce the `num_docs` argument to control the
  document retrieval count.
- Pin dependencies: Pin all dependencies to a version that works for
  this tutorial.
- Fetch Torch+CPU: Download the CPU variant of PyTorch, which makes the
  image much lighter.

Refs #57

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Dimitris Poulopoulos and others added 15 commits October 13, 2023 17:46
Improve the Notebook file by:
- Fixing typos and wording.
- Fixing the code cells to adhere to the 69-character limit of PEP-8.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Improve the README file by:
- Adding a "How it Works" section.
- Fixing typos and wording.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Improve the Notebook file by:
- Fixing typos and wording.
- Change code cells and Python code to adhere to the 69-character limit
  of PEP-8.
- Use `dataset` as the data derectory. We use this name to standardise
  the directory that houses the datasets.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Improve the README file by:
- Fixing typs and wording.
- Adding a table of contents.
- Adding a "How it Works" section.
- Adding a "Clean Up" section.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Improve the Notebooks by:
- Adding table of contents.
- Fixing typos and wording.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Move the Bike Sharing example into the `demos` derectory as it
integrates more than one tools (i.e., MLflow and KServe).

Signed-off-by: Dimitris Poulopoulos <[email protected]>
Create a user interface, using the Gradio framework, for the fraud
detection application:

- Add the source code and the Dockerfile to build the Docker image.
- Add the Helm chart that installs the application.
- Amend the README instructions to include the new UI.

Signed-off-by: Dimitris Poulopoulos <[email protected]>
@AlexanderOllman AlexanderOllman changed the base branch from release-1.2.0 to release/fy24-q1 February 7, 2024 19:32
@prakashmirji
Copy link
Collaborator

why so many files are deleted in this PR?

@ask664
Copy link
Collaborator

ask664 commented Mar 12, 2024

without approval don't merge.

@sercanCyberVision
Copy link
Collaborator

Why do we delete all the Ray examples/ReadMe files and leave only one notebook for only one example?
image
All the detailed explanations, GPU example, fibonacci example are gone.

I have a ticket where I am working on restructuring/improving Ray example here https://jira-pro.it.hpe.com:8443/browse/EZAF-4409. With this ticket I will;

  • Keep all tutorials in one folder that named Ray.
  • Have two sub-folders as GPU and CPU.
  • Have 2 CPU and 1 GPU examples.
  • Check and update the read.me files if necessary.
  • Make sure that tutorials have their own points/purposes.

If we have an intention to improve/simplify our tutorials, lets decide some standards, create tickets for each app, and let the app owner do the changes.

The changes in this PR for Ray do not align with https://jira-pro.it.hpe.com:8443/browse/EZAF-4409. If we still need to merge this PR, @dpoulopoulos please exclude the changes related to Ray, I will be working on it.

@prakashmirji @prasadadireddi @ask664

@sercanCyberVision
Copy link
Collaborator

Please see Ray tutorial PR #155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants