Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature improve wind turbine #74

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 55 additions & 26 deletions demos/wind-turbine/README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,68 @@
# Wind Turbine (Spark Demo)
# Wind Turbine (Spark - Livy - Sparkmagic)

In this demonstration, you use Spark to explore a dataset and train a Gradient-Boosted Tree (GBT) regressor that
leverages various features, such as wind speed and direction, to estimate the power output of a wind turbine.

![wind-farm](images/wind-farm.jpg)

Welcome! In this experiment we delve into the world of wind turbines
and harness the power of machine learning to predict their energy production!
In this demonstration, we will be using Spark to explore the training dataset
and train a Gradient-Boosted Tree (GBT) regressor that will utilize various
features, such as wind speed and direction, to estimate the power output of a
wind turbine.
Wind turbines hold tremendous potential as a sustainable source of energy, capable of supplying a substantial portion
of the world's power needs. However, the inherent unpredictability of power generation poses a challenge when it comes
to optimizing this process.

Wind turbines hold tremendous potential as a sustainable source of energy,
capable of supplying a substantial portion of the world's power needs. However,
the inherent unpredictability of power generation poses a challenge when it
comes to optimizing this process.
Fortunately, you have a powerful tool at our disposal: Machine Learning (ML). By leveraging advanced algorithms and data
analysis, you can develop models that accurately predict the power production of wind turbines. This enables you to
optimize the power generation process and overcome the challenges associated with its ingrained variability.

Fortunately, we have a powerful tool at our disposal: machine learning. By
leveraging advanced algorithms and data analysis, we can develop models that
accurately predict the power production of wind turbines. This enables us to
optimize the power generation process and overcome the challenges associated
with its ingrained variability.
1. [What You'll Need](#what-youll-need)
1. [Procedure](#procedure)
1. [How it Works](#how-it-works)
1. [References](#references)

## What You'll Need

To complete the tutorial follow the steps below:
For this tutorial, ensure you have:

- Access to an HPE Ezmeral Unified Analytics cluster.

## Procedure

To complete this tutorial follow the steps below:

1. Login to your Ezmeral Unified Analytics (EzUA) cluster, using your credentials.
1. Create a new Notebook server using the `jupyter-data-science` image. Request at least `4Gi` of memory for the
Notebook server.
1. Connect to the Notebook server and clone the repository locally.
1. Navigate to the tutorial's directory (`ezua-tutorials/demos/wind-turbine`).
1. Launch a new terminal window and create a new conda environment using the specified `environment.yaml` file:

```bash
conda env create -f environment.yaml
```

1. Add the new conda environment as an ipykernel:

```bash
python -m ipykernel install --user --name=wind-turbine
```

1. Refresh your browser tab to access the updated environment.
1. Launch the `wind-turbine.ipynb` notebook file and follow the instructions. Make sure to select the `wind-turbine`
environment kernel.

1. Login to you EzAF cluster.
1. Create a new notebook server using the `jupyter-data-science` image.
1. Clone the repository locally.
1. Launch the `wind-turbine.ipynb` notebook file and follow the instructions.
## How it Works

> It is recommended to create a Notebook server with more than 3Gi of memory
In this tutorial, you use Livy and Sparkmagic to remotely execute Python code in a Spark cluster. Livy is an open-source
REST service that enables remote and interactive analytics on Apache Spark clusters. It provides a way to interact with
Spark clusters programmatically using a REST API, allowing you to submit Spark jobs, run interactive queries, and manage
Spark sessions.

> If you created your notebook server using the `jupyter-data-science` image
> you should be good to go. However, you can always create a separate conda
> environment for this tutorial using the provided `environment.yaml` file.
> To create a separate enviroment run `conda env create -f environment.yaml`.
To communicate with Livy and manage your sessions you use Sparkmagic, an open-source tool that provides a Jupyter kernel
extension. Sparkmagic integrates with Livy, to provide the underlying communication layer between the Jupyter kernel and
the Spark cluster.

## References

1. [Spark: Unified engine for large-scale data analytics](https://spark.apache.org/)
1. [Livy: A REST Service for Apache Spark](https://livy.apache.org/)
1. [Sparkmagic: Jupyter magics and kernels for working with remote Spark clusters](https://github.com/jupyter-incubator/sparkmagic)
1. [Wind Turbine Scada Dataset](https://www.kaggle.com/datasets/berkerisen/wind-turbine-scada-dataset/data)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading