Zephyr

A project from Data to AI Lab at MIT.

Zephyr

A machine learning library for assisting in the generation of machine learning problems for wind farms operations data by analyzing past occurrences of events.

Important Links
💻 Website	Check out the Sintel Website for more information about the project.
📖 Documentation	Quickstarts, User and Development Guides, and API Reference.
⭐ Tutorials	Checkout our notebooks
Repository	The link to the Github Repository of this library.
📜 License	The repository is published under the MIT License.
⌨️ Development Status	This software is in its Pre-Alpha stage.
Community	Join our Slack Workspace for announcements and discussions.

Homepage: https://github.com/signals-dev/zephyr

Overview

The Zephyr library is a framework designed to assist in the generation of machine learning problems for wind farms operations data by analyzing past occurrences of events.

The main features of Zephyr are:

EntitySet creation: tools designed to represent wind farm data and the relationship between different tables. We have functions to create EntitySets for datasets with PI data and datasets using SCADA data.
Labeling Functions: a collection of functions, as well as tools to create custom versions of them, ready to be used to analyze past operations data in the search for occurrences of specific types of events in the past.
Prediction Engineering: a flexible framework designed to apply labeling functions on wind turbine operations data in a number of different ways to create labels for custom Machine Learning problems.
Feature Engineering: a guide to using Featuretools to apply automated feature engineerinig to wind farm data.

Install

Requirements

Zephyr has been developed and runs on Python 3.6 and 3.7.

Also, although it is not strictly required, the usage of a virtualenv is highly recommended in order to avoid interfering with other software installed in the system where you are trying to run Zephyr.

Download and Install

Zephyr can be installed locally using pip with the following command:

pip install zephyr-ml

If you want to install from source or contribute to the project please read the Contributing Guide.

Quickstart

In this short tutorial we will guide you through a series of steps that will help you getting started with Zephyr.

1. Loading the data

The first step we will be to use preprocessed data to create an EntitySet. Depending on the type of data, we will either the zephyr_ml.create_pidata_entityset or zephyr_ml.create_scada_entityset functions.

NOTE: if you cloned the Zephyr repository, you will find some demo data inside the notebooks/data folder which has been preprocessed to fit the create_entityset data requirements.

import os
import pandas as pd
from zephyr_ml import create_scada_entityset

data_path = 'notebooks/data'

data = {
  'turbines': pd.read_csv(os.path.join(data_path, 'turbines.csv')),
  'alarms': pd.read_csv(os.path.join(data_path, 'alarms.csv')),
  'work_orders': pd.read_csv(os.path.join(data_path, 'work_orders.csv')),
  'stoppages': pd.read_csv(os.path.join(data_path, 'stoppages.csv')),
  'notifications': pd.read_csv(os.path.join(data_path, 'notifications.csv')),
  'scada': pd.read_csv(os.path.join(data_path, 'scada.csv'))
}

scada_es = create_scada_entityset(data)

This will load the turbine, alarms, stoppages, work order, notifications, and SCADA data, and return it as an EntitySet.

Entityset: SCADA data
  DataFrames:
    turbines [Rows: 1, Columns: 10]
    alarms [Rows: 2, Columns: 9]
    work_orders [Rows: 2, Columns: 20]
    stoppages [Rows: 2, Columns: 16]
    notifications [Rows: 2, Columns: 15]
    scada [Rows: 2, Columns: 5]
  Relationships:
    alarms.COD_ELEMENT -> turbines.COD_ELEMENT
    stoppages.COD_ELEMENT -> turbines.COD_ELEMENT
    work_orders.COD_ELEMENT -> turbines.COD_ELEMENT
    scada.COD_ELEMENT -> turbines.COD_ELEMENT
    notifications.COD_ORDER -> work_orders.COD_ORDER

2. Selecting a Labeling Function

The second step will be to choose an adequate Labeling Function.

We can see the list of available labeling functions using the zephyr_ml.labeling.get_labeling_functions function.

from zephyr_ml import labeling

labeling.get_labeling_functions()

This will return us a dictionary with the name and a short description of each available function.

{'brake_pad_presence': 'Calculates the total power loss over the data slice.',
 'converter_replacement_presence': 'Calculates the converter replacement presence.',
 'total_power_loss': 'Calculates the total power loss over the data slice.'}

In this case, we will choose the total_power_loss function, which calculates the total amount of power lost over a slice of time.

3. Generate Target Times

Once we have loaded the data and the Labeling Function, we are ready to start using the zephyr_ml.generate_labels function to generate a Target Times table.

from zephyr_ml import DataLabeler

data_labeler = DataLabeler(labeling.labeling_functions.total_power_loss)
target_times, metadata = data_labeler.generate_label_times(scada_es)

This will return us a compose.LabelTimes containing the three columns required to start working on a Machine Learning problem: the turbine ID (COD_ELEMENT), the cutoff time (time) and the label.

   COD_ELEMENT       time    label
0            0 2022-01-01  45801.0

4. Feature Engineering

Using EntitySets and LabelTimes allows us to easily use Featuretools for automatic feature generation.

import featuretools as ft

feature_matrix, features = ft.dfs(
    entityset=scada_es,
    target_dataframe_name='turbines',
    cutoff_time_in_index=True,
    cutoff_time=target_times,
    max_features=20
)

Then we get a list of features and the computed feature_matrix.

                       TURBINE_PI_ID TURBINE_LOCAL_ID TURBINE_SAP_COD DES_CORE_ELEMENT      SITE DES_CORE_PLANT  ... MODE(alarms.COD_STATUS) MODE(alarms.DES_NAME)  MODE(alarms.DES_TITLE)  NUM_UNIQUE(alarms.COD_ALARM)  NUM_UNIQUE(alarms.COD_ALARM_INT)    label
COD_ELEMENT time                                                                                                 ...                                                                                                                                               
0           2022-01-01          TA00               A0          LOC000              T00  LOCATION            LOC  ...                  Alarm1                Alarm1  Description of alarm 1                             1                                 1  45801.0

[1 rows x 21 columns]

5. Modeling

Once we have the feature matrix, we can train a model using the Zephyr interface where you can train, infer, and evaluate a pipeline. First, we need to prepare our dataset for training by creating X and y variables and one-hot encoding features.

y = list(feature_matrix.pop('label'))
X = pd.get_dummies(feature_matrix).values

In this example, we will use an 'xgb' regression pipeline to predict total power loss.

from zephyr_ml import Zephyr

pipeline_name = 'xgb_regressor'

zephyr = Zephyr(pipeline_name)

To train the pipeline, we simply use the fit function.

zephyr.fit(X, y)

After it finished training, we can make prediciton using predict

y_pred =  zephyr.predict(X)

We can also use zephyr.evaluate to obtain the performance of the pipeline.

What's Next?

If you want to continue learning about Zephyr and all its features please have a look at the tutorials found inside the notebooks folder.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github		.github
docs		docs
notebooks		notebooks
tests		tests
zephyr_ml		zephyr_ml
.gitignore		.gitignore
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.md		HISTORY.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
tasks.py		tasks.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zephyr

Overview

Install

Requirements

Download and Install

Quickstart

1. Loading the data

2. Selecting a Labeling Function

3. Generate Target Times

4. Feature Engineering

5. Modeling

What's Next?

About

Releases 4

Packages

Contributors 3

Languages

License

sintel-dev/Zephyr

Folders and files

Latest commit

History

Repository files navigation

Zephyr

Overview

Install

Requirements

Download and Install

Quickstart

1. Loading the data

2. Selecting a Labeling Function

3. Generate Target Times

4. Feature Engineering

5. Modeling

What's Next?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 3

Languages

Packages