ClearML

In this repository, the clearML functionality was used to track experiments in ML.

A comparative machine learning task was performed to classify reviews from the toxic comments dataset (classification task).

Preparing the environment

for poetry: poetry add clearml

for mamba: mamba create -y -n clearml python=3.10.14 numpy pandas polars clearml scikit-learn matplotlib pytorch transformers hydra-core omegaconf ipykernel jupyter

Deployment clearML

using docker-compose: use docker-compose.yaml and run:

docker-compose up -d

then add credential in clearML server cope credentials and after in local machine start:

clearml-init

and then add credentials to clearml.conf:

using SaaS: use for this credentials from clerml from web and update or add new credentials in clearml.conf (docs)

To authorize on the SaaS platform, you need to get secrets in the settings of your personal profile, and then add them to the environment in any convenient way. Example for Windows Power Shell:

$env:CLEARML_WEB_HOST="https://app.clear.ml"
$env:CLEARML_API_HOST="https://api.clear.ml"
$env:CLEARML_FILES_HOST="https://files.clear.ml"
$env:CLEARML_API_ACCESS_KEY="XXXXXXXXXXXXXXXX"
$env:CLEARML_API_SECRET_KEY="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

Tracking experiments

To start working with ClearML initiate a new session:

    task_name=TASK_NAME, 
    output_uri=True)

This code create project and task in clearML server.

if you want to create new task use new task name.

Uploading the dataset to the ClearML cloud with parameters that will then be used in notebooks:

clearml-data create --project "project_name" --name "dataset_name"
clearml-data add --files datad/train.csv
clearml-data close

Download from cloud

task = Task.init(project_name=cfg.project.name, task_name=TASK_NAME, output_uri=True)

# initialize local version of dataset

dataset = Dataset.get(
    dataset_project=cfg.dataset.project, dataset_name=cfg.dataset.name
).get_local_copy()

task.set_progress(0)

uploading artifacts

task.upload_artifact(name="TfidfVectorizer", artifact_object=pickle.dumps(tfidf_vectorizer))

task.upload_artifact(
    name="train_features",
    artifact_object=(train_features, train["toxic"].to_numpy()),
)

task.upload_artifact(
    name="test_features",
    artifact_object=(test_features, test["toxic"].to_numpy()),
)

logging metrics

task.set_progress(95)

# Fix parameters of model
logger = task.get_logger()   

logger.report_single_value("Accuracy", report.pop("accuracy"))

for class_name, metrics in report.items():
    for metric, value in metrics.items():
        logger.report_single_value(f"{class_name}_{metric}", value)

Also you can upload plots, for example confusion_matrix

After all code runs you need to stop task

task.close()

Compare two experiments

In this task was compare two variants of embeddings

1)TF-IDF

This task was done in local machine using CPU.

2)BERT, and then fitting on them model of Logistic Regression

BERT model was fitted in Google Colab using ClearML Agent.docs

In clearML server we have 2 completed runs with artifacts, metrics and plots. ![](images/completed work.png)

Also we can compare logged metrics:

and plots:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
clearml_infrasructure		clearml_infrasructure
code		code
images		images
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClearML

Preparing the environment

Deployment clearML

Tracking experiments

Compare two experiments

About

Releases

Packages

Languages

ivangolt/clearml_server

Folders and files

Latest commit

History

Repository files navigation

ClearML

Preparing the environment

Deployment clearML

Tracking experiments

Compare two experiments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages