Skip to content

Commit

Permalink
Re-propagate changes to bare-metal.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mdlinville committed Jan 14, 2025
1 parent 737ce65 commit c3f4e80
Showing 1 changed file with 7 additions and 65 deletions.
72 changes: 7 additions & 65 deletions content/guides/hosting/hosting-options/self-managed/bare-metal.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,68 +19,6 @@ Reach out to the W&B Sales Team for related question: [[email protected]](mailto

Before you start deploying W&B, refer to the [reference architecture](./ref-arch.md#infrastructure-requirements), especially the infrastructure requirements.

{{% alert %}}
W&B strongly recommends to deploy W&B Server into a Kubernetes cluster using the W&B Kubernetes Operator. Deploying to a Kubernetes cluster with the operator ensures that you can use all the existing and latest W&B features.
{{% /alert %}}

{{% alert color="secondary" %}}
W&B application performance depends on scalable data stores that your operations team must configure and manage. The team must provide a MySQL 8 database cluster and an AWS S3 compatible object store for the application to scale properly.
{{% /alert %}}

### Application server

W&B recommends deploying W&B Server into its own namespace and a two availability zone node group with the following specifications to provide the best performance, reliability, and availability:

| Specification | Value |
|----------------------------|-----------------------------------|
| Bandwidth | Dual 10 Gigabit+ Ethernet Network |
| Root Disk Bandwidth (Mbps) | 4,750+ |
| Root Disk Provision (GB) | 100+ |
| Core Count | 4 |
| Memory (GiB) | 8 |

This ensures that W&B Server has sufficient disk space to process the application data and store temporary logs before they are externalized.



It also ensures fast and reliable data transfer, the necessary processing power and memory for smooth operation, and that W&B will not be affected by any noisy neighbors.

It is important to keep in mind that these specifications are minimum requirements, and actual resource needs may vary depending on the specific usage and workload of the W&B application. Monitoring the resource usage and performance of the application is critical to ensure that it operates optimally and to make adjustments as necessary.

### Database server

W&B recommends a [MySQL 8](#mysql-database) database as a metadata store. The shape of the model parameters and related metadata impact the performance of the database. The database size grows as the ML practitioners track more training runs, and incurs read heavy load when queries are executed in run tables, users workspaces, and reports.

To ensure optimal performance W&B recommends deploying the W&B database on to a server with the following starting specs:

| Specification | Value |
|--------------------------- |-----------------------------------|
| Bandwidth | Dual 10 Gigabit+ Ethernet Network |
| Root Disk Bandwidth (Mbps) | 4,750+ |
| Root Disk Provision (GB) | 1000+ |
| Core Count | 4 |
| Memory (GiB) | 32 |

Again, W&B recommends monitoring the resource usage and performance of the database to ensure that it operates optimally and to make adjustments as necessary.

Additionally, W&B recommends the following [parameter overrides](#mysql-database) to tune the DB for MySQL 8.

### Object storage

W&B is compatible with an object storage that supports S3 API interface, Signed URLs and CORS. W&B recommends specifying the storage array to the current needs of your practitioners and to capacity plan on a regular cadence.

More details on object store configuration can be found in the [how-to section](../self-managed/bare-metal.md#object-store).

Some tested and working providers:
- [MinIO](https://min.io/)
- [Ceph](https://ceph.io/)
- [NetApp](https://www.netapp.com/)
- [Pure Storage](https://www.purestorage.com/)

#### Secure Storage Connector

The [Secure Storage Connector](../data-security/secure-storage-connector.md) is not available for teams at this time for bare metal deployments.

## MySQL database

{{% alert color="secondary" %}}
Expand Down Expand Up @@ -116,6 +54,11 @@ CREATE DATABASE wandb_local CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
GRANT ALL ON wandb_local.* TO 'wandb_local'@'%' WITH GRANT OPTION;
```

{{% alert %}}
This works only if the SSL certificate is trusted. W&B does not support self-signed certificates.
{{% /alert %}}


### Parameter group configuration

Ensure that the following parameter groups are set to tune the database performance:
Expand Down Expand Up @@ -160,7 +103,7 @@ s3://$ACCESS_KEY:$SECRET_KEY@$HOST/$BUCKET_NAME?tls=true
```

{{% alert color="secondary" %}}
This will only work if the SSL certificate is trusted. W&B does not support self-signed certificates.
This works only if the SSL certificate is trusted. W&B does not support self-signed certificates.
{{% /alert %}}

Set `BUCKET_QUEUE` to `internal://` if you use third-party object stores. This tells the W&B server to manage all object notifications internally instead of depending on an external SQS queue or equivalent.
Expand Down Expand Up @@ -189,7 +132,6 @@ mc mb --region=us-east1 local/local-files

The recommended installation method is with the official W&B Helm chart. Follow [this section](../operator.md#deploy-wb-with-helm-cli) to deploy the W&B Server application.


### OpenShift

W&B supports operating from within an [OpenShift Kubernetes cluster](https://www.redhat.com/en/technologies/cloud-computing/openshift).
Expand Down Expand Up @@ -311,7 +253,7 @@ wandb login --host=https://YOUR_DNS_DOMAIN
wandb verify
```

Check log files to view any errors the W&B Server hits at startup. Run the following commands:
Check log files to view any errors the W&B Server hits at startup. Run the following commands:

{{< tabpane text=true >}}
{{% tab header="Docker" value="docker" %}}
Expand Down

0 comments on commit c3f4e80

Please sign in to comment.