Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HWORKS-284] Documentation for exporting logs #142

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions docs/admin/monitoring/export-logs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Exporting Hopsworks logs

## Introduction
Hopsworks collects services and applications logs to [Logstash](https://www.elastic.co/logstash/) which then forwards them to OpenSearch for indexing.
Often organizations already have logging systems in place so streaming Hopsworks logs is necessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so exporting HOpsworks logs is necessary?


## Prerequisites
To configure Logstash streaming logs outside of Hopsworks you will need SSH access to the cluster (Logstash node). Also, depending on the target system you might
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To configure Logstash to stream logs.

need authentication tokens or opening firewall rules.

## Export logs
Logstash is a well established log collection service with many output [plugins](https://www.elastic.co/guide/en/logstash/7.17/output-plugins.html) available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logstash is not a log collection service, it's just a processing pipeline.


Documentation of individual plugins is beyond the scope of this tutorial. In this guide we will give general instructions and also cover the basic but powerful `http` plugin.

Logstash process logs in *pipelines* where each pipeline is responsible for a logical group of logs. In Hopsworks we have multiple pipelines and their configuration files are under `/srv/hops/logstash/config`

### Export services logs
To stream various services' logs outside of Hopsworks you will need to **create another pipeline** similar to `services`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the services pipeline.


#### Step 1
Copy `/srv/hops/logstash/config/services.conf` to `/srv/hops/logstash/config/services_http.conf`

Change the pipeline *input address* to:
```treetop
input {
pipeline {
address => services_http
}
}
```

!!! note
Take a note of the pipeline address as we will use it in Step 2

At the end of the file is the `output` section which currently forwards them to OpenSearch. Replace the output section with a sample block such as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forwards the logs to OpenSearch


```treetop
output {
http {
format => "json_batch"
headers => ["x-api-key", "API_KEY"]
http_compression => false
http_method => "post"
url => "https://localhost/logs"
follow_redirects => false
}
}
```

#### Step 2
The next step is to configure Logstash to use the new pipeline.

Open `/srv/hops/logstash/config/pipelines.yml`

Add the new pipeline in the pipeline definitions
```yaml
- pipeline.id: services_http
path.config: "/srv/hops/logstash/config/services_http.conf"
pipeline.batch.delay: 2000
pipeline.batch.size: 50
```

**Instruct** the services pipeline to push logs also in the newly created pipeline by appending to `services-intake` for example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also to the newly created pipeline


```yaml
- pipeline.id: services-intake
config.string: |
input { beats { port => 5053 } }
output { pipeline { send_to => ["services","services_http"] } }
```

#### Step 3
Final step is to restart Logstash with `sudo systemctl restart logstash`

Logstash logs can be found in `/srv/hops/logstash/log/logstash-plain.log`


### Export Spark logs
To stream applications' logs to another system the Steps are fairly similar to exporting services logs but need some additional configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower s in Steps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also specify Spark applications logs so users don't confuse with Hopsworks application.


#### Step 1
Copy `/srv/hops/logstash/config/spark-streaming.conf` to `/srv/hops/logstash/config/spark-streaming_http.conf`

Change the **input** section to:

```treetop
input {
pipeline {
address => spark_http
}
}
```

Also, add the **output** block such as:

```treetop
output {
http {
format => "json_batch"
headers => ["x-api-key", "API_KEY"]
http_compression => false
http_method => "post"
url => "https://localhost/logs"
follow_redirects => false
}
}
```

#### Step 2
Edit `/srv/hops/logstash/config/spark-streaming.conf` and **change** the input to:

```treetop
input {
pipeline {
address => spark
}
}
```

#### Step 3
Now you need to change `/srv/hops/logstash/config/pipelines.yml` and **add** the following pipeline definitions:

```yaml
- pipeline.id: spark-intake
config.string: |
input { beats { port => 5044 } }
output { pipeline { send_to => ["spark", "spark_http"] } }
- pipeline.id: spark_http
path.config: "/srv/hops/logstash/config/spark-streaming_http.conf"
```

#### Step 4
Finally you should restart Logstash `sudo systemctl restart logstash`

## Conclusion
It is not easy to write a guide for a task that can be achieved in many different ways but in this guide we gave solid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users don't care that it's not easy. Just refer them to the different plugins configuration to understand how they need to configure their new pipeline to send data wherever they need to.

instructions for streaming Logstash pipelines to external resources. We did not dive into much detail for a specific output
method as there are many plugins and official documentation is complete, but we mainly focused on Hopsworks related configuration.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ nav:
- Monitoring:
- Services Dashboards: admin/monitoring/grafana.md
- Services Logs: admin/monitoring/services-logs.md
- Export Logs: admin/monitoring/export-logs.md
- Authentication:
- Configure Authentication: admin/auth.md
- Configure OAuth2:
Expand Down