-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HWORKS-284] Documentation for exporting logs #142
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
# Exporting Hopsworks logs | ||
|
||
## Introduction | ||
Hopsworks collects services and applications logs to [Logstash](https://www.elastic.co/logstash/) which then forwards them to OpenSearch for indexing. | ||
Often organizations already have logging systems in place so streaming Hopsworks logs is necessary. | ||
|
||
## Prerequisites | ||
To configure Logstash streaming logs outside of Hopsworks you will need SSH access to the cluster (Logstash node). Also, depending on the target system you might | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To configure Logstash to stream logs. |
||
need authentication tokens or opening firewall rules. | ||
|
||
## Export logs | ||
Logstash is a well established log collection service with many output [plugins](https://www.elastic.co/guide/en/logstash/7.17/output-plugins.html) available. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Logstash is not a log collection service, it's just a processing pipeline. |
||
|
||
Documentation of individual plugins is beyond the scope of this tutorial. In this guide we will give general instructions and also cover the basic but powerful `http` plugin. | ||
|
||
Logstash process logs in *pipelines* where each pipeline is responsible for a logical group of logs. In Hopsworks we have multiple pipelines and their configuration files are under `/srv/hops/logstash/config` | ||
|
||
### Export services logs | ||
To stream various services' logs outside of Hopsworks you will need to **create another pipeline** similar to `services`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. similar to the |
||
|
||
#### Step 1 | ||
Copy `/srv/hops/logstash/config/services.conf` to `/srv/hops/logstash/config/services_http.conf` | ||
|
||
Change the pipeline *input address* to: | ||
```treetop | ||
input { | ||
pipeline { | ||
address => services_http | ||
} | ||
} | ||
``` | ||
|
||
!!! note | ||
Take a note of the pipeline address as we will use it in Step 2 | ||
|
||
At the end of the file is the `output` section which currently forwards them to OpenSearch. Replace the output section with a sample block such as | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. forwards the logs to OpenSearch |
||
|
||
```treetop | ||
output { | ||
http { | ||
format => "json_batch" | ||
headers => ["x-api-key", "API_KEY"] | ||
http_compression => false | ||
http_method => "post" | ||
url => "https://localhost/logs" | ||
follow_redirects => false | ||
} | ||
} | ||
``` | ||
|
||
#### Step 2 | ||
The next step is to configure Logstash to use the new pipeline. | ||
|
||
Open `/srv/hops/logstash/config/pipelines.yml` | ||
|
||
Add the new pipeline in the pipeline definitions | ||
```yaml | ||
- pipeline.id: services_http | ||
path.config: "/srv/hops/logstash/config/services_http.conf" | ||
pipeline.batch.delay: 2000 | ||
pipeline.batch.size: 50 | ||
``` | ||
|
||
**Instruct** the services pipeline to push logs also in the newly created pipeline by appending to `services-intake` for example: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also to the newly created pipeline |
||
|
||
```yaml | ||
- pipeline.id: services-intake | ||
config.string: | | ||
input { beats { port => 5053 } } | ||
output { pipeline { send_to => ["services","services_http"] } } | ||
``` | ||
|
||
#### Step 3 | ||
Final step is to restart Logstash with `sudo systemctl restart logstash` | ||
|
||
Logstash logs can be found in `/srv/hops/logstash/log/logstash-plain.log` | ||
|
||
|
||
### Export Spark logs | ||
To stream applications' logs to another system the Steps are fairly similar to exporting services logs but need some additional configuration. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lower s in Steps. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also specify Spark applications logs so users don't confuse with Hopsworks application. |
||
|
||
#### Step 1 | ||
Copy `/srv/hops/logstash/config/spark-streaming.conf` to `/srv/hops/logstash/config/spark-streaming_http.conf` | ||
|
||
Change the **input** section to: | ||
|
||
```treetop | ||
input { | ||
pipeline { | ||
address => spark_http | ||
} | ||
} | ||
``` | ||
|
||
Also, add the **output** block such as: | ||
|
||
```treetop | ||
output { | ||
http { | ||
format => "json_batch" | ||
headers => ["x-api-key", "API_KEY"] | ||
http_compression => false | ||
http_method => "post" | ||
url => "https://localhost/logs" | ||
follow_redirects => false | ||
} | ||
} | ||
``` | ||
|
||
#### Step 2 | ||
Edit `/srv/hops/logstash/config/spark-streaming.conf` and **change** the input to: | ||
|
||
```treetop | ||
input { | ||
pipeline { | ||
address => spark | ||
} | ||
} | ||
``` | ||
|
||
#### Step 3 | ||
Now you need to change `/srv/hops/logstash/config/pipelines.yml` and **add** the following pipeline definitions: | ||
|
||
```yaml | ||
- pipeline.id: spark-intake | ||
config.string: | | ||
input { beats { port => 5044 } } | ||
output { pipeline { send_to => ["spark", "spark_http"] } } | ||
- pipeline.id: spark_http | ||
path.config: "/srv/hops/logstash/config/spark-streaming_http.conf" | ||
``` | ||
|
||
#### Step 4 | ||
Finally you should restart Logstash `sudo systemctl restart logstash` | ||
|
||
## Conclusion | ||
It is not easy to write a guide for a task that can be achieved in many different ways but in this guide we gave solid | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Users don't care that it's not easy. Just refer them to the different plugins configuration to understand how they need to configure their new pipeline to send data wherever they need to. |
||
instructions for streaming Logstash pipelines to external resources. We did not dive into much detail for a specific output | ||
method as there are many plugins and official documentation is complete, but we mainly focused on Hopsworks related configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so exporting HOpsworks logs is necessary
?