Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-32734 Add ElasticSearch metric sink configuration #19247

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kenrowland
Copy link
Contributor

@kenrowland kenrowland commented Oct 29, 2024

Added configuration options for the ElasticSearch sink Updated metrics readme with information on adding the sink to the cluster Added example ElasticSearch configuraion yaml file

Signed-Off-By: Kenneth Rowland [email protected]

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Added configuration options for the ElasticSearch sink
Updated metrics readme with information on adding the sink to the cluster
Added example ElasticSearch configuraion yaml file

Signed-Off-By: Kenneth Rowland [email protected]
Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32734

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

Copy link
Member

@rpastrana rpastrana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kenrowland looks good, but I do have some comments for you to ponder. They're all meant to be constructive, let me know if should clarify any points. Thanks.

data types to be stored in the index in their native types. Without dynamic mapping, the ElasticSearch
default mapping does not properly map value to unsigned 64-bit integers.

To create an index with dynamic mapping, use the following object when creating the index:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this write up seems very useful.
Is the goal still to handle the creation of the indices programmatically in later versions of the sink?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently no plans to create the index from the sink. If that was something we added, there are several considerations that must be taken into account. These are

  • If the cluster is using the same index for all metric reporting, creating and configuring the index must be coordinated somehow. If each component creates their own, there still needs to be coordination.
  • The sink would also need to add mapping templates in order to map values to the correct type. This could still use wildcard dynamic templates, or could be specific to the metrics registered for each metric framework instance
  • If the sink creates the index, we need to worry about life cycle management.

I felt like just telling the sink what index to use allows the policy of the ElasticSearch instance to remain there instead of pushing it down to the sink.

If we decide to push that responsibility to the sink, we can add it as an improvement addressing the concerns from above

<Environment>
<Software>
<metrics name="mymetricsconfig">
<sinks name="elastic" type="elastic">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial but changing the name to 'myelasticsink' would drive to point further and would follow the convention from the metrics name attribute above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I'll change that.

<Software>
<metrics name="mymetricsconfig">
<sinks name="elastic" type="elastic">
<settings period="30"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge of adding the configuration before the logic is higher potential for downstream restructure of configuration which can be very disruptive to end-users.

It's obviously hard to know what we don't know we'll need to support in later versions, but let's scrutinize each element of your config layout more than usual...
for instance, indexName, if we're planning on expanding the index creation, we might need a pattern rather than a name. with that in mind it might be worth supporting a structured element for the index, for now the only active attribute for the index element could be "name", later adding more attributes under the index element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index config changed accordingly to allow sub items.

# type - sink type (must be elastic for ElasticSearch support)
# name - name for the sink instance
# settings.elasticHost - url for the ElasticSearch instance
# settings.indexName - ElasticSearch index name where metrics are indexed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to illustrate what I meant above, this could be settings.index.name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable. Change made. It also would allow keeping the index configuration separate in case we decide to push more policy into the sink.

@@ -45,11 +45,26 @@ ElasticMetricSink::ElasticMetricSink(const char *name, const IPropertyTree *pSet
ignoreZeroMetrics = pSettingsTree->getPropBool("@ignoreZeroMetrics", true);
pSettingsTree->getProp("@elasticHost", elasticHost);
pSettingsTree->getProp("@indexName", indexName);

// Initialize standard suffixes
countMetricSuffix.append("_count");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an advantage to use append rather than set? (I'm obviously assuming these suffix strings are empty before line 50)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are empty. Use of append is based on code review comments from a previous PR from others on the team.

- type: elastic
name: elasticsink
settings:
elasticHost: http://localhost:9200,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps too picky, but "Host" implies the dns entry only, no protocol, port, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this I wanted to provide the full string to which the specific endpoint is appended. This makes it a bit easier than having to read multiple configuration values just to put them together internally. Any individual part can be changed in the string as easily as it could be in a separate config value.

However, thinking ahead, I can see where something like the port or other portion of the route could be a cluster wide parameter. For that reason, separating them into separate values makes sense.

Copy link
Contributor Author

@kenrowland kenrowland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pleas see comments

data types to be stored in the index in their native types. Without dynamic mapping, the ElasticSearch
default mapping does not properly map value to unsigned 64-bit integers.

To create an index with dynamic mapping, use the following object when creating the index:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently no plans to create the index from the sink. If that was something we added, there are several considerations that must be taken into account. These are

  • If the cluster is using the same index for all metric reporting, creating and configuring the index must be coordinated somehow. If each component creates their own, there still needs to be coordination.
  • The sink would also need to add mapping templates in order to map values to the correct type. This could still use wildcard dynamic templates, or could be specific to the metrics registered for each metric framework instance
  • If the sink creates the index, we need to worry about life cycle management.

I felt like just telling the sink what index to use allows the policy of the ElasticSearch instance to remain there instead of pushing it down to the sink.

If we decide to push that responsibility to the sink, we can add it as an improvement addressing the concerns from above

<Environment>
<Software>
<metrics name="mymetricsconfig">
<sinks name="elastic" type="elastic">
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I'll change that.

- type: elastic
name: elasticsink
settings:
elasticHost: http://localhost:9200,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this I wanted to provide the full string to which the specific endpoint is appended. This makes it a bit easier than having to read multiple configuration values just to put them together internally. Any individual part can be changed in the string as easily as it could be in a separate config value.

However, thinking ahead, I can see where something like the port or other portion of the route could be a cluster wide parameter. For that reason, separating them into separate values makes sense.

@@ -45,11 +45,26 @@ ElasticMetricSink::ElasticMetricSink(const char *name, const IPropertyTree *pSet
ignoreZeroMetrics = pSettingsTree->getPropBool("@ignoreZeroMetrics", true);
pSettingsTree->getProp("@elasticHost", elasticHost);
pSettingsTree->getProp("@indexName", indexName);

// Initialize standard suffixes
countMetricSuffix.append("_count");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are empty. Use of append is based on code review comments from a previous PR from others on the team.

# type - sink type (must be elastic for ElasticSearch support)
# name - name for the sink instance
# settings.elasticHost - url for the ElasticSearch instance
# settings.indexName - ElasticSearch index name where metrics are indexed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable. Change made. It also would allow keeping the index configuration separate in case we decide to push more policy into the sink.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants