Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OS Integrations complain about invalid schema, invalid index names #1148

Closed
mmehrten opened this issue Oct 20, 2023 · 5 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@mmehrten
Copy link

mmehrten commented Oct 20, 2023

What is the bug?
When using OpenSearch integrations, the UI will complain about invalid index names and schemas during setup even when using the same schemas that are provided in the source. For example, I used the index mappings for the CloudTrail integration here, but the UI complains about an invalid schema in the index. I also had to follow the SS4O index naming convention, but this was not documented anywhere and took trial and error to discover.

Example dataprepper pipeline: https://github.com/mmehrten/terraform/blob/main/dataprepper/docker/pipelines/pipeline.yaml

Example index templates: https://github.com/mmehrten/terraform/tree/main/dataprepper/docker/mappings/cloudtrail

Example Lambda to process CloudTrail data from CloudWatch and post to dataprepper: https://github.com/mmehrten/terraform/blob/main/dataprepper/cloudwatch_dataprepper.py

Maybe nitpicking, but I'll also call out that the provided schemas for CloudTrail don't really support CloudTrail data and will lead to ingest errors. responseElements, userIdentity, and requestParameters should all be object typed to allow more flexible schema mappings as you can see in the sample data.

What is the expected behavior?
CloudTrail events in SS4O format with any index name should be supported, and index validation should likely be more flexible.

What is your host/environment?
AWS Managed OpenSearch 2.9.

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
image

image

@mmehrten mmehrten added bug Something isn't working untriaged labels Oct 20, 2023
@derek-ho
Copy link
Collaborator

@Swiddis please take a look

@Swiddis
Copy link
Collaborator

Swiddis commented Oct 24, 2023

Hi, thanks for reaching out about this. I think there are a couple integrations that are dealing with similar issues right now, I haven't had time to look into it yet. I'll have time to look at this further in a week or so, I'll be sure to keep you posted.

Update: Looking into this now.

@Swiddis
Copy link
Collaborator

Swiddis commented Nov 9, 2023

So, it looks like the validation issue is because of a misunderstanding in how fields are added to the mapping, which affects most integrations. I'll have a PR out soon, in the meantime here's a reproducing test:

it('should correctly handle if the properties in a mapping mismatch with the mapping name', () => {
  const rootType = 'root';
  const dataSourceProps = {
    prop1: { type: 'string' },
    prop2: { type: 'number' },
  };
  const requiredMappings = {
    root: {
      template: {
        mappings: {
          properties: {
            prop1: { type: 'string' },
          },
        },
      },
    },
    child: {
      template: {
        mappings: {
          properties: {
            prop2: { type: 'number' },
          },
        },
      },
    },
  };

  const result = doPropertyValidation(rootType, dataSourceProps as any, requiredMappings);

  expect(result).toBe(true);
});

And another reproducer that demonstrates the false assumption:

it('should not assume child templates are included as nested props', () => {
  const rootType = 'root';
  const dataSourceProps = {
    prop1: { type: 'string' },
    child: { prop2: { type: 'number' } },
  };
  const requiredMappings = {
    root: {
      template: {
        mappings: {
          properties: {
            prop1: { type: 'string' },
          },
        },
      },
    },
    child: {
      template: {
        mappings: {
          properties: {
            prop2: { type: 'number' },
          },
        },
      },
    },
  };

  const result = doPropertyValidation(rootType, dataSourceProps as any, requiredMappings);

  expect(result).toBe(false);
});

@Swiddis
Copy link
Collaborator

Swiddis commented Nov 9, 2023

The bug will be fixed with #1234, though it may take a few days to hit managed service. Thanks again for reaching out!

Maybe nitpicking, but I'll also call out that the provided schemas for CloudTrail don't really support CloudTrail data and will lead to ingest errors. responseElements, userIdentity, and requestParameters should all be object typed to allow more flexible schema mappings as you can see in the sample data.

It looks like this fix was already merged to our schema repository but not correctly backported to the codebase here. I'll work on taking care of that as well.

Or, I may be misunderstanding. It looks like the fields you pointed to are already objects in our schema, can you help me understand what you mean by object-typed here?

@Swiddis
Copy link
Collaborator

Swiddis commented Nov 13, 2023

... though it may take a few days to hit managed service

I asked the rest of the team about it and it seems like the general flow is to require upgrades instead of backporting bugfixes. The bug isn't an issue in 2.11 since we no longer use this particular validation code (and allow adding non-naming-convention indices!), so the bug should be resolved code-wise. Still waiting for information on the schema mismatch before I mark the issue as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants