Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common: Upgrade yauaa to 7.x #678

Merged
merged 1 commit into from
Nov 8, 2022
Merged

Common: Upgrade yauaa to 7.x #678

merged 1 commit into from
Nov 8, 2022

Conversation

oguzhanunlu
Copy link
Member

@oguzhanunlu oguzhanunlu commented Aug 16, 2022

  • add yauaa_context/1-0-4 schema to commonFs2 test/iglu-client-embedded schemas
  • change com.snowplowanalytics.snowplow.enrich.common.fs2.enrichments.YauaaEnrichmentSpec to use new version

Copy link
Contributor

@istreeter istreeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the original issue (#639) is about more than simply bumping the yauaa library version. I believe the issue was also requesting us to make a code change to use a new feature of yauaa. See my comment in the issue.

@oguzhanunlu
Copy link
Member Author

oguzhanunlu commented Sep 19, 2022

definitely, I'll update this PR, thanks!

@istreeter istreeter force-pushed the bump-yauaa-7.x branch 5 times, most recently from a1cdb6e to f533bce Compare October 25, 2022 19:55
@@ -9,7 +9,7 @@
"vendorPrefixes": [ "com.snowplowanalytics" ],
"connection": {
"http": {
"uri": "http://iglucentral.com"
"uri": "http://iglucentral-dev.com.s3-website-us-east-1.amazonaws.com/feature/yauaa-1-0-4/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line will be removed when the iglu central PR is merged.

Copy link
Contributor

@dilyand dilyand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this comment, we're supposed to be filtering out a field. But I didn't see any changes to that effect.

In addition, I've added a few comments about things that can be improved. The main one being, that at present, if we have an explicit null or '' useragent, then we ignore all the client hint headers. I just want to ensure that this is indeed what we want to be doing.

@@ -40,7 +40,8 @@ object Dependencies {
val hikariCP = "5.0.1"
val jaywayJsonpath = "2.7.0"
val iabClient = "0.2.0"
val yauaa = "5.23"
val yauaa = "7.7.0"
val log4jToSlf4j = "2.18.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this required by the yauaa bump?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the related issue in yauaa.


/**
* Gets the map of attributes retrieved by YAUAA from the user agent.
* @return Map with all the fields extracted by YAUAA by parsing the user agent.
* If the input is null or empty, a map with just the DeviceClass set to Unknown is returned.
*/
def parseUserAgent(userAgent: String): Map[String, String] =
def parseUserAgent(userAgent: String, headers: List[String]): Map[String, String] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's annoying but it feels like this could be named better, since it now parses more than the useragent string. Perhaps something like extractUserAgent.

The userAgent arg is presumably coming from being explicitly added as a field in the event, rather than from a header. If it's null or empty, we are ignoring all headers. Is that what we want?

Alternatively, if the userAgent is coming from headers, then this function can maybe only take a single headers: List[String] argument?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The userAgent field can come from two places:

  • Extracted from the header
  • Explicitly set as a field in the event.

By the time we reach this enrichment, we have dropped information about where userAgent originally came from. This cannot be easily changed without quite a big re-write of the enrichment logic.

If the user agent is null or empty, then this request did not original from a browser. So it is fine to ignore client hint headers, because they are only relevant for web.

We cannot change the function to take a single headers field. Imagine the case of server-side tracking where the headers contains the user agent of the server, but the tp2 payload explicitly sets the user agent of the end client. In this case, we need to override the User-Agent header in the list. I put in a test for this case, so we don't get any accidental regressions if someone tries to change the code.

Renaming the function is a good idea. How about analyzeUserAgent -- afterall that's what the 3rd a in yauaa stands for.

}

/** Yauaa 7.x added many new fields which are not in the 1-0-4 schema */
private val validFields = Set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we get these fields from the properties of the schema? These hardcoded values will not be fun to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this too! If you think more widely, it's a bigger piece of functionality that is currently missing from Iglu libs:

Given some json (the output of yauaa enrichment) and given a schema (the yauaa 1-0-4) schema, cast the json so it is valid against the schema. In this case, drop fields and keep just the valid fields.

I did not implement it here for a few reasons:

  1. Yauaa enrichment would then have a dependency on Iglu client, to get the schema. It makes the code change quite a lot bigger, because it introduces a F[_] and we have to start handling failures to fetch the schema.
  2. It is a temporary problem that we cannot upgrade the yauaa schema to add extra fields. Once that problem is fixed then we don't need to maintain this list.

It's a balance. Weighing up 1 vs 2, and considering we have users waiting for this feature, I decided to implement it the easy way.

@@ -0,0 +1,233 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we have a duplicate of this schema here and in common-fs2/src/test. Can we not use a single file? And do we need it hardcoded once the iglu-central PR is merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the tip of something bigger. We have a lot of iglu central schemas hardcoded into Enrich's test resources. Several of them are duplicated between common and commonFs2.

Yes I could deduplicate those duplicates, but then it would look a bit odd that commonFs2 has some gaps in the long list of schemas that it maintains.

do we need it hardcoded once the iglu-central PR is merged?

I like how they are locally available when the tests run. For unit tests especially, it is good to avoid going to the network. But that doesn't mean they need to be hardcoded in Enrich!

One solution could be to download selected iglu central schemas before the tests run, and add them to a shared resources directory so they are available for all tests. It could be implemented as a sbt task, so that test depends on this new task. It would provide a better guarantee that our code is valid against the real Iglu Central.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be great! It exposes us to some rogue iglu central patch; but that seems less likely than us forgetting to update a schema somewhere in the hardcoded files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid confusion: do you want to do it now or is that an idea for the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea for the future, but I will create an issue now.

@istreeter
Copy link
Contributor

According to this comment, we're supposed to be filtering out a field. But I didn't see any changes to that effect.

We are filtering out the field. This is the relevant line.

@istreeter istreeter requested a review from dilyand November 1, 2022 12:40
Copy link
Contributor

@dilyand dilyand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@istreeter
Copy link
Contributor

Will merge as soon as the Iglu Central schema is released.

@istreeter istreeter merged commit 66dbeea into develop Nov 8, 2022
@istreeter istreeter deleted the bump-yauaa-7.x branch November 8, 2022 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants