Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When someone should use user.name/user.id ? #1172

Open
lmolkova opened this issue Jun 20, 2024 · 6 comments
Open

When someone should use user.name/user.id ? #1172

lmolkova opened this issue Jun 20, 2024 · 6 comments
Labels
area:security area:user question Further information is requested

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Jun 20, 2024

Context

Related to #1142

We're considering adding user.id/user.name under db namespace and not sure which one (neither/both?) to use.
We'd like to stay consistent with root user namespace to leverage embedding mechanism (once it's implemented).

When connecting to the database/messaging system/cloud service/etc it's common to talk about the identity rather than user:

E.g. Azure has several kinds of identities (human, and multiple kinds of machine identities) AWS also has multiple - users, groups, roles.

Problem

The concept of user in the semconv seem to map to the human identity only. It does not seem to apply to a generic 'identity' used to access a resource.

Even within a 'human identity', it's not clear how to use it:

  • is name unique within the system?
  • if so, why there is an id? How are they different?
  • the login terminology does not seem to apply in many cases (authentication could be a broader term). I believe it's related to Add authentication sub-namespace to user #1146

So, I think we need to decide and document:

  • what does the user namespace describe? OS user? Human identity? Any identity?
  • what's the appropriate namespace based on this definition?
  • what are the right attributes to describe this thing?
@lmolkova lmolkova added question Further information is requested area:user labels Jun 20, 2024
@lmolkova
Copy link
Contributor Author

cc @open-telemetry/semconv-security-approvers

@lmolkova
Copy link
Contributor Author

Based on the SemConv SIG discussion 6/24

  • user.name is a login/human-readable name, e.g. root or user login, user.id is something else - e.g. an identifier the system uses internally.
  • user namespace is used for end-user (e.g. website user) or the OS user.
  • it does not seem to be able to describe generic identity. Also there are multiple identities that act at the same time (end-user, OS user service runs with, client identities used to access resources)

Action items:

  • @open-telemetry/semconv-security-approvers will discuss

@mjwolf
Copy link
Contributor

mjwolf commented Jun 27, 2024

I have some thoughts on what a user should be in semantic-conventions, and how it relates to identities.

Use Cases to Handle

A "user" (or multiple types of user resources) should handle these use cases.

  • Represent existing well-known usages of 'user' attributes.
    • many existing concepts/objects have "user" attributes, OTel should be able to generate telemetry with these attributes.
  • Have a trackable scope/chain for each representation of "user".
    • For example, the human user “John Smith” logs in to machine “laptop_123” with user account “jsmith”, connects to a kube pod “pod_abc” with user “root” inside the container, and runs a command. The log event for this should have the three different users and the context that each user exists within. There shouldn’t be conflicts that prevent writing all the info into the Otel event.
    • The Entities WG will work on a way to define relationships between entities. This could be used, if the users are entities.
  • Accurately handle an IAM system's "user". See below section
  • Handle OS user accounts.
    • OS user accounts have a different data model, scope, and functionality than IAM systems, so it might be best to handle them separately from IAM user.

What is an IAM user?

Within IAM systems, a "user" is a type of identity. There are other types of identities such as Role, service account, or user group.

IAM users or managed user accounts are objects within the IAM system. Federated users are users that have existing, external user identities that are connected to the IAM system.

There are some differences in how users are implemented in different IAM systems. In AWS IAM, there is not a traditional "machine user". Instead, roles are typically attached to machine resources. It is possible to create a user that will be used by a workload. In GCP, there are service accounts, which act as machine users.

There are also Customer Identity and Access Management (CIAM) systems, such as AWS Incognito. I'm not sure if there's any difference with CIAM that would impact telemetry.

A user is not a role, or the credentials that identity the user, and care should be taken to not confuse them.

Two concepts of 'user'

A 'user' can be considered two different concepts within OTel; user objects and attributes on non-user objects.

User Objects

Within IAM providers, "user" is a type of identity.

User objects would probably be best implemented as OTel entities rather than resources since they usually have mutable attributes.

Attributes on non-user objects.

There are many existing concepts that are not themselves users, but that have user attributes. Some examples are git commits and OS files/processes. These are not users, but they do have existing user attributes. Git commits have a user name, email, and signing key. Posix files have a user ID and name. A JWT is a token that can carry information on a user, but it's not a user itself.

With embedded attributes, it should be possible to add attributes from the user attribute registry into resources/entities that have the existing concepts of user attributes.

Right now the attribute registry descriptions can be rigid, and might not fit the existing usage in different concepts. For example, git user.name is the full name, while Linux file user.name is the short account name. So that's something that needs be be considered/handled.

Questions

Generic or IAM-specific resources?

The different cloud IAM providers have different implementations but generally follow the same high-level concepts. Should users/identity resources be generic, with a known mapping for each IAM provider, or separate resources for each IAM provider, and non-IAM users?

e.g. Is there a single "human user" type OR "AWS IAM user", "GCP managed user account", "systemd service user", etc types.

Users in the different IAM systems have different attributes, so I think they might need to be separate resources.

What attributes to add to the user registry?

Should we have all attributes for user, or only a select "most-important" set? Active Directory has about 60 attributes for human users, and since there are many other identity providers, the set could get very large if we try to have all attributes for all IAM providers. It might be better to include the IAM ID/reference, so other attributes can be retrieved from it directly, as well as a smaller set of the most-used or "interesting" attributes.

What are OS users accounts?

OS user accounts often represent human users, but they can also be services or machine accounts. Should user accounts be classified as human or machine users, or does it matter at all?

OS user accounts and IAM system users might be different enough that it wouldn't make sense to try to unify them. Maybe OS user and IAM user should have different concepts in OTel.

Should "Identity" be worked on first?

In IAM systems, "user" is just one part of the data model. It might be better to design the larger "identity" data model, rather than do user first, and try to work identity around it later.

@joaopgrassi
Copy link
Member

@trisch-me I tried to find but couldn't - should we create a project board for @open-telemetry/semconv-security-approvers and the project in general? I don't see it mentioned in here as well https://github.com/open-telemetry/community/pull/1838/files. Can you please follow up on this, and once in a project we can remove the triage label.

@trisch-me
Copy link
Contributor

@joaopgrassi done, thanks for the hint

@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 30, 2024

Adding relevant comment from @mjwolf #1104 (comment)

I think user.id needs to be kept for the OS user use case. User ID is a well-defined concept, without any other qualifiers.
For example, from the POSIX specification for getpwuid: https://pubs.opengroup.org/onlinepubs/9699919799/functions/getpwuid.html, this refers to "user id"/uid, many times without any further qualification on user.

For a more concrete example of a security use case, Falco alerts can have a field user.uid, defined as just "user ID". I think it would make sense to map this to user.id in the registry, there's no qualifier or other namespace that would really make sense.

User.id indeed makes a perfect sense for the OS user, but it hard's to apply to other areas. E.g. in case of web/client app, there are multiple identifiers.

One solution could be to introduce user.os.id, os_user.id, etc that would be very specific to the OS and would not need to be reused (with detailed explanation) in browser apps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:security area:user question Further information is requested
Projects
Status: Todo
Development

No branches or pull requests

5 participants