Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename db.name to db.namespace and generalize it #911

Merged
merged 17 commits into from
Apr 24, 2024

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Apr 8, 2024

Changes

Deprecates db.name and db.redis.database_index in favor of db.collection.namespace
Removes db.mssql.instance_name (includes it into db.collection.namespace).

Open question:
where does the schema belong?

  • within name: namespace = mydb, name = dbo.mytable , span name is SELECT mydb.dbo.mytable
    • If schema is not set in the query, then we won't have it (namespace = mydb and name = mytable and span name is SELECT mydb.mytable
    • But it's a best practice to qualify table name with schema
  • within namespace: namespace = mydb.dbo, name = dbo.mytable, span name is SELECT mydb.dbo.dbo.mytable
    • if schema is provided as a part of table name (best practice), we have duplication

The schema (at least sometimes) requires actual database call: postgres jdbc driver, SQL Server JDBC driver.
It's not a connection property per se, but a database server configuration, i.e. can be modified elsewhere and can't be cached on the client.

So in this PR schema stays inside table name (if users provided it in the query) and is not captured inside the database name.

Merge requirement checklist

@lmolkova lmolkova requested review from a team April 8, 2024 23:21
docs/database/hbase.md Outdated Show resolved Hide resolved
docs/database/hbase.md Outdated Show resolved Hide resolved
docs/database/redis.md Outdated Show resolved Hide resolved
@Oberon00
Copy link
Member

Oberon00 commented Apr 9, 2024

Was there any discussion before this PR in some SIG meeting or similar? Is there any issue associated with this PR?

@lmolkova
Copy link
Contributor Author

lmolkova commented Apr 9, 2024

Was there any discussion before this PR in some SIG meeting or similar? Is there any issue associated with this PR?

yes, this is the design approach discussed within DB semconv WG: https://docs.google.com/document/d/1zTi_Z4WBisytPnXTpwtnGOK-4Gv5oSTuMHvIQ8--5hg

It partially addresses #749, clarifies parts of #718, #714

@trask
Copy link
Member

trask commented Apr 10, 2024

if schema is provided as a part of table name (best practice), we have duplication

wdyt of just SELECT {db.collection.name} for the span name? I'm not sure we need db.collection.namespace in the span name, sort of like HTTP spans include the route but not the server name. you can always group span names further by db.collection.namespace if that's desired

@trask
Copy link
Member

trask commented Apr 10, 2024

where does the schema belong?

For SQL at least:

  • db.collection.name will be whatever the user has in the query (may be partially or fully namespaced, or not namespaced at all)
  • so the question is whether to capture schema in db.collection.namespace or not

@lmolkova lmolkova force-pushed the db-collection-namespace branch from a68803e to a8e64c3 Compare April 11, 2024 00:33
@gregkalapos
Copy link
Member

gregkalapos commented Apr 12, 2024

As discussed in the last WG call, looked into what we can collect in .NET.

The DbConnection ADO interface has following relevant methods:

  • DbConnection.DataSource: name of the database server
  • DbConnection.Database: name of the database
  • DbConnection.GetSchema(string): returns a DataTable which contains metadata info about the objects in the db including schema and collection. Here is a super useful article on this.

I ran following code snippet with different DBs and libraries:

static void DisplayConnectionData(DbConnection connection)
{
    Console.WriteLine($"DataSource: {connection.DataSource}, Database: {connection.Database}");
    Console.WriteLine("============================");

    DataTable table = connection.GetSchema("Tables");
    foreach (System.Data.DataRow row in table.Rows)
    {
        foreach (System.Data.DataColumn col in table.Columns)
        {
            Console.WriteLine("{0} = {1}", col.ColumnName, row[col]);
        }
        Console.WriteLine("============================");
    }
}

Above code queries the Tables schema collection (so it only returns infos about tables). Here is a list about other collections (e.g. Procedures could be used for stored procedures).

MS SQL Server with System.Data.SqlClient

Output:

DataSource: 127.0.0.1,50312, Database: master
============================
TABLE_CATALOG = master
TABLE_SCHEMA = dbo
TABLE_NAME = spt_fallback_db
TABLE_TYPE = BASE TABLE
============================
....similar info for all tables...

MySQL with MySqlConnector

Output:

DataSource: 127.0.0.1, Database: test
============================
TABLE_CATALOG = def
TABLE_SCHEMA = information_schema
TABLE_NAME = CHARACTER_SETS
TABLE_TYPE = SYSTEM VIEW
ENGINE =
VERSION = 10
ROW_FORMAT =
TABLE_ROWS = 0
AVG_ROW_LENGTH = 0
DATA_LENGTH = 0
MAX_DATA_LENGTH = 0
INDEX_LENGTH = 0
DATA_FREE = 0
AUTO_INCREMENT =
CREATE_TIME = 12.04.2024 14:18:11
UPDATE_TIME =
CHECK_TIME =
TABLE_COLLATION =
CHECKSUM =
CREATE_OPTIONS =
TABLE_COMMENT =
============================
....similar info for all tables...

PostgreSql with Npgsql

Output:

DataSource: tcp://127.0.0.1:54251, Database: postgres
============================
table_catalog = postgres
table_schema = public
table_name = foo
table_type = BASE TABLE
============================

So, overall it seems, at least in ADO.NET we can get the schema. Issue I see with this is that this seems to query the database - not only that, but connection.GetSchema("Tables") returns the schema info for all tables, which can be very high overhead. Plus it's always a point in time info - e.g. if a table is created after the instrumentation calls connection.GetSchema("Tables"), then connection.GetSchema("Tables") needs to be called again to get the info for that table.

model/trace/database.yaml Outdated Show resolved Hide resolved
model/registry/db.yaml Outdated Show resolved Hide resolved
model/registry/db.yaml Outdated Show resolved Hide resolved
Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

docs/database/database-spans.md Outdated Show resolved Hide resolved
@trask trask changed the title Rename db.name to db.collection.namespace Rename db.name to db.namespace Apr 18, 2024
@trask trask changed the title Rename db.name to db.namespace Rename and generalize db.name to db.namespace Apr 18, 2024
@trask trask changed the title Rename and generalize db.name to db.namespace Rename db.name to db.namespace and generalize it Apr 18, 2024
model/trace/database.yaml Outdated Show resolved Hide resolved
@lmolkova lmolkova force-pushed the db-collection-namespace branch 2 times, most recently from 207d213 to a308a49 Compare April 19, 2024 18:49
@lmolkova lmolkova force-pushed the db-collection-namespace branch 2 times, most recently from 4cd93aa to 55697b5 Compare April 23, 2024 21:28
@lmolkova lmolkova force-pushed the db-collection-namespace branch from 5a87e3f to 15a0841 Compare April 24, 2024 17:03
@lmolkova lmolkova merged commit 3afe287 into open-telemetry:main Apr 24, 2024
12 checks passed

Semantic conventions for individual database systems SHOULD document what `db.namespace`
means in the context of that system.
examples: [ 'customers', 'test.users' ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sound like table (collection) names rather than namespace (database) names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

9 participants