Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Doctrine auto-instrumentation #300

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

DominicDetta
Copy link

No description provided.

@DominicDetta DominicDetta requested a review from a team as a code owner October 3, 2024 07:05
Copy link

linux-foundation-easycla bot commented Oct 3, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link

codecov bot commented Oct 3, 2024

Codecov Report

Attention: Patch coverage is 96.39640% with 4 lines in your changes missing coverage. Please review.

Project coverage is 79.51%. Comparing base (c06f8c6) to head (32d028b).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...mentation/Doctrine/src/DoctrineInstrumentation.php 96.39% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##               main     #300      +/-   ##
============================================
- Coverage     80.35%   79.51%   -0.84%     
+ Complexity     1025      626     -399     
============================================
  Files            98       68      -30     
  Lines          4112     2738    -1374     
============================================
- Hits           3304     2177    -1127     
+ Misses          808      561     -247     
Flag Coverage Δ
Aws 85.51% <ø> (-0.24%) ⬇️
Context/Swoole ?
Instrumentation/CakePHP 20.00% <ø> (ø)
Instrumentation/CodeIgniter 73.94% <ø> (ø)
Instrumentation/Doctrine 96.39% <96.39%> (?)
Instrumentation/ExtAmqp 89.58% <ø> (ø)
Instrumentation/Guzzle 69.73% <ø> (ø)
Instrumentation/HttpAsyncClient 81.33% <ø> (ø)
Instrumentation/IO 70.90% <ø> (ø)
Instrumentation/MongoDB 77.33% <ø> (ø)
Instrumentation/OpenAIPHP 86.82% <ø> (ø)
Instrumentation/PDO 89.56% <ø> (ø)
Instrumentation/Psr14 78.12% <ø> (ø)
Instrumentation/Psr15 93.50% <ø> (ø)
Instrumentation/Psr16 97.50% <ø> (ø)
Instrumentation/Psr18 82.08% <ø> (ø)
Instrumentation/Psr3 ?
Instrumentation/Psr6 97.61% <ø> (ø)
Instrumentation/Slim 86.95% <ø> (ø)
Instrumentation/Symfony 89.07% <ø> (+0.03%) ⬆️
Instrumentation/Yii 77.77% <ø> (ø)
Logs/Monolog ?
Propagation/ServerTiming 100.00% <ø> (ø)
Propagation/TraceResponse 100.00% <ø> (ø)
ResourceDetectors/Container 93.02% <ø> (ø)
Sampler/RuleBased 32.14% <ø> (ø)
Shims/OpenTracing 92.99% <ø> (ø)
Symfony ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...mentation/Doctrine/src/DoctrineInstrumentation.php 96.39% <96.39%> (ø)

... and 33 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c06f8c6...32d028b. Read the comment docs.

@brettmc
Copy link
Collaborator

brettmc commented Oct 3, 2024

@DominicDetta one more thing to add is an entry to the top-level .gitsplit.yaml. When this is merged, that'll be used to split the package into a read-only repository which we can point packagist at.
You'll also need to add an entry to .github/workflows/php.yml so that the tests will run.

@DominicDetta
Copy link
Author

@brettmc I included the Doctrine directory in the files you specified

@DominicDetta
Copy link
Author

I agreed the EasyCLA and waiting for the merge approval

@brettmc
Copy link
Collaborator

brettmc commented Oct 4, 2024

Can you also update workflows/php to exclude 7.4 from the version matrix? The php version requirement is ^8.2, but I think it would work back to 8.0 since it doesn't hook internal functions. Either way, we need to either drop the requirement back to 8.0 or exclude pre-8.2 versions from the test matrix.

@DominicDetta
Copy link
Author

it's done

@DominicDetta
Copy link
Author

In the end I integrated again the PHP 7.4 version and excluded Doctrine from the pre-8.2 versions tests

@brettmc
Copy link
Collaborator

brettmc commented Oct 4, 2024

Green build is an excellent start. I'm happy to push this out as a 0.0.1 release. Since this works against doctrine 3 (according to the composer.json), and doesn't hook any internal/extension functions, I think it would probably work in 8.0 and 8.1 as well. But, that's not a blocker to getting this out if you're happy with the PR as it is.

@DominicDetta
Copy link
Author

I'm pretty confident it works even with PHP 8.0 version.
I can confirm that the classes \Doctrine\DBAL\Driver\Connection and \Doctrine\DBAL\Driver exist even in the next major versions of doctrine. (I'm using doctrine 3 at the moment). I'm glad I was able to contribute.

Copy link
Contributor

@Nevay Nevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooking driver methods will cause duplicated spans if a middleware like Symfony Debug Middleware is used.
An alternative approach to avoid this problem would be to move the implementation into a middleware and use a hook to inject this middleware into every connection; see also Nevay/otel-instrumentation-doctrine-dbal.

$builder
->setAttribute(TraceAttributes::SERVER_ADDRESS, $params[0]['host'] ?? 'unknown')
->setAttribute(TraceAttributes::SERVER_PORT, $params[0]['port'] ?? 'unknown')
->setAttribute(TraceAttributes::DB_SYSTEM, $params[0]['driver'] ?? 'unknown')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should try to follow semconv even if we don't specify a schema url; see semconv:

db.system has the following list of well-known values. If one of them applies, then the respective value MUST be used

Copy link
Author

@DominicDetta DominicDetta Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I didnt know there was a convention. I will apply the correct ones.

@DominicDetta
Copy link
Author

DominicDetta commented Oct 7, 2024

@brettmc @Nevay

Hooking driver methods will cause duplicated spans if a middleware like Symfony Debug Middleware is used. An alternative approach to avoid this problem would be to move the implementation into a middleware and use a hook to inject this middleware into every connection; see also Nevay/otel-instrumentation-doctrine-dbal.

The duplicate problem is also a common problem with other auto-instrumentations. In fact, we could not use psr-15 as it generated too many spans. I wonder if I really need to fix this problem or is there a way to filter the results.
Does OpenTelemetry have a feature to configure and filter duplicated spans?

@dkarlovi
Copy link

dkarlovi commented Oct 9, 2024

Does OpenTelemetry have a feature to configure and filter duplicated spans?

AFAIK the idea is to allow the agent to do that:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/filterprocessor/README.md

Copy link
Contributor

@cedricziel cedricziel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My proposal is to figure out if this instrumentation should depend on the existence of PDO and then define how the two interact. My understanding is that if a PDO is used and the PDO instrumentation is active, there will be 2 nested CLIENT spans for the same interaction. - IMHO this is severely impacting the user experience.

pre: static function (\Doctrine\DBAL\Driver $driver, array $params, string $class, string $function, ?string $filename, ?int $lineno) use ($instrumentation) {
/** @psalm-suppress ArgumentTypeCoercion */
$builder = self::makeBuilder($instrumentation, 'Doctrine\DBAL\Driver::connect', $function, $class, $filename, $lineno)
->setSpanKind(SpanKind::KIND_CLIENT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we think about the duality of PDO and dbal here? Most instrumented methods have a pendant in the PDO instrumentation i.e. - a compatible pdo driver will cause two client spans and the hierarchy will probably be

my_func (internal)
 -> connect (client) # from dbal
   -> PDO::__construct (client ) # from pdo

IMO the two should be either mutually exclusive or enrich each other. WDYT?

Copy link
Collaborator

@brettmc brettmc Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have a way for two auto-instrumentations to cooperate like this (it's an idea that has come up before, and would be cool). So I think mutually exclusive is the best we have - or at least it's up to the user to install only one instrumentation, I don't think we need to go as far as setting them up as conflicting in composer.
Is documenting that recommendation good enough for now?

edit: a hacky approach would be to have this package check if PDO instrumentation is installed and enabled, and only apply some of its hooks (if it provides any value that PDO itself doesn't)...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the PDO instrumentation re-write the doctrine span from CLIENT to internal or bail if it recognizes the parent is already client?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To your point - i think documenting it makes the experience worse because users then need to make a whole lot of decisions when instrumenting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think documenting it makes the experience worse

I only meant documenting such as "use pdo or doctrine auto-instrumentation, but probably not both" - I expect users to make decisions about which auto-instrumentations to install based on their workload/stack...just installing everything available is probably too noisy.

Could the PDO instrumentation re-write the doctrine span from CLIENT to internal or bail if it recognizes the parent is already client?

I think the issue here would be that pre/post hooks operate in isolation, so there's currently no way for a post hook to know whether the pre hook created a span or just modified the active span. All the implementations we have assume that pre created and activated a span, and post just closes whichever is the active span at the time.
We have previously brainstormed whether we could manage some state between a pre and post hook, but we thought it was going to be a hard problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect users to make decisions about which auto-instrumentations to install based on their workload/stack...just installing everything available is probably too noisy.

I completely agree on this matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the PDO instrumentation ... bail if it recognizes the parent is already client?

Bailing out requires storing additional data within the Context to be able to detect duplicate instrumentations, see Javas span suppression strategies:
Instrumentation span suppression behavior
SpanSuppressionStrategy.java
SpanSuppressors.java

AFAIK the idea is to allow the agent to do that

Dropping at the agent is too late as it may lead to broken traces "Dropping a span may lead to orphaned spans if the dropped span is a parent." ref.

there's currently no way for a post hook to know whether the pre hook created a span or just modified the active span

Scopes attached via Context::storage()->attach() implement ArrayAccess to allow propagating state from pre hook to post hook ref.

hook(
    null,
    'example',
    static function() use ($tracer): void {
        $context = Context::getCurrent();
        if (lcg_value() > .5 /* some suppression condition */) {
            $span = $tracer
                ->spanBuilder('example')
                ->startSpan();
            $context = $span->storeInContext($context);
        }
        $scope = Context::storage()->attach($context);
        $scope[SpanInterface::class] = $span ?? null;
    },
    static function(): void {
        if (!$scope = Context::storage()->scope()) {
            return;
        }
        $scope->detach();
        /** @var SpanInterface|null $span */
        $span = $scope[SpanInterface::class] ?? null;
        $span?->end();
    },
);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brettmc Can you post-process the spans and just remove / enrich the duplicated ones?

e.g. Is there something that runs before the exporter?

'query',
pre: static function (\Doctrine\DBAL\Driver\Connection $connection, array $params, string $class, string $function, ?string $filename, ?int $lineno) use ($instrumentation) {
/** @psalm-suppress ArgumentTypeCoercion */
$builder = self::makeBuilder($instrumentation, 'Doctrine\DBAL\Driver\Connection::query', $function, $class, $filename, $lineno)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommended span name here would be something like SELECT my_table.

The duplication with the pdo instrumentation is the same here.

https://opentelemetry.io/docs/specs/semconv/database/database-spans/#name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Do you know a good agnostic SQL parser to achieve this? The one I found is focusing on Mysql syntax.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doctrine already knows the table name(s).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doctrine already knows the table name(s).

With the approach I'm following right now, I dont have this information. Are you referring to Doctrine ORM? I'm creating hooks on the methods of \Doctrine\DBAL\Driver and \Doctrine\DBAL\Driver\Connection

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's a good point.

The issue with this approach is whichever parser you choose is bound to be quite slow and will probably be slowest part of the call (which includes the actual query), making it unusable for production use. ORM itself has a DQL query parser (which is close enough for high level comparison) and it's basically unusable in prod without it being cached with ORM query cache.

Running a random query parser on every DB query is unlikely to be viable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will use a simple regex expression to identify the db.operation.name and target from the query text

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we could consider the ORM instrumentation even if it's still not here yet.

For example, since ORM will know what the table names are, you could assume they will be put in context by the ORM instrumentation and then only try to detect it if not there? This would future proof this implementation and allow extensions when they happen.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response but I'm pretty busy at the moment.

IMO we could consider the ORM instrumentation even if it's still not here yet.

For example, since ORM will know what the table names are, you could assume they will be put in context by the ORM instrumentation and then only try to detect it if not there? This would future proof this implementation and allow extensions when they happen.

I see your point but my focus is to implement hooks for Doctrine DBAL and in the future time permitted we could refactor the functions to behave as you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants