Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use cls-hooked instead of continuation-local-storage #2

Conversation

josieusa
Copy link
Collaborator

@josieusa josieusa commented Jul 29, 2016

Hello,
I just wanted to let you know that, in this branch of mine, npm run testpasses.
So, this PR is meant as a starting point for you.
Of course there are still two unsolved problems at least:

  • I don't think that this branch should be merged as it is, because cls-hooked (and so this branch) requires a more recent version of Node than strongloop/loopback-context;
  • I'm going to do some deeper testing in the following days, with an example Loopback app, and I'll let you know.

PS
npm run test passes using loopback v2, too.

Thank you for your attention.

Contents of this PR:
Use cls-hooked instead of continuation-local-storage,
as suggested by @raymondfeng
in strongloop/loopback-context PR #1 comment #235931961
#1 (comment)

as suggested by @raymondfeng
in strongloop/loopback-context PR strongloop#1 comment #235931961
@raymondfeng
Copy link
Member

@josieusa Thank you for the patch. One possibility is to make it conditional based on the minimum Node.js version.

@josieusa
Copy link
Collaborator Author

@raymondfeng I was just going to commit the conditional logic you are talking about. I only need to do more testing.

@slnode
Copy link

slnode commented Jul 29, 2016

Can one of the admins verify this patch?

@bajtos
Copy link
Member

bajtos commented Aug 1, 2016

@josieusa thank you for the patch, it's great to see community contribution so quickly after creating this new module! 👏

I am afraid the conditional require will not work well, because npm install cls-hooked on Node v0.10 fails:

$ npm install cls-hooked
npm ERR! Darwin 14.5.0
npm ERR! argv "/Users/bajtos/.nvm/v0.10.44/bin/node" "/Users/bajtos/.nvm/v0.10.44/bin/npm" "install" "cls-hooked"
npm ERR! node v0.10.44
npm ERR! npm  v2.15.0
npm ERR! code ETARGET

npm ERR! notarget No compatible version found: cls-hooked@'latest'
npm ERR! notarget No valid targets found.
npm ERR! notarget Perhaps not compatible with your version of node?
npm ERR! notarget This is most likely not a problem with npm itself.
npm ERR! notarget In most cases you or one of your dependencies are requesting
npm ERR! notarget a package version that doesn't exist.

npm ERR! Please include the following file with any support request:
npm ERR!     /private/tmp/npm-debug.log

I guess we could go with an optional dependency?

Alternatively, we can have [email protected] that uses node-continuation-local-storage and [email protected] that uses cls-hooked. The biggest downside of this approach is that it will be difficult to make breaking changes in the branch supporting older Node versions (1.x), because the next major version (2.0) will be already taken.

@raymondfeng thoughts?

@bajtos bajtos self-assigned this Aug 1, 2016
@josieusa
Copy link
Collaborator Author

josieusa commented Aug 1, 2016

@bajtos
One use-case for optionalDependencies is when a package fails to install.
https://docs.npmjs.com/files/package.json#optionaldependencies
The downside is that it's your code's responsibility to handle the lack of the dependency.
So you could install the old continuation-local-storage as a dependency, and cls-hookedas an optional dependency. cls would then be dynamically required by your code, only if the optional cls-hooked dependency is not met, as a fallback mechanism.

Regarding the lost context issues with continuation-local-storage, I made some more manual tests. These issues are hard to reproduce, so manual tests are not very reliable.

I made an example Loopback v3 app which tries to reproduce the bug (i.e. it uses getCurrentContext() inside process.nextTick() callback). These are my findings:

  1. This branch of mine works fine with this example app, i.e. loopbackContext.getCurrentContext() works as expected. Unfortunately, this doesn't mean that continuation-local-storage issue is solved (see also point 2.), because the latest Loopback v3 (which uses the old continuation-local-storage module) works fine with my example app, too.
  2. Sadly, I tested my branch with a real app (whose code I can't share) which was affected by the issue, and my branch didn't solve the lost context issue (i.e. loopbackContext.getCurrentContext() still returns null).

So, manual tests suggest that this branch of mine doesn't solve the lost context issue, at least for v3(I could figure out a way to test with v2), but I think that cls-hooked is an improvement over continuation-local-storage anyway, because it uses the brand-new AsyncWrapper.

So, before continuing to contribuite, I'm going to try to figure out why the two apps behave differently in my manual tests, i.e. what are the differences between the two apps (if any, because it could be just cls's fault).

@raymondfeng
Copy link
Member

@bajtos @josieusa As you have brainstormed, there are a few options to make cls-hooked conditional.

  1. List cls-hooked as an optional dependency and have runtime to require cls or cls-hooked conditionally based on the node version range.
  2. Submit a patch to cls-hooked so that it won't fail on older version of node but report a runtime error if it's used without Node 6.x and above.

@josieusa
Copy link
Collaborator Author

josieusa commented Aug 2, 2016

Good news. I figured out the differences between my two test apps, and all my manual tests are passing now.
So, from these tests it looks like cls-hooked(branch cls-hooked2) could solve the lost context problems, at least in some situations.
This is promising.
Regarding the remaining work to be done, I will go for option 1. since it's quicker and I can't wait.
I will submit a new, working PR. This PR is just a POC and it doesn't work.

@slnode
Copy link

slnode commented Aug 2, 2016

Can one of the admins verify this patch? To accept patch and trigger a build add comment ".ok\W+to\W+test."

@bajtos
Copy link
Member

bajtos commented Aug 2, 2016

@josieusa I will submit a new, working PR. This PR is just a POC and it doesn't work.

Please don't open a new pull request and keep your changes in this pull request (your feature branch). That way the history of our discussion is preserved in a single place.

As suggested by @raymondfeng
in strongloop/loopback-context PR strongloop#2 comment #236644728
@davidcheung
Copy link

Leaving a note here, @marlonkjoseph had a snippet that reproduces continuation-local-storage loosing active context
strongloop/loopback#2541 (comment)

@josieusa
Copy link
Collaborator Author

josieusa commented Aug 4, 2016

@davidcheung

@marlonkjoseph had a snippet that reproduces continuation-local-storage loosing active context

Thanks, I will use it to write a test that fails with the official branch but not this branch of mine.

@josieusa
Copy link
Collaborator Author

josieusa commented Aug 5, 2016

Hello,
good news! I managed to reproduce in a Mocha test the main issue (i.e. getCurrentContext() returning null) more reliably than before.
Though the way I made the test fail on purpose (as in TDD), is different than I thought.
TL;DR One more commit and then I'll suggest to review this PR for merge.
In fact, having reproduced it allows me to make proper Test-Driven Development from now on, i.e. I'm following the best practices and doing my best.
But, before committing, I'd like to ask: is my solving strategy aligned with that of the mantainers?
One premise: I believe that the main issue is often, if not always, due to the following bug, since async is required by loopback-connector-mongodb, loopback-datasouce-juggler etc.:
othiym23/node-continuation-local-storage#55
The short-term solution, according to the link above, is to change everywhere the order in which async and continuation-local-storageare required, because functions like setImmediate, nextTick etc. are monkey-patched at runtime by cls.
Some proposed workarounds, like upgrading to async v2, downgrading Loopback components and so on, could be OK (I don't know), but could also be misleading and even harmful if (and only if) they change the order of dependencies for some reason (i.e. some overlooked new release of sub-dependencies), solving the issue just by accident and not reliably.
So my strategy, beside committing the Mocha test, is to warn the developer in the deprecation warning that the order of the aforementioned require() calls matters (the link above makes us sure about that, and has been mentioned in the past, e.g. strongloop/loopback#878 (comment)).
I'll also add another warning if my code detects that async was require()d too early. I don't know if I'll manage to detect it, it's going to take time, and this is why I wanted to know if our strategies are aligned.
(I'd also mention that async is not the only module affected by the require() order issue; and that other modules have a cls-friendly fork which should be used as a drop-in replacement of the regular module; of course other modules having these types of issues should be detected by my code in the same way, and a warning should be provided accordingly).
The long-term solution IMHO, which I already implemented, is to use cls-hooked. I'm saying this because @trevnorris said that AsyncWrap will properly support nextTick etc. and IMHO this could allow us to not care about the order of dependencies.
nodejs/node#666 (comment)
PS
To track the support of nextTick etc. see
nodejs/diagnostics#29

@josieusa
Copy link
Collaborator Author

josieusa commented Aug 5, 2016

I couldn't manage to detect when async (or any other cls-unfriendly module) is required, but I just managed to detect when some module monkey-patches nextTickor setTimeoutetc., which is what cls-hooked does as soon as it is required. So I use that event as a good enough estimate of when cls-hookedis required (there aren't many modules around that patch those functions).
This is enough to show a runtime warning if cls-hooked is required too late (unfortunately not at boot time, but it's better than nothing), which is a symptom of likely future problems, as I wrote before. Therefore it prevents headaches to Loopback users in some significant situations. Here is how it works.
My code requires a small utility module of mine I'm writing, which MUST be required before anything else. This module just monkey-patches nextTickand the javascript timer functions in order to do these things:

  • Detect if cls-hooked was required too late, and warn if needed.
  • Transparently use the versions of nextTick, setTimeout etc. patched by cls-hooked, if and when they become available (I mean, even after boot), and not the native ones, as an attempt to prevent runtime bugs due to dependency load order, that would otherwise happen without my utility.

I believe it's an improvement, I only need to test it.

azatoth pushed a commit to azatoth/loopback-context that referenced this pull request Aug 16, 2016
As suggested by @raymondfeng
in strongloop/loopback-context PR strongloop#2 comment #236644728
@josieusa
Copy link
Collaborator Author

I continued experimenting, and I just had some success in detecting if some of the "cls-unfriendly" modules are imported, and also if the necessary CLS shims for those modules (for example cls-mysqlin the case of mysqlmodule) are imported too. This is very good news, since it motivates me to continue to work on this. In fact, this is the piece that was missing to detect potential bugs sooner, while performing automated tests, which is fine enough. The only real downside is that a bit of manual work is needed in order to keep the list of "cls-unfriendly" modules up to date, but I think it's completely acceptable.
Unfortunately I don't know how to prevent these bugs in the first place, nor to detect these bugs statically. I mean that you would need to perform tests anyway. But for me it's enough.
The not so good news is that my tests are not ready yet, so I'm unable to prove that the whole things works yet. But I feel optimistic.
More commits coming soon.

@josieusa josieusa changed the title [DO-NOT-MERGE][POC] Use cls-hooked instead of continuation-local-storage [WIP] Use cls-hooked instead of continuation-local-storage Aug 23, 2016
@josieusa
Copy link
Collaborator Author

josieusa commented Sep 5, 2016

Great news! Just pushed! The approach I described is implemented for the most part, and is clear from looking at the tests now. I summarize again:

  1. The application passes a callback to loopback-context in this branch (this step is not done yet! done)
  2. loopback-context imports another package of mine, which detects potential context issues at runtime, using the knowledge I have about the lost context issues (which is partial knowledge from many Loopback issues on Github). The package of mine is not ready for production yet.
  3. loopback-context warns the application by calling the callback (not done yet done)
  4. The application itself (not my branch!) should adopt countermeasures that are described in issue loopback#1495 and other issues.

Let me know if someone wishes to give a look at it. For example, launch the tests by running npm test. Thank you
EDIT done steps 1) and 3)

@bajtos
Copy link
Member

bajtos commented Sep 6, 2016

@josieusa Hey, I am glad you are making progress in your work here. We (the LoopBack team) don't have applications where to test your new version, and to be honest, CLS-based current-context is not a priority for us. Take as much time as you need to get this pull request into state where you consider it production-ready. We'll leave this work in the hands of you and any other community members interested in CLS-based current-context. Just let us know when you get to the state where you will consider the implementation production-ready.

@josieusa
Copy link
Collaborator Author

I closed this in favor of PR #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants