Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Issue 23974 - A ModuleInfo in a separate Windows DLL should not b… #15298

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

WalterBright
Copy link
Member

…e referred to by MIimportedModules

(Somebody put tabs in transitivevisitor.d, and it got fixed by my detabber.)

@dlang-bot
Copy link
Contributor

dlang-bot commented Jun 7, 2023

Thanks for your pull request, @WalterBright!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

⚠️⚠️⚠️ Warnings ⚠️⚠️⚠️

  • In preparation for migrating from Bugzilla to GitHub Issues, the issue reference syntax has changed. Please add the word "Bugzilla" to issue references. For example, Fix Bugzilla Issue 12345 or Fix Bugzilla 12345.(Reminder: the edit needs to be done in the Git commit message, not the GitHub pull request.)

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#15298"

@WalterBright
Copy link
Member Author

buildkite is failing all over the place with:

dmd: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by dmd)

which should not be triggered by this Windows change.

@kinke
Copy link
Contributor

kinke commented Jun 7, 2023

I agree with Rainer, no point in tackling this 'problem' (& this is working just fine with LDC).

@rikkimax
Copy link
Contributor

rikkimax commented Jun 7, 2023

I agree with Rainer, no point in tackling this 'problem' (& this is working just fine with LDC).

There is one key reason that this is a good idea, minus how it determines out of binary: PAYG + -better = no more ModuleInfo stubs. So it does help me personally, even if it replaces something that will one day be automatable.

So I'm all for it, except for the fact that I believe that the mechanism to determine if it's out of binary is going to lead to bad times for people.

@kinke
Copy link
Contributor

kinke commented Jun 7, 2023

I think this would currently totally break separately compiled DLLs - if each module is compiled separately and at least some of those contain export stuff which is then treated as to be 'imported' IIUC, no ModuleInfo dependencies to those modules would be tracked in the DLL, potentially leading to invalid module ctor order inside the DLL itself.

@rikkimax
Copy link
Contributor

rikkimax commented Jun 7, 2023

I think this would currently totally break separately compiled DLLs - if each module is compiled separately and at least some of those contain export stuff which is then treated as to be 'imported' IIUC, no ModuleInfo dependencies to those modules would be tracked in the DLL, potentially leading to invalid module ctor order inside the DLL itself.

Agreed. I tried to explain it in the ticket but didn't get anywhere. I recommended https://issues.dlang.org/show_bug.cgi?id=23850 instead which would be tooling friendly.

@WalterBright
Copy link
Member Author

WalterBright commented Jun 8, 2023

I think this would currently totally break separately compiled DLLs - if each module is compiled separately and at least some of those contain export stuff which is then treated as to be 'imported' IIUC, no ModuleInfo dependencies to those modules would be tracked in the DLL, potentially leading to invalid module ctor order inside the DLL itself.

Why would someone create a module that has:

export int func();

that is not intended to be an imported module? If the export symbol is a declaration, that means its being imported from a separate DLL. If the export symbol is a definition, it means this module must be exporting it.

I've been thinking about this for a while, and am not seeing the confusing case, unless the user is himself confused about what an import is and what an export is.

@rikkimax
Copy link
Contributor

rikkimax commented Jun 8, 2023

They won't unless they are doing bindings and trying to do naughty things, things we do in druntime & phobos. Keep in mind the override switches like visibility add export to everything, so it won't matter if it isn't actually annotated by the user.

But they do, do this:

module a;

export void athing() {
}
module b;
import a;

export void bthing() {
    athing;
}
dmd -of a.obj a/a.d
dmd -of b.obj b/b.d -Ia
dmd -of mydll.dll -shared a.obj b.obj anotherdll.lib -Ia -Ib -Ianotherdll c.d

Does this work today? Yes. With -betterC I have this working right now with dmd and dub and so do plenty of others that I have helped. Works with ldc without -betterC of course.

Oh and related linker warnings that we NEED to prevent (which this thought process eliminates the prevention of, down the road): https://learn.microsoft.com/en-us/cpp/error-messages/tool-errors/linker-tools-warning-lnk4286?view=msvc-170

@WalterBright
Copy link
Member Author

That example wouldn't break with this PR. Both a and b export their definitions, and mydll exports those definitions.

@rikkimax
Copy link
Contributor

rikkimax commented Jun 8, 2023

When you compile b, the declarations in a would be in dllimport mode as they are not being compiled thanks to -I.

Note that the sources for a would be the same thing you would find in the includes pointing at anotherdll. There is no difference between them.

@WalterBright
Copy link
Member Author

When you compile b, the declarations in a would be in dllimport mode as they are not being compiled thanks to -I.

I don't know what dllimport mode is. Is that part of Rainer's PR?

@rikkimax
Copy link
Contributor

rikkimax commented Jun 8, 2023

When you compile b, the declarations in a would be in dllimport mode as they are not being compiled thanks to -I.

I don't know what dllimport mode is. Is that part of Rainer's PR?

No.

I was hoping you were asking about the switch here, but this would explain a lot (but yes Rainer's PR adds the switch). If my explanation here isn't enough, either me or Martin need to have a chat with you about this because we need to be able to communicate this even to people who don't know what a linker is.

Let's start with the basics, a D symbol can map directly to multiple linker symbols. On most platforms, this will be one-to-one, rather than one to two (like on Windows).

A D symbol can be in one of three modes, internal (you're most familiar with this, it's a straightforward block of bytes with optional export out of object file, or extern declaration to another object file, to a name). DllExport which constructs (with the help of the linker) a secondary exported symbol, which is a pointer to an internal symbol. Finally DllImport, which is the inverse of DllExport, it dereferences the pointer to get access to an internal pointer and uses that instead at runtime.

You should only use DllImport when accessing a symbol outside of the binary (executable or shared library, not static library or object file). Doing this when it is not outside of the binary will elicit linker warnings like LNK4286. If you do not use DllImport and try to use internal mode instead, the linker will error with symbol not found.

The ability to put a symbol into DllImport, DllExport or internal modes already exists in the compiler. DllImport/DllExport is incomplete due to not mutating memory at runtime to dereference the pointers which is needed to make pointer arrays ext. work. Which is the problem with ModuleInfo dependency array.

What we need is a way to tune it per D symbol to pick which one is needed (hence my proposal for adding a version parameter to export). Because you can have both D functions/globals declared in one module and still have bindings to DllImport or internal symbols outside of the object file.

Off the top of my head there are four ways to put a D symbol into DllImport mode: (EDIT: I forgot no body)

  • export + extern
  • export + no body
  • export + -I
  • Is internal mode + --dllimport

There are only two ways to put a D symbol into DllExport mode:

  • export and is not passed in via -I
  • Is not passed in via -I and --fvisibility=public.

Internal is the default.

The above lists aren't exhaustive, I'm sure there are more rules and exceptions to them, but at the very least it's what I've personally encountered or had to deal with when debugging.

Dub passes to ldc --dllimport=all and --fvisiblity=public automatically when building a DLL. This gets an out-of-the-box "usable" solution. Even if you do get linker warnings, it's better than nothing. Long term we should not be passing these flags. They are a smell that things are not working right. They override user intention and prevent optimizations.

@WalterBright
Copy link
Member Author

Here's how it works today:

export int x;   // dllexport
export extern int x; // dllimport

It's not ambiguous.

@rikkimax
Copy link
Contributor

rikkimax commented Jun 9, 2023

I've just remembered a very obscure detail of how import libraries work. They have a generated function in them that dereferences a DllImport symbol to allow calling functions using the internal name instead of the DllImport name. I've confirmed it by rereading: http://blog.omega-prime.co.uk/2011/07/04/everything-you-never-wanted-to-know-about-dlls/

After much checking of dmd-fe, you're right in that there is no special behavior of -I. This is working due to the above reason.

LDC does not modify anything related to isImportedSymbol, so --dllimport=all and --fvisibility=public do not affect this.

So it looks like both me and @kinke are wrong about this PR breaking things. It's quite the opposite. Unless you use the .di generator with explicitly export annotated code or explicitly create bindings to D code, this code does nothing at all!

It would certainly be hackable to make work but would require adding code to every module that is importable from a DLL, but that is a significantly worse solution than ModuleInfo stubs that can be automated. If only there was a way to turn it on at the build manager level automatically...

@kinke
Copy link
Contributor

kinke commented Jun 9, 2023

AFAICT, it's still easy to break:

// a.d:
export extern int someSymbol; // dllimport from some C DLL, fwd-declared in this module
shared static this() {}

// b.d:
import a;
void foo() {}
shared static this() {}

And then linking both to a binary. The single dllimport declaration in module a suffices to treat the whole module as ending up in another binary, and to cut off the module dependencies path at that expected boundary. druntime module ctor init code then doesn't guarantee a's module ctor is run before b's.

While cutting off the dependency paths at actual binary boundaries might be safe and perhaps desirable (I guess under the assumption that the other DLL's modules will have been initialized already when loading that DLL earlier), it's not really required, and would just try to tackle one tiny symptom of the general missing relocation for dllimported data symbols. And that general problem is, AFAIU, tackled in Rainer's PR.

@rikkimax
Copy link
Contributor

rikkimax commented Jun 9, 2023

Yeah absolutely. I kinda ignored it due to thinking --dllimport would take over and most people wouldn't be writing it.

The main thing this PR solves is when ModuleInfo is missing from DLL. But shouldn't setting the right flag to nullify the reference work instead?

@WalterBright WalterBright self-assigned this Jun 16, 2023
@WalterBright
Copy link
Member Author

The single dllimport declaration in module a suffices to treat the whole module as ending up in another binary

Yes, which is why best practice would be to have a.di generated by the compiler from a.d in the makefile.

@WalterBright
Copy link
Member Author

For reference, @rainers proposal: #14849

@rikkimax
Copy link
Contributor

Rainer's PR works on a per-symbol basis, this PR is making assumptions based upon a single symbol which could very well be a binding to a system API's or some other external binary.

Mixed binary and user code, will cause this to do the wrong thing.

@WalterBright
Copy link
Member Author

@rikkimax @kinke @rainers I am of the (probably minority) opinion that DLLs in Windows are not meant to coexist with things like template functions, meaning one does not know what functions will be actually generated when writing the code. DLLs are clearly set up to support a fixed, predetermined set of export/import points. The "export it all, whether it is intended to be exported or not" has an uncomfortable feel to it. For COM files, the set of exported names is very small, interfaces are discovered using QueryInterface.

That said, I understand that many people want it to work like shared libraries in Linux. They don't want to sit down and define what the entry points should be.

This PR is designed to support the "I want to think about what the entry points are and have a fixed set of them". @rainers is designed to support the "export it all" paradigm.

But we can support both. This PR doesn't interfere with @rainers proposal.

@WalterBright
Copy link
Member Author

this PR is making assumptions based upon a single symbol which could very well be a binding to a system API's or some other external binary.

I understand that. You'll find exactly the same thing when doing Windows DLLs in C or C++. It's kinda what people are accustomed to with Windows DLL programming, which has little in common with Linux shared libraries. People from that background will find this straightforward, and the "export it all" uncomfortable. I am in that category. After all, I implemented dllimport and dllexport for Windows C and C++, and have done COM programming with it.

Also, it's how DLLs work for Windows ImportC.

@WalterBright
Copy link
Member Author

PS for a template heavy library like Phobos, I doubt trying to make it a Windows DLL makes much sense.

@rikkimax
Copy link
Contributor

I take it you haven't looked at the code that I've linked to you in the past.

I go out of my way to make sure that symbols go into the right binary by de-templating functions. So much so that I'm kinda desperate (long term) to see sum types with implicit construction for function calls.

I.e. https://github.com/Project-Sidero/basic_memory/blob/main/imports/sidero/base/text/unicode/builder_utf8.d#L3568

What I'm saying here is very much coming from a place of experience with using D and explicitly annotating export everywhere (which druntime and Phobos do not do).

These string types under imports, were the result of the .di generator failing for me. I've tried it. It did not work for me. This does.

Yesterday I spent the whole day debugging dmd built binaries trying to figure out why it was causing my code to segfault (with -betterc DLL + full D executable). Do you want to know what was wrong? Even if exported __initZ symbols ARE NULL. I still need to come up with a test case for this to report it, because Rainer's PR should not fix this.

@WalterBright
Copy link
Member Author

the result of the .di generator failing for me

I've already fixed a couple problems with this, I want to fix where it's failing for your code.

@rikkimax
Copy link
Contributor

The .di generator is not a priority when we have bugs that prevent using dmd for code that 100% should work.

Once I'm happy with -betterC support, then I'll look into the .di generator, we just aren't there yet.

Stuff that I actually need:

  • Unresolved RTInfoImpl symbols due to symbol declaration in -betterC shared library
  • Generated symbols not exported even if other symbols in encapsulation are
  • __initZ actually working
  • export should not be a visibility modifier, I have to comment out private/package so that everything will work which is horrible

Things I want:

  • ModuleInfo in -betterC modules not be a linker error
  • Error if a template tries to access a non-templated non-export symbol and is in an external binary

Note: that these two things I want are actually linked in that both of them can be solved with the external -I.

At this point in time, I cannot see how anyone before me has got shared libraries for non-OMF working with dmd and have it look like regular D code. Let alone the 100k+ LOC that I've got. There are just so many pitfalls that it really takes some serious work to deal with.

@WalterBright
Copy link
Member Author

The .di generator is not a priority when we have bugs that prevent using dmd for code that 100% should work

You mentioned it was causing problems for DLL generation, and fixing DLL generation problems is a priority. Also, .di problems tend to be easier to fix. I like easy to fix problems, because then I can make a lot of progress.

I cannot see how anyone before me has got shared libraries for non-OMF working with dmd and have it look like regular D code

I can't either, but as noted earlier, Windows DLLs are simply not like shared libraries at all. They are essentially separate executables with a set of hooks.

export should not be a visibility modifier

I see your point, but the idea with encapsulating functionality in a DLL is one shouldn't have access to private symbols in the DLL. Access should be through a class reference to some public function in the class. A DLL should have a relatively small number of exported symbols, and I would be so bold as to suggest that variables shouldn't be exported at all from a properly designed DLL interface. Of all the DLLs from Microsoft that I have worked with, I cannot recall a single instance of exporting a variable.

Error if a template tries to access a non-templated non-export symbol and is in an external binary

That should be covered by .di generation. External binaries should supply .di files to interface with them.

I understand that this requires more thinking, design, and implementation work on the part of the user. But it will result in a much more robust and clean DLL.

For your other points, I appreciate hearing your needs and want to fix them, but the discussion about them belongs elsewhere. I recommend bugzilla, perhaps bringing them up in the n.g. as well. Having all the issues for betterC tagged with the betterC keyword helps a lot. If any of your concerns are not already in bugzilla, please add them!

@rikkimax
Copy link
Contributor

rikkimax commented Jun 20, 2023

I cannot see how anyone before me has got shared libraries for non-OMF working with dmd and have it look like regular D code

I can't either, but as noted earlier, Windows DLLs are simply not like shared libraries at all. They are essentially separate executables with a set of hooks.

Indeed, they are unlike shared libraries, they can't have an entry point like they can with ELF.

https://unix.stackexchange.com/questions/223385/why-and-how-are-some-shared-libraries-runnable-as-though-they-are-executables

@rikkimax
Copy link
Contributor

I can see that you are trying to be highly principled about limiting problems with shared library support. I am happy to see that.

However, in practice, we as a community have a fairly large number of people who don't know what a linker is and as this is a subject matter that is highly linker-related it is very easy to cause people problems.

Here is a recap of my latest debug I did for someone which was within the last week:

  1. They didn't have a DllMain, keep in mind that this function is an entirely optional feature that nobody should be using today unless they are all in on Win32 programming. Worse yet there are two different mechanisms to replace this that could be handled entirely without the user's awareness. Even more horrible is that Microsoft outright recommends against using DllMain if at all possible: https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-best-practices
  2. It was crashing due to std.stdio throwing if a console wasn't attached.

There are workarounds for the second problem using DllMain. However, this shouldn't be required. I even reported this issue as an example of poor leadership in my gripes list as std.stdio has some serious limitations and this has been known for a long time.

The user decided after I did all that debug work to just use Nim instead because they had that already working... without any workarounds.

Any issues that me or Martin raise regarding usage, are very much based on the fact that we are the ones debugging people's builds. I am sure I won't be the only one who would be quite happy to offer you a bottle of something if you were to take on this particular responsibility moving forward ;)

@WalterBright
Copy link
Member Author

I appreciate that you are doing the ground level work to help people be successful with DLLs.

They didn't have a DllMain, keep in mind that this function is an entirely optional feature

DllMain runs the static construction of the D libraries

It was crashing due to std.stdio throwing if a console wasn't attached.

I can't find it: https://issues.dlang.org/buglist.cgi?quicksearch=dll%20stdio&list_id=245227

@rikkimax
Copy link
Contributor

They didn't have a DllMain, keep in mind that this function is an entirely optional feature

DllMain runs the static construction of the D libraries

That is the current druntime solution for this purpose yes.

But it is not the only way to do it. There are two different mechanisms that could replace it. https://issues.dlang.org/show_bug.cgi?id=23756

It was crashing due to std.stdio throwing if a console wasn't attached.

I can't find it: https://issues.dlang.org/buglist.cgi?quicksearch=dll%20stdio&list_id=245227

You shouldn't. It's not a bug of std.stdio, but a bug of phobos in terms of scoping.

https://dlang.org/phobos/std_stdio.html#.File.write

std.stdio should not be used for interacting with the console for multiple reasons.

@WalterBright
Copy link
Member Author

It was crashing due to std.stdio throwing if a console wasn't attached.
It's not a bug of std.stdio, but a bug of phobos in terms of scoping.
https://dlang.org/phobos/std_stdio.html#.File.write
std.stdio should not be used for interacting with the console for multiple reasons.

?? Could you explain this in a bugzilla issue, please?

@WalterBright
Copy link
Member Author

There are two different mechanisms that could replace it. https://issues.dlang.org/show_bug.cgi?id=23756

There's a PR for that.

@rikkimax
Copy link
Contributor

https://issues.dlang.org/show_bug.cgi?id=24001

There are a lot of tickets referenced there. But just to be clear, std.stdio is doing the right thing, we are simply misusing it.

@WalterBright
Copy link
Member Author

Thanks for putting that bugzilla together.

@rainers
Copy link
Member

rainers commented Jun 20, 2023

PS for a template heavy library like Phobos, I doubt trying to make it a Windows DLL makes much sense.

If you want regular D code (not C with D syntax) to be usable across DLL boundaries I consider a shared runtime a prerequisite. druntime is a must (you have to share the GC, threads, etc.), not having phobos as a DLL would be pretty disappointing and can cause issues by multiple instances of similar resources (e.g. file handles in std.stdio). Templates can be a problem, but the compiler will not expect more symbols in the DLL than what is also built into the static library, so I don't think exchanging one for the other is an issue.

FYI: With "export-all", druntime has about 14000 exported symbols, and phobos about 13000. That's in the same ballpark of some C++-DLLs, e.g. Qt5 has similar DLLs.

@rikkimax
Copy link
Contributor

Of all the DLLs from Microsoft that I have worked with, I cannot recall a single instance of exporting a variable.

I've just had a very bad thought about this. This will need to work with dmd and it's related to my report regarding RTInfo.

https://github.com/dlang/dmd/blob/master/druntime/src/object.d#L3724

It uses a global which is why it has unresolved symbol errors.

https://issues.dlang.org/show_bug.cgi?id=23820

@WalterBright
Copy link
Member Author

This is still a good PR and is ready to merge.

dkorpel
dkorpel previously approved these changes Jul 3, 2023
@dkorpel
Copy link
Contributor

dkorpel commented Jul 3, 2023

@kinke @rainers Do you still have objections after the discussion?

@kinke
Copy link
Contributor

kinke commented Jul 3, 2023

Yes, my objection still stands - breaks existing code and doesn't tackle the root problem, only a minor symptom.

@dkorpel dkorpel dismissed their stale review July 3, 2023 17:55

There's still objections

@rikkimax
Copy link
Contributor

rikkimax commented Jul 3, 2023

Just so we are clear:

  1. All bindings to shared libraries that are static should be marked as export as it is more efficient
  2. Nobody marks their code export (including druntime/phobos)
  3. If you put bindings and your D code in a single module, it will be considered external to your binary
  4. If you use module constructors/unittests, they will not run

In the above scenarios, I am not referring to D being used as a shared library, it is for executables.

This is guaranteed to cause problems someday. It is a ticking time bomb.

I am not recommending the external import path switch for fun. I spent months trying to figure this out. It is the only solution that will work consistently and without problems. It will mean that Martin and I have to work on dub, the refactoring will not be enjoyable.

@rikkimax
Copy link
Contributor

rikkimax commented Aug 4, 2023

Unfortunately, I have hit a scenario where eliding of ModuleInfo for shared libraries is required.

lld-link: error: undefined symbol: ModuleInfo for sidero.base.text.format.rawwrite
>>> referenced by .dub\build\executable-debug-windows-x86_64-ldc_v1.31.0-F5CF8299D338992242C1C53ECC18FD67E166981F5A3173CE71859365507D3463\sidero_eventloop.obj:(ModuleInfo for sidero.eventloop.internal.event_waiting)
>>> referenced by .dub\build\executable-debug-windows-x86_64-ldc_v1.31.0-F5CF8299D338992242C1C53ECC18FD67E166981F5A3173CE71859365507D3463\sidero_eventloop.obj:(ModuleInfo for sidero.eventloop.networking.sockets)
>>> referenced by .dub\build\executable-debug-windows-x86_64-ldc_v1.31.0-F5CF8299D338992242C1C53ECC18FD67E166981F5A3173CE71859365507D3463\sidero_eventloop.obj:(ModuleInfo for sidero.eventloop.networking.internal.windows.mechanism)
>>> referenced 10 more times

It's a real shame this PR has unintended side effects that cannot go in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

6 participants