Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import style and modularisation #2288

Closed
mhauru opened this issue Jul 15, 2024 · 12 comments
Closed

Import style and modularisation #2288

mhauru opened this issue Jul 15, 2024 · 12 comments

Comments

@mhauru
Copy link
Member

mhauru commented Jul 15, 2024

Often when reading Turing.jl code (and many other Julia projects) I'm struck by how hard it is to see where names (functions, variables, types) are defined. There are two parts to this problem:

  • The using X syntax brings things into a namespace without explicitly naming them.
  • Having multiple files included in one module means that things defined in one file are in scope in another, without either file explicitly referencing the other.

To counter this issue, I'd propose two conventions:

  1. All imports should be explicit, so either using X: a, b, c, using X: X, import X: a, b, c, or import X.
  2. Each file should be its own module.

I wouldn't be surprised if 2. turned out to be a bit overly strict in some cases, and I'm happy to have some flexibility there. Maybe 2. should just say "use more submodules".

I don't propose anyone goes and writes a single massive PR for this. Rather I'm asking for comments on this as a style convention. If there's support for this, I'd slowly start applying the new conventions as I edit code.

I raise this as a Turing.jl issue, but I'd eventually like for this to apply to all of TuringLang (and preferably the rest of the universe too).

Thoughts?

@yebai
Copy link
Member

yebai commented Jul 15, 2024

Each file should be its own module.

This might be overly restrictive. A more moderate proposal is that each subfolder under src should be a module, and all its included files should be in that subfolder too.

I'm struck by how hard it is to see where names (functions, variables, types) are defined

I am probably too familiar with the code to say anything useful.

But vscode offers some tools for navigating code. There is also @which macro.

@torfjelde
Copy link
Member

The using X syntax brings things into a namespace without explicitly naming them.

I don't have super-strong opionions regarding exactly where we define the different namespaces, but I very much agree with not doing this 👍

@mhauru
Copy link
Member Author

mhauru commented Jul 16, 2024

This might be overly restrictive. A more moderate proposal is that each subfolder under src should be a module, and all its included files should be in that subfolder too.

That could be a good middle ground if one-module-one-file is a bit too much. Happy to be flexible with this, but I do think more submodules would help structure our namespaces and provide some clear interfaces, defining what is an implementation detail and what is a key part of functionality. This would also help readability of the code, to understand what are the "external facing" functions of some subpart of a package.

To take an example, in DynamicPPL I would make everything related to VarInfo a module. Could be that VarInfo, SimpleVarInfo, and AbstractVarInfo are also all their own modules, or could be that they are all together, but I think there's a natural interface between those types and the rest of DynamicPPL, and making the code structure reflect that separation of concerns would, in my opinion, clarify the code.

But vscode offers some tools for navigating code. There is also @which macro.

Yeah, these help, language servers help, but I do still find myself grepping the code to find where things are defined. Also, when browsing code on GitHub this comes up a lot.

@yebai
Copy link
Member

yebai commented Jul 16, 2024

Might be helpful: https://github.com/ericphanson/ExplicitImports.jl

@mhauru
Copy link
Member Author

mhauru commented Jul 17, 2024

Interesting. Seems like not using using X is about to become an official recommendation: JuliaLang/julia#42080

@devmotion
Copy link
Member

Yes, my impression is that explicit imports have been the (inofficial) recommendation for quite a while.

I also think that making every file its own module is too much. The new public keyword (presented in eg https://pretalx.com/juliacon2024/talk/KGQ8KL/) allows to better mark the official API than the current implicit conventions in the ecosystem.

My general approach to clearly mark where functions etc are defined is to not import anything beyond the package module (ie only import LinearAlgebra) and qualify all uses in the package. But this can be a bit verbose at times, so it's not a rule that I'd suggest following blindly.

@mhauru
Copy link
Member Author

mhauru commented Jul 17, 2024

Okay, so broad support for explicit imports, some support for more submodules, but maybe not quite to the one-file-one-module level.

What are the reasons why people would like to not go to one-file-one-module? Is it just the verbosity of having to add more imports at the beginning of each file?

That would bring in some boilerplate, but reason I'd be willing to pay that price is that I really enjoy the cleanness and simplicity of the guarantee that any name you encounter in a file is either 1) in Base, 2) defined in this file, or 3) explicitly imported at the top of file. You could not ever have something in your namespace that isn't made explicitly clear how it enters that namespace.

@devmotion
Copy link
Member

I'm against one module per file because modules are not designed and intended to be used in this way. Files are intentionally distinct from modules. There also seems to be a general sentiment in the community that many submodules seems to indicate that you should split your package into several packages.

You could not ever have something in your namespace that isn't made explicitly clear how it enters that namespace.

That's also the case if you don't import anything. Arguably, also for some functions such as dot or randexp it's obvious where they are defined, so not every instance seems problematic.

@mhauru
Copy link
Member Author

mhauru commented Jul 17, 2024

I'm against one module per file because modules are not designed and intended to be used in this way. Files are intentionally distinct from modules.

I see the intent of modules to be namespace management, which is very much what a one module per file policy would be about. I can imagine there are some cases where one wants to include without wrapping it in a module, maybe when working with small scripts. But generally I view include as a very low-level way of putting code together, akin to eval, and poorly suited for managing anything complex. I know this is not the way most of the Julia community does things, and I've always thought that that's been a very unfortunate design convention chosen by the community.

That's also the case if you don't import anything.

Sorry, I don't understand what you mean here.

The cases that bother me are things like these:

include("a.jl")
include("b.jl")
include("c.jl")
include("d.jl")
include("e.jl")
include("f.jl")
include("g.jl")

Say I'm reading g.jl and encounter a name dada. I can't find a mention of dada anywhere else in the file. I may start to wonder if dada is something in Base I'm not familiar with. If not, then I need to figure out which file I need to look in. Unless I have something like a language server at hand, first I need to figure out where g.jl was included, then which files were included before it in in the same file, and then do some guesswork or grepping to figure out which of the six candidates is the relevant one. I'd thus prefer it if the top of g.jl said import ..B: dada, making it explicit how dada enters the namespace I'm looking at.

Note also that probably files a-f.jl define all sorts of things that don't really need to have a life outside of that file. Small helper functions and such. All of those are in scope in g.jl, increasing the chance of accidental name collisions.

@penelopeysm
Copy link
Member

I'm increasingly convinced by one-module-per-file 👀

@penelopeysm
Copy link
Member

Unless there's a huge performance downside or something like that, of course?

I did a cursory Google search and https://www.reddit.com/r/Julia/comments/y4t0kc/is_there_a_benchmark_of_the_overhead_of_modules/ seems to suggest that 20,000 modules is problematic, but I don't think we're anywhere near that.

@yebai yebai closed this as completed Dec 18, 2024
@yebai
Copy link
Member

yebai commented Dec 18, 2024

It's likely resolved now; feel free to reopen otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants