Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate automatic creation of empty __init__.py files #7386

Open
limdor opened this issue Feb 8, 2019 · 15 comments
Open

Deprecate automatic creation of empty __init__.py files #7386

limdor opened this issue Feb 8, 2019 · 15 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-Python Native rules for Python type: feature request

Comments

@limdor
Copy link
Contributor

limdor commented Feb 8, 2019

Description of the problem / feature request:

Bazel is creating init.py files automatically. This is an issue in several situations (see bazelbuild/rules_python#55).

This behavior can be suppressed using:
https://docs.bazel.build/versions/master/be/python.html#py_test.legacy_create_init

But in case that you have a large code base with a lot of python rules this is not a good solution. It would mean that this flag needs to be set in every single rule in case that you have a folder in the root of the workspace with the same name as a python package.
When this happens, it is not so easy to figure out what is going on. For modern Python developers init.py files are like source, these are not generated files. Thus, is not intuitive that the error could be because some init files are automatically generated.

Feature requests: what underlying problem are you trying to solve with this feature?

In a first step: There should be an option to decide the behavior of legacy_create_init globally.
In a second step: The default option should be not to generate init.py files by default.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Same like bazelbuild/rules_python#55 or any setup that at the root of the workspace you have a folder called like a builtin python module (time, logging, platform, queue, etc.)

What operating system are you running Bazel on?

Ubuntu 18.04

What's the output of bazel info release?

release 0.22.0

@dslomov dslomov added team-Rules-Python Native rules for Python untriaged labels Feb 11, 2019
@brandjon
Copy link
Member

The end state shouldn't be a global option but rather removing the legacy create init behavior altogether. This might look like the following.

  1. Go through an incompatible change migration to set the default value of legacy_create_init to false.

  2. Then go through an incompatible change migration to force the value to false for everyone, effectively making the attribute a no-op.

  3. Then go through an incompatible change migration to delete the redundant attribute.

@brandjon brandjon added type: feature request P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Mar 14, 2019
@brandjon brandjon changed the title Global option to decide if to create __init__.py files for Python Deprecate automatic creation of empty __init__.py files Oct 21, 2019
@brandjon
Copy link
Member

The first step of the above is implemented in @Faqa's PR #9271, which is in the process of being merged. It introduces an incompatible change flag tracked by #10076.

Steps 2 and 3 above can come after that. They can actually be merged into a single step. But that'll come down the line, since I don't anticipate flipping the first flag for a while. (We need to see how difficult the migration is, and if it requires new tooling, that'll be hard to prioritize.)

We can use this bug to track the overall deprecation effort.

@brandjon
Copy link
Member

A concern came up in internal review of #9271: How will a user know that a failure at execution time (from missing __init__.py) is due to flipping this flag?

We already have some logic in the stub script to give users a more informative error message when a failure may be due to a different incompatible change. We can adopt a similar strategy for this change, so that when the incompatible flag is enabled, the exit code is 1, and the stderr matches ^(ModuleNotFoundError|ImportError): , we print an additional message linking to the incompatible change docs. (The second incompatible change, to remove legacy_create_init altogether, won't need any special diagnostic because it'll be a straightforward analysis-time error.)

The downside of this approach is that it means Bazel injects stderr spam into failing py_binarys. But since it would only trigger when legacy_create_init is set to the default value, auto, the user could opt out of the spam on a per-target basis by explicitly setting this attr to true or false -- which works until the second incompatible change to delete the attr. We'd remove the special message from the stub script once the first incompatible flag is flipped.

bazel-io pushed a commit that referenced this issue Oct 24, 2019
This effectively changes the default value of legacy_create_init to false.

Technically, the attribute type has changed from boolean (default true) to tristate (default auto). Therefore, it is possible to see a semantic change in code that introspects a py_binary's or py_test's attrs dict (via native.existing_rules). We think it is unlikely this will break anyone, and in any case it is not currently possible to gate a rule's definition on an incompatible change flag.

Closes #9271. Work toward #7386 and #10076.

RELNOTES: [Python] Added flag --incomaptible_default_to_explicit_init_py to switch the default value of legacy_create_init to True. With this flag enabled, your py_binary and py_test targets will no longer behave as if empty __init__.py files were implicitly littered in your runfiles tree. See [#10076](#10076).
PiperOrigin-RevId: 276526057
@scele
Copy link
Contributor

scele commented Feb 3, 2020

@brandjon Curious, are there some existing best practices for writing BUILD files when automatic __init__.py file generation is switched off?

Suppose that I have the following setup:

foo/
   __init__.py
   BUILD
   lib/
      __init__.py
      lib.py
      BUILD
   bin/
      __init__.py
      bin.py
      BUILD

With automatic __init__.py file generation, there need not be checked-in __init__.py files, and the BUILD files could look like this:

# foo/lib/BUILD
py_library(
    name = "lib",
    srcs = ["lib.py"],
)

# foo/bin/BUILD
py_library(
    name = "bin",
    srcs = ["bin.py"],
    deps = [
        "//foo/lib",
    ],
)

With manual __init__.py file generation, I guess I'd have to set up something like this:

# foo/BUILD
py_library(
    name = "init",
    srcs = ["__init__.py"],
)

# foo/lib/BUILD
py_library(
    name = "init",
    srcs = ["__init__.py"],
)
py_library(
    name = "lib",
    srcs = ["lib.py"],
    deps = [":init"],
)

# foo/bin/BUILD
py_library(
    name = "init",
    srcs = ["__init__.py"],
)
py_library(
    name = "bin",
    srcs = ["bin.py"],
    deps = [
        ":init",
        "//foo/lib",
        "//foo:init",
    ],
)

This is unfortunately pretty verbose, but somewhat doable for the most part I guess.. To me, the most annoying thing seems to be that we have to explicitly add dependencies to the top-level //foo:init target, in order to have foo/__init__.py show up in runfiles and make import foo.<something> work at runtime. It does not seem to scale well when the repository becomes larger: if I have a target //a/b/c/d:lib, it would need to depend on all parent __init__.py files, like //a:init, //a/b:init, //a/b/c:init. Encouraging a pattern where child libs mechanically depend on their parent packages could also be seen problematic - usually it's a good thing for unit test cacheability if leaf directories are independent.

Or maybe I'm missing something - are there existing bazel code bases that have switched automatic __init__.py generation off?

@brandjon
Copy link
Member

brandjon commented Feb 3, 2020

There's no recommended best practice for what dependency structure to use for __init__.py files. You could have it be included in the srcs of multiple py_librarys in the same package, or factor it into its own target and add it to deps. For parent init files you would have to depend on the parent packages.

@limdor
Copy link
Contributor Author

limdor commented Feb 6, 2020

I will take the liberty to answer, I would disagree on having the init.py files in seperate dependencies.
In this example, as you said with automatic __init__.py it would look something like this, but the issue that I have with automatic __init__.py is that several are not needed.

foo/
   __init__.py   <--- Not needed
   BUILD
   lib/
      __init__.py
      lib.py
      BUILD
   bin/
      __init__.py   <--- Not needed
      bin.py
      BUILD

An init file is to indicate that is a module, for libraries this should be part of a library, I would not recommend to remove it from the srcs of the library and put it separate.

# foo/lib/BUILD
py_library(
    name = "lib",
    srcs = [
        "__init__.py",  <-- This should be part of the library, it is indicating what is a module
        "lib.py",
    ],
    imports = ["."],
)

The imports "." is needed to import from this library, I think that this is unfortunate but probably relates to be able to import always from the root of the workspace. If it would be a big repository with a lot of different languages mix I don't think that scales always starting your import from the workspace.
Here you can see the example that you put how is best to my eyes but it will depend a lot from the project and this is just my personal opinion.
https://github.com/limdor/bazel-examples/tree/master/python

I would love to extend the example with more complex situations, I am also preparing one example to see how someone could integrate pylint into bazel.
I am sure that all together we will find the best way for each situation.

@chickenandpork
Copy link

Likely a naive idea: Is it possible to create a merging py_library variant that properly merges two libraries with clashing namespace?

For example, with google-cloud-datastore and google-cloud-bigquery, the proper result of a single dependency that offers google.cloud.* based on these two dependencies can then be a dependency of other targets. You'd have a BUILD something like:

py_test(
  name = "test_ns",
  srcs = [
    "test_ns.py",
  ],
  deps = [
      ":merged_google_cloud",
  ],
)

py_merged_library(      # new explicit merge action
  name = "merged_google_cloud",
  deps = [
    requirement("google-cloud-datastore"),
    requirement("google-cloud-bigquery"),
  ],
)

This magical new py_merged_library essentially pip-installs both dependencies, and repacks a single wheel.

@limdor
Copy link
Contributor Author

limdor commented Mar 16, 2021

@brandjon I just realized that we missed the oportunity to do the default flip on Bazel 4. What is the status? Are there any blockers to switch the default?

limdor added a commit to limdor/p2p-lending that referenced this issue Sep 18, 2021
In case a __init__.py file is not present, Bazel will by default
automatically create one. This can have several issues and that
is why we deactivate the automatic creation.
For more information see:
bazelbuild/bazel#7386
@aaliddell
Copy link
Contributor

Could this be added to the changes for Bazel 5?

@brandjon
Copy link
Member

This is not currently being worked on.

I don't think we have plans to prioritize Python rules work within Bazel, though we are looking into migrating them to Starlark (as with many other native rule sets). It's conceivable that migration might require removing edge cases and strange behaviors like this one.

@Irvenae
Copy link

Irvenae commented Jul 7, 2022

+1 for resolving this.

We had a major issue with this and after a long search found a hack
dillon-giacoppo/rules_python_external#38

sloretz added a commit to sloretz/drake-ros that referenced this issue Dec 6, 2022
EricCousineau-TRI pushed a commit to RobotLocomotion/drake-ros that referenced this issue Dec 7, 2022
Use rmw_isolation
Use legacy_create_init to workaround bazelbuild/bazel#7386
Add TODO about ODR violations
Use C++17
@EricCousineau-TRI
Copy link
Contributor

I've posted a related issue, #17691 - if explicit py_library(*, imports) could have precedence over implicitly created imports, then it may alleviate pain points related to implicitly created __init__.py files.

@tpudlik
Copy link
Contributor

tpudlik commented Nov 10, 2023

I noticed rules_python now sets this flag in its own .bazelrc. @rickeylev is the maintainers' view that everyone should be building with --incompatible_default_to_explicit_init_py, and we just haven't gotten around to flipping the flag at the Bazel level?

If my project flips this flag (which is very convenient for us), are we pre-adopting the future, or heading down a blind alley?

@rickeylev
Copy link
Contributor

Yeah, the flag will be flipped -- eventually. It a source of confusing behavior and bugs. I'm not sure when it'll be flipped, but it'll be after the rules_python Starlark implementation is activated (which is waiting for Bazel 7 to be released).

@martis42
Copy link

@rickeylev By now the rules_python Starlark implementation exists (thanks a lot for your efforts!) and Bazel 7 is half a year old. Do you believe we can flip this flag now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-Python Native rules for Python type: feature request
Projects
None yet
Development

No branches or pull requests