This is related to #489 but with a different way to reach it.
The background is that our build process will delete files when packages are upgraded but will leave directories behind, because it doesn't track who created the directory and whether it should remain or not.
This means that over time you can end up with a site-packages that has a number of directories in, such as setuptools (with the code), setuptools-82.0.1.dist-info (with the 82.0.1 metadata) and also setuptools-82.0.0.dist-info which is empty.
If the on-disk order is returning the empty 82.0.0 before 82.0.1 then importlib_metadata.entry_points() will return a subset of what is expected.
For example, with a simple test case:
for x in metadata.entry_points():
print(x)
A fresh venv that has setuptools/build/pip installed finds 48 entry points:
EntryPoint(name='alias', value='setuptools.command.alias:alias', group='distutils.commands')
EntryPoint(name='bdist_egg', value='setuptools.command.bdist_egg:bdist_egg', group='distutils.commands')
EntryPoint(name='bdist_rpm', value='setuptools.command.bdist_rpm:bdist_rpm', group='distutils.commands')
EntryPoint(name='bdist_wheel', value='setuptools.command.bdist_wheel:bdist_wheel', group='distutils.commands')
EntryPoint(name='build', value='setuptools.command.build:build', group='distutils.commands')
...
EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts')
EntryPoint(name='pip3', value='pip._internal.cli.main:main', group='console_scripts')
EntryPoint(name='pyproject-build', value='build.__main__:entrypoint', group='console_scripts')
EntryPoint(name='build', value='build.__main__:entrypoint', group='pipx.run')
But by creating setuptools-n.dist-info directories with different values of n until ls -U shows that it appears before the actual metadata has different behaviour:
EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts')
EntryPoint(name='pip3', value='pip._internal.cli.main:main', group='console_scripts')
EntryPoint(name='pyproject-build', value='build.__main__:entrypoint', group='console_scripts')
EntryPoint(name='build', value='build.__main__:entrypoint', group='pipx.run')
Note that none of the setuptools EPs were listed.
My theory: the entry_points() method is listing all distributions and then doing a unique() on it, which will remove duplicate distributions based on the name with the simple logic that it takes the first one seen. With the broken setup I've described above this means it just returns the first, broken, dist and ignores the one with actual content.
I've verified locally that adding another _prefer_valid() call to Distribution.discover() resolves this by sorting valid dists first:
context = context or DistributionFinder.Context(**kwargs)
return cls._prefer_valid(itertools.chain.from_iterable(
resolver(context) for resolver in cls._discover_resolvers()
))
This feels like a bit of a heavy hammer though and possibly better logic in entry_points() would be preferable?
This is related to #489 but with a different way to reach it.
The background is that our build process will delete files when packages are upgraded but will leave directories behind, because it doesn't track who created the directory and whether it should remain or not.
This means that over time you can end up with a
site-packagesthat has a number of directories in, such assetuptools(with the code),setuptools-82.0.1.dist-info(with the 82.0.1 metadata) and alsosetuptools-82.0.0.dist-infowhich is empty.If the on-disk order is returning the empty 82.0.0 before 82.0.1 then
importlib_metadata.entry_points()will return a subset of what is expected.For example, with a simple test case:
A fresh venv that has setuptools/build/pip installed finds 48 entry points:
But by creating
setuptools-n.dist-infodirectories with different values ofnuntills -Ushows that it appears before the actual metadata has different behaviour:Note that none of the setuptools EPs were listed.
My theory: the
entry_points()method is listing all distributions and then doing aunique()on it, which will remove duplicate distributions based on the name with the simple logic that it takes the first one seen. With the broken setup I've described above this means it just returns the first, broken, dist and ignores the one with actual content.I've verified locally that adding another
_prefer_valid()call toDistribution.discover()resolves this by sorting valid dists first:This feels like a bit of a heavy hammer though and possibly better logic in
entry_points()would be preferable?