Skip to content

entry_points doesn't handle empty .dist-info files well #534

@rossburton

Description

@rossburton

This is related to #489 but with a different way to reach it.

The background is that our build process will delete files when packages are upgraded but will leave directories behind, because it doesn't track who created the directory and whether it should remain or not.

This means that over time you can end up with a site-packages that has a number of directories in, such as setuptools (with the code), setuptools-82.0.1.dist-info (with the 82.0.1 metadata) and also setuptools-82.0.0.dist-info which is empty.

If the on-disk order is returning the empty 82.0.0 before 82.0.1 then importlib_metadata.entry_points() will return a subset of what is expected.

For example, with a simple test case:

for x in metadata.entry_points():
    print(x)

A fresh venv that has setuptools/build/pip installed finds 48 entry points:

EntryPoint(name='alias', value='setuptools.command.alias:alias', group='distutils.commands')
EntryPoint(name='bdist_egg', value='setuptools.command.bdist_egg:bdist_egg', group='distutils.commands')
EntryPoint(name='bdist_rpm', value='setuptools.command.bdist_rpm:bdist_rpm', group='distutils.commands')
EntryPoint(name='bdist_wheel', value='setuptools.command.bdist_wheel:bdist_wheel', group='distutils.commands')
EntryPoint(name='build', value='setuptools.command.build:build', group='distutils.commands')
...
EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts')
EntryPoint(name='pip3', value='pip._internal.cli.main:main', group='console_scripts')
EntryPoint(name='pyproject-build', value='build.__main__:entrypoint', group='console_scripts')
EntryPoint(name='build', value='build.__main__:entrypoint', group='pipx.run')

But by creating setuptools-n.dist-info directories with different values of n until ls -U shows that it appears before the actual metadata has different behaviour:

EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts')
EntryPoint(name='pip3', value='pip._internal.cli.main:main', group='console_scripts')
EntryPoint(name='pyproject-build', value='build.__main__:entrypoint', group='console_scripts')
EntryPoint(name='build', value='build.__main__:entrypoint', group='pipx.run')

Note that none of the setuptools EPs were listed.

My theory: the entry_points() method is listing all distributions and then doing a unique() on it, which will remove duplicate distributions based on the name with the simple logic that it takes the first one seen. With the broken setup I've described above this means it just returns the first, broken, dist and ignores the one with actual content.

I've verified locally that adding another _prefer_valid() call to Distribution.discover() resolves this by sorting valid dists first:

        context = context or DistributionFinder.Context(**kwargs)
        return cls._prefer_valid(itertools.chain.from_iterable(
            resolver(context) for resolver in cls._discover_resolvers()
        ))

This feels like a bit of a heavy hammer though and possibly better logic in entry_points() would be preferable?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions