Skip to content

ENH: Native support for Quantity objects in Pandas Series/DataFrame #19350

@Julian-Harbeck

Description

@Julian-Harbeck

What is the problem this feature will solve?

Currently Astropy Quanity objects are converted to float64 when parsed into a pandas DataFrame, for example through the QTable.to_pandas() method. Thereby they use there unit attribute making them lose a lot of usefulness. There is a way around that by using an object dtype for the pd.Series, but that is quite inefficient, especially when multiple operations shall be done on the objects later on leading to a need of constantly parsing between pd.Series and Quantity back and forth.

In 2019/20 @janpipek developed the pandas-units-extension package to enable native support of Quantity objects in pandas using pandas ExtensionDtype/ExtensionArray. Some months ago I forked the project (see pandas-units-extension (fork) to make it compatible with modern pandas. In the past months I updated the implementation, updated the tests and added some more features. It is now in a state that I think it is ready to be published again.

I already introduced the project in last months "Astropy Dev Telecon" we came to the conclusion that longer term this could move from a separate package into astropy's core package, but that there are also some things left to be discussed (for example the dtype string representation, see Issue #7). I would be available again during this weeks meeting of someone would be interested to have a chat. If my notes are correct, then @taldcroft and @neutrinoceros where interested in the topic, but I should also tag the astropy.units core maintainers, which according to the team page should be @nstarman and @mhvk.

After a successful integration one could also think on supporting more complex objects like Skycoord or masked Quantity. Based on the Astropy EA I already started developing another EA for the Skyfield API (Pandas Skyfield Extension), depending on the level of functionality that can be done rather quick.

Describe the desired outcome

Full support for Quantity objects in pandas Series and DataFrame, including:

  • Conversion from/to Quantity/QTable
  • Conversion to other units, including equivalencies maps
  • Conversion from string, like from a csv file
  • Arithmetic and comparison operations between Series/DataFrame and another pandas object or astropy Quantity
  • Numeric reductions like sum, max, std, ...
  • Concatenation of different units of same physical type (e.g. m and ft)

Some examples:
Create a pandas Series containing Quantity objects:

>>> q: u.Quantity = [1, 2, 3] * u.m
>>> q
<Quantity [1., 2., 3.] m>
>>> s: pd.Series = pd.Series(q, dtype="unit")
>>> s
0    1.0 m
1    2.0 m
2    3.0 m
dtype: unit[m]

Comparison operations:

>>> length_sr > 150 * u.cm
0    False
1     True
2     True
dtype: bool

Arithmetic operations:

>>> velocity_sr: pd.Series = length_sr / (1 * u.s)
>>> velocity_sr
0    1.0 m / s
1    2.0 m / s
2    3.0 m / s
dtype: unit[m / s]

Conversion to other units via custom SeriesAccessor:

>>> velocity_sr.units.to(u.km/u.h)
0    3.6 km / h
1     7.2 km / h
2    10.8 km / h
dtype: unit[km / h]

Convert back to Quantity:

>>> velocity_sr.units.to_quantity()
<Quantity [1., 2., 3.] m / s>

This just as an example for now. Desirably one would also alter the QTable.to_pandas() and QTable.from_pandas() to be unit-aware by utilizing the UnitsDtype.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions