You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The idea to use dataclasses has been discussed a number of times (see #2056), more recently in #2491 and there is also a connection to #2607. So here I am opening an issue to write down some thoughts.
This is rather for from being my final judgement on the matter and I'm aware that it will be a while until we properly discuss this. So more of a reminder for future-me.
Motivation
We have a lot of boilerplate code. especially the inits of classes like Message take up a lot of place, along with __slots__ and de/to_json.
There are ways to hide those away by using 3rd pary libraries (that inspired dataclasses) or the built-in dataclasses. Here is what such approaches can possibly do for us:
Making TelegramObjects immutable. I'm really not sure if we want that, but it's at least an interesting thought given that incoming updates represent a static information and changing them doesn't make sense at all.
Edits:
inspired by Remove redundant setters #2747 dataclasses could also be used for ext.Defaults and CallbackContext, especially making them immutable
with dataclasses, we could get rid of the __new__ workaround in TelegramObject: The reason for that was that we didn't want to have to call super().__init__ in all the subclasses. if TO is a dataclass as well, then the auto-generated inits of the subclasses would automatically do that for us
A word on 3rd party libs
A number of 3rd party libraries that offer similar functionality as dataclasses, most notably attrs and pydantic but also some others. I know that we try to keep our dependencies at a minimum and I'm fully backing that mentality. Still, let me list some pros & cons that I see of possible usage of attrs & pydantic - compared to using dataclasses
➕ Automated handling of (nested!) to/de_json logic. pydantic has this built-in, attrs can do this together with the cattrs library
➕/➖ Data validation. pydantic is heavily inspired by that. This means that e.g. self.text = text is replaced by self.text = str(text) in pydantic, but apparently also slows it down
➕ attrs comes with support for slots, which dataclass only adds in py3.10. IISC pydanticdoes notautomatically
➖ IISC neither pydantic nor attrs can do both extra arguments and slots
➖IDE- & type-checker integration may be non-optimal (pydantice provides plugins for mypy and PyCharm)
➖ They are dependencies. duh. If your own boilerplate is fasly, you can change it. If the dependency has a bug, then good night.
➕/➖ Handling of extra arguments - this is actually really important, also for [Discussion] kwargs for input-only classes (like InputMedia*) #2607 and I'll go into detail on that later. TL;DR: pydantic can dismiss additional kwargs that are passed to init, e.g. Chat(…, new_argument=42) will work. attrs can't, IISC. dataclass can't either
But let's focus on the last bullet point. If we can't handle that properly, then there is no use for attrs or dataclass at all
Dismissal of additional kwargs
Currently there is no way to make dataclass add **kwargs to the auto-generated __init__.
But what if we approach this problem differently? Instead of passing the additional kwargs, we can filter them out before. Ofc this is more expensive, but
the chance of this happening is low (new arguments are not added too aften & we're usually lightning fast on API updates *cough*)
similar to the discussion Refactor kwargs handling in Bot methods #1924, by removing **kwargs we no longer just swallow typos in arguments etc. Admittedly, this benefit is mall, since classes like Message, Chat etc are usually not manually instantiated.
Such an approach should work with both attrs and dataclass
One thing to keep in mind is that we currently pass bot to all classes, so a lot of them always have one additional kwarg. This should be easily solvable by different means. E.g. one could
add bot to all signatures
add a class variable HAS_SHORTCUTS that tells us whether or not we need to pass bot
add a method TelegramObject.set_bot to remove bot from all signatures
So here's a small comparison of the two approaches:
importinspectimporttimeitclassInspectDeJson:
_init_params=Nonedef__init__(self, arg1, arg2, arg3, arg4, arg5, arg6, arg7):
self.arg1=arg1self.arg2=arg2self.arg3=arg3self.arg4=arg4self.arg5=arg5self.arg6=arg6self.arg7=arg7@classmethoddefde_json(cls, data):
ifcls._init_paramsisNone:
signature=inspect.signature(cls)
cls._init_params=signature.parameters.keys()
# try-except is significantly faster in case we already have a correct argument settry:
returncls(**data)
exceptTypeErrorasexc:
ifstr(exc).startswith('__init__() got an unexpected keyword argument'):
returncls(**{k: vfork, vindata.items() ifkincls._init_params})
else:
raiseexcclassKwargsDeJson:
def__init__(self, arg1, arg2, arg3, arg4, arg5, arg6, arg7, **kwargs):
self.arg1=arg1self.arg2=arg2self.arg3=arg3self.arg4=arg4self.arg5=arg5self.arg6=arg6self.arg7=arg7@classmethoddefde_json(cls, data):
returncls(**data)
correct_data= {
'arg1': 'arg1',
'arg2': 'arg2',
'arg3': 'arg3',
'arg4': 'arg4',
'arg5': 'arg5',
'arg6': 'arg6',
'arg7': 'arg7',
}
overfull_data=correct_data.copy()
overfull_data.update({str(i): iforiinrange(5)})
number=int(5*10e6)
print('Passing correct data')
a=timeit.timeit('InspectDeJson.de_json(correct_data)', globals=globals(), number=number)
b=timeit.timeit('KwargsDeJson.de_json(correct_data)', globals=globals(), number=number)
print(f'InspectDeJson took {a/number} seconds on average')
print(f'KwargsDeJson took {b/number} seconds on average')
print(f'InspectDeJson took {a*100/b} % of KwargsDeJsons runtime')
print('\n'+20*"="+'\n')
print('Passing too much data')
a=timeit.timeit('InspectDeJson.de_json(overfull_data)', globals=globals(), number=number)
b=timeit.timeit('KwargsDeJson.de_json(overfull_data)', globals=globals(), number=number)
print(f'InspectDeJson took {a/number} seconds on average')
print(f'KwargsDeJson took {b/number} seconds on average')
print(f'InspectDeJson took {a*100/b} % of KwargsDeJsons runtime')
So in the "usual" case the overhead is neglectible IMO and the usage of inspect-dark magic is not too extensive.
In the cases where we have unhandled data, the slow down is heavily notiable - but we're still talking 4 vs 2 micro seconds and as explained above, this shouldn't happen too often.
Summary
For today, much punchlines are:
Dismissal of additional kwargs can be handled in a different way, that works reasonably well IMO
Using 3rd party lib could reduce some more boilerplate code, specifically to/de_json, but neither pydantic nor attrs has all the features that I would like. Neither does dataclass, but at least that's included batteries
PS:
PyCharm apparently still has no proper support for dataclasses - not that this would stop us, just mentioning it …
The idea to use dataclasses has been discussed a number of times (see #2056), more recently in #2491 and there is also a connection to #2607. So here I am opening an issue to write down some thoughts.
This is rather for from being my final judgement on the matter and I'm aware that it will be a while until we properly discuss this. So more of a reminder for future-me.
Motivation
We have a lot of boilerplate code. especially the inits of classes like
Messagetake up a lot of place, along with__slots__andde/to_json.There are ways to hide those away by using 3rd pary libraries (that inspired
dataclasses) or the built-indataclasses. Here is what such approaches can possibly do for us:Edits:
ext.DefaultsandCallbackContext, especially making them immutable__new__workaround inTelegramObject: The reason for that was that we didn't want to have to callsuper().__init__in all the subclasses. ifTOis a dataclass as well, then the auto-generated inits of the subclasses would automatically do that for usA word on 3rd party libs
A number of 3rd party libraries that offer similar functionality as
dataclasses, most notablyattrsandpydanticbut also some others. I know that we try to keep our dependencies at a minimum and I'm fully backing that mentality. Still, let me list some pros & cons that I see of possible usage ofattrs&pydantic- compared to usingdataclassespydantichas this built-in,attrscan do this together with thecattrslibrarypydanticis heavily inspired by that. This means that e.g.self.text = textis replaced byself.text = str(text)inpydantic, but apparently also slows it downattrscomes with support for slots, which dataclass only adds in py3.10. IISCpydanticdoes notautomaticallypydanticnorattrscan do both extra arguments and slotspydanticeprovides plugins formypyand PyCharm)InputMedia*) #2607 and I'll go into detail on that later. TL;DR:pydanticcan dismiss additional kwargs that are passed to init, e.g.Chat(…, new_argument=42)will work.attrscan't, IISC.dataclasscan't eitherBut let's focus on the last bullet point. If we can't handle that properly, then there is no use for
attrsordataclassat allDismissal of additional kwargs
Currently there is no way to make
dataclassadd**kwargsto the auto-generated__init__.But what if we approach this problem differently? Instead of passing the additional kwargs, we can filter them out before. Ofc this is more expensive, but
**kwargswe no longer just swallow typos in arguments etc. Admittedly, this benefit is mall, since classes likeMessage,Chatetc are usually not manually instantiated.Such an approach should work with both
attrsanddataclassOne thing to keep in mind is that we currently pass
botto all classes, so a lot of them always have one additional kwarg. This should be easily solvable by different means. E.g. one couldbotto all signaturesHAS_SHORTCUTSthat tells us whether or not we need to passbotTelegramObject.set_botto removebotfrom all signaturesSo here's a small comparison of the two approaches:
Example output on my machine:
So in the "usual" case the overhead is neglectible IMO and the usage of inspect-dark magic is not too extensive.
In the cases where we have unhandled data, the slow down is heavily notiable - but we're still talking 4 vs 2 micro seconds and as explained above, this shouldn't happen too often.
Summary
For today, much punchlines are:
to/de_json, but neitherpydanticnorattrshas all the features that I would like. Neither doesdataclass, but at least that's included batteriesPS:
PyCharm apparently still has no proper support for
dataclasses- not that this would stop us, just mentioning it …