Adds SERDE for persisters#485
Conversation
The persisters can now be pickled and unpickled. This means that they can be properly serialized across process boundaries.
|
A preview of c2cc4bb is uploaded and can be seen here: ✨ https://burr.dagworks.io/pull/485 ✨ Changes may take a few minutes to propagate. Since this is a preview of production, content with |
There was a problem hiding this comment.
👍 Looks good to me! Reviewed everything up to c2cc4bb in 1 minute and 37 seconds
More details
- Looked at
198lines of code in6files - Skipped
0files when reviewing. - Skipped posting
3drafted comments based on config settings.
1. burr/integrations/persisters/b_mongodb.py:136
- Draft comment:
self.client.addressis not a valid attribute forMongoClient. Useself.client.HOSTandself.client.PORTinstead to get the connection details. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
- The comment is about the new getstate implementation. 2. Looking at pymongo docs, MongoClient.address is actually a valid attribute that returns a (host, port) tuple. 3. HOST and PORT are not standard attributes of MongoClient. 4. The code correctly uses address[0] and address[1] to get host and port from the tuple.
Could there be a version of pymongo where address isn't available? Could HOST/PORT be better practice that I'm unaware of?
The address attribute is well-documented in pymongo and returns exactly what we need. The comment appears to be incorrect about HOST/PORT.
The comment should be deleted as it is incorrect - self.client.address is a valid and appropriate way to get the connection details.
2. burr/integrations/persisters/b_redis.py:165
- Draft comment:
Consider handling the case whereconnection_poolis not present to ensure complete state serialization. - Reason this comment was not posted:
Comment looked like it was already resolved.
3. burr/integrations/persisters/postgresql.py:251
- Draft comment:
Consider handling the case whereinfois not present to ensure complete state serialization. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_JiWlZwz3bmJEL2fu
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.
elijahbenizzy
left a comment
There was a problem hiding this comment.
Looks good, think we can make it easier by abstracting a bit but I think it's fine for now
There was a problem hiding this comment.
❌ Changes requested. Incremental review on c5deb2d in 1 minute and 12 seconds
More details
- Looked at
224lines of code in3files - Skipped
0files when reviewing. - Skipped posting
5drafted comments based on config settings.
1. burr/integrations/persisters/b_mongodb.py:141
- Draft comment:
Accessingself.client.address[0]andself.client.address[1]assumes theaddressattribute is a tuple with at least two elements. Consider adding checks to ensure these attributes exist to avoid potential errors. - Reason this comment was not posted:
Comment did not seem useful.
2. burr/integrations/persisters/b_mongodb.py:158
- Draft comment:
The__setstate__method assumesconnection_paramsalways containsuriandport. Consider adding checks to handle missing keys to prevent potentialKeyError. - Reason this comment was not posted:
Comment was on unchanged code.
3. burr/integrations/persisters/b_redis.py:171
- Draft comment:
Theget_connection_paramsmethod assumesconnection_pool.connection_kwargscontainshost,port,db, andpassword. Consider adding checks to handle missing keys to prevent potentialKeyError. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The Redis connection is created with explicit parameters in from_values() and init(). get_connection_params() is only called on existing connections. The Redis client would fail earlier if these parameters were missing. The connection_pool.connection_kwargs should always have these values since they were provided during connection creation.
I could be wrong about the Redis client implementation - maybe it allows connections without some of these parameters? Also, what if someone subclasses and overrides default_client() with a different client?
Even if some parameters are optional in Redis, the code explicitly provides all these values during connection creation. If someone overrides default_client(), they should also override get_connection_params() as indicated by the comment on line 179.
The comment suggests defensive programming that isn't necessary given the class design and usage patterns. The code is correct as is.
4. burr/integrations/persisters/b_redis.py:187
- Draft comment:
The__setstate__method assumesconnection_paramsalways contains necessary keys for Redis connection. Consider adding checks to handle missing keys to prevent potentialKeyError. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The connection_params dict is created by get_connection_params() which explicitly returns a dict with all required keys. These params come from an existing Redis connection, so they must have been valid to begin with. The suggestion to add checks seems unnecessary since the data structure is controlled internally.
I could be wrong if there are subclasses that override get_connection_params() with different keys. However, the default_client() method suggests Redis is expected.
Even with subclasses, they would need to provide compatible connection params since they're using default_client() which expects Redis connection parameters. The architecture enforces this contract.
The comment should be deleted because the connection parameters are internally controlled and guaranteed to have the required keys through get_connection_params().
5. burr/integrations/persisters/postgresql.py:272
- Draft comment:
The__setstate__method assumesconnection_paramsalways contains necessary keys for PostgreSQL connection. Consider adding checks to handle missing keys to prevent potentialKeyError. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_znu5DZSQdhleNbf9
Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.
c5deb2d to
c2cc4bb
Compare
The persisters can now be pickled and unpickled.
This means that they can be properly serialized across process boundaries.
Changes
How I tested this
Notes
Checklist
Important
Add serialization and deserialization support to MongoDB, Redis, and PostgreSQL persisters using
__getstate__and__setstate__.__getstate__and__setstate__methods toMongoDBBasePersister,RedisBasePersister, andPostgreSQLPersisterfor serialization.pickleintest_b_mongodb.py,test_b_redis.py, andtest_postgresql.py.This description was created by
for c5deb2d. It will automatically update as commits are pushed.