multiprocessing with restarting workers, .db files are corrupted

Hello dear sirs,

This is to report a problem we have with `prometheus_client` in multiprocessing mode with restarting workers.

### How do we observe the problem?

In our production environment it looks like this:
- master process starts and spawns a set of worker processes;
- metrics reporting is fine;
- reconfiguration is requested, and all worker processes are replaced;
- sometimes after this reconfiguration:
  + metrics reporting stops working (HTTP endpoint returns 500)
  + logs contain errors like the ones below, which suggests that .db files get corrupted
  + metrics do not come back until complete restart (and removal of corrupted .db files)

The errors may look like this:

```
...
  File "/Users/vasiliev/.virtualenvs/metrics-issue27/lib/python2.7/site-packages/prometheus_client/core.py", line 682, in __reset
    files[file_prefix] = _MmapedDict(filename)
  File "/Users/vasiliev/.virtualenvs/metrics-issue27/lib/python2.7/site-packages/prometheus_client/core.py", line 577, in __init__
    for key, _, pos in self._read_all_values():
  File "/Users/vasiliev/.virtualenvs/metrics-issue27/lib/python2.7/site-packages/prometheus_client/core.py", line 611, in _read_all_values
    encoded = unpack_from(('%ss' % encoded_len).encode(), data, pos)[0]
error: unpack_from requires a buffer of at least 1919251561 bytes

...
  File "/Users/vasiliev/.virtualenvs/metrics-issue27/lib/python2.7/site-packages/prometheus_client/multiprocess.py", line 42, in merge
    metric_name, name, labels = json.loads(key)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
```
### How to reproduce the problem?

This is very tricky to reproduce in isolated environment, and for sure does not fit in description of a Github issue, so I put the code and instructions how to reproduce here:

https://github.com/lonlylocly/prometheus_client_concurrency_issue

Basically, it is our production script stripped down to minimal version. It reproduces maybe 50% of the time.

### What do I want from this issue?

I must admit that we are lost and we can't figure how can this issue be mitigated. We love prometheus and we really liked the convenience of `python_client` but metrics reporting breaks with this problem on stable basis and we would like to eliminate it.

I would appreciate any sort of suggestion or advice and am also willing to help via PR (if we manage to figure a workaround, personally I don't even know how to start).

Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiprocessing with restarting workers, .db files are corrupted #347

How do we observe the problem?

How to reproduce the problem?

What do I want from this issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

multiprocessing with restarting workers, .db files are corrupted #347

Description

How do we observe the problem?

How to reproduce the problem?

What do I want from this issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions