-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Hello,
Coming from a Google Support Case 51622001: High latency exporting to hocr, which has derived to this issue.
Details
In the method from google.cloud.documentai_toolbox import document as documentai_document_wrapper
documentai_document_wrapper.Document.from_documentai_document(
documentai_document=result.document
).export_hocr_str(title="title")When transforming tables, the latency takes from 30 to 50 seconds, depending on the complexity of the page (high data in table formats).
Looking for any type of optimization.
Environment details
- OS type and version: GCP cloudshell
google-cloud-documentai-toolboxversion: 0.13.3a0
Steps to reproduce
- create venv with the provided requirements.txt
- execute python3 main-hocr.py test.pdf
Code example
request = documentai.ProcessRequest(
name=resource_name,
raw_document=raw_document,
process_options=process_options,
)
start = time.time()
result = client.process_document(request=request)
print(f"process_document {(time.time() - start)}")
start = time.time()
wrapped_document = documentai_document_wrapper.Document.from_documentai_document(
documentai_document=result.document
)
print(f"wrapped_document {(time.time() - start)}")
start = time.time()
hocr_result = wrapped_document.export_hocr_str(title="hocr")
print(f"export_hocr_str {(time.time() - start)}")Stack trace
N/A, the execution is correct, but the latency takes 35 seconds long
Attached sources to reply the test:
sources.zip
- main-hocr.py, with the full code of the example
- requirements.txt
- test.pdf, file to process with documentai: ocr plus hocr
Thanks!
Metadata
Metadata
Assignees
Labels
No labels