feat: optimise copying in `left` for Utf8 and LargeUtf8 #19980

theirix · 2026-01-24T22:25:42Z

Which issue does this PR close?

Closes Additional optimizations for left function #19749.

Rationale for this change

A follow-up to an optimisation of the left function in #19571

What changes are included in this PR?

Improve memory performance to O(1) by eliminating more string copies. Discover a byte offset for the last character for both positive and negative length arguments and slice bytes directly.
For Utf8View (StringViewArray), implement a zero-copy slice operation reusing the same Arrow buffers. It is possible for both views since the string only shrinks. We only need to tune a German prefix.
An Arrow view construction helper shrink_string_view_array_view is included in this PR. Unfortunately, string view builders cannot provide a way to reuse Arrow buffers. I believe it should better reside in the core Arrow crates instead - I can follow up on it.

Are these changes tested?

Additional unit tests
SLTs

Are there any user-facing changes?

No

Improve performance to O(1) by eliminating more string copies. Discover a byte offset for the last character for both positive and negative length argument and slice bytes directly. For LargeUtf8, implement a zero-copy slice operation reusing same Arrow buffers. An Arrow view construction helper `shrink_string_view_array_view` is included to this PR (upstream to Arrow crates)

Jefffrey

Thanks for this PR. Would be good to see some benchmark results too, see:

https://github.com/apache/datafusion/blob/e5e76366a6200dcd7b85cc030128a191e3763dd1/datafusion/functions/benches/left.rs

Moreover, I find some of the points in the PR body confusing:

Improve performance to O(1) by eliminating more string copies. Discover a byte offset for the last character for both positive and negative length arguments and slice bytes directly.

Could you help me understand which changes here make it O(1)?

For LargeUtf8 (StringViewArray), implement a zero-copy slice operation reusing the same Arrow buffers. It is possible for both views since the string only shrinks. We only need to tune a German prefix.

LargeUtf8 and Utf8View are different types, so it is confusing to see them used interchangeably.

datafusion/functions/src/unicode/left.rs

Dandandan · 2026-01-25T07:35:32Z

Run benchmarks

alamb-ghbot · 2026-01-25T07:35:42Z

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing left-speedup-bytes (6e90fa2) to b463a9f diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb-ghbot · 2026-01-25T08:15:13Z

🤖: Benchmark completed

Details

Comparing HEAD and left-speedup-bytes
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ left-speedup-bytes ┃       Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0 │  2428.22 ms │         2379.60 ms │    no change │
│ QQuery 1 │   919.84 ms │          967.92 ms │ 1.05x slower │
│ QQuery 2 │  1872.34 ms │         1905.27 ms │    no change │
│ QQuery 3 │  1032.12 ms │         1041.60 ms │    no change │
│ QQuery 4 │  2223.75 ms │         2243.69 ms │    no change │
│ QQuery 5 │ 27922.98 ms │        27909.80 ms │    no change │
│ QQuery 6 │  3982.09 ms │         3982.05 ms │    no change │
│ QQuery 7 │  2900.29 ms │         3020.20 ms │    no change │
└──────────┴─────────────┴────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 43281.63ms │
│ Total Time (left-speedup-bytes)   │ 43450.12ms │
│ Average Time (HEAD)               │  5410.20ms │
│ Average Time (left-speedup-bytes) │  5431.26ms │
│ Queries Faster                    │          0 │
│ Queries Slower                    │          1 │
│ Queries with No Change            │          7 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ left-speedup-bytes ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.93 ms │            1.99 ms │     no change │
│ QQuery 1  │    50.83 ms │           49.49 ms │     no change │
│ QQuery 2  │   133.87 ms │          135.26 ms │     no change │
│ QQuery 3  │   156.52 ms │          162.61 ms │     no change │
│ QQuery 4  │  1057.99 ms │         1069.99 ms │     no change │
│ QQuery 5  │  1363.03 ms │         1363.20 ms │     no change │
│ QQuery 6  │     1.87 ms │            1.93 ms │     no change │
│ QQuery 7  │    55.25 ms │           54.22 ms │     no change │
│ QQuery 8  │  1427.70 ms │         1409.78 ms │     no change │
│ QQuery 9  │  1791.54 ms │         1719.94 ms │     no change │
│ QQuery 10 │   340.66 ms │          345.99 ms │     no change │
│ QQuery 11 │   394.39 ms │          394.24 ms │     no change │
│ QQuery 12 │  1226.58 ms │         1260.19 ms │     no change │
│ QQuery 13 │  1936.62 ms │         1946.31 ms │     no change │
│ QQuery 14 │  1233.66 ms │         1240.61 ms │     no change │
│ QQuery 15 │  1238.22 ms │         1233.00 ms │     no change │
│ QQuery 16 │  2491.33 ms │         2542.47 ms │     no change │
│ QQuery 17 │  2509.31 ms │         2495.10 ms │     no change │
│ QQuery 18 │  5274.00 ms │         4817.99 ms │ +1.09x faster │
│ QQuery 19 │   122.53 ms │          122.52 ms │     no change │
│ QQuery 20 │  1931.03 ms │         1833.75 ms │ +1.05x faster │
│ QQuery 21 │  2221.92 ms │         2122.60 ms │     no change │
│ QQuery 22 │  4143.05 ms │         3663.35 ms │ +1.13x faster │
│ QQuery 23 │ 15337.00 ms │        12037.08 ms │ +1.27x faster │
│ QQuery 24 │   201.13 ms │          209.94 ms │     no change │
│ QQuery 25 │   455.90 ms │          446.44 ms │     no change │
│ QQuery 26 │   208.02 ms │          214.63 ms │     no change │
│ QQuery 27 │  2656.51 ms │         2591.58 ms │     no change │
│ QQuery 28 │ 23599.03 ms │        23351.03 ms │     no change │
│ QQuery 29 │   979.14 ms │          960.16 ms │     no change │
│ QQuery 30 │  1285.85 ms │         1234.23 ms │     no change │
│ QQuery 31 │  1312.80 ms │         1306.61 ms │     no change │
│ QQuery 32 │  4895.03 ms │         4317.56 ms │ +1.13x faster │
│ QQuery 33 │  5773.52 ms │         5211.24 ms │ +1.11x faster │
│ QQuery 34 │  5818.88 ms │         5554.07 ms │     no change │
│ QQuery 35 │  1950.63 ms │         1861.19 ms │     no change │
│ QQuery 36 │    66.06 ms │           67.08 ms │     no change │
│ QQuery 37 │    45.09 ms │           46.24 ms │     no change │
│ QQuery 38 │    67.12 ms │           66.09 ms │     no change │
│ QQuery 39 │   102.94 ms │          103.51 ms │     no change │
│ QQuery 40 │    26.56 ms │           27.87 ms │     no change │
│ QQuery 41 │    23.70 ms │           23.27 ms │     no change │
│ QQuery 42 │    20.68 ms │           19.88 ms │     no change │
└───────────┴─────────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 95929.42ms │
│ Total Time (left-speedup-bytes)   │ 89636.24ms │
│ Average Time (HEAD)               │  2230.92ms │
│ Average Time (left-speedup-bytes) │  2084.56ms │
│ Queries Faster                    │          6 │
│ Queries Slower                    │          0 │
│ Queries with No Change            │         37 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ left-speedup-bytes ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 106.79 ms │          103.88 ms │     no change │
│ QQuery 2  │  34.06 ms │           34.10 ms │     no change │
│ QQuery 3  │  40.10 ms │           40.89 ms │     no change │
│ QQuery 4  │  31.70 ms │           32.46 ms │     no change │
│ QQuery 5  │  91.41 ms │           91.91 ms │     no change │
│ QQuery 6  │  20.71 ms │           20.87 ms │     no change │
│ QQuery 7  │ 161.13 ms │          156.64 ms │     no change │
│ QQuery 8  │  42.33 ms │           40.22 ms │     no change │
│ QQuery 9  │ 105.29 ms │          105.32 ms │     no change │
│ QQuery 10 │  67.62 ms │           67.90 ms │     no change │
│ QQuery 11 │  19.36 ms │           19.46 ms │     no change │
│ QQuery 12 │  50.39 ms │           52.05 ms │     no change │
│ QQuery 13 │  46.08 ms │           50.58 ms │  1.10x slower │
│ QQuery 14 │  15.12 ms │           15.41 ms │     no change │
│ QQuery 15 │  30.20 ms │           30.09 ms │     no change │
│ QQuery 16 │  28.25 ms │           29.40 ms │     no change │
│ QQuery 17 │ 142.82 ms │          147.71 ms │     no change │
│ QQuery 18 │ 297.86 ms │          286.66 ms │     no change │
│ QQuery 19 │  52.54 ms │           41.33 ms │ +1.27x faster │
│ QQuery 20 │  57.45 ms │           56.96 ms │     no change │
│ QQuery 21 │ 186.86 ms │          193.20 ms │     no change │
│ QQuery 22 │  23.01 ms │           23.52 ms │     no change │
└───────────┴───────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 1651.06ms │
│ Total Time (left-speedup-bytes)   │ 1640.57ms │
│ Average Time (HEAD)               │   75.05ms │
│ Average Time (left-speedup-bytes) │   74.57ms │
│ Queries Faster                    │         1 │
│ Queries Slower                    │         1 │
│ Queries with No Change            │        20 │
│ Queries with Failure              │         0 │
└───────────────────────────────────┴───────────┘

theirix · 2026-01-25T13:29:24Z

Run benchmarks

alamb-ghbot · 2026-01-25T13:29:27Z

🤖 Hi @theirix, thanks for the request (#19980 (comment)). scrape_comments.py only responds to whitelisted users. Allowed users: Dandandan, Omega359, adriangb, alamb, comphead, gabotechs, geoffreyclaude, klion26, rluvaton, xudong963, zhuqi-lucas.

theirix · 2026-01-25T13:39:02Z

Thank you for the review!

Could you help me understand which changes here make it O(1)?

It's for memory complexity. We avoid an extra copy of the string into chars_buf and the collecting it back via collect, as suggested in the original PR. Now we just use byte slicing from the original string.

I cannot say about time complexity - it is improved, but not for all queries (QQuery 1). Since I cannot invoke benchmarks from the PR for the updated version, I'll try to set it up locally.

For LargeUtf8 (StringViewArray), implement a zero-copy slice operation reusing the same Arrow buffers. It is possible for both views since the string only shrinks. We only need to tune a German prefix.

LargeUtf8 and Utf8View are different types, so it is confusing to see them used interchangeably.

My bad, corrected it in the description.

Jefffrey

I think we'd have to see benchmarks from the microbenchmark of left itself; the benchmark queries from the bot don't use this function I think? (Unless its used internally in joins, etc.)

datafusion/functions/src/unicode/left.rs

theirix · 2026-01-26T20:16:50Z

I think we'd have to see benchmarks from the microbenchmark of left itself; the benchmark queries from the bot don't use this function I think? (Unless its used internally in joins, etc.)

Agree, here are the results of a more targeted left benchmark (they don't cover string views):

left size=1024/positive n/1024
                        time:   [28.333 µs 29.190 µs 30.124 µs]
                        change: [−75.427% −74.733% −74.087%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
left size=1024/negative n/1024
                        time:   [24.780 µs 25.203 µs 25.711 µs]
                        change: [−88.075% −87.696% −87.324%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

left size=4096/positive n/4096
                        time:   [97.199 µs 97.773 µs 98.422 µs]
                        change: [−82.227% −80.928% −79.732%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
left size=4096/negative n/4096
                        time:   [168.66 µs 173.51 µs 178.90 µs]
                        change: [−80.799% −80.005% −79.258%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Jefffrey · 2026-01-27T05:46:20Z

I think it would be a good idea to enhance the benchmark to add Utf8View case as well; we can simply compare it to Utf8 numbers as the before case.

Looks like we got a ~80% improvement on Utf8/LargeUtf8 by simply removing the intermediate chars_buf buffer? 😮

theirix · 2026-01-27T13:34:47Z

I think it would be a good idea to enhance the benchmark to add Utf8View case as well; we can simply compare it to Utf8 numbers as the before case.

Good idea, I'll add it shortly.

Looks like we got a ~80% improvement on Utf8/LargeUtf8 by simply removing the intermediate chars_buf buffer? 😮

Indeed! Memory allocation and copying consume time. If we can go with zero copy, we should.

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jan 24, 2026

theirix marked this pull request as ready for review January 24, 2026 22:41

Jefffrey reviewed Jan 25, 2026

View reviewed changes

theirix added 5 commits January 25, 2026 12:06

Simplify return type

8a82c88

Build null buffer at once

bccb8b5

ByteView construction

0b2ee3e

Refactor invoke_with_args

315f180

Use debug_assert

db9765c

Jefffrey reviewed Jan 26, 2026

View reviewed changes

datafusion/functions/src/unicode/left.rs Outdated Show resolved Hide resolved

datafusion/functions/src/unicode/left.rs Outdated Show resolved Hide resolved

theirix added 2 commits January 26, 2026 18:10

Assert instead of error

ecc9b68

Simplify left_byte_length

25be451

feat: optimise copying in left for Utf8 and LargeUtf8 #19980

Are you sure you want to change the base?

feat: optimise copying in left for Utf8 and LargeUtf8 #19980

Conversation

theirix commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dandandan commented Jan 25, 2026

Uh oh!

alamb-ghbot commented Jan 25, 2026

Uh oh!

alamb-ghbot commented Jan 25, 2026

Uh oh!

theirix commented Jan 25, 2026

Uh oh!

alamb-ghbot commented Jan 25, 2026

Uh oh!

theirix commented Jan 25, 2026

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

theirix commented Jan 26, 2026

Uh oh!

Jefffrey commented Jan 27, 2026

Uh oh!

theirix commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: optimise copying in `left` for Utf8 and LargeUtf8 #19980

feat: optimise copying in `left` for Utf8 and LargeUtf8 #19980

theirix commented Jan 24, 2026 •

edited

Loading