Skip to content

Conversation

@theirix
Copy link
Contributor

@theirix theirix commented Jan 24, 2026

Which issue does this PR close?

Rationale for this change

A follow-up to an optimisation of the left function in #19571

What changes are included in this PR?

  • Improve memory performance to O(1) by eliminating more string copies. Discover a byte offset for the last character for both positive and negative length arguments and slice bytes directly.

  • For Utf8View (StringViewArray), implement a zero-copy slice operation reusing the same Arrow buffers. It is possible for both views since the string only shrinks. We only need to tune a German prefix.

  • An Arrow view construction helper shrink_string_view_array_view is included in this PR. Unfortunately, string view builders cannot provide a way to reuse Arrow buffers. I believe it should better reside in the core Arrow crates instead - I can follow up on it.

Are these changes tested?

  • Additional unit tests
  • SLTs

Are there any user-facing changes?

No

Improve performance to O(1) by eliminating more string copies.
Discover a byte offset for the last character for both positive
and negative length argument and slice bytes directly.

For LargeUtf8, implement a zero-copy slice operation reusing
same Arrow buffers.

An Arrow view construction helper `shrink_string_view_array_view`
is included to this PR (upstream to Arrow crates)
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jan 24, 2026
@theirix theirix marked this pull request as ready for review January 24, 2026 22:41
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. Would be good to see some benchmark results too, see:

Moreover, I find some of the points in the PR body confusing:

Improve performance to O(1) by eliminating more string copies. Discover a byte offset for the last character for both positive and negative length arguments and slice bytes directly.

Could you help me understand which changes here make it O(1)?

For LargeUtf8 (StringViewArray), implement a zero-copy slice operation reusing the same Arrow buffers. It is possible for both views since the string only shrinks. We only need to tune a German prefix.

LargeUtf8 and Utf8View are different types, so it is confusing to see them used interchangeably.

@Dandandan
Copy link
Contributor

Run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing left-speedup-bytes (6e90fa2) to b463a9f diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and left-speedup-bytes
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ left-speedup-bytes ┃       Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0 │  2428.22 ms │         2379.60 ms │    no change │
│ QQuery 1 │   919.84 ms │          967.92 ms │ 1.05x slower │
│ QQuery 2 │  1872.34 ms │         1905.27 ms │    no change │
│ QQuery 3 │  1032.12 ms │         1041.60 ms │    no change │
│ QQuery 4 │  2223.75 ms │         2243.69 ms │    no change │
│ QQuery 5 │ 27922.98 ms │        27909.80 ms │    no change │
│ QQuery 6 │  3982.09 ms │         3982.05 ms │    no change │
│ QQuery 7 │  2900.29 ms │         3020.20 ms │    no change │
└──────────┴─────────────┴────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 43281.63ms │
│ Total Time (left-speedup-bytes)   │ 43450.12ms │
│ Average Time (HEAD)               │  5410.20ms │
│ Average Time (left-speedup-bytes) │  5431.26ms │
│ Queries Faster                    │          0 │
│ Queries Slower                    │          1 │
│ Queries with No Change            │          7 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ left-speedup-bytes ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.93 ms │            1.99 ms │     no change │
│ QQuery 1  │    50.83 ms │           49.49 ms │     no change │
│ QQuery 2  │   133.87 ms │          135.26 ms │     no change │
│ QQuery 3  │   156.52 ms │          162.61 ms │     no change │
│ QQuery 4  │  1057.99 ms │         1069.99 ms │     no change │
│ QQuery 5  │  1363.03 ms │         1363.20 ms │     no change │
│ QQuery 6  │     1.87 ms │            1.93 ms │     no change │
│ QQuery 7  │    55.25 ms │           54.22 ms │     no change │
│ QQuery 8  │  1427.70 ms │         1409.78 ms │     no change │
│ QQuery 9  │  1791.54 ms │         1719.94 ms │     no change │
│ QQuery 10 │   340.66 ms │          345.99 ms │     no change │
│ QQuery 11 │   394.39 ms │          394.24 ms │     no change │
│ QQuery 12 │  1226.58 ms │         1260.19 ms │     no change │
│ QQuery 13 │  1936.62 ms │         1946.31 ms │     no change │
│ QQuery 14 │  1233.66 ms │         1240.61 ms │     no change │
│ QQuery 15 │  1238.22 ms │         1233.00 ms │     no change │
│ QQuery 16 │  2491.33 ms │         2542.47 ms │     no change │
│ QQuery 17 │  2509.31 ms │         2495.10 ms │     no change │
│ QQuery 18 │  5274.00 ms │         4817.99 ms │ +1.09x faster │
│ QQuery 19 │   122.53 ms │          122.52 ms │     no change │
│ QQuery 20 │  1931.03 ms │         1833.75 ms │ +1.05x faster │
│ QQuery 21 │  2221.92 ms │         2122.60 ms │     no change │
│ QQuery 22 │  4143.05 ms │         3663.35 ms │ +1.13x faster │
│ QQuery 23 │ 15337.00 ms │        12037.08 ms │ +1.27x faster │
│ QQuery 24 │   201.13 ms │          209.94 ms │     no change │
│ QQuery 25 │   455.90 ms │          446.44 ms │     no change │
│ QQuery 26 │   208.02 ms │          214.63 ms │     no change │
│ QQuery 27 │  2656.51 ms │         2591.58 ms │     no change │
│ QQuery 28 │ 23599.03 ms │        23351.03 ms │     no change │
│ QQuery 29 │   979.14 ms │          960.16 ms │     no change │
│ QQuery 30 │  1285.85 ms │         1234.23 ms │     no change │
│ QQuery 31 │  1312.80 ms │         1306.61 ms │     no change │
│ QQuery 32 │  4895.03 ms │         4317.56 ms │ +1.13x faster │
│ QQuery 33 │  5773.52 ms │         5211.24 ms │ +1.11x faster │
│ QQuery 34 │  5818.88 ms │         5554.07 ms │     no change │
│ QQuery 35 │  1950.63 ms │         1861.19 ms │     no change │
│ QQuery 36 │    66.06 ms │           67.08 ms │     no change │
│ QQuery 37 │    45.09 ms │           46.24 ms │     no change │
│ QQuery 38 │    67.12 ms │           66.09 ms │     no change │
│ QQuery 39 │   102.94 ms │          103.51 ms │     no change │
│ QQuery 40 │    26.56 ms │           27.87 ms │     no change │
│ QQuery 41 │    23.70 ms │           23.27 ms │     no change │
│ QQuery 42 │    20.68 ms │           19.88 ms │     no change │
└───────────┴─────────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 95929.42ms │
│ Total Time (left-speedup-bytes)   │ 89636.24ms │
│ Average Time (HEAD)               │  2230.92ms │
│ Average Time (left-speedup-bytes) │  2084.56ms │
│ Queries Faster                    │          6 │
│ Queries Slower                    │          0 │
│ Queries with No Change            │         37 │
│ Queries with Failure              │          0 │
└───────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ left-speedup-bytes ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 106.79 ms │          103.88 ms │     no change │
│ QQuery 2  │  34.06 ms │           34.10 ms │     no change │
│ QQuery 3  │  40.10 ms │           40.89 ms │     no change │
│ QQuery 4  │  31.70 ms │           32.46 ms │     no change │
│ QQuery 5  │  91.41 ms │           91.91 ms │     no change │
│ QQuery 6  │  20.71 ms │           20.87 ms │     no change │
│ QQuery 7  │ 161.13 ms │          156.64 ms │     no change │
│ QQuery 8  │  42.33 ms │           40.22 ms │     no change │
│ QQuery 9  │ 105.29 ms │          105.32 ms │     no change │
│ QQuery 10 │  67.62 ms │           67.90 ms │     no change │
│ QQuery 11 │  19.36 ms │           19.46 ms │     no change │
│ QQuery 12 │  50.39 ms │           52.05 ms │     no change │
│ QQuery 13 │  46.08 ms │           50.58 ms │  1.10x slower │
│ QQuery 14 │  15.12 ms │           15.41 ms │     no change │
│ QQuery 15 │  30.20 ms │           30.09 ms │     no change │
│ QQuery 16 │  28.25 ms │           29.40 ms │     no change │
│ QQuery 17 │ 142.82 ms │          147.71 ms │     no change │
│ QQuery 18 │ 297.86 ms │          286.66 ms │     no change │
│ QQuery 19 │  52.54 ms │           41.33 ms │ +1.27x faster │
│ QQuery 20 │  57.45 ms │           56.96 ms │     no change │
│ QQuery 21 │ 186.86 ms │          193.20 ms │     no change │
│ QQuery 22 │  23.01 ms │           23.52 ms │     no change │
└───────────┴───────────┴────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                 ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                 │ 1651.06ms │
│ Total Time (left-speedup-bytes)   │ 1640.57ms │
│ Average Time (HEAD)               │   75.05ms │
│ Average Time (left-speedup-bytes) │   74.57ms │
│ Queries Faster                    │         1 │
│ Queries Slower                    │         1 │
│ Queries with No Change            │        20 │
│ Queries with Failure              │         0 │
└───────────────────────────────────┴───────────┘

@theirix
Copy link
Contributor Author

theirix commented Jan 25, 2026

Run benchmarks

@alamb-ghbot
Copy link

🤖 Hi @theirix, thanks for the request (#19980 (comment)). scrape_comments.py only responds to whitelisted users. Allowed users: Dandandan, Omega359, adriangb, alamb, comphead, gabotechs, geoffreyclaude, klion26, rluvaton, xudong963, zhuqi-lucas.

@theirix
Copy link
Contributor Author

theirix commented Jan 25, 2026

Thank you for the review!

Could you help me understand which changes here make it O(1)?

It's for memory complexity. We avoid an extra copy of the string into chars_buf and the collecting it back via collect, as suggested in the original PR. Now we just use byte slicing from the original string.

I cannot say about time complexity - it is improved, but not for all queries (QQuery 1). Since I cannot invoke benchmarks from the PR for the updated version, I'll try to set it up locally.

For LargeUtf8 (StringViewArray), implement a zero-copy slice operation reusing the same Arrow buffers. It is possible for both views since the string only shrinks. We only need to tune a German prefix.

LargeUtf8 and Utf8View are different types, so it is confusing to see them used interchangeably.

My bad, corrected it in the description.

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd have to see benchmarks from the microbenchmark of left itself; the benchmark queries from the bot don't use this function I think? (Unless its used internally in joins, etc.)

@theirix
Copy link
Contributor Author

theirix commented Jan 26, 2026

I think we'd have to see benchmarks from the microbenchmark of left itself; the benchmark queries from the bot don't use this function I think? (Unless its used internally in joins, etc.)

Agree, here are the results of a more targeted left benchmark (they don't cover string views):

left size=1024/positive n/1024
                        time:   [28.333 µs 29.190 µs 30.124 µs]
                        change: [−75.427% −74.733% −74.087%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
left size=1024/negative n/1024
                        time:   [24.780 µs 25.203 µs 25.711 µs]
                        change: [−88.075% −87.696% −87.324%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

left size=4096/positive n/4096
                        time:   [97.199 µs 97.773 µs 98.422 µs]
                        change: [−82.227% −80.928% −79.732%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
left size=4096/negative n/4096
                        time:   [168.66 µs 173.51 µs 178.90 µs]
                        change: [−80.799% −80.005% −79.258%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

@Jefffrey
Copy link
Contributor

I think it would be a good idea to enhance the benchmark to add Utf8View case as well; we can simply compare it to Utf8 numbers as the before case.

Looks like we got a ~80% improvement on Utf8/LargeUtf8 by simply removing the intermediate chars_buf buffer? 😮

@theirix
Copy link
Contributor Author

theirix commented Jan 27, 2026

I think it would be a good idea to enhance the benchmark to add Utf8View case as well; we can simply compare it to Utf8 numbers as the before case.

Good idea, I'll add it shortly.

Looks like we got a ~80% improvement on Utf8/LargeUtf8 by simply removing the intermediate chars_buf buffer? 😮

Indeed! Memory allocation and copying consume time. If we can go with zero copy, we should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Additional optimizations for left function

4 participants