Skip to content

O(n²) insertion sort in unicodedata.normalize("NFC") canonical ordering #149079

@sethmlarson

Description

@sethmlarson

Bug report

Bug description:

From @ch4n3-yoon:

CPython's unicodedata.normalize("NFC") implementation uses an insertion sort for the canonical ordering step of combining characters, which exhibits O(n²) time complexity on crafted inputs. An attacker can cause denial of service by supplying a string with many consecutive combining characters whose Canonical Combining Class (CCC) values are arranged in alternating order, leading to excessive CPU consumption. A 0.5MB payload can consume over 30 seconds of CPU time.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions