Skip to content

Failing to read complex Unicode string embedded in JSON #4417

@kdowney-lot49

Description

@kdowney-lot49

Description

I tried to load urltestdata.json in nlohmann-hson, and get:

parse error at line 4853, column 45: syntax error while parsing value - invalid string: surrogate U+D800..U+DBFF must be followed by U+DC00..U+DFFF; last read: '"http://example.com/\uD800\uD801'

But this is the official WHATWG URL validation test set, and multiple JSON validators that I tried online

Reproduction steps

A simple parse of the above file reproduces it:

    std::filesystem::path testSourceLocation(__FILE__);
    auto buildDir = testSourceLocation.parent_path() / "../../build";
    auto testFixturePath = buildDir / "urltestdata.json";

    std::ifstream testFixtureIn(testFixturePath);
    nlohmann::json testFixtureJson = nlohmann::json::parse(testFixtureIn);

Expected vs. actual results

I expected this file to parse without errors.

Minimal code example

See above.

Error messages

As per above:


parse error at line 4853, column 45: syntax error while parsing value - invalid string: surrogate U+D800..U+DBFF must be followed by U+DC00..U+DFFF; last read: '"http://example.com/\uD800\uD801'


### Compiler and operating system

Ubunto 22.04 (Noble) with gcc 13.2

### Library version

3.11.3 (vcpkg)

### Validation

- [ ] The bug also occurs if the latest version from the [`develop`](https://github.com/nlohmann/json/tree/develop) branch is used.
- [ ] I can successfully [compile and run the unit tests](https://github.com/nlohmann/json#execute-unit-tests).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions