-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
wasm: correct unicode support in stdlib #5255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
297fb3f to
0819bc0
Compare
|
Looking into this further, I think we may not be sacrificing any functionality with this change. If I'm reading the documentation correctly for the various These re-implementations will need to be tested further before I think this is ok to merge, but I believe the previous unicode concerns I raised are actually a non-issue. A quick GH search shows that no parsers use |
0819bc0 to
b289c80
Compare
|
Oh, I actually thought those functions would use UTF-8 character categories by default. I think it would be better to keep the unicode support compiled in, and look into setting the locale somehow. I think those functions are used in some scanners where UTF8 support is intended. |
- Use wasi-defined wide character functions for full unicode support - Depend on custom implemenations for `string.h` functions
b289c80 to
e7dd5b0
Compare
Gotcha. In that case, the recent additions to Since we don't export I also tested a rust |
This PR changes our minimal wasm stdlib such that all of the functions we export are our own implementations, instead of those from the wasi headers.
There are a few benefits to this, namely the >7x size reduction for the stdlib header (~14k -> ~2k), as well as matched behavior between parsers built for wasm (
ts b --wasm) and rust projects built targetingwasm32-unknown-unknown).The downside is mainly the lack of full unicode support for
towupperandtowlower. A lot (most, it seems) of the bloat from wasi's implementations comes from unicode tables, which we don't cover in our own stdlib impl. Instead, we only handle ASCII characters, which is the approach already taken for the existing stdlib implementation. We could also just use the wasi implementations for these functions, bringing the header size back up to7315bytes.@maxbrunsfeld Do you think the tradeoff here is worth it?
Edit: Not sure why the sanitizer checks are failing, will look into that later tonight.Cache issue