fix: allow UTF-16 surrogates to be passed through#93
fix: allow UTF-16 surrogates to be passed through#93dbushong wants to merge 2 commits intomalwarefrank:masterfrom
Conversation
|
Can you share a file that contains such a string? I want to add a test case. |
|
I do not think |
|
OK, I added an I'm trying to add test coverage to match, but I'm unsure how the exe fixtures are being generated? |
|
Here's an example of a unicode string with unpaired surrogates: Here's a round-tripping example: |
The UserStrings can be UTF-16-LE encoded values with "odd" surrogate code points. Per the Wikipedia page on UTF-16:
This change makes it so at least we get them back as valid python unicode characters, rather than omitting the string.