-
Notifications
You must be signed in to change notification settings - Fork 326
fix(utils.py): specify utf-8 encoding when reading file #1268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can you provide an example of when this fails today and the complete traceback? |
|
Yes, this is an error that occurs in Japanese environments. Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Program Files\Python311\Scripts\twine.exe\__main__.py", line 7, in <module>
File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\__main__.py", line 33, in main
error = cli.dispatch(sys.argv[1:])
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\cli.py", line 139, in dispatch
return main(args.args)
^^^^^^^^^^^^^^^
File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\commands\upload.py", line 250, in main
upload_settings = settings.Settings.from_argparse(parsed_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\settings.py", line 288, in from_argparse
return cls(**settings)
^^^^^^^^^^^^^^^
File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\settings.py", line 116, in __init__
self._handle_repository_options(
File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\settings.py", line 304, in _handle_repository_options
self.repository_config = utils.get_repository_from_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\utils.py", line 154, in get_repository_from_config
config = get_config(config_file)[repository]
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\utils.py", line 66, in get_config
parser.read_file(f)
File "C:\Program Files\Python311\Lib\configparser.py", line 734, in read_file
self._read(f, source)
File "C:\Program Files\Python311\Lib\configparser.py", line 1037, in _read
for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'cp932' codec can't decode byte 0x88 in position 418: illegal multibyte sequence``` |
|
I made minimal changes based on this error. |
|
Specifically, it is as follows.
|
woodruffw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @harumaki4649! It's a bummer that this kind of system codec stuff still causes issues on Windows hosts, but I see no problem with explicitly requiring UTF-8 in the config here based on your explanation.
|
I've only been hesitant as I expect there may be some person somewhere using utf-16/utf-32 as rare as that is for whom this would break |
Ah yeah, that's a good point. Maybe what we could do here is attempt the read twice: once as the native codepage and then again with UTF-8 if the first fails? (Or do the read once, as bytes, and attempt decoding from bytes twice.) |
|
Yeah, that just feels not great in both cases |
|
@woodruffw Should we change it to use UTF-8 as a fallback when errors occur? |
|
@harumaki4649 yes. I'd catch a specific exception and then retry with utf-8 |
|
I’ve applied the improvements based on your review comments. |
Refactor configuration file parsing to use helper functions for better error handling and readability.
Refactor error handling for configuration file parsing.
|
Based on the feedback we received again, we have made further revisions. |
sigmavirus24
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now all we could use are some tests to trip this.
|
@harumaki4649 really all you need to do is provide an ini file that has a character outside of |
tests: add UTF-8 fallback tests for parsing config
Update logging level for caplog in tests to specify the logger.
|
I ran the tests in my local environment. I'm not entirely sure whether these are the tests you expected, but would this be acceptable? This was my first time performing this type of task, so I apologize in advance if I made any mistakes. Is there anything else you would like me to check? |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
twine/utils.py:104
- When the default config file doesn't exist (
FileNotFoundErroris caught andpath == DEFAULT_CONFIG_FILE), theparservariable is never initialized, but the code continues to use it on lines 108-109 and 115-134. This will cause anUnboundLocalError: local variable 'parser' referenced before assignment.
The fix should initialize an empty parser when the default config file doesn't exist:
try:
parser = _parse_config(realpath)
except FileNotFoundError:
# User probably set --config-file, but the file can't be read
if path != DEFAULT_CONFIG_FILE:
raise
parser = configparser.RawConfigParser() try:
parser = _parse_config(realpath)
except FileNotFoundError:
# User probably set --config-file, but the file can't be read
if path != DEFAULT_CONFIG_FILE:
raise
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sigmavirus24
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Want @woodruffw to give it a second pass if he has time
|
Thanks, I'll do one tomorrow! |
woodruffw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for your hard work here @harumaki4649.
|
@harumaki4649 Mind fixing those last two CI failures? Looks like a small formatting tweak + a typecheck nit. |
Thanks for the approval! I'll take care of those CI fixes when I have a moment. |
|
@woodruffw I'm sure this will pass the test. |
Still failing |
|
I apologize for the oversight. I will attempt to make the necessary corrections. |
Reformat comment for clarity on encoding.
Refactor test functions to improve clarity and maintainability. Update comments for better understanding of the code flow.
|
The tests passed without issues in GitHub codespaces. This time, the tests should pass for sure. |
|
So many problems came up... |
|
I believe I've fixed the issues! 🎉 All tests passing (231/231) ✅ |
|
I know I'm late to this conversation, and I haven't fully ingested the conversation, but in my humble opinion, locking in the legacy behavior of preferring the system default encoding is the wrong choice here. I'd have preferred for twine instead to simply prefer UTF-8, with the technically-backward-incompatible implications that has. The fact that Python has historically had platform- and environment-specific variable behavior is a behavior that's deprecated and slated to be replaced by a platform-independent behavior of preferring UTF-8. By implementing the "fallback to preferred behavior", it's locking in the deprecated behavior. That said, I'm grateful for providing this implementation. Perhaps we could consider a future, explicitly backward-incompatible release that removes this complexity and simply requires UTF-8? |
|
Yeah, I'd be fine (personally) with a backwards-incompatible change in the future. I think maybe it'd make sense to update the PyPUG docs first to assert that well-formed configs should be UTF-8, since right now users could understandably end up in this situation through no fault of their own. |
|
Thanks again @harumaki4649! |
|
I opened pypa/packaging.python.org#1980 to propose that |
Summary
Fixed file reading error in utils.py by explicitly specifying UTF-8 encoding.
Details
encoding="utf-8"to file open functionImpact