Skip to content

Conversation

@harumaki4649
Copy link
Contributor

Summary

Fixed file reading error in utils.py by explicitly specifying UTF-8 encoding.

Details

  • Added encoding="utf-8" to file open function
  • Prevents errors on environments with different default encodings
  • Ensures consistent behavior across platforms

Impact

  • No functional changes except avoiding UnicodeDecodeError
  • Safer and more predictable file reading

@sigmavirus24
Copy link
Member

Can you provide an example of when this fails today and the complete traceback?

@harumaki4649
Copy link
Contributor Author

Yes, this is an error that occurs in Japanese environments.
It's likely due to handling Japanese text.

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Program Files\Python311\Scripts\twine.exe\__main__.py", line 7, in <module>
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\__main__.py", line 33, in main
    error = cli.dispatch(sys.argv[1:])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\cli.py", line 139, in dispatch
    return main(args.args)
           ^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\commands\upload.py", line 250, in main
    upload_settings = settings.Settings.from_argparse(parsed_args)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\settings.py", line 288, in from_argparse
    return cls(**settings)
           ^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\settings.py", line 116, in __init__
    self._handle_repository_options(
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\settings.py", line 304, in _handle_repository_options
    self.repository_config = utils.get_repository_from_config(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\utils.py", line 154, in get_repository_from_config
    config = get_config(config_file)[repository]
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\msi-z\AppData\Roaming\Python\Python311\site-packages\twine\utils.py", line 66, in get_config
    parser.read_file(f)
  File "C:\Program Files\Python311\Lib\configparser.py", line 734, in read_file
    self._read(f, source)
  File "C:\Program Files\Python311\Lib\configparser.py", line 1037, in _read
    for lineno, line in enumerate(fp, start=1):
UnicodeDecodeError: 'cp932' codec can't decode byte 0x88 in position 418: illegal multibyte sequence```

@harumaki4649
Copy link
Contributor Author

I made minimal changes based on this error.
Specifically, I specified the encoding.
I believe utf-8 is optimal in most cases.

@harumaki4649
Copy link
Contributor Author

Specifically, it is as follows.

This error happens in Windows Japanese environments where the default encoding is cp932.
By explicitly specifying UTF-8, we avoid platform-dependent behavior and ensure consistent handling of .pypirc.
UTF-8 has become the de-facto standard for configuration files, so I believe this change is safe and beneficial.

Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @harumaki4649! It's a bummer that this kind of system codec stuff still causes issues on Windows hosts, but I see no problem with explicitly requiring UTF-8 in the config here based on your explanation.

@sigmavirus24
Copy link
Member

I've only been hesitant as I expect there may be some person somewhere using utf-16/utf-32 as rare as that is for whom this would break

@woodruffw
Copy link
Member

I've only been hesitant as I expect there may be some person somewhere using utf-16/utf-32 as rare as that is for whom this would break

Ah yeah, that's a good point. Maybe what we could do here is attempt the read twice: once as the native codepage and then again with UTF-8 if the first fails?

(Or do the read once, as bytes, and attempt decoding from bytes twice.)

@sigmavirus24
Copy link
Member

Yeah, that just feels not great in both cases

@harumaki4649
Copy link
Contributor Author

@woodruffw Should we change it to use UTF-8 as a fallback when errors occur?

@sigmavirus24
Copy link
Member

@harumaki4649 yes. I'd catch a specific exception and then retry with utf-8

@harumaki4649
Copy link
Contributor Author

harumaki4649 commented Oct 2, 2025

I’ve applied the improvements based on your review comments.
Could you please take another look and let me know if further adjustments are needed?

@harumaki4649 harumaki4649 requested a review from woodruffw October 3, 2025 10:37
Refactor configuration file parsing to use helper functions for better error handling and readability.
Refactor error handling for configuration file parsing.
@harumaki4649
Copy link
Contributor Author

Based on the feedback we received again, we have made further revisions.
Please review it once more.

Copy link
Member

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now all we could use are some tests to trip this.

@sigmavirus24
Copy link
Member

@harumaki4649 really all you need to do is provide an ini file that has a character outside of cp932 and in pytest use https://docs.python.org/3/library/locale.html to try to override the default behaviour in the test.

@harumaki4649
Copy link
Contributor Author

I ran the tests in my local environment. I'm not entirely sure whether these are the tests you expected, but would this be acceptable? This was my first time performing this type of task, so I apologize in advance if I made any mistakes. Is there anything else you would like me to check?

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

twine/utils.py:104

  • When the default config file doesn't exist (FileNotFoundError is caught and path == DEFAULT_CONFIG_FILE), the parser variable is never initialized, but the code continues to use it on lines 108-109 and 115-134. This will cause an UnboundLocalError: local variable 'parser' referenced before assignment.

The fix should initialize an empty parser when the default config file doesn't exist:

try:
    parser = _parse_config(realpath)
except FileNotFoundError:
    # User probably set --config-file, but the file can't be read
    if path != DEFAULT_CONFIG_FILE:
        raise
    parser = configparser.RawConfigParser()
    try:
        parser = _parse_config(realpath)
    except FileNotFoundError:
        # User probably set --config-file, but the file can't be read
        if path != DEFAULT_CONFIG_FILE:
            raise

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@sigmavirus24 sigmavirus24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Want @woodruffw to give it a second pass if he has time

@woodruffw
Copy link
Member

Thanks, I'll do one tomorrow!

Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for your hard work here @harumaki4649.

@woodruffw
Copy link
Member

@harumaki4649 Mind fixing those last two CI failures? Looks like a small formatting tweak + a typecheck nit.

@harumaki4649
Copy link
Contributor Author

@harumaki4649 Mind fixing those last two CI failures? Looks like a small formatting tweak + a typecheck nit

Thanks for the approval! I'll take care of those CI fixes when I have a moment.

@harumaki4649
Copy link
Contributor Author

@woodruffw I'm sure this will pass the test.
Please confirm.

@sigmavirus24
Copy link
Member

lint: commands[2]> flake8 twine/ tests/
tests/test_parse_config_encoding.py:9:1: D205 1 blank line required between summary line and description
tests/test_parse_config_encoding.py:9:1: D401 First line should be in imperative mood; try rephrasing
tests/test_parse_config_encoding.py:17:89: E501 line too long (103 > 88 characters)
tests/test_parse_config_encoding.py:22:1: D205 1 blank line required between summary line and description
tests/test_parse_config_encoding.py:22:1: D400 First line should end with a period
tests/test_parse_config_encoding.py:70:1: D205 1 blank line required between summary line and description
tests/test_parse_config_encoding.py:70:1: D400 First line should end with a period
lint: exit 1 (0.63 seconds) /home/runner/work/twine/twine> flake8 twine/ tests/ pid=2397
  lint: FAIL code 1 (5.08=setup[2.61]+cmd[0.23,1.61,0.63] seconds)
 types: commands[1]> mypy --html-report mypy --txt-report mypy twine
twine/utils.py:86: error: Unexpected keyword argument "path" for "UnableToReadConfigurationFile"  [call-arg]
.tox/types/lib/python3.12/site-packages/mypy/typeshed/stdlib/builtins.pyi:1960: note: "UnableToReadConfigurationFile" defined here
Generated HTML report (via XSLT): /Users/runner/work/twine/twine/mypy/index.html
Generated TXT report (via XSLT): /Users/runner/work/twine/twine/mypy/index.txt

Still failing

@harumaki4649
Copy link
Contributor Author

I apologize for the oversight. I will attempt to make the necessary corrections.

Reformat comment for clarity on encoding.
Refactor test functions to improve clarity and maintainability. Update comments for better understanding of the code flow.
@harumaki4649
Copy link
Contributor Author

The tests passed without issues in GitHub codespaces. This time, the tests should pass for sure.

@harumaki4649
Copy link
Contributor Author

So many problems came up...
I forgot about pytest, I'll fix it.

@harumaki4649
Copy link
Contributor Author

I believe I've fixed the issues! 🎉

All tests passing (231/231) ✅
Code coverage: 97% ✅
All lint checks passing (isort, black, flake8) ✅

@jaraco
Copy link
Member

jaraco commented Dec 24, 2025

I know I'm late to this conversation, and I haven't fully ingested the conversation, but in my humble opinion, locking in the legacy behavior of preferring the system default encoding is the wrong choice here. I'd have preferred for twine instead to simply prefer UTF-8, with the technically-backward-incompatible implications that has. The fact that Python has historically had platform- and environment-specific variable behavior is a behavior that's deprecated and slated to be replaced by a platform-independent behavior of preferring UTF-8. By implementing the "fallback to preferred behavior", it's locking in the deprecated behavior.

That said, I'm grateful for providing this implementation. Perhaps we could consider a future, explicitly backward-incompatible release that removes this complexity and simply requires UTF-8?

@woodruffw
Copy link
Member

Yeah, I'd be fine (personally) with a backwards-incompatible change in the future. I think maybe it'd make sense to update the PyPUG docs first to assert that well-formed configs should be UTF-8, since right now users could understandably end up in this situation through no fault of their own.

@woodruffw woodruffw merged commit 985b962 into pypa:main Dec 24, 2025
24 checks passed
@woodruffw
Copy link
Member

Thanks again @harumaki4649!

@woodruffw
Copy link
Member

I opened pypa/packaging.python.org#1980 to propose that .pypirc change. Not sure who the best people are to ping on it/get feedback from, though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants