Skip to content

UnicodeDecodeError when running cz bump #1110

Closed
@jg40305

Description

@jg40305

Description

Hi, I'm attempting to use commitizen in my project. When I ran the cz bump command, I encountered a UnicodeDecodeError and received the following error message:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\kzsu\workspace\ssrreer\venv\Scripts\cz.exe\__main__.py", line 7, in <module>
  File "C:\Users\kzsu\workspace\ssrreer\venv\Lib\site-packages\commitizen\cli.py", line 607, in main
    args.func(conf, arguments)()
  File "C:\Users\kzsu\workspace\ssrreer\venv\Lib\site-packages\commitizen\commands\bump.py", line 306, in __call__
    changelog_cmd()
  File "C:\Users\kzsu\workspace\ssrreer\venv\Lib\site-packages\commitizen\commands\changelog.py", line 176, in __call__
    changelog_meta = self.changelog_format.get_metadata(self.file_name)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\kzsu\workspace\ssrreer\venv\Lib\site-packages\commitizen\changelog_formats\base.py", line 37, in get_metadata
    return self.get_metadata_from_file(changelog_file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\kzsu\workspace\ssrreer\venv\Lib\site-packages\commitizen\changelog_formats\base.py", line 43, in get_metadata_from_file
    for index, line in enumerate(file):
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe7 in position 50: illegal multibyte sequence

Upon investigating the error, I discovered that the CHANGELOG.md file is being opened using the system code page (cp950 in my case).

commitizen\changelog_formats\base.py#L36

I suspect that the CHANGELOG.md contains non-English words, which is likely causing the error.

## 1.0.0 (2024-05-15)

### BREAKING CHANGE

- 當作第一版(MAJOR)

### Feat

- test commitizen

Steps to reproduce

  1. Prepare __version__.py, .cz.toml:
  • __version__.py
__version__ = "0.0.1"
  • .cz.toml
[tool.commitizen]
name = "cz_conventional_commits"
tag_format = "$version"
version_scheme = "semver2"
version = "0.0.1"
encoding = "utf-8"
update_changelog_on_bump = true
version_files = [
  ".cz.toml:version",
  "__version__.py:__version__",
]
  1. git add * and cz commit:
? Select the type of change you are committing feat: A new feature. Correlates with MINOR in SemVer
? What is the scope of this change? (class or file name): (press [enter] to skip)
 
? Write a short and imperative summary of the code changes: (lower case and no period)
 test commitizen
? Provide additional contextual information about the code changes: (press [enter] to skip)
 testing
? Is this a BREAKING CHANGE? Correlates with MAJOR in SemVer Yes
? Footer. Information about Breaking Changes and reference issues that this commit closes: (press [enter] to skip)
 當作第一版(MAJOR)

feat: test commitizen

testing

BREAKING CHANGE: 當作第一版(MAJOR)


[master (root-commit) bcd9958] feat: test commitizen
 2 files changed, 12 insertions(+)
 create mode 100644 .cz.toml
 create mode 100644 __version__.py

Commit successful!
  1. cz bump
Tag 0.0.1 could not be found.
Possible causes:
- version in configuration is not the current version
- tag_format is missing, check them using 'git tag --list'

? Is this the first tag created? Yes
bump: version 0.0.1 → 1.0.0
tag to create: 1.0.0
increment detected: MAJOR

[master 61ba03b] bump: version 0.0.1 → 1.0.0
 3 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 CHANGELOG.md

warning: CRLF will be replaced by LF in .cz.toml.
The file will have its original line endings in your working directory
warning: CRLF will be replaced by LF in CHANGELOG.md.
The file will have its original line endings in your working directory

Done!
  1. git add .gitignore & cz commit
  • .gitignore
venv/
  • cz commit
? Select the type of change you are committing feat: A new feature. Correlates with MINOR in SemVer
? What is the scope of this change? (class or file name): (press [enter] to skip)
 
? Write a short and imperative summary of the code changes: (lower case and no period)
 add .gitignore file
? Provide additional contextual information about the code changes: (press [enter] to skip)
 
? Is this a BREAKING CHANGE? Correlates with MAJOR in SemVer No
? Footer. Information about Breaking Changes and reference issues that this commit closes: (press [enter] to skip)
 

feat: add .gitignore file


[master aa4b9c5] feat: add .gitignore file
 1 file changed, 1 insertion(+)
 create mode 100644 .gitignore

Commit successful!
  1. cz bump
bump: version 1.0.0 → 1.1.0
tag to create: 1.1.0
increment detected: MINOR

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\kzsu\workspace\ssrreer\venv\Scripts\cz.exe\__main__.py", line 7, in <module>
  File "C:\Users\kzsu\workspace\ssrreer\venv\Lib\site-packages\commitizen\cli.py", line 607, in main
    args.func(conf, arguments)()
  File "C:\Users\kzsu\workspace\ssrreer\venv\Lib\site-packages\commitizen\commands\bump.py", line 306, in __call__
    changelog_cmd()
  File "C:\Users\kzsu\workspace\ssrreer\venv\Lib\site-packages\commitizen\commands\changelog.py", line 176, in __call__
    changelog_meta = self.changelog_format.get_metadata(self.file_name)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\kzsu\workspace\ssrreer\venv\Lib\site-packages\commitizen\changelog_formats\base.py", line 37, in get_metadata
    return self.get_metadata_from_file(changelog_file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\kzsu\workspace\ssrreer\venv\Lib\site-packages\commitizen\changelog_formats\base.py", line 42, in get_metadata_from_file
    for index, line in enumerate(file):
UnicodeDecodeError: 'cp950' codec can't decode byte 0xe7 in position 50: illegal multibyte sequence
  1. current CHANGELOG.md
## 1.0.0 (2024-05-15)

### BREAKING CHANGE

- 當作第一版(MAJOR)

### Feat

- test commitizen

Current behavior

I adjusted the code locally:
commitizen\changelog_formats\base.py#L36

with open(filepath) as changelog_file:

change to

with open(filepath, encoding=self.config.settings["encoding"]) as changelog_file:

and rerun cz bump again.

bump: version 1.0.0 → 1.1.0
tag to create: 1.1.0
increment detected: MINOR

[master 3652d2d] bump: version 1.0.0 → 1.1.0
 3 files changed, 8 insertions(+), 2 deletions(-)

Done!

This resolved the error.

  • CHANGELOG.md
## 1.1.0 (2024-05-15)

### Feat

- add .gitignore file

## 1.0.0 (2024-05-15)

### BREAKING CHANGE

- 當作第一版(MAJOR)

### Feat

- test commitizen

  • git log
commit 3652d2da3b1a52a0d3fc4fa276c1282a573b608f (HEAD -> master, tag: 1.1.0)
Author: John Su <jg40305@gmail.com>
Date:   Wed May 15 19:34:18 2024 +0800

    bump: version 1.0.0 → 1.1.0

commit aa4b9c58202275ceb8feadfa3dbadadc7568fda8
Author: John Su <jg40305@gmail.com>
Date:   Wed May 15 19:30:02 2024 +0800

    feat: add .gitignore file

commit 61ba03b180664cbb912b1d9628221c2f028854bd (tag: 1.0.0)
Author: John Su <jg40305@gmail.com>
Date:   Wed May 15 19:27:31 2024 +0800

    bump: version 0.0.1 → 1.0.0

commit bcd9958778d8fbf742e48d722e51eeee0ca44ea8
Author: John Su <jg40305@gmail.com>
Date:   Wed May 15 19:26:46 2024 +0800

    feat: test commitizen

    testing

    BREAKING CHANGE: 當作第一版(MAJOR)

Desired behavior

I think that altering commitizen\changelog_formats\base.py#L36 is the most straightforward approach, and it can also be configured within [tool.commitizen].
However, as BaseFormat seems to be a core class, there could be unforeseen scenarios that prevent this approach.

Screenshots

No response

Environment

  • commitizen version: 3.25.0
  • python version: Python 3.12.0
  • operating system: Windows
cz version --report
Commitizen Version: 3.25.0
Python Version: 3.12.0 (tags/v3.12.0:0fb18b0, Oct  2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)]
Operating System: Windows

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions