Opened 19 years ago
Closed 10 years ago
#8625 closed defect (fixed)
BUG: recode from utf8 fails (sometimes silently, corrupting the file)
Reported by: | vincent-opdarw@… | Owned by: | macports-tickets@… |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | ports | Version: | 1.0 |
Keywords: | Cc: | ||
Port: | recode |
Description (last modified by tobypeterson)
With LC_CTYPE="en_US.ISO8859-1", I have the following problem. I don't know if this is an upstream bug or not, but this doesn't fail under Linux (with the same locales):
$ recode utf8.. < /dev/null recode: System detected problem in step `UTF-8..CHAR'
Attachments (1)
Change History (10)
comment:1 Changed 18 years ago by vincent-opdarw@…
severity: | normal → critical |
---|---|
Summary: | BUG: recode utf8.. fails → BUG: recode from utf8 fails (sometimes silently, corrupting the file) |
Changed 18 years ago by vincent-opdarw@…
Attachment: | file.latin1.orig added |
---|
Test case showing a silent file corruption
comment:2 Changed 18 years ago by vincent-opdarw@…
(In reply to comment #1)
After some tests, the problem seems to occur near position 4096. Incorrect buffering?
In fact, near position 2048 too, as shown on the test case.
comment:3 Changed 18 years ago by pipping@…
Milestone: | → Available Ports |
---|
comment:4 Changed 18 years ago by pipping@…
Milestone: | Available Ports → Port Bugs |
---|
comment:6 Changed 15 years ago by tobypeterson
Description: | modified (diff) |
---|
Since this is a long-standing bug in a port with no upstream maintainer, perhaps a good candidate for the 'notes' feature in 1.8
comment:7 Changed 15 years ago by vinc17@…
The bug mentioned in comment 1 (silent corruption) was fixed in r41031 (if I remove the patch, the same bug reappears). However the "System detected problem in step `UTF-8..CHAR'" error still occurs.
comment:8 Changed 15 years ago by jmroot (Joshua Root)
Port: | recode added |
---|
comment:9 Changed 10 years ago by jmroot (Joshua Root)
Resolution: | → fixed |
---|---|
Status: | new → closed |
Asking to convert from UTF-8 and supplying an input file that is not valid UTF-8 would be expected to cause an error, so AFAICT this is behaving correctly now.
Sometimes recode silently fails, producing incorrect data in the middle of a file. This is a serious bug, in particular when doing in-place recoding (i.e. when using a file argument), as the file gets corrupted, with missing data. After some tests, the problem seems to occur near position 4096. Incorrect buffering?
I'm going to attach a test case with such a silent file corruption.
Note: again, this bug doesn't occur under Linux (Debian).