#70670 closed defect (invalid)
libiconv @1.17_0: iconv on macOS Ventura 13.6+ does not perform correct conversions
Reported by: | seamusdemora (Seamus) | Owned by: | ryandesign (Ryan Carsten Schmidt) |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | ports | Version: | 2.10.1 |
Keywords: | Cc: | ||
Port: | libiconv |
Description (last modified by ryandesign (Ryan Carsten Schmidt))
I'm trying to do something that seems simple (it can be done simply on my Linux box):
I have C&P a line from a PDF file (a French programming guide), to Terminal.app:
print("Numéro de boucle", i)'
I wanted to convert this line to ASCII before pasting into my editor. So I used 'iconv' as shown below. In each case, I used 'file' to check the "from" encoding :
% echo 'print("Numéro de boucle", i)' | file - /dev/stdin: Unicode text, UTF-8 text % echo 'print("Numéro de boucle", i)' | iconv -f utf-8 -t ascii//translit print("Num'ero de boucle", i)
?!?!?!? I tried another example:
% echo "print("Protégé. Señorita. Coup de grâce", i)" | file - /dev/stdin: Unicode text, UTF-8 text % echo 'Protégé Señorita Coup de grâce' | iconv -f UTF-8 -t ASCII//TRANSLIT Prot'eg'e Se~norita Coup de gr^ace
PLEASE NOTE: I have also tried using 'utf-8-mac' and 'utf8-mac' for the "from" encoding; thhis had no effect on the results - they were identical in all cases.
As you can see, this is not correct: a single quote has been added. I'm not a frequent user of 'iconv', so I checked this on my Debian 'bookworm' Linux box:
$ echo 'print("Numéro de boucle", i)' | iconv -f utf-8 -t ascii//translit print("Numero de boucle", i)
I've checked to confirm that the version of 'iconv' on my macOS Ventura 13.6+ is one from MacPorts. I believe that it is:
% whereis iconv iconv: /usr/bin/iconv /opt/local/share/man/man1/iconv.1.gz % port installed requested The following ports are currently installed: ... libiconv @1.17_0 (active) ... %
And confirmation of my macports version:
% port -v MacPorts 2.10.1
I can accept that it's broken, and I can accept that it can't be fixed (if that turns out to be the case). But I surely would appreciate an explanation of what has gone wrong - especially if it's something that I am doing incorrectly!
Rgds, ~S
Change History (4)
comment:1 Changed 3 months ago by seamusdemora (Seamus)
Description: | modified (diff) |
---|
comment:2 Changed 3 months ago by jmroot (Joshua Root)
Keywords: | iconv libiconv removed |
---|---|
Owner: | set to ryandesign |
Port: | @1.17_0 (active) removed |
Status: | new → assigned |
Summary: | iconv on macOS Ventura 13.6+ does not perform correct conversions → libiconv @1.17_0: iconv on macOS Ventura 13.6+ does not perform correct conversions |
comment:3 Changed 3 months ago by ryandesign (Ryan Carsten Schmidt)
Description: | modified (diff) |
---|---|
Resolution: | → invalid |
Status: | assigned → closed |
comment:4 Changed 3 months ago by jmroot (Joshua Root)
Linux systems also tend to use the iconv from glibc, rather than the standalone libiconv package (which according to upstream is less efficient and only really intended to be used on systems that don't have an iconv
that supports Unicode.) Maybe there are some code differences there.
The standards seem to be silent on what should happen when a character can't be represented in the target character set. If you want fine control over how transliteration is done, you might be better off using something like recode?
I get the same conversions as you (insertion of
'
and^
after accented characters in an attempt to mimic in ASCII what those accents look like) regardless whether I use /usr/bin/iconv on macOS 12 (Apple's GNU libiconv 1.11) or /opt/local/bin/iconv (MacPorts GNU libiconv 1.17) therefore it is not a MacPorts bug.I believe iconv uses locale information provided by the operating system to guide its conversions. Therefore your bug, I suppose, is with macOS, although I assume the result we observe is intentional and not considered a bug. In particular, what we're observing is called transliteration:
https://www.gnu.org/software/libiconv/
You have specifically requested that transliteration be enabled.
I don't know why you get different results on Linux. That is, it is presumably because the locale information provided by Linux differs from that provided by macOS, but I don't know why these two OS vendors have decided to do that. Possibly, the locale information on your Linux does not support transliteration therefore your request to enable transliteration is being ignored on Linux.