Opened 2 months ago

Closed 2 months ago

Last modified 2 months ago

#70670 closed defect (invalid)

libiconv @1.17_0: iconv on macOS Ventura 13.6+ does not perform correct conversions

Reported by: seamusdemora (Seamus) Owned by: ryandesign (Ryan Carsten Schmidt)
Priority: Normal Milestone:
Component: ports Version: 2.10.1
Keywords: Cc:
Port: libiconv

Description (last modified by ryandesign (Ryan Carsten Schmidt))

I'm trying to do something that seems simple (it can be done simply on my Linux box):

I have C&P a line from a PDF file (a French programming guide), to Terminal.app:

print("Numéro de boucle", i)'

I wanted to convert this line to ASCII before pasting into my editor. So I used 'iconv' as shown below. In each case, I used 'file' to check the "from" encoding :

% echo 'print("Numéro de boucle", i)' | file -
/dev/stdin: Unicode text, UTF-8 text

% echo 'print("Numéro de boucle", i)' | iconv -f utf-8 -t ascii//translit
print("Num'ero de boucle", i) 

?!?!?!? I tried another example:

% echo "print("Protégé. Señorita. Coup de grâce", i)" | file -
/dev/stdin: Unicode text, UTF-8 text

% echo 'Protégé Señorita Coup de grâce' | iconv -f UTF-8 -t ASCII//TRANSLIT
Prot'eg'e Se~norita Coup de gr^ace

PLEASE NOTE: I have also tried using 'utf-8-mac' and 'utf8-mac' for the "from" encoding; thhis had no effect on the results - they were identical in all cases.

As you can see, this is not correct: a single quote has been added. I'm not a frequent user of 'iconv', so I checked this on my Debian 'bookworm' Linux box:

$ echo 'print("Numéro de boucle", i)' | iconv -f utf-8 -t ascii//translit
print("Numero de boucle", i)

I've checked to confirm that the version of 'iconv' on my macOS Ventura 13.6+ is one from MacPorts. I believe that it is:

% whereis iconv
iconv: /usr/bin/iconv /opt/local/share/man/man1/iconv.1.gz 

% port installed requested
The following ports are currently installed:
...
libiconv @1.17_0 (active)
...
%

And confirmation of my macports version:

% port -v
MacPorts 2.10.1

I can accept that it's broken, and I can accept that it can't be fixed (if that turns out to be the case). But I surely would appreciate an explanation of what has gone wrong - especially if it's something that I am doing incorrectly!

Rgds, ~S

Change History (4)

comment:1 Changed 2 months ago by seamusdemora (Seamus)

Description: modified (diff)

comment:2 Changed 2 months ago by jmroot (Joshua Root)

Keywords: iconv libiconv removed
Owner: set to ryandesign
Port: @1.17_0 (active) removed
Status: newassigned
Summary: iconv on macOS Ventura 13.6+ does not perform correct conversionslibiconv @1.17_0: iconv on macOS Ventura 13.6+ does not perform correct conversions

comment:3 Changed 2 months ago by ryandesign (Ryan Carsten Schmidt)

Description: modified (diff)
Resolution: invalid
Status: assignedclosed

I get the same conversions as you (insertion of ' and ^ after accented characters in an attempt to mimic in ASCII what those accents look like) regardless whether I use /usr/bin/iconv on macOS 12 (Apple's GNU libiconv 1.11) or /opt/local/bin/iconv (MacPorts GNU libiconv 1.17) therefore it is not a MacPorts bug.

I believe iconv uses locale information provided by the operating system to guide its conversions. Therefore your bug, I suppose, is with macOS, although I assume the result we observe is intentional and not considered a bug. In particular, what we're observing is called transliteration:

https://www.gnu.org/software/libiconv/

It has also some limited support for transliteration, i.e. when a character cannot be represented in the target character set, it can be approximated through one or several similarly looking characters. Transliteration is activated when //TRANSLIT is appended to the target encoding name.

You have specifically requested that transliteration be enabled.

I don't know why you get different results on Linux. That is, it is presumably because the locale information provided by Linux differs from that provided by macOS, but I don't know why these two OS vendors have decided to do that. Possibly, the locale information on your Linux does not support transliteration therefore your request to enable transliteration is being ignored on Linux.

comment:4 Changed 2 months ago by jmroot (Joshua Root)

Linux systems also tend to use the iconv from glibc, rather than the standalone libiconv package (which according to upstream is less efficient and only really intended to be used on systems that don't have an iconv that supports Unicode.) Maybe there are some code differences there.

The standards seem to be silent on what should happen when a character can't be represented in the target character set. If you want fine control over how transliteration is done, you might be better off using something like recode?

Note: See TracTickets for help on using tickets.