Opened 16 years ago
Closed 15 years ago
#18736 closed enhancement (fixed)
distname is not percent-encoded before gluing it into the URL
Reported by: | ryandesign (Ryan Carsten Schmidt) | Owned by: | macports-tickets@… |
---|---|---|---|
Priority: | Normal | Milestone: | MacPorts 1.9.0 |
Component: | base | Version: | 1.7.0 |
Keywords: | Cc: | nerdling (Jeremy Lavergne), blb@…, raimue (Rainer Müller) | |
Port: |
Description
MacPorts base does not percent-encode the distname before putting it in the final URL. This causes distnames containing spaces to be completely unavailable (causes a Tcl error), causes distnames containing a percent character to probably not download at all (probably not a common need), and causes distnames containing a plus character to work sometimes but not others, at the whim of the web server (since a plus should really be encoded as %2b the web server is completely within its right to not know what we're asking for).
Attachments (2)
Change History (18)
comment:1 Changed 16 years ago by ryandesign (Ryan Carsten Schmidt)
Changed 16 years ago by ryandesign (Ryan Carsten Schmidt)
Attachment: | urlencode.diff added |
---|
comment:2 follow-up: 3 Changed 16 years ago by raimue (Rainer Müller)
Cc: | raimue@… added |
---|
I also can't find mapReply in the Tcl docs. So this seems to have been an internal proc.
Now you already made the effort of writing this in Tcl, but as we are already using and linking with libcurl, we could use curl_easy_escape() for this purpose.
comment:3 follow-up: 11 Changed 16 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to raimue@…:
Now you already made the effort of writing this in Tcl, but as we are already using and linking with libcurl, we could use curl_easy_escape() for this purpose.
That would be fine with me, except if it does what the docs say: "All input characters that are not a-z, A-Z or 0-9 are converted to their 'URL escaped' version" -- if that's what it really does then that's broken just like tcl < 8.4.12 was broken: it needs to convert all characters that are not a-z, A-Z, 0-9, -, ., _ or ~.
comment:4 follow-up: 5 Changed 16 years ago by raimue (Rainer Müller)
According to the man page, libcurl implements RFC2396, Tcl goes with RFC3986. It is not really broken, just outdated. Which is not a serious issue, as decoders are advised to decode these characters normally.
There is also a simple approach using the public http API, which is so simple that we could drop the wrapper as well. Although calling ::http::formatQuery with an odd number of arguments seems not to be documented (could also call with an additional empty string and strip the last char).
package require http proc urlencode {str} { return [::http::formatQuery $str] }
But if we are going to duplicate this, we should probably copy from the latest version which is not using regsub for performance reasons. It is not implemented in C as stated on the Tclers wiki.
comment:5 follow-ups: 6 10 Changed 16 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to raimue@…:
According to the man page, libcurl implements RFC2396, Tcl goes with RFC3986. It is not really broken, just outdated. Which is not a serious issue, as decoders are advised to decode these characters normally.
It seems silly if you want to download foo-1.0.tar.gz to request foo%2d1%2e0%2etar%2egz but I agree it should still work.
There is also a simple approach using the public http API, which is so simple that we could drop the wrapper as well.
That would be fine if it works. I had not tried [::http::something] I had tried [http::something] which was how the example I saw used it. I'm not so familiar with tcl syntax when it comes to packages.
Although calling ::http::formatQuery with an odd number of arguments seems not to be documented (could also call with an additional empty string and strip the last char).
formatQuery is for formatting a query string (e.g. "?a=b&c=d"), but that's not what we're doing, so I don't feel we should be using that function. It seems mapReply is the correct function to use.
But if we are going to duplicate this, we should probably copy from the latest version which is not using regsub for performance reasons. It is not implemented in C as stated on the Tclers wiki.
Ok.
comment:6 follow-up: 7 Changed 16 years ago by raimue (Rainer Müller)
FYI ::http::* starts from the top level namespace, http::* looks for a http namespace in the current namespace first.
Replying to ryandesign@…:
formatQuery is for formatting a query string (e.g. "?a=b&c=d"), but that's not what we're doing, so I don't feel we should be using that function. It seems mapReply is the correct function to use.
http::mapReply is private API (at least in 8.4 and 8.5, seems to have changed for 8.6), we should not use it. But we cannot even use it before we call another http::* function, as it is not listed in pkgIndex.tcl. http::formatQuery is a wrapper around http::mapReply which is public. Actually we want to encode a string to be used as value in a query string, so it is not absolutely wrong.
comment:7 Changed 16 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to raimue@…:
http::mapReply is private API (at least in 8.4 and 8.5, seems to have changed for 8.6), we should not use it. But we cannot even use it before we call another http::* function, as it is not listed in pkgIndex.tcl.
That's a shame.
http::formatQuery is a wrapper around http::mapReply which is public. Actually we want to encode a string to be used as value in a query string, so it is not absolutely wrong.
The query string is the part of a URL after the question mark. We are constructing a distfile URL which does not contain a question mark, meaning it does not have a query string component.
So what about using curl_easy_escape() as you suggested?
Changed 16 years ago by raimue (Rainer Müller)
Attachment: | patch-macports-curl-escape.diff added |
---|
comment:8 Changed 16 years ago by raimue (Rainer Müller)
patch-macports-curl-escape.diff
is a patch to add a curl escape
command, using curl_easy_escape() internally.
$ rlwrap /usr/bin/tclsh % source /Library/Tcl/macports1.0/macports_fastload.tcl 0 % package require Pextlib 1.0 % curl escape foo~bar+baz! foo%7Ebar%2Bbaz%21
comment:9 Changed 16 years ago by ryandesign (Ryan Carsten Schmidt)
Thanks.
When trying to compile MacPorts with this change, I got:
ld: Undefined symbols: _curl_easy_escape
I found that on this Intel Tiger Mac, I somehow had both library version 3 and 4 of curl, with all the symlinks pointing to version 3, and only version 4 has the curl_easy_escape function:
$ ls -l /usr/lib/libcurl* lrwxr-xr-x 1 root wheel 15 Sep 5 2008 /usr/lib/libcurl.2.dylib -> libcurl.3.dylib lrwxr-xr-x 1 root wheel 15 Sep 5 2008 /usr/lib/libcurl.3.0.0.dylib -> libcurl.3.dylib -rwxr-xr-x 1 root wheel 384692 Oct 24 02:02 /usr/lib/libcurl.3.dylib -rwxr-xr-x 1 root wheel 460136 Nov 6 15:53 /usr/lib/libcurl.4.dylib lrwxr-xr-x 1 root wheel 15 Sep 5 2008 /usr/lib/libcurl.dylib -> libcurl.3.dylib $ grep curl_easy_escape /usr/lib/libcurl* Binary file /usr/lib/libcurl.4.dylib matches $
I checked two other Tiger Macs (one Intel and one PowerPC) and they only have library version 4, and everything else is just symlinks to it:
$ ls -l libcurl* lrwxr-xr-x 1 root wheel 15 Mar 26 2008 libcurl.2.dylib -> libcurl.4.dylib lrwxr-xr-x 1 root wheel 15 Mar 16 2007 libcurl.3.0.0.dylib -> libcurl.3.dylib lrwxr-xr-x 1 root wheel 15 Mar 26 2008 libcurl.3.dylib -> libcurl.4.dylib lrwxr-xr-x 1 root wheel 15 Mar 26 2008 libcurl.4.0.0.dylib -> libcurl.4.dylib -rwxr-xr-x 1 root wheel 460136 Nov 6 15:53 libcurl.4.dylib lrwxr-xr-x 1 root wheel 15 Mar 26 2008 libcurl.dylib -> libcurl.4.dylib $
And I checked two Leopard Macs (one Intel, one PowerPC) which also look the same (except that the libcurl.3.0.0.dylib symlink is missing).
Once I fixed the symlinks on the first Mac I could build it.
comment:10 Changed 15 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to ryandesign@…:
It seems silly if you want to download foo-1.0.tar.gz to request foo%2d1%2e0%2etar%2egz but I agree it should still work.
I've had this patch in place for awhile locally, and it seems to work with most servers, but not with downloads.sourceforge.net.
comment:11 Changed 15 years ago by nerdling (Jeremy Lavergne)
Cc: | snc@… added |
---|
Replying to ryandesign@…:
That would be fine with me, except if it does what the docs say: "All input characters that are not a-z, A-Z or 0-9 are converted to their 'URL escaped' version" -- if that's what it really does then that's broken just like tcl < 8.4.12 was broken: it needs to convert all characters that are not a-z, A-Z, 0-9, -, ., _ or ~.
We can probably summarize that as "Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits." (from PHP).
comment:12 Changed 15 years ago by jmroot (Joshua Root)
Milestone: | MacPorts 1.8.0 → MacPorts 1.8.1 |
---|
Pushing this back a little since it's not a regression and will apparently be nontrivial to fix.
comment:13 Changed 15 years ago by wsiegrist@…
Milestone: | MacPorts 1.8.1 → MacPorts 1.8.2 |
---|
comment:14 Changed 15 years ago by jmroot (Joshua Root)
1.8.1 milestone is closing, moving open tickets out
comment:15 Changed 15 years ago by jmroot (Joshua Root)
Milestone: | MacPorts 1.8.2 → MacPorts Future |
---|
comment:16 Changed 15 years ago by jmroot (Joshua Root)
Milestone: | MacPorts Future → MacPorts 1.9.0 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
I searched high and low to see how to do percent-encoding in Tcl. I found a function http::mapReply which does it. It's part of the http module included with Tcl. It seems like I should just be able to do something like this:
Unfortunately while I can get this to work with MacPorts tcl, I can't get it to work with the OS-provided tcl; it can't find the http::mapReply procedure. And even if we find out why that is and how to fix it, it won't work because http::mapReply was seriously broken until tcl 8.4.12; Mac OS X 10.4.11 has tcl 8.4.7 so it would not include the fix.
The http module appears to have originally been written in tcl (http.tcl) but is now apparently written in C. I copied the tcl code from http.tcl and modified it as suggested in the bug report so that it is not broken. I'll attach the patch I came up with.