Opened 14 months ago
Last modified 12 months ago
#68192 new defect
Archives can remain cached by the CDN without corresponding signature
Reported by: | ulange-eso-org | Owned by: | admin@… |
---|---|---|---|
Priority: | Normal | Milestone: | |
Component: | server/hosting | Version: | 2.8.1 |
Keywords: | Cc: | ||
Port: | py39-markupsafe |
Description (last modified by ryandesign (Ryan Carsten Schmidt))
Since about one day we are unable to port install py39-markupsafe-2.1.1_0.darwin_21.x86_64.tbz2
The file is still there but the associated rmd160 file is only 126 Bytes in length while it should be 512 Bytes.
Can you please resurrect a proper checksum file?
Thanks, Uwe.
details:
curl https://packages.macports.org/py39-markupsafe/py39-markupsafe-2.1.3_0.darwin_21.x86_64.tbz2 -o 211.tbz2 curl https://packages.macports.org/py39-markupsafe/py39-markupsafe-2.1.1_0.darwin_21.x86_64.tbz2 -o 211.tbz2 curl https://packages.macports.org/py39-markupsafe/py39-markupsafe-2.1.3_0.darwin_21.x86_64.tbz2.rmd160 -o 213.tbz2.rmd160 curl https://packages.macports.org/py39-markupsafe/py39-markupsafe-2.1.1_0.darwin_21.x86_64.tbz2.rmd160 -o 211.tbz2.rmd160 ls -rtl 21* -rw-r--r--. 1 ulange vlt 18884 Sep 18 14:46 213.tbz2 -rw-r--r--. 1 ulange vlt 18474 Sep 18 14:47 211.tbz2 -rw-r--r--. 1 ulange vlt 512 Sep 18 14:47 213.tbz2.rmd160 -rw-r--r--. 1 ulange vlt 126 Sep 18 14:48 211.tbz2.rmd160
Captured error snippet from our build system:
[macports] ---> Attempting to fetch py39-markupsafe-2.1.1_0.darwin_21.x86_64.tbz2.rmd160 from https://packages.macports.org/py39-markupsafe [macports] % Total % Received % Xferd Average Speed Time Time Time Current [macports] Dload Upload Total Spent Left Speed [macports] 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 126 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 [macports] Error: Failed to archivefetch py39-markupsafe: Failed to fetch signature for archive: The requested URL returned error: 404
Change History (14)
comment:1 Changed 14 months ago by ulange-eso-org
Description: | modified (diff) |
---|
comment:2 Changed 14 months ago by ryandesign (Ryan Carsten Schmidt)
Description: | modified (diff) |
---|
comment:3 Changed 14 months ago by jmroot (Joshua Root)
The corresponding archive should have been deleted at the same time as the .rmd160 though, which should have just resulted in a build from source. I can only guess this was some combination of bad timing and CDN caching?
comment:4 Changed 14 months ago by ryandesign (Ryan Carsten Schmidt)
Yes, py39-markupsafe-2.1.1_0.darwin_21.x86_64.tbz2 appears still to be cached by the CDN (at least the edge server I happen to be connecting to) while py39-markupsafe-2.1.1_0.darwin_21.x86_64.tbz2.rmd160 isn't.
comment:5 Changed 14 months ago by ulange-eso-org
Thanks for the clarification to both Ryan and Joshua! Is there any rough estimate for how long the CDN will hold on to the py39-markupsafe-2.1.1_0.darwin_21.x86_64.tbz2 ? Building from source would be okay for us but is currently prevented with the caching of the CDN.
comment:6 Changed 14 months ago by ryandesign (Ryan Carsten Schmidt)
Our main public web file server sends the header Cache-Control: max-age=2592000, public
which means the CDN will cache resources for 30 days.
You can request a build from source by using the -s
command line flag.
Why do you need this outdated version of markupsafe?
comment:7 Changed 14 months ago by ulange-eso-org
If both the signature file and binary archive for 2.1.1 version would have disappeared at the same time we would be fine. The ports command would fall back to building from source.
If due to CDN this cannot be guaranteed then perhaps the ports command should detect that and fall back to build from source for the older version?
We have our own Macports package building infrastructure: https://www.eso.org/sci/software/pipelines/installation/macports.html).
We build hundreths of Macports packages across different MacOS X versions. The basis are VMware VM templates per supported OS version.
At the time of template creation we bake a snapshot of the then current Macports definitions into them.
Our VMware templates where refreshed on 08-SEP-2023. This was in reaction to a similar issue we had in August for package https://ports.macports.org/port/py39-yaml (older binary archive there but matching signature file already gone).
A new VMware template with updated Macports then triggers a full rebuild of all packages on our side which can take several days to finish and is very straining on our infrastructure. In this case only a few hours after the rebuild finished we now have to restart from the beginning because py39-markupsafe 2.1.1 -> 2.1.3.
comment:8 follow-up: 9 Changed 14 months ago by jmroot (Joshua Root)
Component: | ports → server/hosting |
---|---|
Owner: | set to admin@… |
Summary: | py39-markupsafe-2.1.1_0.darwin_21.x86_64.tbz2 : Associated rmd160 file possibly corrupted → Archives can remain cached by the CDN without corresponding signature |
We should figure out a way to immediately expire files from the CDN cache when we remove them from the main server. In the meantime, one workaround would be to avoid using the CDN by adding host_blacklist packages.macports.org
to your macports.conf. (You'll still get archives from one of the other mirrors.)
comment:9 follow-up: 10 Changed 14 months ago by ryandesign (Ryan Carsten Schmidt)
Replying to ulange-eso-org:
perhaps the ports command should detect that and fall back to build from source for the older version?
I think that would be a good idea. MacPorts base should not assume both files will exist; if either one or the other doesn't exist, it should try the next packages mirror or build from source. It should try fetching the rmd160 file first, since it's smaller.
In this case only a few hours after the rebuild finished we now have to restart from the beginning because py39-markupsafe 2.1.1 -> 2.1.3.
You might not have seen this problem much before because I didn't used to clean up old archives on the server until the server's disk had nearly filled up, at which point I would try to remember how to run the scripts that do the cleanup. But since a few months, as we've wanted to do for years, the cleanup happens automatically every Sunday.
The cleanup script uses a number of criteria in weighting each outdated archive to decide which ones to delete first. Age and size are some of the considerations. py-markupsafe is small and was only updated to 2.1.3 on September 10, 2023, however the 2.1.1 archives dated back to March 16, 2022, which is probably why they got deleted this past Sunday. We expect users to keep their ports reasonably up to date; MacPorts warns you if your ports tree is more than two weeks old. In this case, the script decided to delete an old archive less than two weeks after the new version was available, which is not ideal, and it would be nice if we didn't do that, but I'm not sure how we should modify the scripts to achieve that.
Replying to jmroot:
We should figure out a way to immediately expire files from the CDN cache when we remove them from the main server.
We control when we run the deletion script on the private server but we do not control when the main public mirror synchronizes with the private server so doing the CDN purge when we run the deletion script wouldn't work. Someone could still request the file between the time that we purged it and the time that the public mirror synchronizes, which would cause the CDN to cache it again.
We could write a script to monitor the private server's rsync log and then purge those items from the CDN as we see in the log that their deletion was synchronized. It might still not happen at exactly the same time (I'm not sure if rsync needs additional time to commit changes to disk for example).
We'd also have to look into the CDN API for purging specific files. Hopefully there is an API that allows specifying an arbitrary number of paths at once. I would not want, for example, to have to send thousands of API requests, one per deleted file.
comment:10 follow-up: 11 Changed 14 months ago by jmroot (Joshua Root)
Replying to ryandesign:
Replying to ulange-eso-org:
perhaps the ports command should detect that and fall back to build from source for the older version?
I think that would be a good idea. MacPorts base should not assume both files will exist; if either one or the other doesn't exist, it should try the next packages mirror or build from source. It should try fetching the rmd160 file first, since it's smaller.
Attempting to fetch the .rmd160 from other mirrors complicates the logic a bit, but is doable. Falling back to building from source when only the .rmd160 is missing is much harder than it sounds. It would require the dependencies to be recomputed, since building may need dependencies that installing from an archive does not.
comment:11 follow-up: 12 Changed 14 months ago by ulange-eso-org
Replying to jmroot:
Replying to ryandesign:
Replying to ulange-eso-org:
perhaps the ports command should detect that and fall back to build from source for the older version?
I think that would be a good idea. MacPorts base should not assume both files will exist; if either one or the other doesn't exist, it should try the next packages mirror or build from source. It should try fetching the rmd160 file first, since it's smaller.
Attempting to fetch the .rmd160 from other mirrors complicates the logic a bit, but is doable.
a quick check on some mirrors shows that the needed rmd160 file seems gone from all of them so I'm afraid hunting for that file across mirrors wont help, no?
reading #56181 and the code for browser:macports-infrastructure/jobs/delete_old_archives.py:
Could it be that only the rmd160 file was deleted but the binary archive somehow not?
comment:12 Changed 14 months ago by jmroot (Joshua Root)
Replying to ulange-eso-org:
Could it be that only the rmd160 file was deleted but the binary archive somehow not?
Both files are gone from the origin server. One of them remains in the CDN cache.
There is no file py39-markupsafe-2.1.1_0.darwin_21.x86_64.tbz2.rmd160 on the server. You can confirm that by looking at the directory listing. It was probably automatically deleted because it was outdated. 2.1.3 is the current version. If you look at the contents of your 126-byte file it should be a "404 not found" message from our server.