Opened 16 years ago
Last modified 3 years ago
#16373 assigned enhancement
base should maintain a persistent working copy for all supported VCS fetches
Reported by: | ryandesign (Ryan Carsten Schmidt) | Owned by: | raimue (Rainer Müller) |
---|---|---|---|
Priority: | Normal | Milestone: | MacPorts Future |
Component: | base | Version: | 1.7.0 |
Keywords: | performance fetch | Cc: | jeremyhu (Jeremy Huddleston Sequoia), nerdling (Jeremy Lavergne), nonstop.server@…, cooljeanius (Eric Gallager), mojca (Mojca Miklavec), jul_bsd@…, anddam (Andrea D'Amore), Schamschula (Marius Schamschula), xuchunyang (Chunyang Xu), raimue (Rainer Müller) |
Port: |
Description
"fetch.type svn
" is inefficient in that it checks out a new working copy every time, directly to the work area. That would be like a normal port downloading the distfile every time. Instead, we should check out a working copy to that port's distpath, and then in the extract phase we should svn export
it to the work area.
Some checks will be needed in the fetch phase to ensure that an existing working copy:
- has no modifications: check
svn status
. Ideally we would try to clean up the working copy, for example bysvn revert
ing modified or added or deleted files, and then in a secondsvn status
run, delete any unversioned files. But it's already an improvement if we just discard the working copy ifsvn status --ignore-externals
produces any output. - is from the right URL: check
svn info
: check if the "URL" is the one we want. If not, check that the "Repository Root" is a substring of the repository we want. If yes, try tosvn switch
to the URL and revision we want; if not, discard the working copy.
So the fetch phase would go something like...
if {working copy exists} { if {working copy has modifications} { delete working copy } } if {working copy exists} { if {working copy url is the one we want} { svn update to the desired revision } else { if {working copy repository root matches beginning of desired url} { try to svn switch to the desired url and revision if {an error occurred} { delete working copy } } else { delete working copy } } } if {working copy doesn't exist} { check out working copy }
And the extract phase is simply to svn export
the working copy from the distpath to the worksrcpath. (There is one problem if the working copy has externals and the user is using Subversion earlier than 1.5, for example Subversion 1.4.whatever which is included with Leopard. But rather than spend time working around this in base, I think this is a case where the port should depend on MacPorts subversion.)
Attachments (1)
Change History (42)
comment:1 Changed 13 years ago by jeremyhu (Jeremy Huddleston Sequoia)
Keywords: | performance fetch added |
---|---|
Summary: | svn fetch type should maintain a persistent working copy → svn git and hg fetch type should maintain a persistent working copy |
comment:8 Changed 11 years ago by mojca (Mojca Miklavec)
I'm looking at options for git.
The following commands result in stable checksums:
git archive {shasum_or_branch} > /path/to/name_version.tar gzip < /path/to/name_version.tar > /path/to/name_version.tar.gz
git archive {shasum_or_branch} > /path/to/name_version.tar gzip -n /path/to/name_version.tar
git archive {shasum_or_branch} | gzip -n > /path/to/name_version.tar.gz
git archive {shasum_or_branch} | xz > /path/to/name_version.tar.xz
The first option results in a different checksum that the other two. I didn't try to understand the difference in the approaches, but in either case that would allow users to store the resulting compressed file, verify the checksums and store the file on MacPorts' server.
(Optionally the resulting file could be touched to get the same timestamp as the contents, but that's not a strict requirement.)
comment:9 Changed 11 years ago by ryandesign (Ryan Carsten Schmidt)
I'm not sure how this relates to this ticket. The solution I'm envisioning for this issue (in Subversion parlance, though I'm sure git and hg have equivalent concepts) is maintaining a persistent working copy which would be updated and switched as needed, or in extreme cases deleted and recreated, not creating any tarball, keeping any checksums, or uploading any file to a MacPorts server.
comment:10 Changed 11 years ago by nerdling (Jeremy Lavergne)
For git, we could store the downloaded repository in the distfiles directory. If local repo doesn't exist git clone
, or if local repo exists git reset --hard && git pull
.
This repo can then be locally cloned or checked out to the working directory.
comment:11 Changed 11 years ago by mojca (Mojca Miklavec)
Sure, keeping the whole repository (and cleaning it in case it turns out to be "broken" or changed in unexpected ways) would be the optimal solution, but the solution I was talking about would probably be a lot faster to implement: it would be similar to what the GitHub PortGroup
does for example. It fetches a .tar.gz
file from GitHub (even though it could clone the git repo) and calculates the checksums. If the checksum matches, all is well and a copy of that file gets mirrored on one of the MacPorts server.
The solution I suggest would:
- check if
${distfile}
exists - if not, clone the git repository and create a
${name.version}.tar.gz
/${name.version}.tar.xz
of the desired branch/tag/version in${distpath}
, delete the temporary git clone - verify the checksums, extract the contents as usual ...
So something similar to what GitHub and BitBucket PortGroup already do (except that those fetch the distfiles from the server already).
I mentioned this because I believe it would be relatively easy to implement and it would allow to keep a mirror of a particular version on the server.
comment:12 Changed 11 years ago by mojca (Mojca Miklavec)
The problem is that I'm now trying to push some projects into making GitHub clones just for the sake of being able to avoid constant re-fetching of the sources from a random git repository. I would be really really grateful if MacPorts would get the ability to store the old repository and/or to mirror snapshots in the form of .tar.[gz|bz2|xz] files.
I suspect that solution would need to be implemented for each system separately anyway (different commands for svn, git and hg). I wanted to push the issue to start with git which is probably most widely used.
I would like to add a new port and I'm trying to figure out whether I should:
- make an unofficial mirror on GitHub (in my user account)
- deal with the pain of re-fetching from the original repository
- or make sure that the issue gets fixed in MacPorts
I would prefer the last one.
comment:13 follow-up: 14 Changed 11 years ago by neverpanic (Clemens Lang)
Keeping a (bare repo) clone of the whole thing would speed up fetching even after a port is updated, though. Packaging tarballs wouldn't. Also we can't easily avoid the git dependency because by the time the fetch phase is started we wouldn't know whether our mirrors already had a generated tarball or we'd have to fetch from git.
I guess getting this implemented using bare clones wouldn't be so hard after all. For git, you'd have to
- generate a unique identifier from the repository URL (e.g. using a hash function)
- test whether $cachedir/$identifier is a valid git repository
- create a bare clone if it isn't, run git fetch if it is
- export the version/revision/tag you need from $cachedir/$identifier into $worksrcdir.
I think that's actually easier to implement than getting the mirroring stuff you propose into the scripts that update our distfile mirrors.
comment:14 follow-up: 15 Changed 11 years ago by mojca (Mojca Miklavec)
Replying to cal@…:
Keeping a (bare repo) clone of the whole thing would speed up fetching even after a port is updated, though. Packaging tarballs wouldn't.
Yes, that would be a huge benefit over tarballs.
Also we can't easily avoid the git dependency because by the time the fetch phase is started we wouldn't know whether our mirrors already had a generated tarball or we'd have to fetch from git.
I don't think that getting rid of the dependency on git would be of any substantial benefit.
I guess getting this implemented using bare clones wouldn't be so hard after all. For git, you'd have to
- generate a unique identifier from the repository URL (e.g. using a hash function)
- test whether $cachedir/$identifier is a valid git repository
- create a bare clone if it isn't, run git fetch if it is
- export the version/revision/tag you need from $cachedir/$identifier into $worksrcdir.
I would also suggest to add/check the SHA sum of the commit (even when dealing with tags) just to be on the safe side.
I think that's actually easier to implement than getting the mirroring stuff you propose into the scripts that update our distfile mirrors.
I'm too clumsy when it comes to tcl (I've learnt to handle the Portfiles
, but changing anything in base is still too complex for me).
I would be thrilled if someone would be willing and able to implement this.
Once that gets implemented – how would you handle GitHub and BitBucket from that point on? And how would you handle situations when the servers go offline? Would you mirror the bare repository on one of MacPorts servers? (This is of course less important.)
comment:15 Changed 11 years ago by nerdling (Jeremy Lavergne)
Replying to mojca@…:
I would also suggest to add/check the SHA sum of the commit (even when dealing with tags) just to be on the safe side.
Using commitish over tags is helpful and uniform.
Once that gets implemented – how would you handle GitHub and BitBucket from that point on? And how would you handle situations when the servers go offline? Would you mirror the bare repository on one of MacPorts servers? (This is of course less important.)
There's no need to mirror their repositories. The authors can easily host it elsewhere and we simply update the portfile.
comment:16 follow-ups: 17 18 Changed 11 years ago by mojca (Mojca Miklavec)
I believe that both the SHA sum and the tag should be present. Tag doesn't always represent the exact version number (sometimes the version needs to be set separately for a github project anyway, but is often clear and helpful, often even for livecheck).
I wasn't talking about moving the git repositories. I meant situations when the server is not accessible for several days. Or when the sources disappear completely (there are certain tar.gz files that are only present on MacPorts mirrors and can still be installed, but are otherwise long gone from web).
comment:17 Changed 11 years ago by nerdling (Jeremy Lavergne)
Replying to mojca@…:
I believe that both the SHA sum and the tag should be present. Tag doesn't always represent the exact version number (sometimes the version needs to be set separately for a github project anyway, but is often clear and helpful, often even for livecheck).
And sometimes tags are never used.
I wasn't talking about moving the git repositories. I meant situations when the server is not accessible for several days. Or when the sources disappear completely (there are certain tar.gz files that are only present on MacPorts mirrors and can still be installed, but are otherwise long gone from web).
So we have two issues here: it's not a distfile, and keeping the whole repo would mean we have to manage history rewrites on our servers.
comment:18 follow-up: 20 Changed 11 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to mojca@…:
I wasn't talking about moving the git repositories. I meant situations when the server is not accessible for several days. Or when the sources disappear completely (there are certain tar.gz files that are only present on MacPorts mirrors and can still be installed, but are otherwise long gone from web).
I consider that scenario to be outside the scope of this ticket.
If I get around to working on this issue, I would begin with the Subversion portion, since that's the version control system I'm most familiar with.
comment:19 Changed 11 years ago by mojca (Mojca Miklavec)
OK, it could be a mandatory SHA sum and an optional tag (or maybe this needs a bit of rethinking). One thing that I would also like to see supported out of the box (but is otherwise completely independent and also outside of scope of this ticket) is creating a version string like 3.14-beta-20140314-{short_SHA}
. I mean: provided a full SHA string, I would like to be able to extract both date (just for "sorting" the increasing version) and a shortened version of the SHA sum.
But keeping a copy on MacPorts mirrors is definitely a lower priority than getting this functionality to work in the first place.
comment:20 Changed 11 years ago by nerdling (Jeremy Lavergne)
Replying to ryandesign@…:
If I get around to working on this issue
Could you give further guidance on this so that others who aren't as familiar with base might try to help out?
comment:21 follow-up: 22 Changed 11 years ago by neverpanic (Clemens Lang)
I'm not sure we really need a mandatory SHA sum. We currently trust git (or any other version control system) to do the right thing automatically when specifying tags (and not using github or setting fetch.type git
). I'm also not sure how to implement a SHA sum of a complete source tree.
As for the version string, try git describe
, it might generate what you want.
This needs to be implemented in browser:trunk/base/src/port1.0/portfetch.tcl; there are a couple of procs named portfetch::${vcs}fetch where this would have to be implemented.
comment:22 Changed 11 years ago by ryandesign (Ryan Carsten Schmidt)
Replying to cal@…:
This needs to be implemented in browser:trunk/base/src/port1.0/portfetch.tcl; there are a couple of procs named portfetch::${vcs}fetch where this would have to be implemented.
Currently, when using a non-distfile fetch.type, they fetch directly into workpath, and the extract phase does nothing; the extract phase would also have to be updated to do something.
comment:23 Changed 11 years ago by mojca (Mojca Miklavec)
Replying to cal@…:
I'm also not sure how to implement a SHA sum of a complete source tree.
One option is to generate a .tar
or a .tar.[gz|bz2|xz]
and calculate the checksum of that. There are other options for sure.
As for the version string, try
git describe
, it might generate what you want.
I meant something that would easily be accessible in Tcl, so that I could specify something like
git.branch ...sha... version "3.14-beta-${git.commitdate}-${git.shortsha}"
I would need to learn how to interface git and Tcl first to implement that.
comment:25 Changed 10 years ago by larryv (Lawrence Velázquez)
Cc: | larryv@… removed |
---|---|
Owner: | changed from macports-tickets@… to larryv@… |
comment:26 Changed 10 years ago by larryv (Lawrence Velázquez)
Status: | new → assigned |
---|
comment:27 Changed 10 years ago by dbevans (David B. Evans)
Note this is an issue with ports that use bzr fetches as well such as inkscape-devel.
comment:28 Changed 10 years ago by larryv (Lawrence Velázquez)
Summary: | svn git and hg fetch type should maintain a persistent working copy → base should maintain a persistent working copy for all supported VCS fetches |
---|
comment:32 Changed 9 years ago by mojca (Mojca Miklavec)
Can you please take a look at the attached Portfile
for xchm
? (Never mind the fact that it leads to a build error later on.)
I copy-pasted some code from portutil.tcl
and portfetch.tcl
. This is the relevant part:
checksums rmd160 ... \ sha256 ... use_xz yes pre-fetch { if {![file exists ${distpath}/${distname}${extract.suffix}]} { set git_dir ${workpath}/git # clone the git repository set options "-q" set cmdstring "${git.cmd} clone $options ${git.url} ${git_dir} 2>&1" ui_debug "Executing: $cmdstring" if {[catch {system $cmdstring} result]} { return -code error [msgcat::mc "Git clone failed"] } # create a tarball set xz [findBinary xz ${portutil::autoconf::xz_path}] set cmdstring "${git.cmd} archive ${git.branch} --prefix=${distname}/ | ${xz} > ${distpath}/${distname}${extract.suffix}" ui_debug "Executing: $cmdstring" if {[catch {system -W ${git_dir} ${cmdstring}} result]} { return -code error [msgcat::mc "Git archive failed"] } } }
It works like explained months ago:
- In case the sources are missing, it will clone the repository and make a tarball out of it and store it to
${distpath}
. - (In extract phase the sources are extracted from the tarball, they are not taken from the git repository.)
- Next time when the sources are needed, it will simply extract everything from the tarball, no need for a new clone and for consumption of a precious bandwidth.
- I assume that the buildbots / other servers would then also automatically keep a mirror of these tarballs, so it would no longer be a problem if the git repository goes offline.
However, this code should go to a portgroup (or possibly to core, depending on where the other code resides) and I'm not too comfortable writing the code for that yet.
Can someone please provide some feedback about this approach and in case that the approach sounds reasonable, possibly help me rewrite the code?
Changed 9 years ago by mojca (Mojca Miklavec)
Attachment: | xchm.Portfile added |
---|
Example of a portfile that creates a tarball from git on the fly
comment:33 Changed 9 years ago by ryandesign (Ryan Carsten Schmidt)
This assumes ${distname} is sufficiently unique, which it likely isn't. This particular port sets version to include the version number, ${git.branch} and a date, but the default for distname is ${name}-${version}, and it is common for projects that fetch from git to update their commit hash while the version and of course the name stay the same. The distname should probably be changed to be ${name}-${git.branch} for git, ${name}-r${svn.revision} for subversion, etc.
comment:35 Changed 9 years ago by raimue (Rainer Müller)
Cc: | raimue@… added |
---|
I sketched a possible solution for this on macports-dev: https://lists.macosforge.org/pipermail/macports-dev/2015-March/029917.html https://lists.macosforge.org/pipermail/macports-dev/2015-March/029936.html
comment:36 Changed 9 years ago by raimue (Rainer Müller)
Implementation started in ^/branches/vcs-fetch/base/.
comment:37 Changed 7 years ago by neverpanic (Clemens Lang)
Milestone: | MacPorts Future → MacPorts 2.5.0 |
---|
We would like to see this in 2.5.0.
comment:38 Changed 7 years ago by neverpanic (Clemens Lang)
Owner: | changed from larryv to raimue |
---|
comment:39 Changed 7 years ago by neverpanic (Clemens Lang)
Milestone: | MacPorts 2.5.0 → MacPorts 2.6.0 |
---|
Our plan is to merge this right after the 2.5.0 branch.
comment:40 Changed 5 years ago by jmroot (Joshua Root)
Milestone: | MacPorts 2.6.0 → MacPorts 2.7.0 |
---|
Ticket retargeted after milestone closed
comment:41 Changed 3 years ago by jmroot (Joshua Root)
Milestone: | MacPorts 2.7.0 → MacPorts Future |
---|
Ticket retargeted after milestone closed
This should be done for mercurial and git as well. It's quite annoying to have to redownload sources every time through my debug itteration even though they haven't changed.