Opened 12 years ago

Closed 7 years ago

Last modified 4 years ago

#36560 closed enhancement (fixed)

Use hfsCompression

Reported by: mfeiri Owned by: mfeiri
Priority: Normal Milestone: MacPorts 2.5.0
Component: base Version: 2.1.99
Keywords: haspatch Cc: nonstop.server@…, cooljeanius (Eric Gallager), raimue (Rainer Müller), RJVB (René Bertin), ryandesign (Ryan Carsten Schmidt), fracai, andre.david@…, eborisch (Eric A. Borisch), 1-61803
Port:

Description

The attached patch enables the port activation phase in registry2.0/portimage.tcl to take advantage of hfsCompression on Mac OS X >=10.6. Posting here for reference before committing to trunk.

Attachments (4)

hfscompression.diff (665 bytes) - added by mfeiri 12 years ago.
hfscompression2.diff (672 bytes) - added by mfeiri 12 years ago.
hfscompression3.diff (2.6 KB) - added by mfeiri 10 years ago.
activate_with_compression.patch (1.4 KB) - added by eborisch (Eric A. Borisch) 7 years ago.
Opportunistic bsdtar w/ --hfsCompression usage during activation.

Download all attachments as: .zip

Change History (40)

Changed 12 years ago by mfeiri

Attachment: hfscompression.diff added

comment:1 Changed 12 years ago by cooljeanius (Eric Gallager)

Could we get some benchmarks as to about how much of a difference it makes?

comment:2 Changed 12 years ago by ryandesign (Ryan Carsten Schmidt)

Any link explaining what hfsCompression is?

comment:3 Changed 12 years ago by nonstop.server@…

Cc: nonstop.server@… added

Cc Me!

comment:4 Changed 12 years ago by mfeiri

HFS compression is a fully transparent way to save disk space and reduce disk I/O. Apple uses it for all of /usr, /bin, /sbin, and /System. You can easily verify this using "ls -lO". The man page of ditto describes this feature as "intended to be used in installation and backup scenarios that involve system files". Should be appropriate enough for macports. Ditto will only actually compress files "if appropriate".

Here are two links describing some technical details and beneftis of transparent compression in HSF:

And here are some numbers to show the effect of HFS compression on my system using 24 recently updated ports

  • 787,056K (sum of relevant .tbz2 archives in /opt/local/var/macports/software/)
  • 2,958,600K (sum of relevant unarchived files not using HFS compression)
  • 1,407,148K (sum of relevant installed files using HFS compression)
  • 1,551,452K (sum of space saved on disk)
  • 52% space saved on disk

Finally the list of ports I used to calculate the above values:

  • libpng-1.5.13_0+universal.darwin_12.i386-x86_64
  • gcc45-4.5.4_6.darwin_12.x86_64
  • gcc46-4.6.3_9.darwin_12.x86_64
  • llvm-2.9-2.9_12.darwin_12.x86_64
  • serf1-1.1.1_0.darwin_12.x86_64
  • clang-2.9-2.9_12+analyzer+python27.darwin_12.x86_64
  • swig-perl-2.0.8_2.darwin_12.noarch
  • swig-python-2.0.8_2.darwin_12.noarch
  • swig-2.0.8_2.darwin_12.x86_64
  • openmpi-1.6.2_0+gcc45.darwin_12.x86_64
  • llvm-3.0-3.0_11.darwin_12.x86_64
  • llvm-3.1-3.1_4.darwin_12.x86_64
  • clang-3.1-3.1_4+analyzer+python27.darwin_12.x86_64
  • db46-4.6.21_7+java+universal.darwin_12.i386-x86_64
  • sqlite3-3.7.14.1_0+universal.darwin_12.i386-x86_64
  • giflib-4.2.1_0+x11.darwin_12.x86_64
  • netpbm-10.60.01_0.darwin_12.x86_64
  • graphviz-2.28.0_8.darwin_12.x86_64
  • wireshark-1.8.3_0+no_python.darwin_12.x86_64
  • bind9-9.9.2_0.darwin_12.x86_64
  • flex-2.5.37_1.darwin_12.x86_64
  • subversion-1.7.7_0.darwin_12.x86_64
  • libusb-1.0.9_0.darwin_12.x86_64
  • mpfr-3.1.1-p2_0.darwin_12.x86_64

comment:5 Changed 12 years ago by mfeiri

Resolution: fixed
Status: newclosed

Commited in r98734

comment:6 Changed 12 years ago by jmroot (Joshua Root)

Doesn't this break hard links? And how much does it affect activation time, particularly on slower systems?

comment:7 in reply to:  6 Changed 12 years ago by jmroot (Joshua Root)

Replying to jmr@…:

Doesn't this break hard links?

Yes, it does. Reverted in r98737 because of this regression.

And how much does it affect activation time, particularly on slower systems?

It makes it slower by more than a factor of 10, apparently. One example:

file move:

sudo port -v activate git-core  1.98s user 0.68s system 80% cpu 3.305 total

ditto:

sudo port -v activate git-core  21.35s user 15.62s system 91% cpu 40.362 total

comment:8 Changed 12 years ago by jmroot (Joshua Root)

Resolution: fixed
Status: closedreopened

comment:9 Changed 12 years ago by cooljeanius (Eric Gallager)

Cc: egall@… added

Cc Me!

Changed 12 years ago by mfeiri

Attachment: hfscompression2.diff added

comment:10 Changed 12 years ago by mfeiri

Owner: changed from macports-tickets@… to mfeiri@…
Status: reopenednew

Thanks for pointing out this regression. I've updated the patch to apply to extract_archive_to_tmpdir instead of _activate_file. This way ditto can compress the entire directory tree of a port, which is a lot faster and seems to preserve hard links. I've also tried to hook the HFS compression directly into the unarchiving pipe, e.g. bsdtar -cpf - --format cpio @${location} | ditto -xV --hfsCompress - $extractdir, but it turned out that the conversion from tar to cpio does not preserve hard links https://github.com/libarchive/libarchive/wiki/Hardlinks.

comment:11 Changed 12 years ago by jmroot (Joshua Root)

Hmm, I can't say I'm thrilled with writing out the files twice. It might be best to implement the relevant bits in C (using libarchive).

comment:12 Changed 12 years ago by mfeiri

AFAICT the only way for us to avoid writing files twice with ditto is to require the use of cpio instead of tar for archives. Not sure if this is a good idea.

I'm also not sure if libarchive would accept patches for direct HFS compresion, because AFAIK there is no official API for HFS compression and it is a bit of a hack anyway. Maybe one day we will get something like ZFS and we can simply configure truly transparent filesystem compression per directory...

But for now ditto is the only available tool and filesystem compression is a desireable feature.

comment:13 Changed 12 years ago by mfeiri

*bump*

I'm happily using the second iteration of this patch for quite a while now. Hard links work fine and I guess adding one additional pass of i/o is as good as it gets (requiring cpio instead of tar or forking libarchive don't sound very attractive). I know some users are eager to see this patch in a released version of macports to save precious space on their SSD equipped MacBooks. I'll commit to trunk again during the weekend.

comment:14 Changed 12 years ago by jmroot (Joshua Root)

Version: 2.1.22.1.99

Why would we need to fork libarchive? There are at least two implementations out there that turn on compression for individual files, and if you have that, you can do it as you extract with libarchive.

There is a fundamental space/speed tradeoff here, and you don't get to decide that everyone wants to save space. At minimum, it needs to be a conf option.

comment:15 in reply to:  14 Changed 12 years ago by cooljeanius (Eric Gallager)

Replying to jmr@…:

There is a fundamental space/speed tradeoff here, and you don't get to decide that everyone wants to save space. At minimum, it needs to be a conf option.

How would one go about making this into a conf option?

Last edited 12 years ago by cooljeanius (Eric Gallager) (previous) (diff)

comment:16 Changed 12 years ago by mfeiri

Oh, when I created this patch there was no support for hfsCompression in libarchive. I just noticed that a couple of weeks later "experimental support for HFS+ Compression" was added to libarchive. Yay! I guess this means we can get rid of the additional round of i/o. Would you suggest to somehow depend on our port of libarchive or to import libarchive/bsdtar into the base macports distribution?

I would have assumed that for the kinds of files installed by macports (text and executables) the cost/benefit tradeoff is clearly in favor of unconditionally enabling compression. Apple also uses hfsCompression in /usr/bin and similar locations. Once I have some spare time again I will look into extending the patch to allow to opt out of hfsCompression.

comment:17 in reply to:  16 Changed 12 years ago by jmroot (Joshua Root)

Replying to mfeiri@…:

Oh, when I created this patch there was no support for hfsCompression in libarchive. I just noticed that a couple of weeks later "experimental support for HFS+ Compression" was added to libarchive. Yay! I guess this means we can get rid of the additional round of i/o. Would you suggest to somehow depend on our port of libarchive or to import libarchive/bsdtar into the base macports distribution?

Well, I guess using ${prefix}/bin/bsdtar --hfsCompression ... would be simplest, but the disadvantage would be it could only be used after the port is installed.

comment:18 Changed 10 years ago by ryandesign (Ryan Carsten Schmidt)

As of MacPorts 2.3.0 we now have the infrastructure in place to bundle other software packages with MacPorts base...

comment:19 Changed 10 years ago by raimue (Rainer Müller)

Cc: raimue@… added

Cc Me!

Changed 10 years ago by mfeiri

Attachment: hfscompression3.diff added

comment:20 Changed 10 years ago by mfeiri

Version 3 of this patch now uses bsdtar instead of ditto, e.g. as provided by the libarchive port. Integration follows the example of lbzip2 and pbzip2 in portimage.tcl (thanks raim for this hint). For maximal robustness an actual feature test is performed. If bsdtar breaks or is unavailable or does not support hfsCompression, everything transparently falls back to the regular tar utility.

To give some motivation again: My installation of macports is 18GB in size (according to Finder Info), but only takes up 10GB of space on my SSD (according to "du -sh"). With this new patch the impact on unarchiving time is now down to around 2x versus 10x previously (measured with "time sudo port activate git" on a 2011 MBP).

PS: (Un)archiving and (de)compression happens in various places in macports. This patch only considers portimage.tcl and does not try to introduce configuration infrastructure for this feature.

comment:21 Changed 10 years ago by RJVB (René Bertin)

FYI, there *is* ZFS on OSX: www.o3x.org . While ZFS doesn't really support compression on a per-directory level, it does do fast (and less fast), transparent compression per mounted volume (dataset) ... and datasets can be made to mount wherever you want. So it would be perfectly straightforward to put all of MacPorts on a specific dataset, without the software even being aware of the fact. It'll be a little less straightforward if there is a dependency on HFS+ features, for example because you insist on Spotlight indexing or TM backups. NB: ZFS's default compression algorithm actually tends to speed up I/O on spinning storage, as CPUs are usually faster than HDDs.

To come back to the patch: could this be made to apply to source code extraction too? Source trees can be huge and while users won't usually keep them around the same is probably not true for port developers. But maybe bsdtar has an option to "extract with compression"?

comment:22 in reply to:  21 ; Changed 10 years ago by ryandesign (Ryan Carsten Schmidt)

Cc: rjvbertin@… ryandesign@… added

Replying to rjvbertin@…:

FYI, there *is* ZFS on OSX: www.o3x.org . While ZFS doesn't really support compression on a per-directory level, it does do fast (and less fast), transparent compression per mounted volume (dataset) ... and datasets can be made to mount wherever you want. So it would be perfectly straightforward to put all of MacPorts on a specific dataset, without the software even being aware of the fact. It'll be a little less straightforward if there is a dependency on HFS+ features, for example because you insist on Spotlight indexing or TM backups. NB: ZFS's default compression algorithm actually tends to speed up I/O on spinning storage, as CPUs are usually faster than HDDs.

I would say it is not MacPorts' job to suggest users use a third-party filesystem. We should assume MacPorts is installed on Apple's default filesystem and improve MacPorts to optimize that experience.

To come back to the patch: could this be made to apply to source code extraction too? Source trees can be huge and while users won't usually keep them around the same is probably not true for port developers. But maybe bsdtar has an option to "extract with compression"?

I would not recommend that. Building from source is a CPU-bound activity; you don't want to slow down the CPU further with compression/decompression during that.

comment:23 in reply to:  22 ; Changed 10 years ago by RJVB (René Bertin)

Replying to ryandesign@…:

I would say it is not MacPorts' job to suggest users use a third-party filesystem. We should assume MacPorts is installed on Apple's default filesystem and improve MacPorts to optimize that experience.

"Job" would probably be an inappropriate term. I'd say MacPorts could easily *play a role* in making suggestions how to optimise the experience to different requirements. NB: "optimising 'that' experience" isn't a perfectly defined concept as long as you don't take user requirements into account... ;)

I would not recommend that. Building from source is a CPU-bound activity; you don't want to slow down the CPU further with compression/decompression during that.

Here's what I just posted on the ML:

As an example: the Qt 5.4.0 source tree takes up 1.7Gb normally, impressively down to 400Mb extracted with --hfsCompression, with a 2x slower extraction.

Use performance examples using tcsh' time command:

Without compression, 1st time
#> time fgrep QGenericUnixServices -R qt-everywhere-opensource-src-5.4.0/
snip

> 3.202 user_cpu 21.238 kernel_cpu 1:49.77 total_time 22.2%CPU {0W 0X 0D 0K 6875136M 0F 63303R 6572I 7752O 0r 0s 0k 171188w 36188c}
Repeated immediately

> 1.366 user_cpu 2.381 kernel_cpu 0:04.89 total_time 76.4%CPU {0W 0X 0D 0K 6895616M 0F 67970R 1I 372O 0r 0s 0k 5272w 482c}

With compression: 1st time
#> time fgrep QGenericUnixServices -R qt-everywhere-opensource-src-5.4.0/

> 1.875 user_cpu 36.627 kernel_cpu 1:54.56 total_time 33.5%CPU {0W 0X 0D 0K 6834176M 49F 72987R 26468I 13928O 0r 0s 0k 74067w 14795c}
Repeated immediately

> 1.618 user_cpu 15.632 kernel_cpu 0:46.32 total_time 37.2%CPU {0W 0X 0D 0K 6846464M 0F 71258R 25940I 5O 0r 0s 0k 48798w 7706c}

So there is little to no performance hit when reading from disk as long as the system isn't CPU bound, but for some reason the file/disk cache is less effective.

Compression-on-extraction can be activated on a per-port basis by adding extract.post_args-append --hfsCompression to the Portfile, but I haven't yet found out how to ensure that port:bsdtar is used, as the bsdtar that Apple ship with 10.9 doesn't support the argument.

[OT] Grmmblmbllgrmmbl! They develop a nifty fs feature allow to save considerable amounts of space, and then make it almost impossible to reap the benefits of that. And still some are saying that that's *not* to drive people to buy new (storage) hardware sooner ... O:-) OT

Version 0, edited 10 years ago by RJVB (René Bertin) (next)

comment:24 in reply to:  23 ; Changed 10 years ago by ryandesign (Ryan Carsten Schmidt)

I suppose modern Macs have many CPU cores and parallel builds often cannot build everything in parallel, leaving some cores free some of the time, so using hfscompression might not use that much more real time. But I guess we're only talking about hfscompressing the files that came out of the source tarball, not all the object files getting compiled, so it will only shrink the size of a built work directory a small amount.

comment:25 Changed 10 years ago by fracai

Cc: arno@… added

Cc Me!

comment:26 in reply to:  24 Changed 10 years ago by RJVB (René Bertin)

Replying to ryandesign@…:

I suppose modern Macs have many CPU cores and parallel builds often cannot build everything in parallel, leaving some cores free some of the time, so using hfscompression might not use that much more real time. But I guess

Exactly.

we're only talking about hfscompressing the files that came out of the source tarball, not all the object files getting compiled, so it will only shrink the size of a built work directory a small amount.

In any case, HFS doesn't have transparent compression. So unless/until clang is redesigned to write its output leveraging compression (and it could just as well use any other form of compression for that), the build directory will not be compressed.

There would be a lot to say to compress the build directory. I often leave it and thus the object files around because they appear to be required for debugging or even getting a sensible backtrace (with line numbers). And of course it allows incremental builds, which speeds up port development considerably.

I'm currently looking at how much one can gain by post-hoc build directory compression by creating a (slightly compressed) tarball and then untarring it with --hfsCompression (the Qt 5.4.0 out-of-source build directory is a brute 26Gb, giving a 5.5Gb .tar.gz2 archive ... and 5.6Gb extracted to a hfsCompressed directory :)) I did get this error: build/qtbase/src/corelib/global/qconfig.cpp: Cannot restore xattr:com.apple.decmpfs so I going to have to spend another 45min or so checking if everything was extracted ...

comment:27 Changed 10 years ago by fracai

Rather than creating a temporary tarball, why not compress in place? afsctool (available in ports, though it points to old site information that should be updated to https://brkirch.wordpress.com/afsctool/; I've opened a ticket for this #46683) can do this or I'd think you'd be able to use ditto as well. If ditto can't use the same source and destination I think it'd be simple enough to rename each file as $file.hfscompression and then use mv to replace the uncompressed version.

Last edited 10 years ago by fracai (previous) (diff)

comment:28 Changed 10 years ago by RJVB (René Bertin)

Mostly because I simply wasn't aware it exists ... Also, how does that tool handle hard links (it can detect them, but then what)?

Last edited 10 years ago by RJVB (René Bertin) (previous) (diff)

comment:29 Changed 10 years ago by RJVB (René Bertin)

One observation: I compressed a ${destroot} folder like that, before running sudo port install. That led to an error claiming that one of the files couldn't be removed from the temporary activation directory for lack of permissions. That was a read-only file, should that make any difference, and the same that gave the xattr:com.apple.decmpfs error above.

comment:30 Changed 9 years ago by andre.david@…

Cc: andre.david@… added

Cc Me!

comment:31 Changed 7 years ago by eborisch (Eric A. Borisch)

Cc: eborisch added

comment:32 Changed 7 years ago by eborisch (Eric A. Borisch)

I've brought this up again on the mailing list, checking with a simple test for a bsdtar that recognizes --hfsCompression at runtime. (A modern libarchive's does.) Patch will be attached presently.

Changed 7 years ago by eborisch (Eric A. Borisch)

Opportunistic bsdtar w/ --hfsCompression usage during activation.

comment:33 Changed 7 years ago by 1-61803

Cc: 1-61803 added

comment:34 Changed 7 years ago by eborisch (Eric A. Borisch)

I've created a pull request on GH.

comment:35 Changed 7 years ago by Eric A. Borisch <eborisch@…>

Resolution: fixed
Status: newclosed

In b833d34823a2f8c5c8b40a30efbc078e0c3de2dc/macports-base (master):

registry2.0: Apply HFS+ compression on extraction

HFS+ compression is applied if a bsdtar is available that
supports --hfsCompression.

Closes: #36560

comment:36 Changed 4 years ago by jmroot (Joshua Root)

Milestone: MacPorts 2.5.0
Note: See TracTickets for help on using tickets.