Opened 3 months ago

Last modified 2 months ago

#70556 new enhancement

"base" seemingly locking up for sometimes very long

Reported by: RJVB (René Bertin) Owned by:
Priority: Normal Milestone:
Component: base Version:
Keywords: Cc: barracuda156
Port:

Description (last modified by ryandesign (Ryan Carsten Schmidt))

I'm not the only one who has noticed or thought at times that the port command "hung" because apparently nothing was happening e.g. at the end of a command. I've also already had a registry corruption at least once in the past because I had interrupted (^C) a command that to the best of my recollection wasn't even one that should have touched the registry.

I know I'm also not the only one who has observed the "write-ahead log" registry.db-wal file getting larger than the db itself, and sometimes just persisting until for some unclear reason it disappears (which usually means one of the aforementioned "freezes").

I have a custom script written in Tcl that uses code from "base" to check a newly generated destroot tree for conflicts, new files, updated vs. touched files or files that will go missing compared to the currently installed version/variant of a port. This command routinely takes a comparatively long time to start up the 1st time after a port destroot command (curiously more noticeably on my Mac than on my significantly less powerful Linux set-up).

That all comes across as a bit sub-optimal:

  • long operations should be announced and if possible (optionally) provide a progress indicator
  • leaving db operations "in limbo" in the -wal file seems like a recipe for disaster (esp. if they're not just backups)

All in all it sounds like port is missing the equivalent of the fsck and/or vacuum maintenance commands provided by KDE PIM's akonadictl driver (KDE Mail stores emails, filters etc. in -by default an sqlite3- database).

Given how large the registry can become and how well it compresses (better than 2x with just LZ4 online compression on ZFS) it seems like an optimisation command would be welcome too.

I've already tried to use the maintenance/optimisation commands in sqlitebrowser but that utility warns about custom tables that made be abort the attempt.

On a related note: it wouldn't hurt to have an automatic backup mechanism that is removed after succesful and total completion of an operation except in case of a DB version update. I'm also not the only person who found out the hard way that you can't (always?) roll back from a MacPorts "base" update, and that DB updates can happen unexpectedly because of operator error e.g. after creating a secondary MacPorts install.

Change History (24)

comment:1 Changed 3 months ago by RJVB (René Bertin)

(My DB knowledge is just good enough to know that this is not an area where I should go around hacking in the code to come up with a proposition...!)

comment:2 Changed 3 months ago by barracuda156

Not too long ago with some earlier MacPorts version I had frequent “freeze-like” conditions when port command apparently did nothing and could not be quit normally: typically that happened when I ran sudo port upgrade for a port which version was not higher than installed one (i.e. -force was needed, which I forgot to add), but also after normal successful updates of some ports. Then the issue resolved, and I probably never had it, until 2.10.0. Today I had a similar freeze, and had to restart the machine.

While admittedly I have many ports installed on the Quad, so indexing is expected not to be instantaneous, it feels like some change after 2.9.3 made it worse. (This is a one-day experience, so take it FWIW. No strict measurements and repeated tests we done.)

comment:3 Changed 3 months ago by RJVB (René Bertin)

I can only compare to v2.7.1 (.55, i.e. master branch) I've been running until recently, and overall 2.10.0 is about 2x slower. Or, probably more exactly, *startup* is about 2x slower (as in simple commands like port work, port dir, port info etc.), to the point that I had to edit a number of my personal helper scripts that evoke multiple such commands and became just too slow.

That however has nothing to do with the original issue.

comment:4 Changed 3 months ago by RJVB (René Bertin)

I think this is about the only possible DB maintenance we can trigger with only routines from "base": a forced VACUUM.

It does seem to clean up lingering -wal files.

port-vacuum.tcl

#!/usr/bin/env port-tclsh

set t0 [clock clicks -millisec]
package require macports 1.0
package require registry 1.0
puts "Startup took [expr {([clock clicks -millisec]-$t0)/1000.}] sec"

array set ui_options        {}
array set global_options    {}
array set global_variations {}

set ui_options(ports_debug) yes
set ui_options(ports_verbose) yes

### init
puts -nonewline "Initialising MacPorts ..."
flush stdout
set t0 [clock clicks -millisec]
mportinit ui_options global_options global_variations
puts " done ([expr {([clock clicks -millisec]-$t0)}] ms)"

ui_msg "MacPorts version [macports::version], registry version [registry::metadata get version]"

set maybe ""
if {[catch {registry::set_needs_vacuum}]} {
     ui_warn "Your MacPorts version doesn't allow to program a registry VACUUM operation"
     set maybe "maybe "
} else {
     ui_msg "Called registry::set_needs_vacuum"
}

puts -nonewline "Closing registry and ${maybe}performing vacuum ..."
flush stdout
set t0 [clock clicks -millisec]
mportshutdown
puts " done ([expr {([clock clicks -millisec]-$t0)}] ms)"

(edited for the timing freaks like me)

Last edited 3 months ago by RJVB (René Bertin) (previous) (diff)

comment:5 Changed 3 months ago by ryandesign (Ryan Carsten Schmidt)

Description: modified (diff)

As of MacPorts 2.0.0, it does vacuum the registry; see [7d950c17e4be1c12ea13d487997806e5885e8f20/macports-base].

It used to do it after every operation but this was slow every time. As of MacPorts 2.9.0, it only vacuums if it's worth it, so now it's only slow at those times; see [c82a2699117181b1fedd490c62a3efffca7c8947/macports-base].

comment:6 Changed 3 months ago by RJVB (René Bertin)

I think my master-based install from a bit after 2.7.1 already had an internal trigger to determine whether a vacuum step was needed, but without feedback it's hard to say whether this was just "always". Either way the time spent "cleaning up" is variable even in that release, and vacuuming is probably not the only way one can optimise a DB. Have the settings (pragmas) been tuned for performance or for space-efficiency, for instance?

And isn't it a telltale sign that even a simple operation creates a -wal file that's as big as or even bigger than the db itself?

comment:7 in reply to:  5 Changed 3 months ago by barracuda156

Replying to ryandesign:

Ryan, something in 2.10.0 brought back the old bug from pre-2.9.x times. Now any force upgrade or uninstall leads to ridiculous time of freezing with doing nothing. It was the case before, but it was fixed (whether on purpose or accidentally), and I never had this issue until now. With 2.10.0 this is back, and reproducibly happens. Admittedly, I have a lot of ports installed, which may matter, but this bug was not present during 2.9.x versions (I do not remember re 2.8.x now).

comment:8 Changed 3 months ago by barracuda156

Cc: barracuda156 added

comment:9 Changed 3 months ago by jmroot (Joshua Root)

Cc: jmroot removed

comment:10 Changed 3 months ago by RJVB (René Bertin)

(apparently not a high-priority issue...)

Anyway, I see reg_vacuum() calculating the free space in MB by comparing to 1000*1000. Shouldn't that be 1024*1024 or are we sponsored by drive manufacturers? ;)

comment:11 Changed 3 months ago by RJVB (René Bertin)

FWIW, I just tweaked the code a bit so that registry::set_needs_vacuum also forces an actual vacuum step, and reg_vacuum traces what it's doing and how much free space there was to begin with.

The test install where I did this has a registry that's only 4MB (real MBs ;)) large, but the bit of my script above

puts "Closing registry and ${maybe}performing vacuum ..."
flush stdout
set t0 [clock clicks -millisec]
mportshutdown
puts "All done ([expr {([clock clicks -millisec]-$t0)}] ms)"

reports an execution time of about 240ms when actually doing the VACUUM (with 0 bytes free space) compared to only 2ms when the operation is skipped.

Is it conceivable that the registry ends up in a state where a single vacuum operation will never get all free space, or worse, where it's not possible to recuperate all free space? Supposing an sqlite DB file can be compared to a filesystem such situations can probably arise when the free space is fragmented enough.

Another thought: is there an architecture-related limit here, iow, could this have something to do with being on a 32bit machine?

comment:12 Changed 3 months ago by RJVB (René Bertin)

three more thoughts related to size:

  • sqlite3 is a relational database, and is currently used to store absolute paths. Couldn't that be optimised at least for the major common shared path, libdir, bindir etc. or *at least* prefix? Every access to the registry goes via one of the 2 binary extensions so if sqlite cannot already expand any variables, the appropriate extension can. Idem for mapping absolute paths to the way they'd be stored. So all the dependent Tcl code can remain oblivious of the change.
  • All those repeating shared path components must explain why I see a >2.3x compression with LZ4 (and >10x with xz -9!). I once modded a project that stored lots of code snippets via the filesystem to store them LZ4-compressed in an lmdb key/value database. LZ4 has very little compression overhead if you let it build up a dictionary so this change sped up the application considerably.
  • And a bit of a wild one: how about splitting the registry DB over two files, one for the inactive ports, one for the active ports? My guess is that the latter one will see the brunt of all operations, but should also remain about as compact as possible. Here too it might be possible to hide the design change completely in the C extensions.
Last edited 3 months ago by RJVB (René Bertin) (previous) (diff)

comment:13 Changed 3 months ago by RJVB (René Bertin)

In my Tcl scripts, I know put this before the call to mportinit:

set runner [thread::create -preserved [list thread::wait]]
thread::send $runner [list after 250 {puts stderr "Initialising MacPorts..."}] timerID

and a single line after that procedure terminate successfully:

thread::send -async $runner [list after cancel $timerID]

Later on I reuse the same thread to install a "patience timer" for mportshutdown.

I haven't looked if Tcl has repeating timers that can be set as easily but if so that could be a nice thing to have, making it print out something periodically to show that port is (presumably) not frozen.

comment:14 Changed 3 months ago by RJVB (René Bertin)

I see the DB page_size has been kept at the old 1024 bytes default value. It's 4x that nowadays and from what I understand it should be possible to bump the value for an existing database without negative effects. An idea and if so, what would be an optimum choice?

CF https://www.sqlite.org/pgszchng2016.html

Last edited 3 months ago by RJVB (René Bertin) (previous) (diff)

comment:15 Changed 3 months ago by barracuda156

I just uninstalled a port and tried not to kill tclsh8.6 immediately. It freezes for phenomenal amount of time. Perhaps already ~20 minutes and still stuck.

Process sample, FWIW:

Analysis of sampling tclsh8.6 (pid 15449) every 1 millisecond
Call graph:
    2225 Thread_3703
      2225 start
        2225 main
          2225 Tcl_MainEx
            2225 Tcl_FSEvalFileEx
              2225 TclEvalEx
                2225 TclNRRunCallbacks
                  2225 registry_close
                    2225 reg_vacuum
                      2225 sqlite3_step
                        2225 sqlite3VdbeExec
                          2225 sqlite3RunVacuum
                            2221 sqlite3_backup_step
                              2122 backupOnePage
                                2108 getPageNormal
                                  1274 pagerStress
                                    1272 pagerWalFrames
                                      1122 walFindFrame
                                        1090 walFindFrame
                                        32 walHashGet
                                          32 walHashGet
                                      146 walWriteOneFrame
                                        143 seekAndWriteFd
                                          143 pwrite$UNIX2003
                                            143 pwrite$UNIX2003
                                        3 walEncodeFrame
                                          2 walChecksumBytes
                                            2 walChecksumBytes
                                          1 sqlite3Put4byte
                                            1 sqlite3Put4byte
                                      3 pagerWalFrames
                                      1 bcmp
                                        1 bcmp
                                    1 pagerStress
                                    1 subjournalPageIfRequired
                                      1 subjournalPageIfRequired
                                  833 readDbPage
                                    801 walFindFrame
                                      766 walFindFrame
                                      35 walHashGet
                                        35 walHashGet
                                    29 unixRead
                                      28 seekAndRead
                                        28 pread$UNIX2003
                                          28 pread$UNIX2003
                                      1 unixRead
                                    2 readDbPage
                                    1 readDbPage
                                      1 unixRead
                                        1 unixRead
                                  1 pcache1FetchStage2
                                    1 pcache1RemoveFromHash
                                      1 pcache1RemoveFromHash
                                6 __memcpy
                                  6 __memcpy
                                3 backupOnePage
                                2 sqlite3PagerWrite
                                  2 sqlite3PagerWrite
                                1 __moddi3
                                  1 __moddi3
                                1 dyld_stub___moddi3
                                  1 dyld_stub___moddi3
                                1 memcpy
                                  1 memcpy
                              62 sqlite3_backup_step
                              28 getPageNormal
                                20 readDbPage
                                  19 unixRead
                                    19 seekAndRead
                                      19 pread$UNIX2003
                                        19 pread$UNIX2003
                                  1 readDbPage
                                7 pcache1FetchStage2
                                  6 pcache1Alloc
                                    4 sqlite3Malloc
                                      3 sqlite3MemMalloc
                                        1 dyld_stub_malloc_zone_malloc
                                          1 dyld_stub_malloc_zone_malloc
                                        1 malloc_zone_malloc
                                          1 malloc_zone_malloc
                                            1 szone_malloc
                                              1 szone_malloc
                                        1 pthread_mutex_unlock
                                          1 pthread_mutex_unlock
                                      1 pthread_mutex_lock
                                        1 pthread_mutex_lock
                                    1 pcache1Alloc
                                    1 pthread_mutex_lock
                                      1 pthread_mutex_lock
                                  1 pcache1FetchStage2
                                1 pcache1Fetch
                                  1 pcache1Fetch
                              6 pcache1Free
                                2 pcache1Free
                                2 sqlite3MemSize
                                  2 sqlite3MemSize
                                1 pthread_mutex_lock
                                  1 pthread_mutex_lock
                                1 szone_size
                                  1 szone_size
                              3 szone_free
                                3 szone_free
                            3 sqlite3PcacheRelease
                              3 sqlite3PcacheRelease
                            1 pcache1RemoveFromHash
                              1 pcache1RemoveFromHash

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
        walFindFrame        1856
        pwrite$UNIX2003        143
        walHashGet        67
        sqlite3_backup_step        62
        pread$UNIX2003        47
        __memcpy        6

comment:16 Changed 3 months ago by RJVB (René Bertin)

Do you have sqlitebrowser installed, or do you know how else to determine what your registry's pagesize is set to? (https://www.sqlite.org/pragma.html#pragma_page_size)

That, plus what's the architecture of the machine you're on and how fragmented is the drive holding the registry (if it's a spinning disk)?

Your registry is huge compared (5-10x mine) and if we're talking about one of your retro-computers it seems inevitable that DB maintenance is going to be slow for any of the reasons hinted at above.

This thread had me googling for sqlite3 optimisations, and the ones I found were

  • use a cheap transparent filesystem compression like LZ4
  • set pagesize to 64k
  • on ZFS, set the dataset recordsize to 64k too

I can confirm that the registry DB compresses awfully well, no wonder since it must be full of repeats of common paths, and this will of course improve I/O speeds maybe even on pre-nvme SSDs.

Transparent FS compression doesn't exist on Mac unless you install ZFS (www.o3x.org) but while it has become stable and fast I wouldn't advise it unless you keep very regular backups.

I think you should evaluate how much old luggage there is in your installation, and uninstall anything you really don't need anymore. If you make a backup of $prefix/var/macports/registry and the corresponding files under $prefix/var/macports/software you can always reinstall ports you shouldn't have uninstalled, with a bit of manual hackery. But, this cleanup will probably be a lengthy process: I have no idea how good MacPorts is in automatically uninstalling only the appropriate versions of dependencies and/or dependents of a given port version, while leaving all the other versions. Not very good I'm afraid, if it is any indication that port dependents foo will list every port you have installed that once depended on port:foo.

comment:17 in reply to:  16 Changed 2 months ago by barracuda156

Replying to RJVB:

Do you have sqlitebrowser installed, or do you know how else to determine what your registry's pagesize is set to? (https://www.sqlite.org/pragma.html#pragma_page_size)

I can try to check that.

BTW, once I tolerated that procedure once without killing it, it is back to normal now, like it was with 2.9.x (or at least it seems so, no more ridiculous freezing).

That, plus what's the architecture of the machine you're on and how fragmented is the drive holding the registry (if it's a spinning disk)?

It is an SSD, but with no too much space left (at the time of original post perhaps some 10 GB). I did some cleaning ever since, so that ccache and other stuff can use more disk space when needed.

Transparent FS compression doesn't exist on Mac unless you install ZFS (www.o3x.org) but while it has become stable and fast I wouldn't advise it unless you keep very regular backups.

I am not sure it is even supported in any build of 10.6 (a few of them has some ZFS-related components, not sure if FAT or x86-only) and won’t risk using it in any case.

I think you should evaluate how much old luggage there is in your installation, and uninstall anything you really don't need anymore. If you make a backup of $prefix/var/macports/registry and the corresponding files under $prefix/var/macports/software you can always reinstall ports you shouldn't have uninstalled, with a bit of manual hackery. But, this cleanup will probably be a lengthy process: I have no idea how good MacPorts is in automatically uninstalling only the appropriate versions of dependencies and/or dependents of a given port version, while leaving all the other versions. Not very good I'm afraid, if it is any indication that port dependents foo will list every port you have installed that once depended on port:foo.

Somewhat on a side note, if there a way to deactivate all ports and then activate back specifically those which were active prior to that? Some port conflict with each other, so activate installed will not work, and sorting that by hand gonna be annoying. A simple use case: I want to make a clean build of something, being sure that no irrelevant stuff gets linked in (build-bot-style build). But then I want my ports active back, because otherwise I do not know what gets updated.

comment:18 Changed 2 months ago by RJVB (René Bertin)

On the last question: port installed and active outputs what you'd expect it to, but you'd need to figure out how to use it to create a list you can pass to port activate. I'm not really certain how you'd got about deactivating all ports though, and reactivating everything *in the right order* might be even more tricky.

But IIRC there's a script to upgrade an installation after an OS upgrade which could be of help here.

comment:19 in reply to:  10 ; Changed 2 months ago by neverpanic (Clemens Lang)

Cc: neverpanic removed

On the topic of this ticket:

  • If you always cancel the vaccum because it takes long, you don't get to complain about registry operations being slow. Let it finish, that's what it's supposed to do.
  • If you're on old macOS versions, you're likely also using a very outdated copy of SQLite, since MacPorts uses the OS copy. Chances are a newer SQLite might be faster.

Replying to RJVB:

(apparently not a high-priority issue...)

It doesn't seem to be a common issue, so I'm not surprised. Just because it's slow for you doesn't make it an emergency for everybody else.

Replying to barracuda156:

Somewhat on a side note, if there a way to deactivate all ports and then activate back specifically those which were active prior to that? Some port conflict with each other, so activate installed will not work, and sorting that by hand gonna be annoying.

port snapshot and port restore, parts of the new automatic migration procedure, can probably do that.

Finally, a note on ticket hygiene: Please don't randomly Cc me to ticket you file because you believe I should do anything about your issue. I'm doing this in my free time, I have no obligation whatsoever to solve a problem you seem to have, nor do I provide any guarantees on MacPorts' performance on your particular machines.

comment:20 in reply to:  19 ; Changed 2 months ago by RJVB (René Bertin)

Replying to neverpanic:

  • If you're on old macOS versions, you're likely also using a very outdated copy of SQLite, since MacPorts uses the OS copy. Chances are a newer SQLite might be faster.

Note that @barracuda156 is at the current MP version with its own sqlite3 copy. While newer versions might be faster chances are that you'll only benefit optimally from those performance improvements with a new database.

Finally, a note on ticket hygiene: Please don't randomly Cc me to ticket you file because you believe I should do anything about your issue. I'm doing this in my free time, I have no obligation whatsoever to solve a problem you seem to have, nor do I provide any guarantees on MacPorts' performance on your particular machines.

There was none of that. You were CC'ed because the commit log suggests that you're among the few core devs who are familiar with the registry design, who could explain what kind of DB tweaks could be tried safely, what kind of customisation was performed on the registry database that "stock" sqlite3 doesn't want to modify it and how to work around that, etc.

I'm probably also one of the select very few who observe that the community "entraide" spirit has largely disappeared around here...

comment:21 in reply to:  20 ; Changed 2 months ago by neverpanic (Clemens Lang)

Replying to RJVB:

Replying to neverpanic:

Finally, a note on ticket hygiene: Please don't randomly Cc me to ticket you file because you believe I should do anything about your issue. I'm doing this in my free time, I have no obligation whatsoever to solve a problem you seem to have, nor do I provide any guarantees on MacPorts' performance on your particular machines.

There was none of that. You were CC'ed because the commit log suggests that you're among the few core devs who are familiar with the registry design, who could explain what kind of DB tweaks could be tried safely>

What makes you think I want to spend my time to explain what kind of DB tweaks could be tried safely? This seems to be an issue that doesn't affect our general user base, just a few corner cases on old machines. I do not want to spend my time remotely debugging or helping to debug those. If you haven't read it by now, my support policy is macOS (current-2;current), and it doesn't seem that that's the case here. And that's precisely why I don't want to be Cc'd.

what kind of customisation was performed on the registry database that "stock" sqlite3 doesn't want to modify it and how to work around that, etc.

See port notes macports.sqlext.

I'm probably also one of the select very few who observe that the community "entraide" spirit has largely disappeared around here...

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED.

I try to be helpful to people in my support policy. You're typically not one of them.

Those people outside of my support policy are welcome to open pull requests to solve their own issues. You've done that for some issues recently, and I fully intend to still review those. Outside of those, I largely ignore your emails, because they always seem to concern some corner case, are possibly related to a local modification you've made, or are on some non-standard system (Linux, old macOS, etc). It is absolutely not fun for me to look at your requests (just look at this one, 20 rather long comments with detail I really don't have the time to read and understand), so it's baffling to me why you seem to think you'd be entitled to "entraide" spirit here.

comment:22 in reply to:  21 Changed 2 months ago by RJVB (René Bertin)

Replying to neverpanic:

Apart from observing that I clearly have a less exclusive idea of the principle of entraide (which I may need to reconsider as far as MacPorts is concerned) I'm going to ignore this bunch of vitriol.

However, macports.sqlext seems to be outdated or otherwise ineffective with current sqlite versions or the instructions how to use it are incomplete. I have the version from the macports repo (master branch of a few weeks old) built and installed and just loading it changes nothing.

I will point out this though:

  • the brunt of the user base will probably accept slowness of certain operations as normal, prefer to create a fresh install of the few things they need after upgrading their OS rather than following complicated upgrading instructions ... or have moved to HB long since
  • I am not going to be convinced easily that anyone who did end up with a decades-old registry of several Gbs on a current OS version is going to be unaffected by any of the symptomes described above.
  • Until just now I wouldn't have believed that anyone with any amount of heart for the code s/he wrote would not have been incited to ask themselves if there were not some way to optimise the registry database for instance by determining an optimal pagesize.

From what I understand there is 1 (ONE) frickin parameter that can be tweaked and that is likely to have an impact on performance.

How much time could it take to indicate how to change that parameter safely?!

comment:23 Changed 2 months ago by RJVB (René Bertin)

Last edited 2 months ago by ryandesign (Ryan Carsten Schmidt) (previous) (diff)

comment:24 Changed 2 months ago by RJVB (René Bertin)

Just managed a >10x decrease in the time VACUUM takes combined with a slight reduction in file size (for a larger installation).

Note: See TracTickets for help on using tickets.