Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#61154 closed defect (fixed)

Error: port deactivate failed: this entry does not own the given file

Reported by: ryandesign (Ryan Carsten Schmidt) Owned by: jmroot (Joshua Root)
Priority: Normal Milestone: MacPorts 2.7.0
Component: base Version: 2.6.3
Keywords: Cc: jmroot (Joshua Root)
Port:

Description

After a power failure on the 10.11 buildbot worker, I cannot deactivate a port:

$ /opt/local/bin/port installed gettext
The following ports are currently installed:
  gettext @0.19.8.1_2 (active)
$ sudo /opt/local/bin/port -f deact gettext
--->  Deactivating gettext @0.19.8.1_2
Error: Failed to deactivate gettext: this entry does not own the given file
Error: See /opt/local/var/macports/logs/_opt_local_var_macports_registry_portfiles_gettext-0.19.8.1_2_5f96e69dd9bdd69fcd8bbcc0e1b726ec1f4f302db28dd0b7a28ef6326418e34c-4461/gettext/main.log for details.
Warning: Failed to execute portfile from registry for gettext @0.19.8.1_2
--->  Deactivating gettext @0.19.8.1_2
Error: port deactivate failed: this entry does not own the given file
$ /opt/local/bin/port installed gettext
The following ports are currently installed:
  gettext @0.19.8.1_2 (active)
$ 

Attachments (1)

main.log (161.1 KB) - added by ryandesign (Ryan Carsten Schmidt) 4 years ago.

Download all attachments as: .zip

Change History (13)

Changed 4 years ago by ryandesign (Ryan Carsten Schmidt)

Attachment: main.log added

comment:1 Changed 4 years ago by ryandesign (Ryan Carsten Schmidt)

To clarify, the power outage occurred while the worker was busy doing builds, possibly while it was modifying the registry.

Looking in src/cregistry/entry.c, reg_entry_deactivate prints "this entry does not own the given file" whenever the SQL statement UPDATE registry.files INDEXED BY file_actual SET active=0 WHERE actual_path=? AND id=? does not change any rows. It appears to assume the only reason why a row is not changed is an ownership problem. But in this case, I wonder if the problem is really that the row was already changed before the power went out.

Poking around in the registry with sqlite3, I see that most of gettext's files are marked as active, but some aren't.

sqlite> select * from ports where id=9837;
9837|gettext|5f96e69dd9bdd69fcd8bbcc0e1b726ec1f4f302db28dd0b7a28ef6326418e34c-4461|/opt/local/var/macports/software/gettext/gettext-0.19.8.1_2.darwin_15.x86_64.tbz2|2|0.19.8.1|2|||installed|1595235473|image|x86_64|0|darwin|15|libc++|0
sqlite> select count(*) from files where id=9837 and active=1;
1772
sqlite> select count(*) from files where id=9837 and active=0;
158

I suppose a fix would be for me to mark those inactive files as active; then port should be able to deactivate it.

My hope would be that MacPorts would perform sets of related database operations using transactions that either commit or rollback as a whole so that there is no possibility to leave the database in an inconsistent state. However, based on previous problems I've encountered and maybe this one as well, it's evident to me that MacPorts either is not doing that or is not doing it correctly.

comment:2 in reply to:  1 Changed 4 years ago by ryandesign (Ryan Carsten Schmidt)

Replying to ryandesign:

I suppose a fix would be for me to mark those inactive files as active; then port should be able to deactivate it.

Hmm, that did not fix it.

comment:3 Changed 4 years ago by ryandesign (Ryan Carsten Schmidt)

I improved the error message to say which file it's having trouble deactivating in the registry. With that change, sudo port deact gettext now shows:

Warning: Failed to open Portfile from registry for gettext @0.19.8.1_2
--->  Deactivating gettext @0.19.8.1_2
Error: port deactivate failed: /opt/local/share/doc/gettext/examples/hello-c++-kde/po/mt.po is not owned by this entry

sqlite3 confirms that it cannot find that file when searching for it using = (like port does):

$ sqlite3 registry.db
SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
sqlite> select * from files where actual_path = "/opt/local/share/doc/gettext/examples/hello-c++-kde/po/mt.po";

But oddly it can find it when searching using like:

sqlite> select * from files where actual_path like "/opt/local/share/doc/gettext/examples/hello-c++-kde/po/mt.po";
9837|/opt/local/share/doc/gettext/examples/hello-c++-kde/po/mt.po|/opt/local/share/doc/gettext/examples/hello-c++-kde/po/mt.po|1|0

And oddly this only affects this file, not other similar files:

sqlite> select * from files where actual_path = "/opt/local/share/doc/gettext/examples/hello-c++-kde/po/tr.po";
9837|/opt/local/share/doc/gettext/examples/hello-c++-kde/po/tr.po|/opt/local/share/doc/gettext/examples/hello-c++-kde/po/tr.po|1|0
sqlite> select * from files where actual_path = "/opt/local/share/doc/gettext/examples/hello-c++-kde/po/vi.po";
9837|/opt/local/share/doc/gettext/examples/hello-c++-kde/po/vi.po|/opt/local/share/doc/gettext/examples/hello-c++-kde/po/vi.po|1|0

comment:4 Changed 4 years ago by ryandesign (Ryan Carsten Schmidt)

Turns out there was a problem with an index and rebuilding the index fixed it:

sqlite> .load macports.sqlext
sqlite> pragma integrity_check;
row 907518 missing from index file_actual
row 907615 missing from index file_actual
row 907621 missing from index file_actual
row 907730 missing from index file_actual
row 907787 missing from index file_actual
row 907839 missing from index file_actual
row 907904 missing from index file_actual
row 907968 missing from index file_actual
row 908027 missing from index file_actual
row 908072 missing from index file_actual
row 908134 missing from index file_actual
row 908188 missing from index file_actual
row 908292 missing from index file_actual
row 908345 missing from index file_actual
row 908399 missing from index file_actual
sqlite> reindex file_actual;
sqlite> pragma integrity_check;
ok
sqlite> select * from files where actual_path = "/opt/local/share/doc/gettext/examples/hello-c++-kde/po/mt.po";
9837|/opt/local/share/doc/gettext/examples/hello-c++-kde/po/mt.po|/opt/local/share/doc/gettext/examples/hello-c++-kde/po/mt.po|1|0
sqlite> ^D
$ sudo /opt/local/bin/port deact gettext
--->  Deactivating gettext @0.19.8.1_2
--->  Cleaning gettext

But I'm not sure what we can do to prevent corrupt indexes from occurring in the future.

comment:5 Changed 4 years ago by jmroot (Joshua Root)

I wonder if the VM's virtual disk properly supports the mechanisms that sqlite uses to guarantee atomicity, like the F_BARRIERFSYNC and F_FULLFSYNC fcntls. Those operations would have to be done on the underlying real hardware as well to actually work.

comment:6 Changed 4 years ago by ryandesign (Ryan Carsten Schmidt)

VMware documentation states that it does guarantee atomicity of virtual disk operations, however we have encountered previous corruptions of the registry SQLite database that suggest that this may not be true.

https://www.mail-archive.com/sqlite-users@mailinglists.sqlite.org/msg113012.html

comment:7 Changed 4 years ago by jmroot (Joshua Root)

The link to the VMware docs seems to have broken, but from your quote in that thread, it sounds like fsync works correctly in a VM. It's not clear whether the fcntls do. F_FULLFSYNC goes beyond waiting for the drive to acknowledge the write and tells it to actually flush the write buffer and ensure everything is on the platter before saying it's done.

We could try using WAL when available, and only checkpoint when closing the registry. That should at least reduce the window in which corruption can occur if the system shuts down unexpectedly.

comment:8 Changed 4 years ago by ryandesign (Ryan Carsten Schmidt)

In 9e45d667ffad518813bdaad9bb26c357e7f038df/macports-base (master):

Improve deactivate/unmap failure error message

When the registry fails to deactivate or unmap a file, say which file it
is.

See: #61154

comment:9 Changed 4 years ago by neverpanic (Clemens Lang)

Yes, we should probably switch to WAL – we keep having various problems with the database as reported on the list, and not only when people are running MacPorts in virtual machines. Either crash recovery of SQLite on macOS isn't as good as upstream thinks it is, or that's because we're not really using WAL mode, which everybody else nowadays seems to do.

Adding a pragma when initially connecting to the database should do the trick, but not that we support versions of SQLite that do not have WAL mode.

I'm pretty sure we're already running all queries in a transaction, btw. Speed would be abysmal if we didn't, and the last time we had issues with that I went through all queries that modify the database and ensured that those are in transactions.

comment:10 Changed 4 years ago by jmroot (Joshua Root)

Owner: set to jmroot
Resolution: fixed
Status: newclosed

In 5779dbc66da9acb3e77df1ef1996772a925dd928/macports-base (master):

Enable fullfsync and WAL (if possible) for registry

Using fullfsync should make the registry much less likely to experience
corruption in scenarios like writing to it when power fails.
Write-ahead logging (WAL) should reduce the window in which syncing
needs to occur, hopefully improving reliability further as well as
improving performance. However, WAL is only practical for us to use
with SQLite versions 3.22.0 and newer, because older versions didn't
allow read-only access to the database, which would mean commands like
'port installed' run with normal privileges would not work with a
root-owned installation.

Closes: #61154

comment:11 Changed 4 years ago by jmroot (Joshua Root)

Milestone: MacPorts 2.7.0

comment:12 Changed 4 years ago by ryandesign (Ryan Carsten Schmidt)

There was also the follow-up commit [81d6ca25476e212cfc5de409b9ce7166b56087dd/macports-base].

Note: See TracTickets for help on using tickets.