Welcome! Please see the About page for a little more info on how this works.

0 votes
in On-Prem by

I recently (backed up and) deleted a fairly large database in production, and had gotten partway through reclaiming the space in Postgres with bin/datomic -Ddatomic.gcStoragePaceMsec=5 gc-deleted-dbs [...] when I wanted to speed it up by removing the pacing, so I stopped it with Ctrl-C. The docs explicitly say that "You can kill a gc-deleted-dbs process and restart it later with no adverse affects" and I had previously done exactly that to speed up the pacing once before.

Unfortunately, it no longer recognizes any need to clean up garbage segments, the third stage (after log and index segments). Adding a new temporary database, filling it with a bunch of junk transactions, and deleting it, successfully goes through the whole process for that DB only, but does not trigger any additional garbage segment cleanup. (If I don't add the junk transactions it skips that stage entirely.)

1 Answer

0 votes
by

@TuggyNE

What is your evidence that the job has not started again? Is it simply that you are not seeing work being done? You say it "no longer recognizes any need to clean up garbage segments, the third stage" what is the output of running it again? Is there an error or does it complete?

by
Thanks for getting back with me! There's no error. It just finishes up quickly. Specifically, if I just run it by itself it says "GC deleted dbs: no deleted dbs found." If I add a new deleted DB as mentioned, it will delete that DB's log, index, and potentially even garbage segments (but only if there were a lot of changes that got obsoleted). It will not delete any additional segments, i.e. running the same temporary DB creation code and running `gc-deleted-dbs` will show the same number of log, index, and garbage segments each time, proving that it is not e.g. taking care of leftover garbage segments. (Creating just an empty temporary database and deleting it will omit any line about garbage segments entirely.)
by
@TuggyNE

Can you create a DB with the same name as the one you deleted, delete it and then run?

Also do you have an external validation or notion of how much space you plan to clean up from storage by deleting this deleted DB?  One thought is you know you disk size on your underlying storage and you can determine the size of your other databases on that storage by either guesstimating (50kb * index segments) or looking at the sizes of datomic level backups of those databases (keeping in mind that this does not account for normally accruing storage level garbage which is collected by gc-storage).

Basically, do you have a way to independently confirm you did not get most or all of the deleted DB in your collection?
by
Yeah, it was the single largest DB in that storage, and would probably have been at least 10 GiB, but it only freed up about 1 GiB.

Creating a new DB with the same name has the problem that it seems to append a GUID after the semantic name, and the two GUIDs are clearly different. The original log shows "Deleting storage for  iops3-log-3c2f65cc-037d-4697-b856-5394ef53f482" and the latest one shows "Deleting storage for  iops3-log-793d9415-200b-46bb-99ac-c2975248d1df". (These were created using the same database URI.) So it behaves like other new databases and only clears out its own new garbage. I have no idea how to specify the GUID.
ago by
@TuggyNE

What is your underlying storage?  

I have attempted to reproduce and talked with dev about the stages of this script and we do not believe there is any issue here with gc-deleted-dbs leaving partial work. My best guess is at this point without a reproduction, is that the job worked and marked all segments for garbage collection, but perhaps there are still some steps on underlying storage we need to complete to finish collection (i.e. Vacuum on Postgres).
...