The Unclear Impact

There are lots of peculiar design decisions in the APT 3.0 solver design I came up with, and I'm curious to see if that actually works.

Upgrading installed packages, what does it even mean? Basically treat it the same as a depends on either version of the package.

So while still technically a SAT solver, it can only solve clauses of the forms A->B|...|Z and A->!B.

Otherwise it is unit propagation, then pick a free variable* and try it, and back track as needed.

* Which as you know, we don't pick randomly, our work queue is not on the variables, but the clauses. So we pick the smallest clause and then try the choices in order (augmented, if any choice is recommended, they are tried first).

So that's an accidental algorith equivalent to DPLL.

@juliank A->B|...|Z looks suspiciously like a dual-Horn clause !A | B | ... | Z (although I don’t know what to make of A->!B, other than just !A | !B). but it’s defnitely no accident that your algorithm is pretty much DPLL, given that your input is already basically in CNF blobfoxscience

maybe this is a good time for CDCL?


@kristof Yes, package dependencies are CNF :D

I think this will morph somewhat into CDCL. The data structures are highly original and unorthodox:

It's top-down reinvention without reading much literature, I think it will have very annoying issues because it pretty much doesn't have a global view of all the clauses.

It essentially only knows clauses that are not yet satisfied, and introduces further clauses as it traverses the tree further.

@kristof Essentially it traverses a graph rather than try to satisfy a formula.

So it only knows the edges it has seen so far.

I expect it will backtrack more than necessary and perform badly for general purpose SAT solving.

But we also rely on the problem being reasonably defined by humans to work with the algorithm.

@juliank ah, materializing further clauses sound reasonable, especially if solutions have a fairly regular structure. i guess your alternative would be to pull in clauses for all transitive dependencies, which sounds a lot more painful

a tricky thing is how to represent useful lemmas (CDCL-style) when some variables from the learned clause might not even be in the original problem (pulled in by a transitive dependency in your case).

the very least, i’m having a lot of trouble with that on the first-order model generator that we’re working on (i can’t even pull in all clauses eagerly in the beginning, because i don’t know the model size in advance and a FOL quantifier could match any model element)

2**32 levels of backtracking.

holy (super) cow! is that a state space of 2**(2**32)?

@kristof I don't have much idea what I'm doing really :D

So the thing with the 2**32 is that that's probably outrageous but that's the integer size we use to store references (and the IDs of the literals [package version objects]) and depth.

Actually I think I can reduce the depth to 16 bit and save some space, but need to actually write the code first.

I think it's pretty theoretic.

We know all available literals, because a literal is just the ID of the version object in the cache.

@kristof I think when we reach a conflict we have visited all the variables we need to know about, e.g.


(B1->C1, etc)

We will have visited e.g, A-B1-C1-!A, and can record something that I think looks like, A->!C1|!B1, in other words !A|!C1|!B1.

I may have messed up the negation somewhere.

@kristof So I _think_ that works and my issue is understanding non-chronological backtracking :D

@juliank the tricky thing is what would happen if you backtrack and the clauses that were in conflict get removed (if you remove versions of packages you decide not necessarily to install on backtracking).

in this case the conflict cause would become useful once you decide to install the same package at some other branch of the search when the variables and clauses get added back again. until then, some variables would be “dangling” (appear in the lemma, but not in the problem proper)

but i guess this is not actually a problem if variables can be uniquely identified (so it is apparent when they are re-added to the problem) and you never branch on “dangling” variables blobfoxthinkgoogly

SAT solvers are weird, man

@kristof I think the learnt clauses are kept in a separate set and we only check choices we make against them, but we're not driven by them essentially.

Ignoring optional dependencies and optimisations, we maintain a set of needed versions and rejected versions.

To start the solver we iterate over the needs set, look at the corresponding APT objects, mark all packages we conflict with as rejected, and then add a work queue for dependencies, essentially a list of lists of ordered choices.

@kristof Then we process that queue of unsolved dependencies, rather than unsolved literals.

If we make a choice, we bump the depth counter.

If we reach a conflict, we mark the conflict,decrease the counter, remove all the propagations from the choice, and then reconstruct the work queue from the needed set.

That's slow and lazy. There's technically I think no need to reconstruct the queue because we could just prepend our new entries for the current branch.

Don't know if that makes sense.

@kristof I really need to write down the latex with all the ideas in a halfway sensible math notation.

I started doing that for about an hour in a pastebin but then it reloaded and I lost it, lol.

Lots of: Let $\phi_{ij}$ be the work queue at depth i, step j and stuff like that.

@kristof I started writing but it's missing bits and a counter for the branching or something.