The Unclear Impact

data structures, long

@juliank at uni, we’re developing something similar (in java, however), for SAT solving actually

in a previous version of the system, we were basically using journaling. that makes backtracking relatively easy, but jumping across different branches of the derivation tree (e.g., some heuristic tells you to abandon whatever you were doing, try something else, but if that doesn’t yield a solution quickly, go back to wherever you were in the search space) become inefficient, because you always have to first go back to the common ancestor, and then drill down again

currently, we’re experimenting with persistent data structures. HAMTs seem to work nicely as a big collection of labeled tuples (~graph edges). we’re following this paper: but implemented it from scratch with our own optimizations

one particular optimization that seems useful is to have immutable and mutable tree nodes, and “check in” a new revision of a collection by copying each mutable node into a packed immutable one, which can handle batch updates more efficiently (immutable -> mutable when performing an update is just CoW)

of course, a persistent set of tuples is barely a graph, so we also have to maintain indexes for adjacency. the cool thing is that it’s very fast to determine the delta between two persistent data structures (just iterate over the two trees, skipping over immutable nodes that are referentially identical): when backtracking, you can calculate a delta (usually small), and update (mutable) indexes accordingly. we’re using to incrementally maintain indexes and graph queries, but would be another alternative

another idea would be to use another persistent data structure, like a ROMDD, radix tree, or even B-tree to store the indexes. but that might lead to pathological behavior on some inputs (that a randomly selected hash function in HAMT can avoid)

w.r.t. doing this in native code, I think you could get away with just releasing all mutable nodes when “checking in” and never releasing immutable ones (since “checking in” can control the number of stored versions, hopefully, there won’t be extremely many). otherwise, I’d try adding a refcount to mutable nodes, but I have no idea about its efficiency


@juliank we’re trying to generate large(-ish) graphs which satisfy some first-order constraints (useful as test inputs or design space exploration), so we have the benefit that our graph starts small and grows big as the solver runs – it seems you’re having to deal with the opposite blobfoxupsidedown

but I’ll try to keep this use-case in mind – the very least, now I know that I could make nice graph-processing benchmarks from apt install :)

clause learning seems quite hard in this space, because the nodes of the graph ‘shift’ around as you are merging them (we’re doing the opposite by splitting), so there is no easy way to generalize the results of conflict analysis into a lemma…

@juliank the parallel exploration on different parts of the graph sounds a bit weird for me at first. in the end, you’d need to integrate the results from multiple threads, where the nodes could have shifted around in different ways.

a simpler approach could be to explore different parts of the search space in parallel: say, pick an initial branching point, and explore the possible outcomes (selected node in the or-group in your case) in separate threads. however, this may need more memory to keep track of each candidate solution on each threads (but persistent data structures with structural sharing can reduce the overhead considerably)

@juliank btw (I am not very familiar with apt, so pardon me if this is nonsense), how large/difficult are these dependency resolution problems that you need a custom decision procedure for them? naively, directly encoding everything into SAT seems possible

(OTOH p2 in Eclipse does just that, and is usually dreadfully slow)

@juliank @juliank ah, thanks! very interesting

I never realized apt was allowed reshuffle already (automatically) installed packages, but in that case, stability sounds useful indeed blobfoxcomfy