This is the second post related to the two bus scheduling solver formulations that can be found in the OR-Tools source code. Last time I took a deep dive into the program bus_driver_scheduling_sat. At the end, I was left wondering why whoever wrote the example did not avail themselves of the CP-SAT circuit constraint. In this post I take a look at whether using the circuit constraint will speed things up at all.

Hacking to add in a circuit constraint call

In my last post, I worked my way through the main solver function noting all the places where it looked like the standard approach to setting up a circuit constraint. And then I hit this block of code:

# Create circuit constraint.
model.add_exactly_one(outgoing_source_literals)
for s in range(num_shifts):
    model.add_exactly_one(outgoing_literals[s])
    model.add_exactly_one(incoming_literals[s])
model.add_exactly_one(incoming_sink_literals)

Interestingly, the comment says that it is creating a circuit constraint, but the API function model.add_circuit(arcs) is not used. So that’s my starting point—populate the arcs and be able to call add_circuit. I am going to hack directly on a copy of the original program, keep all the logic, and just do the minimum to get this API call in there.

So first, what is an arc? According to the handy file cp_model.py, the list of arcs for the circuit constraint is defined as follows:

   arcs: a list of arcs. An arc is a tuple (source_node, destination_node,
     literal). The arc is selected in the circuit if the literal is true.
     Both source_node and destination_node must be integers between 0 and the
     number of nodes - 1.

There is a bit more to the definition found in the C++ header file cp_model.h:

   * The circuit constraint is defined on a graph where the arc presence is
   * controlled by literals. That is the arc is part of the circuit of its
   * corresponding literal is assigned to true.
   *
   * For now, we ignore node indices with no incident arc. All the other nodes
   * must have exactly one incoming and one outgoing selected arc (i.e. literal
   * at true). All the selected arcs that are not self-loops must form a single
   * circuit.

It took me a long time to parse the language here, and honestly every time I go back to using the circuit constraint I have to reread this a few times until I remember. Each node can have any number of incoming and outgoing arcs when setting things up. However, all “nodes must have exactly one incoming and one outgoing selected arc.” So it is perfectly fine even to have multiple arcs between two nodes i and j, just so long as in the end exactly one is selected as active.

So back to the code. An individual arc is a tuple with three elements. The first is the integer index for the tail node of the arc (the source), the second is the integer index for the head of the arc (the sink), and the third is the boolean variable that is true if the arc is active and used, false if not. So what we need is a container for the arcs:

arcs: list[tuple[int, int, cp_model.LiteralT]] = []

Again quoting from cp_model.py there is a type defined for an arc:

ArcT = tuple[IntegralT, IntegralT, LiteralT]

So I can simplify that type hint to just use ArcT:

arcs: list[cp_model.ArcT] = []

So where can I slot in this container? Well, the code creates one complete circuit constraint for each driver, so it seems I should put it inside the loop over each driver, right after the lists for the literals.

for d in range(num_drivers):
    ...
    incoming_literals = collections.defaultdict(list)
    outgoing_literals = collections.defaultdict(list)
    outgoing_source_literals = []
    incoming_sink_literals = []
    arcs: list[cp_model.ArcT] = []

Poking around other examples in the source code, I found a handy little function in an example called music_playlist_sat.py that I’m going to steal here:

def AddArc(i, j):
    literals[(i, j)] = model.new_bool_var(f"lit_{i}_{j}")
    arcs.append((i, j, literals[(i, j)]))

It relies on global variables which I’m not going to use, so I’ll pass in the arc container instead, and drop the literals dict because I don’t think I need it.

def AddArc(model: cp_model.CpModel,
           arcs: list[cp_model.ArcT],
           i: int,
           j: int) -> cp_model.LiteralT:
    """Add an arc to the list of arcs from i to j, append the tuple to the list
    of arcs, and return the new BoolVar we just created."""
    lit: cp_model.LiteralT = model.new_bool_var(f"lit_{i}_{j}")
    arcs.append((i, j, lit))
    return lit

Another quirk of the circuit constraint is that I’ve found it is best to stick with positive integers as indices, and reserve 0 for the source/sink node. Since the existing code uses zero-based indexing for the shifts, I’m going to add one to each when calling my AddArc function.

So the loop over shifts, starting about line 1847 in the original code, currently looks like this:

for s in range(num_shifts):
    shift = shifts[s]
    duration = shift[5]

    # Arc from source to shift.
    #    - set the start time of the driver
    #    - increase driving time and driving time since break
    source_lit = model.new_bool_var("%i from source to %i" % (d, s))

I want to use my function to create the literals, so I’m modifying that to look like this:

for s in range(num_shifts):
    shift = shifts[s]
    duration = shift[5]

    # Arc from source to shift.
    #    - set the start time of the driver
    #    - increase driving time and driving time since break
    source_lit = AddArc(model, arcs, 0, s+1)

That way all the rest of the code can stay the same.

Similar modifications are needed for the sink link and the skip-the-node self-loop.

sink_lit = AddArc(model, arcs, s+1, 0)
...
# Node not performed
skipped_by_d_lit = AddArc(model, arcs, s+1, s+1)
# hard equivalence between ~skipped_lit and performed[d,s]
model.add(performed[d, s] == ~skipped_by_d_lit)
...
for o in range(num_shifts):
    other = shifts[o]
    delay = other[3] - shift[4]
    if delay < min_delay_between_shifts:
        continue
    lit = AddArc(model, arcs, s+1, o+1)

The only oddness is the extra code for the performed[d, s] lit. Since that lit is already defined, I have to link it to the arc literal skipped_by_d_lit. The self loop from the node to itself is used only when the node is skipped. Therefore there is a hard equivalence between performed(d, s) and ~skipped_by_d_lit. If the node is skipped, it is not performed; if it is performed, it is not skipped.

I also added one more thing to make sure that the circuit constraint tracks drivers that are not used.

if minimize_drivers:
    # Driver is not working.
    # working = model.new_bool_var("working_%i" % d)
    working = ~AddArc(model, arcs, 0, 0)

If the driver is not working, then there is a self-loop from the source back to the source, and every node for the driver’s circuit also has self-loops. Without this escape valve, the solver will be forced to assign at least one shift to each driver.

With these changes, the solver still runs and gets similar answers, and takes roughly the same time. With the circuit constraint call, the tiny instance consistently takes a bit over 2s, and the small problem instance takes around 45s, but sometimes is a fast as 35s. The original code is consistently just under 2s for the tiny problem, and maybe a bit under 45s on average for the small problem, with the fastest run coming it at 29s. The timings are variable, of course due to the randomness of the CP-SAT solver, but it feels like the version with the circuit constraint is consistently a bit slower.

Remove the previous “circuit” constraints

In the original code, there were a number of constraints added to create the circuit constraint. What I want to do now is to remove those, and see if the same behavior can be obtained through the circuit constraint alone. Mostly I’m doing this to make sure that I have set up the circuit constraint properly. So the following lines were commented out:

# # Create circuit constraint.
# model.add_exactly_one(outgoing_source_literals)
# for s in range(num_shifts):
#     model.add_exactly_one(outgoing_literals[s])
#     model.add_exactly_one(incoming_literals[s])
# model.add_exactly_one(incoming_sink_literals)

model.add_circuit(arcs)

Without those lines, the final result for the small model is 8 drivers and a second pass optimal objective function of 1902, which is the same as the original program.

Round two, slightly more invasive change

At this point, I have proven to myself that the circuit constraint works, and the results are not broken. So now it is time to put on my hacker hat (the one with the propeller) and start trying to break things.

There is nothing that requires having separate circuit constraints for each driver. There can be any number of arcs between two nodes, just as long as only one is active. So my next hack is to collect all the arcs from all the passes over the drivers, and use all them in one big circuit constraint. The changes are as follows. First, I moved the arcs list outside of the driver loop, as well as the outgoing and incoming source and sink literals lists:

outgoing_source_literals = []
incoming_sink_literals = []
arcs: list[cp_model.ArcT] = []
for d in range(num_drivers):
    ...

I also removed all of the shared list of literals, shared_incoming_literals, shared_outgoing_literals, and shared_outgoing_source_literals. There is no need for them now because I have one big circuit constraint spanning all the drivers.

Then, inside the drivers loop everything is more or less the same:

Before:

for s in range(num_shifts):
    shift = shifts[s]
    duration = shift[5]

    # Arc from source to shift.
    #    - set the start time of the driver
    #    - increase driving time and driving time since break
    source_lit = AddArc(model, arcs, 0, s+1)
    shared_outgoing_source_literals.append(source_lit)
    outgoing_source_literals.append(source_lit)
    incoming_literals[s].append(source_lit)
    shared_incoming_literals[s].append(source_lit)
    model.add(start_times[d] == shift[3] - setup_time).only_enforce_if(
        source_lit
    )
    model.add(total_driving[d, s] == duration).only_enforce_if(source_lit)
    model.add(no_break_driving[d, s] == duration).only_enforce_if(source_lit)
    starting_shifts[d, s] = source_lit

And after:

for s in range(num_shifts):
    shift = shifts[s]
    duration = shift[5]

    # Arc from source to shift.
    #    - set the start time of the driver
    #    - increase driving time and driving time since break
    source_lit = AddArc(model, arcs, 0, s+1)
    outgoing_source_literals.append(source_lit)
    model.add_implication(source_lit, performed[d, s])
    model.add(start_times[d] == shift[3] - setup_time).only_enforce_if(
        source_lit
    )
    model.add(total_driving[d, s] == duration).only_enforce_if(source_lit)
    model.add(no_break_driving[d, s] == duration).only_enforce_if(source_lit)
    starting_shifts[d, s] = source_lit

There are a few differences. In the new version, I have fewer lists of literals as they are all stored in arcs. Also, in the new version I added an implication between the source literal and the performed literal:

    model.add_implication(source_lit, performed[d, s])

The list of which nodes are performed by which drivers exists somewhat outside of the logic of the circuit constraint, so I do my best to make sure the logic links the arc literals with the performed literals wherever possible.

Similar changes are made for the exit arc (from the shift to the sink). The final difference in this “outer loop” logic is that I removed the ability to drop nodes. The following code was before:

# Node not performed
#    - set both driving times to 0
#    - add a looping arc on the node
skipped_by_d_lit = AddArc(model, arcs, s+1, s+1)
# hard equivalence between ~skipped_lit and performed[d,s]
model.add(performed[d, s] == ~skipped_by_d_lit)
model.add(total_driving[d, s] == 0).only_enforce_if(~performed[d, s])
model.add(no_break_driving[d, s] == 0).only_enforce_if(~performed[d, s])
incoming_literals[s].append(~performed[d, s])
outgoing_literals[s].append(~performed[d, s])

And it has now been changed to this:

# Node not performed
#    - set both driving times to 0
#    - add a looping arc on the node
# skipped_by_d_lit = AddArc(model, arcs, s+1, s+1)
# hard equivalence between ~skipped_lit and performed[d,s]
# model.add(performed[d, s] == ~skipped_by_d_lit)
model.add(total_driving[d, s] == 0).only_enforce_if(~performed[d, s])
model.add(no_break_driving[d, s] == 0).only_enforce_if(~performed[d, s])

Next we get to the inner loop, handling moves from shift s to some other shift o. The before code looks like:

for s in range(num_shifts):
    shift = shifts[s]
    ...
    for o in range(num_shifts):
        other = shifts[o]
        delay = other[3] - shift[4]
        if delay < min_delay_between_shifts:
            continue
        lit = AddArc(model, arcs, s+1, o+1)
        # Cost part
        delay_literals.append(lit)
        delay_weights.append(delay)
        # Increase driving time
        model.add(
            total_driving[d, o] == total_driving[d, s] + other[5]
        ).only_enforce_if(lit)
        # Increase no_break_driving or reset it to 0 depending on the delay
        if delay >= min_pause_after_4h:
            model.add(no_break_driving[d, o] == other[5]).only_enforce_if(lit)
        else:
            model.add(
                no_break_driving[d, o] == no_break_driving[d, s] + other[5]
            ).only_enforce_if(lit)
        # add arc
        outgoing_literals[s].append(lit)
        shared_outgoing_literals[s].append(lit)
        incoming_literals[o].append(lit)
        shared_incoming_literals[o].append(lit)

That has now been changed to this:

for s in range(num_shifts):
    shift = shifts[s]
    ...
    for o in range(num_shifts):
        other = shifts[o]
        delay = other[3] - shift[4]
        if delay < min_delay_between_shifts:
            continue
        # add arc
        lit = AddArc(model, arcs, s+1, o+1)
        # cannot go from s to o if either is unperformed
        model.add_implication(~performed[d, s], ~lit)
        model.add_implication(~performed[d, o], ~lit)
        # if the lit is true, then both must be performed
        model.add_implication(lit, performed[d, s])
        model.add_implication(lit, performed[d, o])
        # Cost part
        delay_literals.append(lit)
        delay_weights.append(delay)
        # Increase driving time
        model.add(
            total_driving[d, o] == total_driving[d, s] + other[5]
        ).only_enforce_if(lit)
        # Increase no_break_driving or reset it to 0 depending on the delay
        if delay >= min_pause_after_4h:
            model.add(no_break_driving[d, o] == other[5]).only_enforce_if(lit)
        else:
            model.add(
                no_break_driving[d, o] == no_break_driving[d, s] + other[5]
            ).only_enforce_if(lit)

There are two main differences. I removed stashing the arc lit in the global lists for shared outgoing and shared incoming literals. And in the new code, I added new logic linking up the arc lit with the variables for performed[d, s] annd performed[d, o]. Before I could just link performed directly with whether or not the node was skipped by linking it to the node’s self-arc. I don’t have that connection anymore, as none of the nodes are allowed to be skipped in this new single circuit constraint. So in order to link whether or not a driver performs a node, I have to add as much logic as possible connecting states.

Note that the performed variables are for the nodes, not the arcs. Therefore, it is wrong to say that if both performed[d, s] and performed[d, o] are true, that the arc lit from s to o is true. Just because the driver performs both nodes, it does not mean that s is the immediate predecessor to o. So the logic rules linking performed and the arc lit are somewhat roundabout.

# cannot go from s to o if either is unperformed
model.add_implication(~performed[d, s], ~lit)
model.add_implication(~performed[d, o], ~lit)
# if the lit is true, then both must be performed
model.add_implication(lit, performed[d, s])
model.add_implication(lit, performed[d, o])

Not performing s of course means the link from s to o is impossible. Ditto not performing o. I also know that if the lit is true, the both of the nodes must have been performed by d. But if the lit is false, I can’t say anything about the nodes being performed or not, and if the performed variables are true, I can’t say anything about the arc’s true or false state.

Finally, I move the call to model.add_circuit outside of the loop. Before I had:

    else:
        # Working time constraints
        model.add(working_times[d] >= min_working_time)

    # # Create circuit constraint.
    # model.add_exactly_one(outgoing_source_literals)
    # for s in range(num_shifts):
    #     model.add_exactly_one(outgoing_literals[s])
    #     model.add_exactly_one(incoming_literals[s])
    # model.add_exactly_one(incoming_sink_literals)

    model.add_circuit(arcs)
# Each shift is covered.
for s in range(num_shifts):

Now I have this version. Note the significant white space difference in the indentation of model.add_circuit:

    else:
        # Working time constraints
        model.add(working_times[d] >= min_working_time)

    # # Create circuit constraint.
    # model.add_exactly_one(outgoing_source_literals)
    # for s in range(num_shifts):
    #     model.add_exactly_one(outgoing_literals[s])
    #     model.add_exactly_one(incoming_literals[s])
    # model.add_exactly_one(incoming_sink_literals)

model.add_circuit(arcs)
# Each shift is covered above, but this next makes sure that performed is linked
for s in range(num_shifts):

But that doesn’t work. The solver is subbornly infeasible.

Starting presolve at 0.00s
INFEASIBLE: 'var #537 as empty domain after intersecting with [0]'
Unsat after presolving constraint #12096 (warning, dump might be inconsistent): circuit { tails: 0 tails: 1 tails: 1 tails: 1 tails: 1 tails: 1 tails: 1 tails: 1 tails: 1 tails: 1 tails: 1 tails: 1 tails: 1 tails: ...

What is happening is that the circuit constraint needs just one outgoing link from the source, which means there can be at most one path through the nodes. Because I need multiple paths (one per driver), the straight-up circuit constraint fails. My guess is that the logic has assigned one driver and the first outgoing arc from the source, but then it comes to the second driver and can’t do anything because all other arcs out of the source node must be false if the first one is true. Either that, or it is impossible to serve all nodes with a single circuit, and so it runs into an empty case trying to get to a node that must be served. Looking at the model dump, variable number 537 is:

variables { name: "lit_0_2" domain: [0, 1] }

Which means the failure is the first guess: it can’t get from source to node 2, because source to node 1 is already true and so all other outgoing links from source have been set to false.

Lucky for us, the programmers over at Google have already thought of this, and have added an expanded version of the circuit constraint! This version is called the multiple circuit constraint. It is just like the regular circuit constraint, except that it allows multiple outgoing arcs from the source node only.

The multiple circuit constraint is defined as follows in cp_model.py:

The direct graph where arc #i (from tails[i] to head[i]) is present iff
literals[i] is true must satisfy this set of properties:
- #incoming arcs == 1 except for node 0.
- #outgoing arcs == 1 except for node 0.
- for node zero, #incoming arcs == #outgoing arcs.
- There are no duplicate arcs.
- Self-arcs are allowed except for node 0.
- There is no cycle in this graph, except through node 0.

I’m not sure what the line means “there are no duplicate arcs”, but my guess it that it refers only to the true arcs. In my case I have multiple duplicate arcs between nodes, one per driver, and I would think that would be fairly common. Anyway, try it and see what happens, that’s my motto.

So now my call to create the circuit constraint uses model.add_multiple_circuit(arcs). All together, the core code for creating the arcs (all the calls to AddArc()) and the creation of the multiple circuit now looks like this:

arcs: list[cp_model.ArcT] = []
for d in range(num_drivers):
    # set the shift constraints
    for s in range(num_shifts):
        # Arc from source to shift.
        source_lit = AddArc(model, arcs, 0, s+1)
        ...
        # Arc from shift to sink
        sink_lit = AddArc(model, arcs, s+1, 0)
        ...
        # Node not performed
        # skipped_by_d_lit = AddArc(model, arcs, s+1, s+1)
        for o in range(num_shifts):
            lit = AddArc(model, arcs, s+1, o+1)
        ...
    ...
model.add_multiple_circuit(arcs)

When I run that, it works great, but it is not any faster, and in fact seems to be slower! The tiny case runs in about 2 seconds which is about the same, but the small case runs consistently at about 60 seconds, instead of 40 seconds or so with the original code. And it still cannot even estimate the minimum number of drivers in the medium problem instance. I suspect the root of the problem is all the duplicated arcs used to track drivers, but I’m going to poke around other things first to see if I can speed it up.

Try to use the prior constraints as redundant constraints

My first guess to speed up the multiple circuit version is to try to add in some redundant constraints. Perhaps the explicit setting of the constraints on the ad hoc “circuits” were beneficial. Perhaps adding those back it will speed things up, even though they are redundant constraints. I’m essentially computing all the things anyway in order to track drivers per route, so I may as well use them.

(Maybe this will work, maybe it won’t. I’m writing this post at the same time I’m trying these things out!)

First I’m going to try to put back in the per-driver circuit constraints from the original as redundant constraints. I will change the list variables to have the word “driver_” in them to make sure I’m not using an existing list:

# Create circuit constraint.
model.add_exactly_one(driver_outgoing_source_literals)
for s in range(num_shifts):
    model.add_exactly_one(driver_outgoing_literals[s])
    model.add_exactly_one(driver_incoming_literals[s])
model.add_exactly_one(driver_incoming_sink_literals)

Next, inside of the loop over the drivers, I have to create these four lists and populate them.

for d in range(num_drivers):
    # create the reundant literal containers
    driver_outgoing_source_literals: list[cp_model.LiteralT] = []
    driver_incoming_sink_literals: list[cp_model.LiteralT] = []
    driver_outgoing_literals = collections.defaultdict(list)
    driver_incoming_literals = collections.defaultdict(list)
    for s in range(num_shifts):
        source_lit = AddArc(model, arcs, 0, s+1)
        driver_outgoing_source_literals.append(source_lit)
        driver_incoming_literals[s].append(source_lit)
        ...
        sink_lit = AddArc(model, arcs, s+1, 0)
        driver_incoming_sink_literals.append(sink_lit)
        driver_outgoing_literals[s].append(source_lit)
        ...
        # add "dropping" a node per driver if not performed
        driver_incoming_literals[s].append(~performed[d, s])
        driver_outgoing_literals[s].append(~performed[d, s])
        ...
        for o in range(num_shifts):
            lit = AddArc(model, arcs, s+1, o+1)
            # redundant constraints stuff
            driver_outgoing_literals[s].append(lit)
            driver_incoming_literals[o].append(lit)
            ...
    if minimize_drivers:
        # Driver is not working.
        working = model.new_bool_var("working_%i" % d)
        driver_outgoing_source_literals.append(~working)
        driver_incoming_sink_literals.append(~working)

Sadly, this helped a little bit but not much. The new way is still consistently over 50 seconds (averaging around 1 minute) for the small problem instance, whereas the original is never over 50 seconds.

Maybe combine both the multiple circuit and the per-driver circuit constraints?

Another thought is to try nesting an official circuit constraint call for the inner loop as well. To do that, just replace all of the new driver_..._literals type variables with creating a new arc in a driver-specific arcs list.

for d in range(num_drivers):
    # create the reundant literal containers
    driver_arcs: list[cp_model.ArcT] = []
    ...
    for s in range(num_shifts):
        ...
        source_lit = AddArc(model, arcs, 0, s+1)
        driver_arcs.append((0, s+1, source_lit))
        ...
        sink_lit = AddArc(model, arcs, s+1, 0)
        driver_arcs.append((s+1, 0, sink_lit))
        ...
        # Node not performed by this driver
        driver_arcs.append((s+1, s+1, ~performed[d, s]))
        ...
        for o in range(num_shifts):
            ...
            lit = AddArc(model, arcs, s+1, o+1)
            driver_arcs.append((s+1, o+1, lit))
    if minimize_drivers:
        # Driver is not working.
        working = model.new_bool_var("working_%i" % d)
        # working = ~AddArc(model, arcs, 0, 0)
        driver_arcs.append((0, 0, ~working))
    ...
    # Create circuit constraint.
    model.add_circuit(driver_arcs) 
model.add_multiple_circuit(arcs)

Unfortunately, once again that does nothing to speed up the solver. The small instance still takes 50 seconds or more (although in prepping this for publishing it, I did just get a result of 48 seconds).

Conclusions and where to next?

The use of one big circuit constraint over all the nodes via the add_multiple_circuit call works, but it seems to slow things down. I tossed in some diagnostic print statements to just see whether it was the search for the minimum drivers or the optimization of the shifts that was slower, but it seems both passes are equally slower than the original. So it really does have something to do with that big multiple circuit.

As I mentioned above, I suspect that the reason the multiple circuit constraint is slower is that it does the circuit constraint thing over a network with way too many redundant arcs between node pairs. Those arcs are just there to track which driver performs which arc, but I don’t really need that. The very first arc in every path sort of dictates a unique driver for the path.

On the other hand, replacing the multiple ad hoc circuits with calling the official add_circuit call does not seem to slow things down, although it also does not speed things up either. Plus there was no impact at all on the ability to handle the medium problem instance. So sadly, I think my conclusion here is that a straight replacement of the ad hoc circuit constraint with the official add_circuit doesn’t help. I think that is all I can really say here and that I’m done with this post.

Luckily, before even starting on this post I had already hit on a different formulation that runs a bit faster. My idea is to completely remove the per-driver circuit, and instead concentrate on calculating where the breaks between shifts should be. Instead of one circuit constraint per driver, I end up with one multiple circuit constraint to calculate the tour assignments, and one multiple circuit constraint to establish where the breaks must go. My next blog post will talk about that.