Lec 18 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
-
0:07 - 0:07Good morning,
everyone. -
0:10 - 0:10Glad you are all here bright
and early. -
0:14 - 0:14I'm counting the days till the
TA's outnumber the students. -
0:21 - 0:21They'll show up.
We return to a familiar story. -
0:26 - 0:26This is part two,
the Empire Strikes Back. -
0:32 - 0:32So last time,
our adversary, -
0:34 - 0:34the graph, came to us with a
problem. -
0:37 - 0:37We have a source,
and we had a directed graph, -
0:40 - 0:40and we had weights on the
edges, and they were all -
0:44 - 0:44nonnegative.
And there was happiness. -
0:46 - 0:46And we triumphed over the
Empire by designing Dijkstra's -
0:50 - 0:50algorithm, and very efficiently
finding single source shortest -
0:55 - 0:55paths, shortest path weight from
s to every other vertex. -
1:00 - 1:00Today, however,
the Death Star has a new trick -
1:03 - 1:03up its sleeve,
and we have negative weights, -
1:06 - 1:06potentially.
And we're going to have to -
1:08 - 1:08somehow deal with,
in particular, -
1:10 - 1:10negative weight cycles.
And we saw that when we have a -
1:13 - 1:13negative weight cycle,
we can just keep going around, -
1:16 - 1:16and around, and around,
and go back in time farther, -
1:20 - 1:20and farther,
and farther. -
1:21 - 1:21And we can get to be
arbitrarily far back in the -
1:24 - 1:24past.
And so there's no shortest -
1:26 - 1:26path, because whatever path you
take you can get a shorter one. -
1:30 - 1:30So we want to address that
issue today, and we're going to -
1:33 - 1:33come up with a new algorithm
actually simpler than Dijkstra, -
1:37 - 1:37but not as fast,
called the Bellman-Ford -
1:40 - 1:40algorithm.
And, it's going to allow -
1:45 - 1:45negative weights,
and in some sense allow -
1:49 - 1:49negative weight cycles,
although maybe not as much as -
1:55 - 1:55you might hope.
We have to leave room for a -
1:59 - 1:59sequel, of course.
OK, so the Bellman-Ford -
2:04 - 2:04algorithm, invented by two guys,
as you might expect, -
2:10 - 2:10it computes the shortest path
weights. -
2:13 - 2:13So, it makes no assumption
about the weights. -
2:18 - 2:18Weights are arbitrary,
and it's going to compute the -
2:23 - 2:23shortest path weights.
So, remember this notation: -
2:28 - 2:28delta of s, v is the weight of
the shortest path from s to v. -
2:34 - 2:34s was called a source vertex.
And, we want to compute these -
2:40 - 2:40weights for all vertices,
little v. -
2:43 - 2:43The claim is that computing
from s to everywhere is no -
2:47 - 2:47harder than computing s to a
particular location. -
2:51 - 2:51So, we're going to do for all
them. -
2:54 - 2:54It's still going to be the case
here. -
2:57 - 2:57And, it allows negative
weights. -
2:59 - 2:59And this is the good case,
but there's an alternative, -
3:03 - 3:03which is that Bellman-Ford may
just say, oops, -
3:07 - 3:07there's a negative weight
cycle. -
3:11 - 3:11And in that case it will just
say so. -
3:14 - 3:14So, they say a negative weight
cycle exists. -
3:19 - 3:19Therefore, some of these deltas
are minus infinity. -
3:23 - 3:23And that seems weird.
So, Bellman-Ford as we'll -
3:28 - 3:28present it today is intended for
the case, but there are no -
3:33 - 3:33negative weights cycles,
which is more intuitive. -
3:39 - 3:39It sort of allows them,
but it will just report them. -
3:43 - 3:43In that case,
it will not give you delta -
3:46 - 3:46values.
You can change the algorithm to -
3:48 - 3:48give you delta values in that
case, but we are not going to -
3:53 - 3:53see it here.
So, in exercise, -
3:55 - 3:55after you see the algorithm,
exercise is: -
3:58 - 3:58compute these deltas in all
cases. -
4:12 - 4:12So, it's not hard to do.
But we don't have time for it -
4:20 - 4:20here.
So, here's the algorithm. -
4:24 - 4:24It's pretty straightforward.
As I said, it's easier than -
4:33 - 4:33Dijkstra.
It's a relaxation algorithm. -
4:37 - 4:37So the main thing that it does
is relax edges just like -
4:41 - 4:41Dijkstra.
So, we'll be able to use a lot -
4:44 - 4:44of dilemmas from Dijkstra.
And proof of correctness will -
4:47 - 4:47be three times shorter because
the first two thirds we already -
4:52 - 4:52have from Dijkstra.
But I'm jumping ahead a bit. -
4:55 - 4:55So, the first part is
initialization. -
4:58 - 4:58Again, d of v will represent
the estimated distance from s to -
5:02 - 5:02v.
And we're going to be updating -
5:05 - 5:05those estimates as the algorithm
goes along. -
5:08 - 5:08And initially,
d of s is zero, -
5:10 - 5:10which now may not be the right
answer conceivably. -
5:14 - 5:14Everyone else is infinity,
which is certainly an upper -
5:18 - 5:18bound.
OK, these are both upper bounds -
5:21 - 5:21on the true distance.
So that's fine. -
5:23 - 5:23That's initialization just like
before. -
5:36 - 5:36And now we have a main loop
which happens v minus one times. -
5:39 - 5:39We're not actually going to use
the index i. -
5:42 - 5:42It's just a counter.
-
6:02 - 6:02And we're just going to look at
every edge and relax it. -
6:08 - 6:08It's a very simple idea.
If you learn about relaxation, -
6:13 - 6:13this is the first thing you
might try. -
6:17 - 6:17The question is when do you
stop. -
6:20 - 6:20It's sort of like I have this
friend to what he was like six -
6:26 - 6:26years old he would claim,
oh, I know how to spell banana. -
6:32 - 6:32I just don't know when to stop.
OK, same thing with relaxation. -
6:38 - 6:38This is our relaxation step
just as before. -
6:41 - 6:41We look at the edge;
we see whether it violates the -
6:44 - 6:44triangle inequality according to
our current estimates we know -
6:48 - 6:48the distance from s to v should
be at most distance from s to -
6:52 - 6:52plus the weight of that edge
from u to v. -
6:54 - 6:54If it isn't,
we set it equal. -
6:56 - 6:56We've proved that this is
always an OK thing to do. -
7:00 - 7:00We never violate,
I mean, these d of v's never -
7:04 - 7:04get too small if we do a bunch
of relaxations. -
7:07 - 7:07So, the idea is you take every
edge. -
7:10 - 7:10You relax it.
I don't care which order. -
7:13 - 7:13Just relax every edge,
one each. -
7:15 - 7:15And that do that V minus one
times. -
7:18 - 7:18The claim is that that should
be enough if you have no -
7:22 - 7:22negative weights cycles.
So, if there's a negative -
7:26 - 7:26weight cycle,
we need to figure it out. -
7:30 - 7:30And, we'll do that in a fairly
straightforward way, -
7:35 - 7:35which is we're going to do
exactly the same thing. -
7:40 - 7:40So this is outside before loop
here. -
7:44 - 7:44We'll have the same four loops
for each edge in our graph. -
7:50 - 7:50We'll try to relax it.
And if you can relax it, -
7:55 - 7:55the claim is that there has to
be a negative weight cycle. -
8:02 - 8:02So this is the main thing that
needs proof. -
8:28 - 8:28OK, and that's the algorithm.
So the claim is that at the -
8:32 - 8:32ends we should have d of v,
let's see, L's so to speak. -
8:35 - 8:35d of v equals delta of s comma
v for every vertex, -
8:39 - 8:39v.
If we don't find a negative -
8:40 - 8:40weight cycle according to this
rule, that we should have all -
8:44 - 8:44the shortest path weights.
That's the claim. -
8:47 - 8:47Now, the first question is,
in here, the running time is -
8:51 - 8:51very easy to analyze.
So let's start with the running -
8:54 - 8:54time.
We can compare it to Dijkstra, -
8:57 - 8:57which is over here.
What is the running time of -
9:02 - 9:02this algorithm?
V times E, exactly. -
9:06 - 9:06OK, I'm going to assume,
because it's pretty reasonable, -
9:13 - 9:13that V and E are both positive.
Then it's V times E. -
9:19 - 9:19So, this is a little bit
slower, or a fair amount slower, -
9:26 - 9:26than Dijkstra's algorithm.
There it is: -
9:31 - 9:31E plus V log V is essentially,
ignoring the logs is pretty -
9:36 - 9:36much linear time.
Here we have something that's -
9:39 - 9:39at least quadratic in V,
assuming your graph is -
9:43 - 9:43connected.
So, it's slower, -
9:45 - 9:45but it's going to handle these
negative weights. -
9:49 - 9:49Dijkstra can't handle negative
weights at all. -
9:53 - 9:53So, let's do an example,
make it clear why you might -
9:57 - 9:57hope this algorithm works.
And then we'll prove that it -
10:03 - 10:03works, of course.
But the proof will be pretty -
10:09 - 10:09easy.
So, I'm going to draw a graph -
10:13 - 10:13that has negative weights,
but no negative weight cycles -
10:19 - 10:19so that I get an interesting
answer. -
10:55 - 10:55Good.
The other thing I need in order -
10:57 - 10:57to make the output of this
algorithm well defined, -
11:01 - 11:01it depends in which order you
visit the edges. -
11:04 - 11:04So I'm going to assign an
arbitrary order to these edges. -
11:07 - 11:07I could just ask you for an
order, but to be consistent with -
11:11 - 11:11the notes, I'll put an ordering
on it. -
11:14 - 11:14Let's say I put number four,
say that's the fourth edge I'll -
11:17 - 11:17visit.
It doesn't matter. -
11:19 - 11:19But it will affect what happens
during the algorithm for a -
11:23 - 11:23particular graph.
-
11:43 - 11:43Do they get them all?
One, two, three, -
11:46 - 11:46four, five, six,
seven, eight, -
11:49 - 11:49OK.
And my source is going to be A. -
11:51 - 11:51And, that's it.
So, I want to run this -
11:55 - 11:55algorithm.
I'm just going to initialize -
11:58 - 11:58everything.
So, I set the estimates for s -
12:01 - 12:01to be zero, and everyone else to
be infinity. -
12:06 - 12:06And to give me some notion of
time, over here I'm going to -
12:11 - 12:11draw or write down what all of
these d values are as the -
12:16 - 12:16algorithm proceeds because I'm
going to start crossing them out -
12:21 - 12:21and rewriting them that the
figure will get a little bit -
12:26 - 12:26messier.
But we can keep track of it -
12:29 - 12:29over here.
It's initially zero and -
12:32 - 12:32infinities.
Yeah? -
12:34 - 12:34It doesn't matter.
So, for the algorithm you can -
12:37 - 12:37go to the edges in a different
order every time if you want. -
12:40 - 12:40We'll prove that,
but here I'm going to go -
12:43 - 12:43through the same order every
time. -
12:45 - 12:45Good question.
It turns out it doesn't matter -
12:47 - 12:47here.
OK, so here's the starting -
12:49 - 12:49point.
Now I'm going to relax every -
12:51 - 12:51edge.
So, there's going to be a lot -
12:53 - 12:53of edges here that don't do
anything. -
12:55 - 12:55I try to relax n minus one.
I'd say, well, -
12:58 - 12:58I know how to get from s to B
with weight infinity. -
13:02 - 13:02Infinity plus two I can get to
from s to E. -
13:05 - 13:05Well, infinity plus two is not
much better than infinity. -
13:08 - 13:08OK, so I don't do anything,
don't update this to infinity. -
13:12 - 13:12I mean, infinity plus two
sounds even worse. -
13:15 - 13:15But infinity plus two is
infinity. -
13:17 - 13:17OK, that's the edge number one.
So, no relaxation edge number -
13:21 - 13:21two, same deal as number three,
same deal, edge number four we -
13:24 - 13:24start to get something
interesting because I have a -
13:28 - 13:28finite value here that says I
can get from A to B using a -
13:31 - 13:31total weight of minus one.
So that seems good. -
13:36 - 13:36I'll write down minus one here,
and update B to minus one. -
13:41 - 13:41The rest stay the same.
So, I'm just going to keep -
13:46 - 13:46doing this over and over.
That was edge number four. -
13:50 - 13:50Number five,
we also get a relaxation. -
13:54 - 13:54Four is better than infinity.
So, c gets a number of four. -
14:00 - 14:00Then we get to edge number six.
That's infinity plus five is -
14:04 - 14:04worse than four.
OK, so no relaxation there. -
14:08 - 14:08Edge number seven is
interesting because I have a -
14:11 - 14:11finite value here minus one plus
the weight of this edge, -
14:15 - 14:15which is three.
That's a total of two, -
14:18 - 14:18which is actually better than
four. -
14:21 - 14:21So, this route,
A, B, c is actually better than -
14:24 - 14:24the route I just found a second
ago. -
14:27 - 14:27So, this is now a two.
This is all happening in one -
14:31 - 14:31iteration of the main loop.
We actually found two good -
14:36 - 14:36paths to c.
We found one better than the -
14:39 - 14:39other.
OK, and that was edge number -
14:41 - 14:41seven, and edge number eight is
over here. -
14:44 - 14:44It doesn't matter.
OK, so that was round one of -
14:48 - 14:48this outer loop,
so, the first value of i. -
14:51 - 14:51i equals one.
OK, now we continue. -
14:53 - 14:53Just keep going.
So, we start with edge number -
14:56 - 14:56one.
Now, minus one plus two is one. -
15:00 - 15:00That's better than infinity.
It'll start speeding up. -
15:05 - 15:05It's repetitive.
It's actually not too much -
15:09 - 15:09longer until we're done.
Number two, this is an infinity -
15:14 - 15:14so we don't do anything.
Number three: -
15:18 - 15:18minus one plus two is one;
better than infinity. -
15:22 - 15:22This is vertex d,
and it's number three. -
15:26 - 15:26Number four we've already done.
Nothing changed. -
15:31 - 15:31Number five:
this is where we see the path -
15:35 - 15:35four again, but that's worse
than two. -
15:38 - 15:38So, we don't update anything.
Number six: one plus five is -
15:43 - 15:43six, which is bigger than two,
so no good. -
15:47 - 15:47Go around this way.
Number seven: -
15:50 - 15:50same deal.
Number eight is interesting. -
15:53 - 15:53So, we have a weight of one
here, a weight of minus three -
15:58 - 15:58here.
So, the total is minus two, -
16:03 - 16:03which is better than one.
So, that was d. -
16:07 - 16:07And, I believe that's it.
So that was definitely the end -
16:13 - 16:13of that round.
So, it's I plus two because we -
16:18 - 16:18just looked at the eighth edge.
And, I'll cheat and check. -
16:24 - 16:24Indeed, that is the last thing
that happens. -
16:30 - 16:30We can check the couple of
outgoing edges from d because -
16:34 - 16:34that's the only one whose value
just changed. -
16:37 - 16:37And, there are no more
relaxations possible. -
16:40 - 16:40So, that was in two rounds.
The claim is we got all the -
16:44 - 16:44shortest path weights.
The algorithm would actually -
16:47 - 16:47loop four times to guarantee
correctness because we have five -
16:51 - 16:51vertices here and one less than
that. -
16:54 - 16:54So, in fact,
in the execution here there are -
16:57 - 16:57two more blank rounds at the
bottom. -
16:59 - 16:59Nothing happens.
But, what the hell? -
17:03 - 17:03OK, so that is Bellman-Ford.
I mean, it's certainly not -
17:06 - 17:06doing anything wrong.
The question is, -
17:08 - 17:08why is it guaranteed to
converge in V minus one steps -
17:12 - 17:12unless there is a negative
weight cycle? -
17:14 - 17:14Question?
-
17:24 - 17:24Right, so that's an
optimization. -
17:26 - 17:26If you discover a whole round,
and nothing happens, -
17:29 - 17:29so you can keep track of that
in the algorithm thing, -
17:32 - 17:32you can stop.
In the worst case, -
17:33 - 17:33it won't make a difference.
But in practice, -
17:36 - 17:36you probably want to do that.
Yeah? -
17:38 - 17:38Good question.
All right, so some simple -
17:40 - 17:40observations,
I mean, we're only doing -
17:42 - 17:42relaxation.
So, we can use a lot of our -
17:44 - 17:44analysis from before.
In particular, -
17:46 - 17:46the d values are only
decreasing monotonically. -
17:49 - 17:49As we cross out values here,
we are always making it -
17:52 - 17:52smaller, which is good.
Another nifty thing about this -
17:55 - 17:55algorithm is that you can run it
even in a distributed system. -
18:00 - 18:00If this is some actual network,
some computer network, -
18:03 - 18:03and these are machines,
and they're communicating by -
18:05 - 18:05these links, I mean,
it's a purely local thing. -
18:07 - 18:07Relaxation is a local thing.
You don't need any global -
18:10 - 18:10strategy, and you're asking
about, can we do a different -
18:13 - 18:13order in each step?
Well, yeah, you could just keep -
18:15 - 18:15relaxing edges,
and keep relaxing edges, -
18:17 - 18:17and just keep going for the
entire lifetime of the network. -
18:20 - 18:20And eventually,
you will find shortest paths. -
18:22 - 18:22So, this algorithm is
guaranteed to finish in V rounds -
18:24 - 18:24in a distributed system.
It might be more asynchronous. -
18:27 - 18:27And, it's a little harder to
analyze. -
18:30 - 18:30But it will still work
eventually. -
18:34 - 18:34It's guaranteed to converge.
And so, Bellman-Ford is used in -
18:42 - 18:42the Internet for finding
shortest paths. -
18:46 - 18:46OK, so let's finally prove that
it works. -
18:51 - 18:51This should only take a couple
of boards. -
18:56 - 18:56So let's suppose we have a
graph and some edge weights that -
19:04 - 19:04have no negative weight cycles.
Then the claim is that we -
19:13 - 19:13terminate with the correct
answer. -
19:19 - 19:19So, Bellman-Ford terminates
with all of these d of v values -
19:30 - 19:30set to the delta values for
every vertex. -
19:38 - 19:38OK, the proof is going to be
pretty immediate using the -
19:42 - 19:42lemmas that we had from before
if you remember them. -
19:46 - 19:46So, we're just going to look at
every vertex separately. -
19:50 - 19:50So, I'll call the vertex v.
The claim is that this holds by -
19:54 - 19:54the end of the algorithm.
So, remember what we need to -
19:59 - 19:59prove is that at some point,
d of v equals delta of s comma -
20:03 - 20:03v because we know it decreases
monotonically, -
20:06 - 20:06and we know that it never gets
any smaller than the correct -
20:11 - 20:11value because relaxations are
always safe. -
20:15 - 20:15So, we just need to show at
some point this holds, -
20:25 - 20:25and that it will hold at the
end. -
20:32 - 20:32So, by monotonicity of the d
values, and by correctness part -
20:41 - 20:41one, which was that the d of v's
are always greater than or equal -
20:52 - 20:52to the deltas,
we only need to show that at -
20:59 - 20:59some point we have equality.
-
21:18 - 21:18So that's our goal.
So what we're going to do is -
21:22 - 21:22just look at v,
and the shortest path to v, -
21:25 - 21:25and see what happens to the
algorithm relative to that path. -
21:30 - 21:30So, I'm going to name the path.
Let's call it p. -
21:35 - 21:35It starts at vertex v_0 and
goes to v_1, v_2, -
21:40 - 21:40whatever, and ends at v_k.
And, this is not just any -
21:46 - 21:46shortest path,
but it's one that starts at s. -
21:51 - 21:51So, v_0's s,
and it ends at v. -
21:54 - 21:54So, I'm going to give a couple
of names to s and v so I can -
22:01 - 22:01talk about the path more
uniformly. -
22:05 - 22:05So, this is a shortest path
from s to v. -
22:11 - 22:11Now, I also want it to be not
just any shortest path from s to -
22:16 - 22:16v, but among all shortest paths
from s to v I want it to be one -
22:20 - 22:20with the fewest possible edges.
-
22:32 - 22:32OK, so shortest here means in
terms of the total weight of the -
22:36 - 22:36path.
Subject to being shortest in -
22:39 - 22:39weight, I wanted to also be
shortest in the number of edges. -
22:43 - 22:43And, the reason I want that is
to be able to conclude that p is -
22:47 - 22:47a simple path,
meaning that it doesn't repeat -
22:50 - 22:50any vertices.
Now, can anyone tell me why I -
22:53 - 22:53need to assume that the number
of edges is the smallest -
22:57 - 22:57possible in order to guarantee
that p is simple? -
23:01 - 23:01The claim is that not all
shortest paths are necessarily -
23:04 - 23:04simple.
Yeah? -
23:05 - 23:05Right, I can have a zero weight
cycle, exactly. -
23:08 - 23:08So, we are hoping,
I mean, in fact in the theorem -
23:11 - 23:11here, we're assuming that there
are no negative weight cycles. -
23:15 - 23:15But there might be zero weight
cycles still. -
23:17 - 23:17As a zero weight cycle,
you can put that in the middle -
23:20 - 23:20of any shortest path to make it
arbitrarily long, -
23:23 - 23:23repeat vertices over and over.
That's going to be annoying. -
23:27 - 23:27What I want is that p is
simple. -
23:30 - 23:30And, I can guarantee that
essentially by shortcutting. -
23:34 - 23:34If ever I take a zero weight
cycle, I throw it away. -
23:37 - 23:37And this is one mathematical
way of doing that. -
23:40 - 23:40OK, now what else do we know
about this shortest path? -
23:43 - 23:43Well, we know that subpaths are
shortest paths are shortest -
23:47 - 23:47paths.
That's optimal substructure. -
23:49 - 23:49So, we know what the shortest
path from s to v_i is sort of -
23:53 - 23:53inductively.
It's the shortest path, -
23:56 - 23:56I mean, it's the weight of that
path, which is, -
23:59 - 23:59in particular,
the shortest path from s to v -
24:02 - 24:02minus one plus the weight of the
last edge, v minus one to v_i. -
24:07 - 24:07So, this is by optimal
substructure as we proved last -
24:17 - 24:17time.
OK, and I think that's pretty -
24:24 - 24:24much the warm-up.
So, I want to sort of do this -
24:31 - 24:31inductively in I,
start out with v zero, -
24:34 - 24:34and go up to v_k.
So, the first question is, -
24:38 - 24:38what is d of v_0,
which is s? -
24:40 - 24:40What is d of the source?
Well, certainly at the -
24:44 - 24:44beginning of the algorithm,
it's zero. -
24:47 - 24:47So, let's say equals zero
initially because that's what we -
24:52 - 24:52set it to.
And it only goes down from -
24:55 - 24:55there.
So, it certainly, -
24:57 - 24:57at most, zero.
The real question is, -
25:02 - 25:02what is delta of s comma v_0.
What is the shortest path -
25:07 - 25:07weight from s to s?
It has to be zero, -
25:10 - 25:10otherwise you have a negative
weight cycle, -
25:14 - 25:14exactly.
My favorite answer, -
25:16 - 25:16zero.
So, if we had another path from -
25:19 - 25:19s to s, I mean,
that is a cycle. -
25:22 - 25:22So, it's got to be zero.
So, these are actually equal at -
25:27 - 25:27the beginning of the algorithm,
which is great. -
25:32 - 25:32That means they will be for all
time because we just argued up -
25:37 - 25:37here, only goes down,
never can get too small. -
25:41 - 25:41So, we have d of v_0 set to the
right thing. -
25:45 - 25:45Great: good for the base case
of the induction. -
25:49 - 25:49Of course, what we really care
about is v_k, -
25:53 - 25:53which is v.
So, let's talk about the v_i -
25:57 - 25:57inductively, and then we will
get v_k as a result. -
26:11 - 26:11So, yeah, let's do it by
induction. -
26:14 - 26:14That's more fun.
-
26:27 - 26:27Let's say that d of v_i is
equal to delta of s v_i after I -
26:33 - 26:33rounds of the algorithm.
So, this is actually referring -
26:38 - 26:38to the I that is in the
algorithm here. -
26:42 - 26:42These are rounds.
So, one round is an entire -
26:47 - 26:47execution of all the edges,
relaxation of all the edges. -
26:52 - 26:52So, this is certainly true for
I equals zero. -
26:57 - 26:57We just proved that.
After zero rounds, -
27:01 - 27:01at the beginning of the
algorithm, d of v_0 equals delta -
27:06 - 27:06of s, v_0.
OK, so now, that's not really -
27:11 - 27:11what I wanted,
but OK, fine. -
27:14 - 27:14Now we'll prove it for d of v_i
plus one. -
27:17 - 27:17Generally, I recommend you
assume something. -
27:20 - 27:20In fact, why don't I follow my
own advice and change it? -
27:25 - 27:25It's usually nicer to think of
induction as recursion. -
27:29 - 27:29So, you assume that this is
true, let's say, -
27:33 - 27:33for j less than the i that you
care about, and then you prove -
27:37 - 27:37it for d of v_i.
It's usually a lot easier to -
27:42 - 27:42think about it that way.
In particular, -
27:45 - 27:45you can use strong induction
for all less than i. -
27:48 - 27:48Here, we're only going to need
it for one less. -
27:52 - 27:52We have some relation between I
and I minus one here in terms of -
27:56 - 27:56the deltas.
And so, we want to argue -
27:59 - 27:59something about the d values.
OK, well, let's think about -
28:05 - 28:05what's going on here.
We know that, -
28:09 - 28:09let's say, after I minus one
rounds, we have this inductive -
28:16 - 28:16hypothesis, d of v_i minus one
equals delta of s v_i minus one. -
28:23 - 28:23And, we want to conclude that
after i rounds, -
28:28 - 28:28so we have one more round to do
this. -
28:32 - 28:32We want to conclude that d of
v_i has the right answer, -
28:38 - 28:38delta of s comma v_i.
Does that look familiar at all? -
28:44 - 28:44So we want to relax every edge
in this round. -
28:48 - 28:48In particular,
at some point, -
28:50 - 28:50we have to relax the edge from
v_i minus one to v_i. -
28:54 - 28:54We know that this path consists
of edges. -
28:57 - 28:57That's the definition of a
path. -
29:00 - 29:00So, during the i'th round,
we relax every edge. -
29:10 - 29:10So, we better relax v_i minus
one v_i. -
29:19 - 29:19And, what happens then?
It's a test of memory. -
29:43 - 29:43Quick, the Death Star is
approaching. -
29:47 - 29:47So, if we have the correct
value for v_i minus one, -
29:52 - 29:52that we relax an outgoing edge
from there, and that edge is an -
29:58 - 29:58edge of the shortest path from s
to v_i. -
30:02 - 30:02What do we know?
d of v_i becomes the correct -
30:07 - 30:07value, delta of s comma v_i.
This was called correctness -
30:13 - 30:13lemma last time.
One of the things we proved -
30:18 - 30:18about Dijkstra's algorithm,
but it was really just a fact -
30:24 - 30:24about relaxation.
And it was a pretty simple -
30:29 - 30:29proof.
And it comes from this fact. -
30:33 - 30:33We know the shortest path
weight is this. -
30:36 - 30:36So, certainly d of v_i was at
least this big, -
30:38 - 30:38and let's suppose it's greater,
or otherwise we were done. -
30:42 - 30:42We know d of v_i minus one is
set to this. -
30:45 - 30:45And so, this is exactly the
condition that's being checked -
30:49 - 30:49in the relaxation step.
And, the d of v_i value will be -
30:52 - 30:52greater than this,
let's suppose. -
30:54 - 30:54And then, we'll set it equal to
this. -
30:57 - 30:57And that's exactly d of s v_i.
So, when we relax that edge, -
31:02 - 31:02we've got to set it to the
right value. -
31:04 - 31:04So, this is the end of the
proof, right? -
31:07 - 31:07It's very simple.
The point is, -
31:09 - 31:09you look at your shortest path.
Here it is. -
31:12 - 31:12And if we assume there's no
negative weight cycles, -
31:15 - 31:15this has the correct value
initially. -
31:17 - 31:17d of s is going to be zero.
After the first round, -
31:20 - 31:20you've got to relax this edge.
And then you get the right -
31:24 - 31:24value for that vertex.
After the second round, -
31:27 - 31:27you've got to relax this edge,
which gets you the right d -
31:31 - 31:31value for this vertex and so on.
And so, no matter which -
31:36 - 31:36shortest path you take,
you can apply this analysis. -
31:41 - 31:41And you know that by,
if the length of this path, -
31:45 - 31:45here we assumed it was k edges,
then after k rounds you've got -
31:50 - 31:50to be done.
OK, so this was not actually -
31:54 - 31:54the end of the proof.
Sorry. -
31:57 - 31:57So this means after k rounds,
we have the right answer for -
32:03 - 32:03v_k, which is v.
So, the only question is how -
32:08 - 32:08big could k be?
And, it better be the right -
32:13 - 32:13answer, at most,
v minus one is the claim by the -
32:18 - 32:18algorithm that you only need to
do v minus one steps. -
32:24 - 32:24And indeed, the number of edges
in a simple path in a graph is, -
32:31 - 32:31at most, the number of vertices
minus one. -
32:37 - 32:37k is, at most,
v minus one because p is -
32:40 - 32:40simple.
So, that's why we had to assume -
32:44 - 32:44that it wasn't just any shortest
path. -
32:47 - 32:47It had to be a simple one so it
didn't repeat any vertices. -
32:52 - 32:52So there are,
at most, V vertices in the -
32:56 - 32:56path, so at most,
V minus one edges in the path. -
33:01 - 33:01OK, and that's all there is to
Bellman-Ford. -
33:06 - 33:06So: pretty simple in
correctness. -
33:09 - 33:09Of course, we're using a lot of
the lemmas that we proved last -
33:15 - 33:15time, which makes it easier.
OK, a consequence of this -
33:21 - 33:21theorem, or of this proof is
that if Bellman-Ford fails to -
33:27 - 33:27converge, and that's what the
algorithm is checking is whether -
33:34 - 33:34this relaxation still requires
work after these d minus one -
33:40 - 33:40steps.
Right, the end of this -
33:44 - 33:44algorithm is run another round,
a V'th round, -
33:48 - 33:48see whether anything changes.
So, we'll say that the -
33:53 - 33:53algorithm fails to converge
after V minus one steps or -
33:59 - 33:59rounds.
Then, there has to be a -
34:02 - 34:02negative weight cycle.
OK, this is just a -
34:04 - 34:04contrapositive of what we
proved. -
34:07 - 34:07We proved that if you assume
there's no negative weight -
34:10 - 34:10cycle, then we know that d of s
is zero, and then all this -
34:14 - 34:14argument says is you've got to
converge after v minus one -
34:18 - 34:18rounds.
There can't be anything left to -
34:21 - 34:21do once you've reached the
shortest path weights because -
34:25 - 34:25you're going monotonically;
you can never hit the bottom. -
34:30 - 34:30You can never go to the floor.
So, if you fail to converge -
34:34 - 34:34somehow after V minus one
rounds, you've got to have -
34:37 - 34:37violated the assumption.
The only assumption we made was -
34:41 - 34:41there's no negative weight
cycle. -
34:43 - 34:43So, this tells us that
Bellman-Ford is actually -
34:46 - 34:46correct.
When it says that there is a -
34:48 - 34:48negative weight cycle,
it indeed means it. -
34:51 - 34:51It's true.
OK, and you can modify -
34:53 - 34:53Bellman-Ford in that case to
sort of run a little longer, -
34:57 - 34:57and find where all the minus
infinities are. -
35:01 - 35:01And that is,
in some sense, -
35:03 - 35:03one of the things you have to
do in your problem set, -
35:06 - 35:06I believe.
So, I won't cover it here. -
35:08 - 35:08But, it's a good exercise in
any case to figure out how you -
35:12 - 35:12would find where the minus
infinities are. -
35:15 - 35:15What are all the vertices
reachable from negative weight -
35:18 - 35:18cycle?
Those are the ones that have -
35:20 - 35:20minus infinities.
OK, so you might say, -
35:23 - 35:23well, that was awfully fast.
Actually, it's not over yet. -
35:26 - 35:26The episode is not yet ended.
We're going to use Bellman-Ford -
35:30 - 35:30to solve the even bigger and
greater shortest path problems. -
35:35 - 35:35And in the remainder of today's
lecture, we will see it applied -
35:39 - 35:39to a more general problem,
in some sense, -
35:42 - 35:42called linear programming.
And the next lecture, -
35:46 - 35:46we'll really use it to do some
amazing stuff with all pairs -
35:50 - 35:50shortest paths.
Let's go over here. -
35:52 - 35:52So, our goal,
although it won't be obvious -
35:55 - 35:55today, is to be able to compute
the shortest paths between every -
36:00 - 36:00pair of vertices,
which we could certainly do at -
36:03 - 36:03this point just by running
Bellman-Ford v times. -
36:08 - 36:08OK, but we want to do better
than that, of course. -
36:15 - 36:15And, that will be the climax of
the trilogy. -
36:22 - 36:22OK, today we just discovered
who Luke's father is. -
36:30 - 36:30So, it turns out the father of
shortest paths is linear -
36:37 - 36:37programming.
Actually, simultaneously the -
36:43 - 36:43father and the mother because
programs do not have gender. -
36:51 - 36:51OK, my father likes to say,
we both took improv comedy -
36:58 - 36:58lessons so we have degrees in
improvisation. -
37:05 - 37:05And he said,
you know, we went to improv -
37:07 - 37:07classes in order to learn how to
make our humor better. -
37:11 - 37:11And, the problem is,
it didn't actually make our -
37:14 - 37:14humor better.
It just made us less afraid to -
37:16 - 37:16use it.
[LAUGHTER] So, -
37:17 - 37:17you are subjected to all this
improv humor. -
37:20 - 37:20I didn't see the connection of
Luke's father, -
37:23 - 37:23but there you go.
OK, so, linear programming is a -
37:26 - 37:26very general problem,
a very big tool. -
37:29 - 37:29Has anyone seen linear
programming before? -
37:33 - 37:33OK, one person.
And, I'm sure you will, -
37:36 - 37:36at some time in your life,
do anything vaguely computing -
37:41 - 37:41optimization related,
linear programming comes up at -
37:45 - 37:45some point.
It's a very useful tool. -
37:49 - 37:49You're given a matrix and two
vectors: not too exciting yet. -
37:54 - 37:54What you want to do is find a
vector. -
37:57 - 37:57This is a very dry description.
We'll see what makes it so -
38:03 - 38:03interesting in a moment.
-
38:17 - 38:17So, you want to maximize some
objective, and you have some -
38:21 - 38:21constraints.
And they're all linear. -
38:24 - 38:24So, the objective is a linear
function in the variables x, -
38:29 - 38:29and your constraints are a
bunch of linear constraints, -
38:33 - 38:33inequality constraints,
that's one makes an -
38:36 - 38:36interesting.
It's not just solving a linear -
38:39 - 38:39system as you've seen in linear
algebra, or whatever. -
38:43 - 38:43Or, of course,
it could be that there is no -
38:47 - 38:47such x.
OK: vaguely familiar you might -
38:49 - 38:49think to the theorem about
Bellman-Ford. -
38:53 - 38:53And, we'll show that there's
some kind of connection here -
38:57 - 38:57that either you want to find
something, or show that it -
39:01 - 39:01doesn't exist.
Well, that's still a pretty -
39:06 - 39:06vague connection,
but I also want to maximize -
39:09 - 39:09something, or are sort of
minimize the shortest paths, -
39:13 - 39:13OK, somewhat similar.
We have these constraints. -
39:17 - 39:17So, yeah.
This may be intuitive to you, -
39:20 - 39:20I don't know.
I prefer a more geometric -
39:23 - 39:23picture, and I will try to draw
such a geometric picture, -
39:27 - 39:27and I've never tried to do this
on a blackboard, -
39:31 - 39:31so it should be interesting.
I think I'm going to fail -
39:36 - 39:36miserably.
It sort of looks like a -
39:40 - 39:40dodecahedron,
right? -
39:42 - 39:42Sort of, kind of,
not really. -
39:44 - 39:44A bit rough on the bottom,
OK. -
39:47 - 39:47So, if you have a bunch of
linear constraints, -
39:52 - 39:52this is supposed to be in 3-D.
Now I labeled it. -
39:57 - 39:57It's now in 3-D.
Good. -
40:00 - 40:00So, you have these linear
constraints. -
40:03 - 40:03That turns out to define
hyperplanes in n dimensions. -
40:07 - 40:07OK, so you have this base here
that's three-dimensional space. -
40:11 - 40:11So, n equals three.
And, these hyperplanes, -
40:14 - 40:14if you're looking at one side
of the hyperplane, -
40:18 - 40:18that's the less than or equal
to, if you take the -
40:21 - 40:21intersection,
you get some convex polytope or -
40:24 - 40:24polyhedron.
In 3-D, you might get a -
40:27 - 40:27dodecahedron or whatever.
And, your goal, -
40:30 - 40:30you have some objective vector
c, let's say, -
40:33 - 40:33up.
Suppose that's the c vector. -
40:37 - 40:37Your goal is to find the
highest point in this polytope. -
40:42 - 40:42So here, it's maybe this one.
OK, this is the target. -
40:47 - 40:47This is the optimal,
x. -
40:49 - 40:49That is the geometric view.
If you prefer the algebraic -
40:54 - 40:54view, you want to maximize the c
transpose times x. -
41:00 - 41:00So, this is m.
This is n. -
41:02 - 41:02Check out the dimensions work
out. -
41:05 - 41:05So that's saying you want to
maximize the dot product. -
41:09 - 41:09You want to maximize the extent
to which x is in the direction -
41:14 - 41:14c.
And, you want to maximize that -
41:16 - 41:16subject to some constraints,
which looks something like -
41:20 - 41:20this, maybe.
So, this is A, -
41:23 - 41:23and it's m by n.
You want to multiply it by, -
41:26 - 41:26it should be something of
height n. -
41:30 - 41:30That's x.
Let me put x down here, -
41:33 - 41:33n by one.
And, it should be less than or -
41:36 - 41:36equal to something of this
height, which is B, -
41:40 - 41:40the right hand side.
OK, that's the algebraic view, -
41:44 - 41:44which is to check out all the
dimensions are working out. -
41:49 - 41:49But, you can read these off in
each row here, -
41:53 - 41:53when multiplied by this column,
gives you one value here. -
41:57 - 41:57And as just a linear
constraints on all the x sides. -
42:03 - 42:03So, you want to maximize this
linear function of x_1 up to x_n -
42:09 - 42:09subject to these constraints,
OK? -
42:12 - 42:12Pretty simple,
but pretty powerful in general. -
42:16 - 42:16So, it turns out that with,
you can formulate a huge number -
42:22 - 42:22of problems such as shortest
paths as a linear program. -
42:27 - 42:27So, it's a general tool.
And in this class, -
42:32 - 42:32we will not cover any
algorithms for solving linear -
42:37 - 42:37programming.
It's a bit tricky. -
42:40 - 42:40I'll just mention that they are
out there. -
42:45 - 42:45So, there's many efficient
algorithms, and lots of code -
42:51 - 42:51that does this.
It's a very practical setup. -
42:55 - 42:55So, lots of algorithms to solve
LP's, linear programs. -
43:02 - 43:02Linear programming is usually
called LP. -
43:06 - 43:06And, I'll mention a few of
them. -
43:09 - 43:09There's the simplex algorithm.
This is one of the first. -
43:14 - 43:14I think it is the first,
the ellipsoid algorithm. -
43:19 - 43:19There's interior point methods,
and there's random sampling. -
43:25 - 43:25I'll just say a little bit
about each of these because -
43:30 - 43:30we're not going to talk about
any of them in depth. -
43:36 - 43:36The simplex algorithm,
this is, I mean, -
43:38 - 43:38one of the first algorithms in
the world in some sense, -
43:42 - 43:42certainly one of the most
popular. -
43:44 - 43:44It's still used today.
Almost all linear programming -
43:47 - 43:47code uses the simplex algorithm.
It happens to run an -
43:50 - 43:50exponential time in the
worst-case, so it's actually -
43:54 - 43:54pretty bad theoretically.
But in practice, -
43:56 - 43:56it works really well.
And there is some recent work -
43:59 - 43:59that tries to understand this.
It's still exponential in the -
44:03 - 44:03worst case.
But, it's practical. -
44:06 - 44:06There's actually an open
problem whether there exists a -
44:10 - 44:10variation of simplex that runs
in polynomial time. -
44:14 - 44:14But, I won't go into that.
That's a major open problem in -
44:18 - 44:18this area of linear programming.
The ellipsoid algorithm was the -
44:22 - 44:22first algorithm to solve linear
programming in polynomial time. -
44:26 - 44:26So, for a long time,
people didn't know. -
44:30 - 44:30Around this time,
people started realizing -
44:32 - 44:32polynomial time is a good thing.
That happened around the late -
44:36 - 44:3660s.
Polynomial time is good. -
44:38 - 44:38And, the ellipsoid algorithm is
the first one to do it. -
44:41 - 44:41It's a very general algorithm,
and very powerful, -
44:44 - 44:44theoretically:
completely impractical. -
44:46 - 44:46But, it's cool.
It lets you do things like you -
44:49 - 44:49can solve a linear program that
has exponentially many -
44:52 - 44:52constraints in polynomial time.
You've got all sorts of crazy -
44:56 - 44:56things.
So, I'll just say it's -
44:58 - 44:58polynomial time.
I can't say something nice -
45:02 - 45:02about it; don't say it at all.
It's impractical. -
45:05 - 45:05Interior point methods are sort
of the mixture. -
45:08 - 45:08They run in polynomial time.
You can guarantee that. -
45:11 - 45:11And, they are also pretty
practical, and there's sort of -
45:15 - 45:15this competition these days
about whether simplex or -
45:18 - 45:18interior point is better.
And, I don't know what it is -
45:22 - 45:22today but a few years ago they
were neck and neck. -
45:25 - 45:25And, random sampling is a brand
new approach. -
45:28 - 45:28This is just from a couple
years ago by two MIT professors, -
45:32 - 45:32Dimitris Bertsimas and Santosh
Vempala, I guess the other is in -
45:36 - 45:36applied math.
So, just to show you, -
45:39 - 45:39there's active work in this
area. -
45:41 - 45:41People are still finding new
ways to solve linear programs. -
45:45 - 45:45This is completely randomized,
and very simple, -
45:47 - 45:47and very general.
It hasn't been implemented, -
45:50 - 45:50so we don't know how practical
it is yet. -
45:52 - 45:52But, it has potential.
OK: pretty neat. -
45:55 - 45:55OK, we're going to look at a
somewhat simpler version of -
45:58 - 45:58linear programming.
The first restriction we are -
46:02 - 46:02going to make is actually not
much of a restriction. -
46:06 - 46:06But, nonetheless we will
consider it, it's a little bit -
46:10 - 46:10easier to think about.
So here, we had some polytope -
46:14 - 46:14we wanted to maximize some
objective. -
46:16 - 46:16In a feasibility problem,
I just want to know, -
46:20 - 46:20is the polytope empty?
Can you find any point in that -
46:23 - 46:23polytope?
Can you find any set of values, -
46:26 - 46:26x, that satisfy these
constraints? -
46:30 - 46:30OK, so there's no objective.
c, just find x such that AX is -
46:35 - 46:35less than or equal to B.
OK, it turns out you can prove -
46:39 - 46:39a very general theorem that if
you can solve linear -
46:43 - 46:43feasibility, you can also solve
linear programming. -
46:48 - 46:48We won't prove that here,
but this is actually no easier -
46:52 - 46:52than the original problem even
though it feels easier, -
46:56 - 46:56and it's easier to think about.
I was just saying actually no -
47:03 - 47:03easier than LP.
OK, the next restriction we're -
47:08 - 47:08going to make is a real
restriction. -
47:12 - 47:12And it simplifies the problem
quite a bit. -
47:30 - 47:30And that's to look at different
constraints. -
47:35 - 47:35And, if all this seemed a bit
abstract so far, -
47:41 - 47:41we will now ground ourselves
little bit. -
47:46 - 47:46A system of different
constraints is a linear -
47:51 - 47:51feasibility problem.
So, it's an LP where there's no -
47:58 - 47:58objective.
And, it's with a restriction, -
48:06 - 48:06so, where each row of the
matrix, so, the matrix, -
48:17 - 48:17A, has one one,
and it has one minus one, -
48:26 - 48:26and everything else in the row
is zero. -
48:36 - 48:36OK, in other words,
each constraint has its very -
48:41 - 48:41simple form.
It involves two variables and -
48:45 - 48:45some number.
So, we have something like x_j -
48:50 - 48:50minus x_i is less than or equal
to w_ij. -
48:54 - 48:54So, this is just a number.
These are two variables. -
49:00 - 49:00There's a minus sign,
no values up here, -
49:03 - 49:03no coefficients,
no other of the X_k's appear, -
49:06 - 49:06just two of them.
And, you have a bunch of -
49:09 - 49:09constraints of this form,
one per row of the matrix. -
49:13 - 49:13Geometrically,
I haven't thought about what -
49:16 - 49:16this means.
I think it means the -
49:18 - 49:18hyperplanes are pretty simple.
Sorry I can't do better than -
49:23 - 49:23that.
It's a little hard to see this -
49:25 - 49:25in high dimensions.
But, it will start to -
49:31 - 49:31correspond to something we've
seen, namely the board that its -
49:39 - 49:39next to, very shortly.
OK, so let's do a very quick -
49:45 - 49:45example mainly to have something
to point at. -
49:51 - 49:51Here's a very simple system of
difference constraints -- -
50:11 - 50:11-- OK, and a solution.
Why not? -
50:14 - 50:14It's not totally trivial to
solve this, but here's a -
50:18 - 50:18solution.
And the only thing to check is -
50:21 - 50:21that each of these constraints
is satisfied. -
50:25 - 50:25x_1 minus x_2 is three,
which is less than or equal to -
50:30 - 50:30three, and so on.
There could be negative values. -
50:36 - 50:36There could be positive values.
It doesn't matter. -
50:42 - 50:42I'd like to transform this
system of difference constraints -
50:50 - 50:50into a graph because we know a
lot about graphs. -
50:56 - 50:56So, we're going to call this
the constraint graph. -
51:03 - 51:03And, it's going to represent
these constraints. -
51:08 - 51:08How'd I do it?
Well, I take every constraint, -
51:14 - 51:14which in general looks like
this, and I convert it into an -
51:20 - 51:20edge.
OK, so if I write it as x_j -
51:24 - 51:24minus x_i is less than or equal
to some w_ij, -
51:29 - 51:29w seems suggestive of weights.
That's exactly why I called it -
51:36 - 51:36w.
I'm going to make that an edge -
51:39 - 51:39from v_i to v_j.
So, the order flips a little -
51:42 - 51:42bit.
And, the weight of that edge is -
51:45 - 51:45w_ij.
So, just do that. -
51:46 - 51:46Make n vertices.
So, you have the number of -
51:49 - 51:49vertices equals n.
The number of edges equals the -
51:53 - 51:53number of constraints,
which is m, the height of the -
51:57 - 51:57matrix, and just transform.
So, for example, -
52:01 - 52:01here we have three variables.
So, we have three vertices, -
52:06 - 52:06v_1, v_2, v_3.
We have x_1 minus x_2. -
52:10 - 52:10So, we have an edge from v_2 to
v_1 of weight three. -
52:14 - 52:14We have x_2 minus x_3.
So, we have an edge from v_3 to -
52:19 - 52:19v_2 of weight minus two.
And, we have x_1 minus x_3. -
52:23 - 52:23So, we have an edge from v_3 to
v_1 of weight two. -
52:28 - 52:28I hope I got the directions
right. -
52:32 - 52:32Yep.
So, there it is, -
52:34 - 52:34a graph: currently no obvious
connection to shortest paths, -
52:40 - 52:40right?
But in fact, -
52:42 - 52:42this constraint is closely
related to shortest paths. -
52:48 - 52:48So let me just rewrite it.
You could say, -
52:52 - 52:52well, an x_j is less than or
equal to x_i plus w_ij. -
52:59 - 52:59Or, you could think of it as
d[j] less than or equal to d[i] -
53:04 - 53:04plus w_ij.
This is a conceptual balloon. -
53:07 - 53:07Look awfully familiar?
A lot like the triangle -
53:11 - 53:11inequality, a lot like
relaxation. -
53:13 - 53:13So, there's a very close
connection between these two -
53:18 - 53:18problems as we will now prove.
-
53:43 - 53:43So, we're going to have two
theorems. -
53:45 - 53:45And, they're going to look
similar to the correctness of -
53:49 - 53:49Bellman-Ford in that they talk
about negative weight cycles. -
53:53 - 53:53Here we go.
It turns out, -
53:55 - 53:55I mean, we have this constraint
graph. -
53:57 - 53:57It can have negative weights.
It can have positive weights. -
54:02 - 54:02It turns out what matters is if
you have a negative weight -
54:06 - 54:06cycle.
So, the first thing to prove is -
54:08 - 54:08that if you have a negative
weight cycle that something bad -
54:12 - 54:12happens.
OK, what could happen bad? -
54:14 - 54:14Well, we're just trying to
satisfy this system of -
54:17 - 54:17constraints.
So, the bad thing is that there -
54:19 - 54:19might not be any solution.
These constraints may be -
54:23 - 54:23infeasible.
And that's the claim. -
54:25 - 54:25The claim is that this is
actually an if and only if. -
54:29 - 54:29But first we'll proved the if.
If you have a negative weight -
54:34 - 54:34cycle, you're doomed.
The difference constraints are -
54:38 - 54:38unsatisfiable.
That's a more intuitive way to -
54:42 - 54:42say it.
In the LP world, -
54:44 - 54:44they call it infeasible.
But unsatisfiable makes a lot -
54:48 - 54:48more sense.
There's no way to assign the -
54:52 - 54:52x_i's in order to satisfy all
the constraints simultaneously. -
54:57 - 54:57So, let's just take a look.
Consider a negative weight -
55:02 - 55:02cycle.
It starts at some vertex, -
55:04 - 55:04goes through some vertices,
and at some point comes back. -
55:08 - 55:08I don't care whether it repeats
vertices, just as long as this -
55:12 - 55:12cycle, from v_1 to v_1 is a
negative weight cycle strictly -
55:15 - 55:15negative weight.
-
55:26 - 55:26OK, and what I'm going to do is
just write down all the -
55:31 - 55:31constraints.
Each of these edges corresponds -
55:34 - 55:34to a constraint,
which must be in the set of -
55:38 - 55:38constraints because we had that
graph. -
55:41 - 55:41So, these are all edges.
Let's look at what they give -
55:45 - 55:45us.
So, we have an edge from v_1 to -
55:48 - 55:48v_2.
That corresponds to x_2 minus -
55:51 - 55:51x_1 is, at most,
something, w_12. -
55:53 - 55:53Then we have x_3 minus x_2.
That's the weight w_23, -
55:58 - 55:58and so on.
And eventually we get up to -
56:04 - 56:04something like x_k minus
x_(k-1). -
56:09 - 56:09That's this edge:
w_(k-1),k , and lastly we have -
56:16 - 56:16this edge, which wraps around.
So, it's x_1 minus x_k, -
56:24 - 56:24w_k1 if I've got the signs
right. -
56:30 - 56:30Good, so here's a bunch of
constraints. -
56:36 - 56:36What do you suggest I do with
them? -
56:41 - 56:41Anything interesting about
these constraints, -
56:47 - 56:47say, the left hand sides?
Sorry? -
56:52 - 56:52It sounded like the right word.
What was it? -
57:00 - 57:00Telescopes, yes,
good. -
57:01 - 57:01Everything cancels.
If I added these up, -
57:04 - 57:04there's an x_2 and a minus x_2.
There's a minus x_1 and an x_1. -
57:08 - 57:08There's a minus XK and an XK.
Everything here cancels if I -
57:12 - 57:12add up the left hand sides.
So, what happens if I add up -
57:16 - 57:16the right hand sides?
Over here I get zero, -
57:19 - 57:19my favorite answer.
And over here, -
57:21 - 57:21we get all the weights of all
the edges in the negative weight -
57:25 - 57:25cycle, which is the weight of
the cycle, which is negative. -
57:30 - 57:30So, zero is strictly less than
zero: contradiction. -
57:33 - 57:33Contradiction:
wait a minute, -
57:35 - 57:35we didn't assume anything that
was false. -
57:38 - 57:38So, it's not really a
contradiction in the -
57:40 - 57:40mathematical sense.
We didn't contradict the world. -
57:44 - 57:44We just said that these
constraints are contradictory. -
57:47 - 57:47In other words,
if you pick any values of the -
57:50 - 57:50x_i's, there is no way that
these can all be true because -
57:54 - 57:54that you would get a
contradiction. -
57:56 - 57:56So, it's impossible for these
things to be satisfied by some -
58:00 - 58:00real x_i's.
So, these must be -
58:02 - 58:02unsatisfiable.
Let's say there's no satisfying -
58:07 - 58:07assignment, a little more
precise, x_1 up to x_m, -
58:12 - 58:12no weights.
Can we satisfy those -
58:15 - 58:15constraints?
Because they add up to zero on -
58:19 - 58:19the left-hand side,
and negative on the right-hand -
58:24 - 58:24side.
OK, so that's an easy proof. -
58:27 - 58:27The reverse direction will be
only slightly harder. -
58:33 - 58:33OK, so, cool.
We have this connection. -
58:35 - 58:35So motivation is,
suppose you'd want to solve -
58:37 - 58:37these difference constraints.
And we'll see one such -
58:40 - 58:40application.
I Googled around for difference -
58:42 - 58:42constraints.
There is a fair number of -
58:44 - 58:44papers that care about
difference constraints. -
58:47 - 58:47And, they all use shortest
paths to solve them. -
58:49 - 58:49So, if we can prove a
connection between shortest -
58:52 - 58:52paths, which we know how to
compute, and difference -
58:55 - 58:55constraints, then we'll have
something cool. -
58:57 - 58:57And, next class will see even
more applications of difference -
59:00 - 59:00constraints.
It turns out they're really -
59:05 - 59:05useful for all pairs shortest
paths. -
59:10 - 59:10OK, but for now let's just
prove this equivalence and -
59:16 - 59:16finish it off.
So, the reverse direction is if -
59:22 - 59:22there's no negative weight cycle
in this constraint graph, -
59:29 - 59:29then the system better be
satisfiable. -
59:35 - 59:35The claim is that these
negative weight cycles are the -
59:42 - 59:42only barriers for finding a
solution to these difference -
59:49 - 59:49constraints.
I have this feeling somewhere -
59:55 - 59:55here.
I had to talk about the -
59:59 - 59:59constraint graph.
Good. -
60:13 - 60:13Satisfied, good.
So, here we're going to see a -
60:20 - 60:20technique that is very useful
when thinking about shortest -
60:28 - 60:28paths.
And, it's a bit hard to guess, -
60:33 - 60:33especially if you haven't seen
it before. -
60:37 - 60:37This is useful in problem sets,
and in quizzes, -
60:41 - 60:41and finals, and everything.
So, keep this in mind. -
60:45 - 60:45I mean, I'm using it to prove
this rather simple theorem, -
60:51 - 60:51but the idea of changing the
graph, so I'm going to call this -
60:56 - 60:56constraint graph G.
Changing the graph is a very -
61:00 - 61:00powerful idea.
So, we're going to add a new -
61:04 - 61:04vertex, s, or source,
use the source, -
61:08 - 61:08Luke, and we're going to add a
bunch of edges from s because -
61:13 - 61:13being a source,
it better be connected to some -
61:17 - 61:17things.
So, we are going to add a zero -
61:24 - 61:24weight edge, or weight zero edge
from s to everywhere, -
61:30 - 61:30so, to every other vertex in
the constraint graph. -
61:36 - 61:36Those vertices are called v_i,
v_1 up to v_n. -
61:40 - 61:40So, I have my constraint graph.
But I'll copy this one so I can -
61:46 - 61:46change it.
It's always good to backup your -
61:50 - 61:50work before you make changes,
right? -
61:53 - 61:53So now, I want to add a new
vertex, s, over here, -
61:58 - 61:58my new source.
I just take my constraint -
62:01 - 62:01graph, whatever it looks like,
add in weight zero edges to all -
62:07 - 62:07the other vertices.
Simple enough. -
62:11 - 62:11Now, what did I do?
What did you do? -
62:14 - 62:14Well, I have a candidate source
now which can reach all the -
62:19 - 62:19vertices.
So, shortest path from s, -
62:22 - 62:22hopefully, well,
paths from s exist. -
62:25 - 62:25I can get from s to everywhere
in weight at most zero. -
62:30 - 62:30OK, maybe less.
Could it be less? -
62:32 - 62:32Well, you know,
like v_2, I can get to it by -
62:34 - 62:34zero minus two.
So, that's less than zero. -
62:37 - 62:37So I've got to be a little
careful. -
62:39 - 62:39What if there's a negative
weight cycle? -
62:41 - 62:41Oh no?
Then there wouldn't be any -
62:43 - 62:43shortest paths.
Fortunately, -
62:44 - 62:44we assume that there's no
negative weight cycle in the -
62:47 - 62:47original graph.
And if you think about it, -
62:50 - 62:50if there's no negative weight
cycle in the original graph, -
62:53 - 62:53we add an edge from s to
everywhere else. -
62:55 - 62:55We're not making any new
negative weight cycles because -
62:59 - 62:59you can start at s and go
somewhere at a cost of zero, -
63:02 - 63:02which doesn't affect any
weights. -
63:05 - 63:05And then, you are forced to
stay in the old graph. -
63:09 - 63:09So, there can't be any new
negative weight cycles. -
63:13 - 63:13So, the modified graph has no
negative weight cycles. -
63:17 - 63:17That's good because it also has
paths from s, -
63:21 - 63:21and therefore it also has
shortest paths from s. -
63:25 - 63:25The modified graph has no
negative weight because it -
63:30 - 63:30didn't before.
And, it has paths from s. -
63:34 - 63:34There's a path from s to every
vertex. -
63:38 - 63:38There may not have been before.
Before, I couldn't get from v_2 -
63:45 - 63:45to v_3, for example.
Well, that's still true. -
63:50 - 63:50But from s I can get to
everywhere. -
63:53 - 63:53So, that means that this graph,
this modified graph, -
63:59 - 63:59has shortest paths.
Shortest paths exist from s. -
64:05 - 64:05In other words,
if I took all the shortest path -
64:10 - 64:10weights, like I ran Bellman-Ford
from s, then, -
64:15 - 64:15I would get a bunch of finite
numbers, d of v, -
64:19 - 64:19for every value,
for every vertex. -
64:23 - 64:23That seems like a good idea.
Let's do it. -
64:27 - 64:27So, shortest paths exist.
Let's just assign x_i to be the -
64:34 - 64:34shortest path weight from s to
v_i. -
64:37 - 64:37Why not?
That's a good choice for a -
64:40 - 64:40number, the shortest path weight
from s to v_i. -
64:44 - 64:44This is finite because it's
less than infinity, -
64:48 - 64:48and it's greater than minus
infinity, so, -
64:52 - 64:52some finite number.
That's what we need to do in -
64:56 - 64:56order to satisfy these
constraints. -
65:00 - 65:00The claim is that this is a
satisfying assignment. -
65:04 - 65:04Why?
Triangle inequality. -
65:06 - 65:06Somewhere here we wrote
triangle inequality. -
65:09 - 65:09This looks a lot like the
triangle inequality. -
65:13 - 65:13In fact, I think that's the end
of the proof. -
65:16 - 65:16Let's see here.
What we want to be true with -
65:20 - 65:20this assignment is that x_j
minus x_i is less than or equal -
65:25 - 65:25to w_ij whenever ij is an edge.
Or, let's say v_i, -
65:28 - 65:28v_j, for every such constraint,
so, for v_i, -
65:32 - 65:32v_j in the edge set.
OK, so what is this true? -
65:37 - 65:37Well, let's just expand it out.
So, x_i is this delta, -
65:42 - 65:42and x_j is some other delta.
So, we have delta of s, -
65:47 - 65:47vj minus delta of s_vi.
And, on the right-hand side, -
65:52 - 65:52well, w_ij, that was the weight
of the edge from I to J. -
65:57 - 65:57So, this is the weight of v_i
to v_j. -
66:01 - 66:01OK, I will rewrite this
slightly. -
66:04 - 66:04Delta s, vj is less than or
equal to delta s, -
66:07 - 66:07vi plus w of v_i,
v_j. -
66:09 - 66:09And that's the triangle
inequality more or less. -
66:13 - 66:13The shortest path from s to v_j
is, at most, shortest path from -
66:18 - 66:18s to v_i plus a particular path
from v_i to v_j, -
66:22 - 66:22namely the single edge v_i to
v_j. -
66:25 - 66:25This could only be longer than
the shortest path. -
66:30 - 66:30And so, that makes the
right-hand side bigger, -
66:33 - 66:33which makes this inequality
more true, meaning it was true -
66:38 - 66:38before.
And now it's still true. -
66:40 - 66:40And, that proves it.
This is true. -
66:42 - 66:42And, these were all equivalent
statements. -
66:46 - 66:46This we know to be true by
triangle inequality. -
66:49 - 66:49Therefore, these constraints
are all satisfied. -
66:52 - 66:52Magic.
I'm so excited here. -
66:54 - 66:54So, we've proved that having a
negative weight cycle is exactly -
66:59 - 66:59when these system of difference
constraints are unsatisfiable. -
67:05 - 67:05So, if we want to satisfy them,
if we want to find the right -
67:08 - 67:08answer to x, we run
Bellman-Ford. -
67:10 - 67:10Either it says,
oh, no negative weight cycle. -
67:12 - 67:12Then you are hosed.
Then, there is no solution. -
67:15 - 67:15But that's the best you could
hope to know. -
67:17 - 67:17Otherwise, it says,
oh, there was no negative -
67:20 - 67:20weight cycle,
and here are your shortest path -
67:22 - 67:22weights.
You just plug them in, -
67:24 - 67:24and bam, you have your x_i's
that satisfy the constraints. -
67:27 - 67:27Awesome.
Now, it wasn't just any graph. -
67:30 - 67:30I mean, we started with
constraints, algebra, -
67:33 - 67:33we converted it into a graph by
this transform. -
67:36 - 67:36Then we added a source vertex,
s. -
67:38 - 67:38So, I mean, we had to build a
graph to solve our problem, -
67:42 - 67:42very powerful idea.
Cool. -
67:43 - 67:43This is the idea of reduction.
You can reduce the problem you -
67:47 - 67:47want to solve into some problem
you know how to solve. -
67:51 - 67:51You know how to solve shortest
paths when there are no negative -
67:55 - 67:55weight cycles,
or find out that there is a -
67:57 - 67:57negative weight cycle by
Bellman-Ford. -
68:01 - 68:01So, now we know how to solve
difference constraints. -
68:06 - 68:06It turns out you can do even
more. -
68:09 - 68:09Bellman-Ford does a little bit
more than just solve these -
68:15 - 68:15constraints.
But first let me write down -
68:19 - 68:19what I've been jumping up and
down about. -
68:23 - 68:23The corollary is you can use
Bellman-Ford. -
68:27 - 68:27I mean, you make this graph.
Then you apply Bellman-Ford, -
68:34 - 68:34and it will solve your system
of difference constraints. -
68:41 - 68:41So, let me put in some numbers
here. -
68:46 - 68:46You have m difference
constraints. -
68:50 - 68:50And, you have n variables.
And, it will solve them in -
68:56 - 68:56order m times n time.
Actually, these numbers go up -
69:02 - 69:02slightly because we are adding n
edges, and we're adding one -
69:07 - 69:07vertex, but assuming all of
these numbers are nontrivial, -
69:12 - 69:12m is at least n.
It's order MN time. -
69:15 - 69:15OK, trying to avoid cases where
some of them are close to zero. -
69:20 - 69:20Good.
So, some other facts, -
69:22 - 69:22that's what I just said.
And we'll leave these as -
69:26 - 69:26exercises because they're not
too essential. -
69:31 - 69:31The main thing we need is this.
But, some other cool facts is -
69:36 - 69:36that Bellman-Ford actually
optimizes some objective -
69:39 - 69:39functions.
So, we are saying it's just a -
69:42 - 69:42feasibility problem.
We just want to know whether -
69:46 - 69:46these constraints are
satisfiable. -
69:49 - 69:49In fact, you can add a
particular objective function. -
69:53 - 69:53So, you can't give it an
arbitrary objective function, -
69:57 - 69:57but here's one of interest.
x_1 plus x_2 plus x_n, -
70:05 - 70:05OK, but not just that.
We have some constraints. -
70:24 - 70:24OK, this is a linear program.
I want to maximize the sum of -
70:27 - 70:27the x_i's subject to all the
x_i's being nonpositive and the -
70:31 - 70:31difference constraints.
So, this we had before. -
70:34 - 70:34This is fine.
We noticed at some point you -
70:36 - 70:36could get from s to everywhere
with cost, at most, -
70:39 - 70:39zero.
So, we know that in this -
70:41 - 70:41assignment all of the x_i's are
negative. -
70:43 - 70:43That's not necessary,
but it's true when you run -
70:46 - 70:46Bellman-Ford.
So if you solve your system -
70:48 - 70:48using Bellman-Ford,
which is no less general than -
70:51 - 70:51anything else,
you happen to get nonpositive -
70:53 - 70:53x_i's.
And so, subject to that -
70:55 - 70:55constraint, it actually makes
them is close to zero as -
70:58 - 70:58possible in the L1 norm.
In the sum of these values, -
71:04 - 71:04it tries to make the sum as
close to zero, -
71:09 - 71:09it tries to make the values as
small as possible in absolute -
71:15 - 71:15value in this sense.
OK, it does more than that. -
71:20 - 71:20It cooks, it cleans,
it finds shortest paths. -
71:25 - 71:25It also minimizes the spread,
the maximum over all i of x_i -
71:32 - 71:32minus the minimum over all i of
x_i. -
71:37 - 71:37So, I mean, if you have your
real line, and here are the -
71:41 - 71:41x_i's wherever they are.
It minimizes this distance. -
71:44 - 71:44And zero is somewhere over
here. -
71:47 - 71:47So, it tries to make the x_i's
as compact as possible. -
71:50 - 71:50This is actually the L infinity
norm, if you know stuff about -
71:54 - 71:54norms from your linear algebra
class. -
71:57 - 71:57OK, this is the L1 norm.
I think it minimizes every LP -
72:01 - 72:01norm.
Good, so let's use this for -
72:05 - 72:05something.
Yeah, let's solve a real -
72:09 - 72:09problem, and then we'll be done
for today. -
72:14 - 72:14Next class we'll see the really
cool stuff, the really cool -
72:21 - 72:21application of all of this.
For now, and we'll see a cool -
72:27 - 72:27but relatively simple
application, which is VLSI -
72:33 - 72:33layout.
We talked a little bit about -
72:38 - 72:38VLSI way back and divide and
conquer. -
72:41 - 72:41You have a bunch of chips,
or you want to arrange them, -
72:46 - 72:46and minimize some objectives.
So, here's a particular, -
72:50 - 72:50tons of problems that come out
of VLSI layout. -
72:55 - 72:55Here's one of them.
You have a bunch of features of -
72:59 - 72:59an integrated circuit.
You want to somehow arrange -
73:05 - 73:05them on your circuit without
putting any two of them too -
73:10 - 73:10close to each other.
You have some minimum -
73:14 - 73:14separation like at least they
should not get top of each -
73:19 - 73:19other.
Probably, you also need some -
73:22 - 73:22separation to put wires in
between, and so on, -
73:27 - 73:27so, without putting any two
features too close together. -
73:33 - 73:33OK, so just to give you an
idea, so I have some objects and -
73:37 - 73:37I'm going to be a little bit
vague about how this works. -
73:41 - 73:41You have some features.
This is stuff, -
73:44 - 73:44some chips, whatever.
We don't really care what their -
73:47 - 73:47shapes look like.
I just want to be able to move -
73:51 - 73:51them around so that the gap at
any point, so let me just think -
73:55 - 73:55about this gap.
This gap should be at least -
73:58 - 73:58some delta.
Or, I don't want to use delta. -
74:01 - 74:01Let's say epsilon,
good, small number. -
74:05 - 74:05So, I just need some separation
between all of my parts. -
74:09 - 74:09And for this problem,
I'm going to be pretty simple, -
74:12 - 74:12just say that the parts are
only allowed to slide -
74:16 - 74:16horizontally.
So, it's a one-dimensional -
74:18 - 74:18problem.
These objects are in 2-d, -
74:21 - 74:21or whatever,
but I can only slide them an x -
74:24 - 74:24coordinate.
So, to model that, -
74:26 - 74:26I'm going to look at the left
edge of every part and say, -
74:30 - 74:30well, these two left edges
should be at least some -
74:33 - 74:33separation.
So, I think of it as whatever -
74:37 - 74:37the distance is plus some
epsilon. -
74:39 - 74:39But, you know,
if you have some funky 2-d -
74:42 - 74:42shapes you have to compute,
well, this is a little bit too -
74:45 - 74:45close because these come into
alignment. -
74:48 - 74:48But, there's some constraint,
well, for any two pieces, -
74:51 - 74:51I could figure out how close
they can get. -
74:54 - 74:54They should get no closer.
So, I'm going to call this x_1. -
74:57 - 74:57I'll call this x_2.
So, we have some constraint -
75:00 - 75:00like x_2 minus x_1 is at least d
plus epsilon, -
75:03 - 75:03or whatever you compute that
weight to be. -
75:07 - 75:07OK, so for every pair of
pieces, I can do this, -
75:10 - 75:10compute some constraint on how
far apart they have to be. -
75:13 - 75:13And, now I'd like to assign
these x coordinates. -
75:16 - 75:16Right now, I'm assuming they're
just variables. -
75:19 - 75:19I want to slide these pieces
around horizontally in order to -
75:22 - 75:22compactify them as much as
possible so they fit in the -
75:25 - 75:25smallest chip that I can make
because it costs money, -
75:28 - 75:28and time, and everything,
and power, everything. -
75:31 - 75:31You always want your chip
small. -
75:34 - 75:34So, Bellman-Ford does that.
All right, so Bellman-Ford -
75:40 - 75:40solves these constraints because
it's just a bunch of difference -
75:48 - 75:48constraints.
And we know that they are -
75:52 - 75:52solvable because you could
spread all the pieces out -
75:58 - 75:58arbitrarily far.
And, it minimizes the spread, -
76:03 - 76:03minimizes the size of the chip
I need, a max of x_i minus the -
76:10 - 76:10min of x_i.
So, this is it maximizes -
76:15 - 76:15compactness, or minimizes size
of the chip. -
76:18 - 76:18OK, this is a one-dimensional
problem, so it may seem a little -
76:23 - 76:23artificial, but the two
dimensional problem is really -
76:27 - 76:27hard to solve.
And this is, -
76:29 - 76:29in fact, the best you can do
with a nice polynomial time -
76:33 - 76:33algorithm.
There are other applications if -
76:37 - 76:37you're scheduling events in,
like, a multimedia environment, -
76:42 - 76:42and you want to guarantee that
this audio plays at least two -
76:47 - 76:47seconds after this video,
but then there are things that -
76:51 - 76:51are playing at the same time,
and they have to be within some -
76:56 - 76:56gap of each other,
so, lots of papers about using -
76:59 - 76:59Bellman-Ford,
solve difference constraints to -
77:03 - 77:03enable multimedia environments.
OK, so there you go. -
77:07 - 77:07And next class we'll see more
applications of Bellman-Ford to -
77:11 - 77:11all pairs shortest paths.
Questions? -
77:14 - 77:14Great.
- Title:
- Lec 18 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
- Description:
-
Lecture 18: Shortest Paths II: Bellman-Ford, Linear Programming, Difference Constraints
View the complete course at: http://ocw.mit.edu/6-046JF05
License: Creative Commons BY-NC-SA
More information at http://ocw.mit.edu/terms
More courses at http://ocw.mit.edu
- Duration:
- 01:17:17
![]() |
Amara Bot edited English subtitles for Lec 18 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005 | Jan 25, 2011, 2:22 PM |
![]() |
Amara Bot edited English subtitles for Lec 18 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005 | Jan 25, 2011, 2:22 PM |
![]() |
Amara Bot edited English subtitles for Lec 18 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005 | Jan 25, 2011, 2:22 PM |