Lec 19 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
-
0:07 - 0:07-- shortest paths.
This is the finale. -
0:11 - 0:11Hopefully it was worth waiting
for. -
0:14 - 0:14Remind you there's a quiz
coming up soon, -
0:18 - 0:18you should be studying for it.
There's no problem set due at -
0:23 - 0:23the same time as the quiz
because you should be studying -
0:29 - 0:29now.
It's a take-home exam. -
0:32 - 0:32It's required that you come to
class on Monday. -
0:37 - 0:37Of course, you'll all come,
but everyone watching at home -
0:43 - 0:43should also come next Monday to
get the quiz. -
0:48 - 0:48It's the required lecture.
So, we need a bit of a recap in -
0:54 - 0:54the trilogy so far.
So, the last two lectures, -
0:58 - 0:58the last two episodes,
or about single source shortest -
1:04 - 1:04paths.
So, we wanted to find the -
1:08 - 1:08shortest path from a source
vertex to every other vertex. -
1:14 - 1:14And, we saw a few algorithms
for this. -
1:17 - 1:17Here's some recap.
We saw in the unweighted case, -
1:22 - 1:22that was sort of the easiest
where all the edge weights were -
1:27 - 1:27one.
Then we could use breadth first -
1:30 - 1:30search.
And this costs what we call -
1:35 - 1:35linear time in the graph world,
the number of vertices plus the -
1:42 - 1:42number of edges.
The next simplest case, -
1:46 - 1:46perhaps, is nonnegative edge
weights. -
1:50 - 1:50And in that case,
what algorithm do we use? -
1:55 - 1:55Dijkstra, all right,
everyone's awake. -
2:00 - 2:00Several answers at once,
great. -
2:04 - 2:04So this takes almost linear
time if you use a good heap -
2:12 - 2:12structure, so,
V log V plus E. -
2:16 - 2:16And, in the general case,
general weights, -
2:21 - 2:21we would use Bellman-Ford which
you saw. -
2:27 - 2:27And that costs VE,
good, OK, which is quite a bit -
2:33 - 2:33worse.
This is ignoring log factors. -
2:38 - 2:38Dijkstra is basically linear
time, Bellman-Ford you're -
2:43 - 2:43quadratic if you have a
connected graph. -
2:46 - 2:46So, in the sparse case,
when E is order V, -
2:49 - 2:49this is about linear.
This is about quadratic. -
2:53 - 2:53In the dense case,
when E is about V^2, -
2:56 - 2:56this is quadratic,
and this is cubic. -
3:00 - 3:00So, Dijkstra and Bellman-Ford
are separated by about an order -
3:06 - 3:06of V factor, which is pretty
bad. -
3:09 - 3:09OK, but that's the best we know
how to do for single source -
3:15 - 3:15shortest paths,
negative edge weights, -
3:19 - 3:19Bellman-Ford is the best.
We also saw in recitation the -
3:25 - 3:25case of a DAG.
And there, what do you do? -
3:30 - 3:30Topological sort,
yeah. -
3:32 - 3:32So, you can do a topological
sort to get an ordering on the -
3:39 - 3:39vertices.
That you run Bellman-Ford, -
3:43 - 3:43one round.
This is one way to think of -
3:47 - 3:47what's going on.
You run Bellman-Ford in the -
3:52 - 3:52order given by the topological
sort, which is once, -
3:58 - 3:58and you get a linear time
algorithm. -
4:03 - 4:03So, DAG is another case where
we know how to do well even with -
4:07 - 4:07weights.
Unweighted, we can also do -
4:09 - 4:09linear time.
But most of the time, -
4:10 - 4:10though, will be,
so you should keep these in -
4:13 - 4:13mind in the quiz.
When you get a shortest path -
4:16 - 4:16problem, or what you end up
determining is the shortest path -
4:19 - 4:19problem, think about what's the
best algorithm you can use in -
4:23 - 4:23that case?
OK, so that's single source -
4:25 - 4:25shortest paths.
And so, in our evolution of the -
4:27 - 4:27Death Star, initially it was
just nonnegative edge weights. -
4:31 - 4:31Then we got negative edge
weights. -
4:34 - 4:34Today, the Death Star
challenges us with all pair -
4:37 - 4:37shortest paths,
where we want to know the -
4:40 - 4:40shortest path weight between
every pair of vertices. -
4:59 - 4:59OK, so let's get some quick
results. -
5:04 - 5:04What could we do with this
case? -
5:08 - 5:08So, for example,
suppose I have an unweighted -
5:14 - 5:14graph.
Any suggestions of how I should -
5:19 - 5:19compute all pair shortest paths?
Between every pair of vertices, -
5:27 - 5:27I want to know the shortest
path weight. -
5:32 - 5:32BFS, a couple more words?
Yeah? -
5:38 - 5:38Right, BFS V times.
OK, I'll say V times BFS, -
5:45 - 5:45OK?
So, the running time would be -
5:50 - 5:50V^2 plus V times E,
yeah, which is assuming your -
5:57 - 5:57graph is connected,
V times E. -
6:03 - 6:03OK, good.
That's probably about the best -
6:05 - 6:05algorithm we know for unweighted
graphs. -
6:08 - 6:08So, a lot of these are going to
sort of be the obvious answer. -
6:12 - 6:12You take your single source
algorithm, you run it V times. -
6:15 - 6:15That's the best you can do,
OK, or the best we know how to -
6:19 - 6:19do.
This is not so bad. -
6:20 - 6:20This is like one iteration of
Bellman-Ford, -
6:23 - 6:23for comparison.
We definitely need at least, -
6:25 - 6:25like, V^2 time,
because the size of the output -
6:28 - 6:28is V^2, shortest path weight we
have to compute. -
6:32 - 6:32So, this is not perfect,
but pretty good. -
6:37 - 6:37And we are not going to improve
on that. -
6:42 - 6:42So, nonnegative edge weights:
the natural thing to do is to -
6:49 - 6:49run Dijkstra V times,
OK, no big surprise. -
6:54 - 6:54And the running time of that
is, well, V times E again, -
7:01 - 7:01plus V^2, log V,
which is also not too bad. -
7:08 - 7:08I mean, it's basically the same
as running BFS. -
7:11 - 7:11And then, there's the log
factor. -
7:13 - 7:13If you ignore the log factor,
this is the dominant term. -
7:16 - 7:16And, I mean,
this had an [added?] V^2 as -
7:18 - 7:18well.
So, these are both pretty good. -
7:21 - 7:21I mean, this is kind of neat.
Essentially, -
7:23 - 7:23the time it takes to run one
Bellman-Ford plus a log factor, -
7:27 - 7:27you can compute all pair
shortest paths if you have -
7:30 - 7:30nonnegative edge weights.
So, I mean, comparing all pairs -
7:35 - 7:35to signal source,
this seems a lot better, -
7:40 - 7:40except we can only handle
nonnegative edge weights. -
7:45 - 7:45OK, so now let's think about
the general case. -
7:50 - 7:50Well, this is the focus of
today, and here's where we can -
7:56 - 7:56actually make an improvement.
So the obvious thing is V times -
8:02 - 8:02Bellman-Ford,
which would cost V^2 times E. -
8:08 - 8:08And that's pretty pitiful,
and we're going to try to -
8:12 - 8:12improve that to something closer
to that nonnegative edge weight -
8:16 - 8:16bound.
So it turns out, -
8:17 - 8:17here, we can actually make an
improvement whereas in these -
8:21 - 8:21special cases,
we really can't do much better. -
8:24 - 8:24OK, I don't have a good
intuition why, -
8:27 - 8:27but it's the case.
So, we'll cover something like -
8:30 - 8:30three algorithms today for this
problem. -
8:34 - 8:34The last one will be the best,
but along the way we'll see -
8:37 - 8:37some nice connections between
shortest paths and dynamic -
8:40 - 8:40programming, which we haven't
really seen yet. -
8:43 - 8:43We've seen shortest path,
and applying greedy algorithms -
8:46 - 8:46to it, but today will actually
do dynamic programming. -
8:49 - 8:49The intuition is that with all
pair shortest paths, -
8:52 - 8:52there's more potential
subproblem reuse. -
8:54 - 8:54We've got to compute the
shortest path from x to y for -
8:57 - 8:57all x and y.
Maybe we can reuse those -
8:59 - 8:59shortest paths in computing
other shortest paths. -
9:03 - 9:03OK, there's a bit more
reusability, let's say. -
9:07 - 9:07OK, let me quickly define all
pair shortest paths formally, -
9:12 - 9:12because we're going to change
our notation slightly. -
9:17 - 9:17It's because we care about all
pairs. -
9:20 - 9:20So, as usual,
the input is directed graph, -
9:24 - 9:24so, vertices and edges.
We're going to say that the -
9:30 - 9:30vertices are labeled one to n
for convenience because with all -
9:36 - 9:36pairs, we're going to think of
things more as an n by n matrix -
9:42 - 9:42instead of edges in some sense
because it doesn't help to think -
9:48 - 9:48any more in terms of adjacency
lists. -
9:52 - 9:52And, you have edge weights as
usual. -
9:55 - 9:55This is what makes it
interesting. -
10:00 - 10:00Some of them are going to be
negative. -
10:05 - 10:05So, w maps to every real
number, and the target output is -
10:14 - 10:14a shortest path matrix.
So, this is now an n by n -
10:21 - 10:21matrix.
So, n is just the number of -
10:26 - 10:26vertices of shortest path
weights. -
10:32 - 10:32So, delta of i,
j is the shortest path weight -
10:38 - 10:38from i to j for all pairs of
vertices. -
10:43 - 10:43So this, you could represent as
an n by n matrix in particular. -
10:51 - 10:51OK, so now let's start doing
algorithms. -
10:57 - 10:57So, we have this very simple
algorithm, V times Bellman-Ford, -
11:02 - 11:02V^2 times E,
and just for comparison's sake, -
11:06 - 11:06I'm going to say,
let me rewrite that, -
11:10 - 11:10V times Bellman-Ford gives us
this running time of V^2 E, -
11:15 - 11:15and I'm going to think about
the case where, -
11:18 - 11:18let's just say the graph is
dense, meeting that the number -
11:23 - 11:23of edges is quadratic,
and the number of vertices. -
11:29 - 11:29So in that case,
this will take V^4 time, -
11:33 - 11:33which is pretty slow.
We'd like to do better. -
11:38 - 11:38So, first goal would just be to
beat V^4, V hypercubed, -
11:43 - 11:43I guess.
OK, and we are going to use -
11:47 - 11:47dynamic programming to do that.
Or at least that's what the -
11:53 - 11:53motivation will come from.
It will take us a while before -
11:59 - 11:59we can even beat V^4,
which is maybe a bit pathetic, -
12:04 - 12:04but it takes some clever
insights, let's say. -
12:10 - 12:10OK, so I'm going to introduce a
bit more notation for this -
12:20 - 12:20graph.
So, I'm going to think about -
12:25 - 12:25the weighted adjacency matrix.
So, I don't think we've really -
12:33 - 12:33seen this in lecture before,
although I think it's in the -
12:38 - 12:38appendix.
What that means, -
12:40 - 12:40so normally adjacency matrix is
like one if there's an edge, -
12:44 - 12:44and zero if there isn't.
And this is in a digraph, -
12:48 - 12:48so you have to be a little bit
careful. -
12:51 - 12:51Here, these values,
the entries in the matrix, -
12:54 - 12:54are going to be the weights of
the edges. -
12:57 - 12:57OK, this is this if ij is an
edge. -
13:01 - 13:01So, if ij is an edge in the
graph, and it's going to be -
13:05 - 13:05infinity if there is no edge.
OK, in terms of shortest paths, -
13:09 - 13:09this is a more useful way to
represent the graph. -
13:12 - 13:12All right, and so this includes
everything that we need from -
13:16 - 13:16here.
And now we just have to think -
13:18 - 13:18about it as a matrix.
Matrices will be a useful tool -
13:22 - 13:22in a little while.
OK, so now I'm going to define -
13:25 - 13:25some sub problems.
And, there's different ways -
13:28 - 13:28that you could define what's
going on in the shortest paths -
13:32 - 13:32problem.
OK, the natural thing is I want -
13:36 - 13:36to go from vertex i to vertex j.
What's the shortest path? -
13:39 - 13:39OK, we need to refine the sub
problems a little but more than -
13:43 - 13:43that.
Not surprising. -
13:44 - 13:44And if you think about my
analogy to Bellman-Ford, -
13:47 - 13:47what Bellman-Ford does is it
tries to build longer and longer -
13:50 - 13:50shortest paths.
But here, length is in terms of -
13:53 - 13:53the number of edges.
So, first, it builds shortest -
13:55 - 13:55paths of length one.
We've proven the first round it -
13:58 - 13:58does that.
The second round, -
14:02 - 14:02it provides all shortest paths
of length two, -
14:06 - 14:06of count two,
and so on. -
14:09 - 14:09We'd like to do that sort of
analogously, and try to reuse -
14:15 - 14:15things a little bit more.
So, I'm going to say d_ij^(m) -
14:21 - 14:21is the weight of the shortest
path from i to j with some -
14:26 - 14:26restriction involving m.
So: shortest path from i to j -
14:33 - 14:33using at most m edges.
OK, for example, -
14:37 - 14:37if m is zero,
then we don't have to really -
14:41 - 14:41think very hard to find all
shortest paths of length zero. -
14:47 - 14:47OK, they use zero edges,
I should say. -
14:51 - 14:51So, Bellman-Ford sort of tells
us how to go from m to m plus -
14:57 - 14:57one.
So, let's just figure that out. -
15:02 - 15:02So one thing we know from the
Bellman-Ford analysis is if we -
15:06 - 15:06look at d_ij^(m-1),
we know that in some sense the -
15:09 - 15:09longest shortest path of
relevance, unless you have -
15:12 - 15:12negative weight cycle,
the longest shortest path of -
15:15 - 15:15relevance is when m equals n
minus one because that's the -
15:19 - 15:19longest simple path you can
have. -
15:21 - 15:21So, this should be a shortest
path weight from i to j, -
15:25 - 15:25and it would be no matter what
larger value you put in the -
15:28 - 15:28superscript.
This should be delta of i comma -
15:32 - 15:32j if there's no negative weight
cycles. -
15:35 - 15:35OK, so this feels good for
dynamic programming. -
15:39 - 15:39This will give us the answer if
we can compute this for all m. -
15:43 - 15:43Then we'll have the shortest
path weights in particular. -
15:47 - 15:47We need a way to detect
negative weight cycles, -
15:51 - 15:51but let's not worry about that
too much for now. -
15:54 - 15:54There are negative weights,
but let's just assume for now -
15:58 - 15:58there's no negative weight
cycles. -
16:02 - 16:02OK, and we get a recursion
recurrence. -
16:06 - 16:06And the base case is when m
equals zero. -
16:11 - 16:11This is pretty easy.
They have the same vertices, -
16:17 - 16:17the weight of zero,
and otherwise it's infinity. -
16:22 - 16:22OK, and then the actual
recursion is for m. -
16:57 - 16:57OK, if I got this right,
this is a pretty easy, -
17:01 - 17:01intuitive recursion for
d_ij^(m) is a min of smaller -
17:05 - 17:05things in terms of n minus one.
I'll just show the picture, -
17:10 - 17:10and then the proof of that
claim should be obvious. -
17:15 - 17:15So, this is proof by picture.
So, we have on the one hand, -
17:20 - 17:20I over here,
and j over here. -
17:22 - 17:22We want to know the shortest
path from i to j. -
17:26 - 17:26And, we want to use,
at most, m edges. -
17:30 - 17:30So, the idea is,
well, you could use m minus one -
17:35 - 17:35edges to get somewhere.
So this is, at most, -
17:39 - 17:39m minus one edges,
some other place, -
17:43 - 17:43and we'll call it k.
So this is a candidate for k. -
17:48 - 17:48And then you could take the
edge directly from k to j. -
17:53 - 17:53So, this costs A_k^j,
and this costs DIK m minus one. -
18:00 - 18:00OK, and that's a candidate path
of length that uses, -
18:03 - 18:03at most, m edges from I to j.
And this is essentially just -
18:06 - 18:06considering all of them.
OK, so there's sort of many -
18:09 - 18:09paths we are considering.
All of these are candidate -
18:12 - 18:12values of k.
We are taking them in over all -
18:14 - 18:14k as intermediate nodes,
whatever. -
18:16 - 18:16So there they are.
We take the best such path. -
18:19 - 18:19That should encompass all
shortest paths. -
18:21 - 18:21And this is essentially sort of
what Bellman-Ford is doing, -
18:24 - 18:24although not exactly.
We also sort of want to think -
18:27 - 18:27about, well, what if I just go
directly with, -
18:29 - 18:29say, m minus one edges?
What if there is no edge here -
18:34 - 18:34that I want to use,
in some sense? -
18:37 - 18:37Well, we always think about
there being, and the way the A's -
18:41 - 18:41are defined, there's always this
zero weight edge to yourself. -
18:45 - 18:45So, you could just take a path
that's shorter, -
18:49 - 18:49go from d i to j,
and j is a particular value of -
18:52 - 18:52k that we might consider,
and then take a zero weight -
18:56 - 18:56edge at the end from A and jj.
OK, so this really encompasses -
19:00 - 19:00everything.
So that's a pretty trivial -
19:04 - 19:04claim.
OK, now once we have such a -
19:06 - 19:06recursion, we get a dynamic
program. -
19:09 - 19:09I mean, there,
this is it in some sense. -
19:12 - 19:12It's written recursively.
You can write a bottom up. -
19:15 - 19:15And I would like to write it
bottom up it little bit because -
19:20 - 19:20while it doesn't look like it,
this is a relaxation. -
19:23 - 19:23This is yet another relaxation
algorithm. -
19:26 - 19:26So, I'll give you,
so, this is sort of the -
19:29 - 19:29algorithm.
This is not a very interesting -
19:32 - 19:32algorithm.
So, you don't have to write it -
19:36 - 19:36all down if you don't feel like
it. -
19:38 - 19:38It's probably not even in the
book. -
19:40 - 19:40This is just an intermediate
step. -
19:43 - 19:43So, we loop over all m.
That's sort of the outermost -
19:46 - 19:46thing to do.
I want to build longer and -
19:48 - 19:48longer paths,
and this vaguely corresponds to -
19:51 - 19:51Bellman-Ford,
although it's actually worse -
19:54 - 19:54than Bellman-Ford.
But hey, what the heck? -
19:56 - 19:56It's a stepping stone.
OK, then for all i and j, -
20:04 - 20:04and then we want to compute
this min. -
20:10 - 20:10So, we'll just loop over all k,
and relax. -
20:18 - 20:18And, here's where we're
actually computing the min. -
20:27 - 20:27And, it's a relaxation,
is the point. -
20:35 - 20:35This is our good friend,
the relaxation step, -
20:38 - 20:38relaxing edge.
Well, it's not, -
20:40 - 20:40yeah.
I guess we're relaxing edge kj, -
20:43 - 20:43or something,
except we don't have the same -
20:46 - 20:46clear notion.
I mean, it's a particular thing -
20:49 - 20:49that we're relaxing.
It's not just a single edge -
20:52 - 20:52because we don't have a single
source anymore. -
20:55 - 20:55It's now relative to source I,
we are relaxing the edge kj, -
20:59 - 20:59something like that.
But this is clearly a -
21:03 - 21:03relaxation.
We are just making the triangle -
21:06 - 21:06inequality true if it wasn't
before. -
21:08 - 21:08The tribal inequality has got
to hold between all pairs. -
21:12 - 21:12And that's just implementing
this min, right? -
21:14 - 21:14You're taking d ij.
You take the min of what it was -
21:17 - 21:17before in some sense.
That was one of the -
21:20 - 21:20possibilities we considered when
we looked at the zero weight -
21:24 - 21:24edge.
We say, well, -
21:25 - 21:25or you could go from i to some
k in some way that we knew how -
21:28 - 21:28to before, and then add on the
edge, and check whether that's -
21:32 - 21:32better if it's better,
set our current estimate to -
21:35 - 21:35that.
And, you do this for all k. -
21:39 - 21:39In particular,
you might actually compute -
21:41 - 21:41something smaller than this min
because I didn't put -
21:44 - 21:44superscripts up here.
But that's just making paths -
21:47 - 21:47even better.
OK, so you have to argue that -
21:49 - 21:49relaxation is always a good
thing to do. -
21:51 - 21:51So, by not putting
superscripts, -
21:53 - 21:53maybe I do some more
relaxation, but more relaxation -
21:56 - 21:56never hurts us.
You can still argue correctness -
21:59 - 21:59using this claim.
So, it's not quite the direct -
22:03 - 22:03implementation,
but there you go, -
22:06 - 22:06dynamic programming algorithm.
The main reason I'll write it -
22:10 - 22:10down: so you see that it's a
relaxation, and you see the -
22:15 - 22:15running time is n^4,
OK, which is certainly no -
22:18 - 22:18better than Bellman-Ford.
Bellman-Ford was n^4 even in -
22:23 - 22:23the dense case,
and it's a little better in the -
22:26 - 22:26sparse case.
So: not doing so great. -
22:30 - 22:30But it's a start.
OK, it gets our dynamic -
22:35 - 22:35programming minds thinking.
And, we'll get a better dynamic -
22:42 - 22:42program in a moment.
But first, there's actually -
22:47 - 22:47something useful we can do with
this formulation, -
22:53 - 22:53and I guess I'll ask,
but I'll be really impressed if -
22:59 - 22:59anyone can see.
Does this formula look like -
23:05 - 23:05anything else that you've seen
in any context, -
23:10 - 23:10mathematical or algorithmic?
Have you seen that recurrence -
23:16 - 23:16anywhere else?
OK, not exactly as stated, -
23:20 - 23:20but similar.
I'm sure if you thought about -
23:25 - 23:25it for awhile,
you could come up with it. -
23:30 - 23:30Any answers?
I didn't think you would be -
23:33 - 23:33very intuitive,
but the answer is matrix -
23:36 - 23:36multiplication.
And it may now be obvious to -
23:40 - 23:40you, or it may not.
You have to think with the -
23:43 - 23:43right quirky mind.
Then it's obvious that it's -
23:47 - 23:47matrix multiplication.
Remember, matrix -
23:50 - 23:50multiplication,
we have A, B, -
23:52 - 23:52and C.
They're all n by n matrices. -
23:55 - 23:55And, we want to compute C
equals A times B. -
24:00 - 24:00And what that meant was,
well, c_ij was a sum over all k -
24:05 - 24:05of a_ik times b_kj.
All right, that was our -
24:09 - 24:09definition of matrix
multiplication. -
24:12 - 24:12And that formula looks kind of
like this one. -
24:16 - 24:16I mean, notice the subscripts:
ik and kj. -
24:19 - 24:19Now, the operators are a little
different. -
24:23 - 24:23Here, we're multiplying the
inside things and adding them -
24:28 - 24:28all together.
There, we're adding the inside -
24:34 - 24:34things and taking them in.
But other than that, -
24:41 - 24:41it's the same.
OK, weird, but here we go. -
24:47 - 24:47So, the connection to shortest
paths is you replace these -
24:55 - 24:55operators.
So, let's take matrix -
25:00 - 25:00multiplication and replace,
what should I do first, -
25:05 - 25:05plus this thing with min.
So, why not just change the -
25:10 - 25:10operators, replace dot with
plus? -
25:14 - 25:14This is just a different
algebra to work in, -
25:18 - 25:18where plus actually means min,
and dot actually means plus. -
25:24 - 25:24So, you have to check that
things sort of work out in that -
25:30 - 25:30context, but if we do that,
then we get that c_ij is the -
25:35 - 25:35min overall k of a_ik plus,
a bit messy here, -
25:40 - 25:40b_kj.
And that looks like what we -
25:44 - 25:44actually want to compute,
here, for one value of m, -
25:49 - 25:49you have to sort of do this m
times. -
25:52 - 25:52But this conceptually is
d_ij^(m), and this is -
25:56 - 25:56d_ik^(m-1).
So, this is looking like a -
26:00 - 26:00matrix product,
which is kind of cool. -
26:04 - 26:04So, if we sort of plug in this
claim, then, and think about -
26:12 - 26:12things as matrices,
the recurrence gives us, -
26:17 - 26:17and I'll just write this now at
matrix form, that d^(m) is d^(m) -
26:26 - 26:26minus one, funny product,
A. -
26:30 - 26:30All right, so these are the
weights. -
26:32 - 26:32These were the weighted
adjacency matrix. -
26:35 - 26:35This was the previous d value.
This is the new d value. -
26:38 - 26:38So, I'll just rewrite that in
matrix form with capital -
26:41 - 26:41letters.
OK, I have the circle up things -
26:44 - 26:44that are using this funny
algebra, so, in particular, -
26:47 - 26:47circled product.
OK, so that's kind of nifty. -
26:50 - 26:50We know something about
computing matrix -
26:52 - 26:52multiplications.
We can do it in n^3 time. -
26:55 - 26:55If we were a bit fancier,
maybe we could do it in -
26:58 - 26:58sub-cubic time.
So, we could try to sort of use -
27:03 - 27:03this connection.
And, well, think about what we -
27:07 - 27:07are computing here.
We are saying, -
27:10 - 27:10well, d to the m is the
previous one times A. -
27:15 - 27:15So, what is d^(m)?
Is that some other algebraic -
27:19 - 27:19notion that we know?
Yeah, it's the exponent. -
27:23 - 27:23We're taking A,
and we want to raise it to the -
27:28 - 27:28power, m, with this funny notion
of product. -
27:33 - 27:33So, in other words,
d to the m is really just A to -
27:37 - 27:37the m in a funny way.
So, I'll circle it, -
27:40 - 27:40OK?
So, that sounds good. -
27:42 - 27:42We also know how to compute
powers of things relatively -
27:46 - 27:46quickly, if you remember how.
OK, for this notion, -
27:50 - 27:50this power notion,
to make sense, -
27:53 - 27:53I should say what A to the zero
means. -
27:55 - 27:55And so, I need some kind of
identity matrix. -
28:00 - 28:00And for here,
the identity matrix is this -
28:03 - 28:03one, if I get it right.
So, it has zeros along the -
28:07 - 28:07diagonal, and infinities
everywhere else. -
28:09 - 28:09OK, that sort of just to match
this definition. -
28:13 - 28:13d_ij zero should be zeros on
the diagonals and infinity -
28:17 - 28:17everywhere else.
But you can check this is -
28:20 - 28:20actually an identity.
If you multiply it with this -
28:23 - 28:23funny multiplication against any
other matrix, -
28:27 - 28:27you get the matrix back.
Nothing changes. -
28:31 - 28:31This really is a valid identity
matrix. -
28:35 - 28:35And, I should mention that for
A to the m to make sense, -
28:40 - 28:40you really knew that your
product operation is -
28:45 - 28:45associative.
So, actually A to the m circled -
28:49 - 28:49makes sense because circled
multiplication is associative, -
28:54 - 28:54and you can check that;
not hard because, -
28:58 - 28:58I mean, min is associative,
and addition is associative, -
29:04 - 29:04and all sorts of good stuff.
And, you have some kind of -
29:11 - 29:11distributivity property.
And, this is, -
29:14 - 29:14in turn, because the real
numbers with, -
29:18 - 29:18and get the right order here,
with min as your addition -
29:24 - 29:24operation, and plus as your
multiplication operation is a -
29:29 - 29:29closed semi-ring.
So, if ever you want to know -
29:34 - 29:34when powers make sense,
this is a good rule. -
29:38 - 29:38If you have a closed semi-ring,
then matrix products on that -
29:43 - 29:43semi-ring will give you an
associative operator, -
29:47 - 29:47and then, good,
you can take products. -
29:50 - 29:50OK, that's just some formalism.
So now, we have some intuition. -
29:55 - 29:55The question is,
what's the right. -
29:57 - 29:57Algorithm?
There are many possible -
30:00 - 30:00answers, some of which are
right, some of which are not. -
30:06 - 30:06So, we have this connection to
matrix products, -
30:10 - 30:10and we have a connection to
matrix powers. -
30:13 - 30:13And, we have algorithms for
both. -
30:16 - 30:16The question is,
what should we do? -
30:18 - 30:18So, all we need to do now is to
compute A to the funny power, -
30:23 - 30:23n minus one.
n minus one is when we get -
30:26 - 30:26shortest paths,
assuming we have no negative -
30:30 - 30:30weight cycles.
In fact, we could compute a -
30:34 - 30:34larger power than n minus one.
Once you get beyond n minus -
30:39 - 30:39one, multipling by A doesn't
change you anymore. -
30:43 - 30:43So, how should we do it?
OK, you're not giving any smart -
30:48 - 30:48answers.
I'll give the stupid answer. -
30:51 - 30:51You could say,
well, I take A. -
30:53 - 30:53I multiply it by A.
Then I multiply it by A, -
30:57 - 30:57and I multiply it by A,
and I use normal, -
31:00 - 31:00boring matrix to
multiplication. -
31:04 - 31:04So, I do, like,
n minus two, -
31:07 - 31:07standard matrix multiplies.
So, standard multiply costs, -
31:13 - 31:13like, n^3.
And I'm doing n of them. -
31:17 - 31:17So, this gives me an n^4
algorithm, and compute all the -
31:23 - 31:23shortest pathways in n^4.
Woohoo! -
31:27 - 31:27OK, no improvement.
So, how can I do better? -
31:32 - 31:32Right, natural thing to try
which sadly does not work, -
31:36 - 31:36is to use the sub cubic matrix
multiply algorithm. -
31:40 - 31:40We will, in some sense,
get there in a moment with a -
31:44 - 31:44somewhat simpler problem.
But, it's actually not known -
31:48 - 31:48how to compute shortest paths
using fast matrix multiplication -
31:53 - 31:53like Strassen's system
algorithm. -
31:56 - 31:56But, good suggestion.
OK, you have to think about why -
32:00 - 32:00it doesn't work,
and I'll tell you. -
32:04 - 32:04It's not obvious,
so it's a perfectly reasonable -
32:07 - 32:07suggestion.
But in this context it doesn't -
32:10 - 32:10quite work.
It will come up in a few -
32:13 - 32:13moments.
The problem is, -
32:14 - 32:14Strassen requires the notion of
subtraction. -
32:17 - 32:17And here, addition is min.
And, there's no inverse to min. -
32:21 - 32:21Once you take the arguments,
you can't sort of undo a min. -
32:25 - 32:25OK, so there's no notion of
subtraction, so it's not known -
32:29 - 32:29how to pull that off,
sadly. -
32:32 - 32:32So, what other tricks do we
have up our sleeve? -
32:36 - 32:36Yeah?
Divide and conquer, -
32:38 - 32:38log n powering,
yeah, repeated squaring. -
32:41 - 32:41That works.
Good, we had a fancy way. -
32:44 - 32:44If you had a number n,
you sort of looked at the -
32:48 - 32:48binary number representation of
n, and you either squared the -
32:53 - 32:53number or squared it and added
another factor of A. -
32:57 - 32:57Here, we don't even have to be
smart about it. -
33:02 - 33:02OK, we can just compute,
we really only have to think -
33:07 - 33:07about powers of two.
What we want to know, -
33:11 - 33:11and I'm going to need a bigger
font here because there's -
33:17 - 33:17multiple levels of subscripts,
A to the circled power, -
33:23 - 33:23two to the ceiling of log n.
Actually, n minus one would be -
33:28 - 33:28enough.
But there you go. -
33:32 - 33:32You can write n if you didn't
leave yourself enough space like -
33:36 - 33:36me, n the ceiling,
n the circle. -
33:38 - 33:38This just means the next power
of two after n minus one, -
33:41 - 33:41two to the ceiling log.
So, we don't have to go -
33:44 - 33:44directly to n minus one.
We can go further because -
33:47 - 33:47anything farther than n minus
one is still just the shortest -
33:51 - 33:51pathways.
If you look at the definition, -
33:53 - 33:53and you know that your paths
are simple, which is true if you -
33:57 - 33:57have no negative weight cycles,
then fine, just go farther. -
34:02 - 34:02Why not?
And so, to compute this, -
34:05 - 34:05we just do ceiling of log n
minus one products, -
34:09 - 34:09just take A squared,
and then take the result and -
34:14 - 34:14square it; take the result and
square it. -
34:17 - 34:17So, this is order log n
squares. -
34:20 - 34:20And, we don't know how to use
Strassen, but we can use the -
34:25 - 34:25boring, standard multiply of
n^3, and that gives us n^3 log n -
34:31 - 34:31running time,
OK, which finally is something -
34:35 - 34:35that beats Bellman-Ford in the
dense case. -
34:40 - 34:40OK, in the dense case,
Bellman-Ford was n^4. -
34:43 - 34:43Here we get n^3 log n,
finally something better. -
34:47 - 34:47In the sparse case,
it's about the same, -
34:50 - 34:50maybe a little worse.
E is order V. -
34:52 - 34:52Then we're going to get,
like, V3 for Bellman-Ford. -
34:56 - 34:56Here, we get n^3 log n.
OK, after log factors, -
34:59 - 34:59this is an improvement some of
the time. -
35:03 - 35:03OK, it's about the same the
other times. -
35:06 - 35:06Another nifty thing that you
get for free out of this, -
35:10 - 35:10is you can detect negative
weight cycles. -
35:13 - 35:13So, here's a bit of a puzzle.
How would I detect, -
35:17 - 35:17after I compute this product,
A to the power to ceiling log n -
35:21 - 35:21minus one, how would I know if I
found a negative weight cycle? -
35:26 - 35:26What would that mean it this
matrix of all their shortest -
35:30 - 35:30paths of, at most,
a certain length? -
35:34 - 35:34If I found a cycle,
what would have to be in that -
35:37 - 35:37matrix?
Yeah? -
35:38 - 35:38Right, so I could,
for example, -
35:39 - 35:39take this thing,
multiply it by A, -
35:41 - 35:41see if the matrix changed at
all. -
35:43 - 35:43Right, that works fine.
That's what we do in -
35:46 - 35:46Bellman-Ford.
It's an even simpler thing. -
35:48 - 35:48It's already there.
You don't have to multiply. -
35:51 - 35:51But that's the same running
time. -
35:53 - 35:53That's a good answer.
The diagonal would have a -
35:56 - 35:56negative value,
yeah. -
35:57 - 35:57So, this is just a cute thing.
Both approaches would work, -
36:05 - 36:05can detect a negative weight
cycle just by looking at the -
36:15 - 36:15diagonal of the matrix.
You just look for a negative -
36:24 - 36:24value in the diagonal.
OK. -
36:30 - 36:30So, that's algorithm one,
let's say. -
36:33 - 36:33I mean, we've seen several that
are all bad, but I'll call this -
36:37 - 36:37number one.
OK, we'll see two more. -
36:40 - 36:40This is the only one that will,
well, I shouldn't say that. -
36:44 - 36:44Fine, there we go.
So, this is one dynamic program -
36:48 - 36:48that wasn't so helpful,
except it showed us a -
36:51 - 36:51connection to matrix
multiplication, -
36:54 - 36:54which is interesting.
We'll see why it's useful a -
36:58 - 36:58little bit more.
But, it bled to this nasty four -
37:02 - 37:02nested loops.
And, using this trick, -
37:05 - 37:05we got down to n^3 log n.
Let's try, just for n^3. -
37:08 - 37:08OK, just get rid of that log.
It's annoying. -
37:12 - 37:12It makes you a little bit worse
than Bellman-Ford, -
37:15 - 37:15and the sparse case.
So, let's just erase one of -
37:19 - 37:19these nested loops.
OK, I want to do that. -
37:22 - 37:22OK, obviously that algorithm
doesn't work because it's for -
37:26 - 37:26first decay, and it's not
defined, but, -
37:28 - 37:28you know, I've got enough
variables. -
37:31 - 37:31Why don't I just define k to
the m? -
37:35 - 37:35OK, it turns out that works.
I'll do it from scratch, -
37:40 - 37:40but why not?
I don't know if that's how -
37:43 - 37:43Floyd and Warshall came up with
their algorithm, -
37:47 - 37:47but here you go.
Here's Floyd-Warshall. -
37:50 - 37:50The idea is to define the
subproblems a little bit more -
37:55 - 37:55cleverly so that to compute one
of these values, -
37:59 - 37:59you don't have to take the min
of n things. -
38:04 - 38:04I just want to take the min of
two things. -
38:07 - 38:07If I could do that,
and I still only have n^3 -
38:10 - 38:10subproblems, then I would have
n^3 time. -
38:12 - 38:12So, all right,
the running time of dynamic -
38:15 - 38:15program is number of subproblems
times the time to compute the -
38:19 - 38:19recurrence for one subproblem.
So, here's linear times n^3, -
38:23 - 38:23and we want n^3 times constant.
That would be good. -
38:26 - 38:26So that's Floyd-Warshall.
So, here's the way we're going -
38:30 - 38:30to redefine c_ij.
Or I guess, there it was called -
38:35 - 38:35d_ij.
Good, so we're going to define -
38:39 - 38:39something new.
So, c_ij superscript k is now -
38:44 - 38:44going to be the weight of the
shortest path from I to j as -
38:50 - 38:50before.
Notice I used the superscript k -
38:54 - 38:54instead of m because I want k
and m to be the same thing. -
39:00 - 39:00Deep.
OK, now, here's the new -
39:04 - 39:04constraint.
I want all intermediate -
39:06 - 39:06vertices along the path,
meeting all vertices except for -
39:10 - 39:10I and j at the beginning and the
end to have a small label. -
39:14 - 39:14So, they should be in the set
from one up to k. -
39:17 - 39:17And this is where we are really
using that our vertices are -
39:21 - 39:21labeled one up to m.
So, I'm going to say, -
39:24 - 39:24well, first think about the
shortest paths that don't use -
39:28 - 39:28any other vertices.
That's when k is zero. -
39:32 - 39:32Then think about all the
shortest paths that maybe they -
39:36 - 39:36use vertex one.
And then think about the -
39:38 - 39:38shortest paths that maybe use
vertex one or vertex two. -
39:42 - 39:42Why not?
You could define it in this -
39:44 - 39:44way.
It turns out, -
39:45 - 39:45then when you increase k,
you only have to think about -
39:48 - 39:48one new vertex.
Here, we had to take min over -
39:51 - 39:51all k.
Now we know which k to look at. -
39:54 - 39:54OK, maybe that made sense.
Maybe it's not quite obvious -
39:57 - 39:57yet.
But I'm going to redo this -
39:59 - 39:59claim, redo a recurrence.
So, maybe first I should say -
40:04 - 40:04some obvious things.
So, if I want delta of ij of -
40:07 - 40:07the shortest pathway,
well, just take all the -
40:11 - 40:11vertices.
So, take c_ij superscript n. -
40:13 - 40:13That's everything.
And this even works, -
40:16 - 40:16this is true even if you have a
negative weight cycle. -
40:20 - 40:20Although, again,
we're going to sort of ignore -
40:23 - 40:23negative weight cycles as long
as we can detect them. -
40:27 - 40:27And, another simple case is if
you have, well, -
40:30 - 40:30c_ij to zero.
Let me put that in the claim to -
40:36 - 40:36be a little bit more consistent
here. -
40:41 - 40:41So, here's the new claim.
If we want to compute c_ij -
40:47 - 40:47superscript zero,
what is it? -
40:51 - 40:51Superscript zero means I really
shouldn't use any intermediate -
40:59 - 40:59vertices.
So, this has a very simple -
41:04 - 41:04answer, a three letter answer.
So, it's not zero. -
41:09 - 41:09It's four letters.
What's that? -
41:13 - 41:13Nil.
No, not working yet. -
41:15 - 41:15It has some subscripts,
too. -
41:18 - 41:18So, the definition would be,
what's the shortest path weight -
41:25 - 41:25from I to j when you're not
allowed to use any intermediate -
41:32 - 41:32vertices?
Sorry? -
41:35 - 41:35So, yeah, it has a very simple
name. -
41:38 - 41:38That's the tricky part.
All right, so if i equals j, -
41:43 - 41:43[LAUGHTER] you're clever,
right, open bracket i equals j -
41:49 - 41:49means one, well,
OK. -
41:50 - 41:50It sort of works,
but it's not quite right. -
41:55 - 41:55In fact, I want infinity if i
does not equal j. -
41:59 - 41:59And I want to zero if i equals
j, a_ij, good. -
42:05 - 42:05I think it's a_ij.
It should be, -
42:07 - 42:07right?
Maybe I'm wrong. -
42:09 - 42:09Right, a_ij.
So it's essentially not what I -
42:12 - 42:12said.
That's the point. -
42:14 - 42:14If i does not equal j,
you still have to think about a -
42:18 - 42:18single edge connecting i to j,
right? -
42:21 - 42:21OK, so that's a bit of a
subtlety. -
42:23 - 42:23This is only intermediate
vertices, so you could still go -
42:27 - 42:27from i to j via a single edge.
That will cost a_ij. -
42:33 - 42:33If there is an edge:
infinity. -
42:35 - 42:35If there isn't one:
that is a_ij. -
42:37 - 42:37So, OK, that gets us started.
And then, we want a recurrence. -
42:42 - 42:42And, the recurrence is,
well, maybe you get away with -
42:46 - 42:46all the vertices that you had
before. -
42:49 - 42:49So, if you want to know paths
that you had before, -
42:53 - 42:53so if you want to know paths
that use one up to k, -
42:57 - 42:57maybe I just use one up to k
minus one. -
43:01 - 43:01You could try that.
Or, you could try using k. -
43:05 - 43:05So, either you use k or you
don't. -
43:07 - 43:07If you don't,
it's got to be this. -
43:10 - 43:10If you do, then you've got to
go to k. -
43:13 - 43:13So why not go to k at the end?
So, you go from I to k using -
43:17 - 43:17the previous vertices.
Obviously, you don't want to -
43:21 - 43:21repeat k in there.
And then, you go from k to j -
43:25 - 43:25somehow using vertices that are
not k. -
43:29 - 43:29This should be pretty
intuitive. -
43:31 - 43:31Again, I can draw a picture.
So, either you never go to k, -
43:36 - 43:36and that's this wiggly line.
You go from i to j using things -
43:40 - 43:40only one up to k minus one.
In other words, -
43:44 - 43:44here we have to use one up to
k. -
43:46 - 43:46So, this just means don't use
k. -
43:48 - 43:48So, that's this thing.
Or, you use k somewhere in the -
43:52 - 43:52middle there.
OK, it's got to be one of the -
43:56 - 43:56two.
And in this case, -
43:57 - 43:57you go from i to k using only
smaller vertices, -
44:01 - 44:01because you don't want to
repeat k. -
44:05 - 44:05And here, you go from k to j
using only smaller labeled -
44:11 - 44:11vertices.
So, every path is one of the -
44:15 - 44:15two.
So, we take the shortest of -
44:18 - 44:18these two subproblems.
That's the answer. -
44:22 - 44:22So, now we have a min of two
things. -
44:26 - 44:26It takes constant time to
compute. -
44:30 - 44:30So, we get a cubic algorithm.
So, let me write it down. -
44:37 - 44:37So, this is the Floyd-Warshall
algorithm. -
44:41 - 44:41I'll write the name again.
You give it a matrix A. -
44:47 - 44:47That's all it really needs to
know. -
44:50 - 44:50It codes everything.
You copy C to A. -
44:54 - 44:54That's the warm up.
Right at time zero, -
44:59 - 44:59C equals A.
And then you just have these -
45:03 - 45:03three loops for every value of
k, for every value of i, -
45:07 - 45:07and for every value of j.
You compute that min. -
45:11 - 45:11And if you think about it a
little bit, that min is a -
45:15 - 45:15relaxation.
Surprise, surprise. -
45:47 - 45:47So, that is the Floyd-Warshall
algorithm. -
45:52 - 45:52And, the running time is
clearly n^3, three nested loops, -
45:58 - 45:58constant time inside.
So, we're finally getting -
46:02 - 46:02something that is never worse
than Bellman-Ford. -
46:05 - 46:05In the sparse case,
it's the same. -
46:07 - 46:07And anything denser,
the number of edges is super -
46:10 - 46:10linear.
This is strictly better than -
46:12 - 46:12Bellman-Ford.
And, it's better than -
46:14 - 46:14everything we've seen so far for
all pair, shortest paths. -
46:17 - 46:17And, this handles negative
weights; very simple algorithm, -
46:20 - 46:20even simpler than the one
before. -
46:22 - 46:22It's just relaxation within
three loops. -
46:24 - 46:24What more could you ask for?
And we need to check that this -
46:27 - 46:27is indeed what min we're
computing here, -
46:29 - 46:29except that the superscripts
are omitted. -
46:33 - 46:33That's, again,
a bit of hand waving a bit. -
46:36 - 46:36It's OK to omit subscripts
because that can only mean that -
46:39 - 46:39you're doing more relaxation
techniques should be. -
46:43 - 46:43Doing more relaxations can
never hurt you. -
46:45 - 46:45In particular,
we do all the ones that we have -
46:48 - 46:48to.
Therefore, we find the shortest -
46:50 - 46:50path weights.
And, again, here, -
46:52 - 46:52we're assuming that there is no
negative weight cycles. -
46:56 - 46:56It shouldn't be hard to find
them, but you have to think -
46:59 - 46:59about that a little bit.
OK, you could run another round -
47:04 - 47:04of Bellman-Ford,
see if it relaxes in a new -
47:08 - 47:08edges again.
For example, -
47:10 - 47:10I think there's no nifty trick
for that version. -
47:13 - 47:13And, we're going to cover,
that's our second algorithm for -
47:18 - 47:18all pairs shortest paths.
Before we go up to the third -
47:22 - 47:22algorithm, which is going to be
the cleverest of them all, -
47:26 - 47:26the one Ring to rule them all,
to switch trilogies, -
47:30 - 47:30we're going to take a little
bit of a diversion, -
47:34 - 47:34side story, whatever,
and talk about transitive -
47:37 - 47:37closure briefly.
This is just a good thing to -
47:43 - 47:43know about.
And, it relates to the -
47:46 - 47:46algorithms we've seen so far.
So, here's a transitive closure -
47:51 - 47:51problem.
I give you a directed graph, -
47:55 - 47:55and for all pair vertices,
i and j, I want to compute this -
48:00 - 48:00number.
It's one if there's a path from -
48:03 - 48:03i to j.
From i to j, -
48:07 - 48:07OK, and then zero otherwise.
OK, this is sort of like a -
48:14 - 48:14boring adjacency matrix with no
weights, except it's about paths -
48:23 - 48:23instead of being about edges.
OK, so how can I compute this? -
48:32 - 48:32That's very simple.
How should I compute this? -
48:39 - 48:39This gives me a graph in some
sense. -
48:45 - 48:45This is adjacency matrix of a
new graph called the transitive -
48:55 - 48:55closure of my input graph.
So, breadth first search, -
49:02 - 49:02yeah, good.
So, all I need to do is find -
49:05 - 49:05shortest paths,
and if the weights come out -
49:08 - 49:08infinity, then there's no path.
If it's less than infinity, -
49:13 - 49:13that there's a path.
And so here, -
49:15 - 49:15so you are saying maybe I don't
care about the weights, -
49:20 - 49:20so I can run breadth first
search n times, -
49:23 - 49:23and that will work indeed.
So, if we do B times B of S, -
49:27 - 49:27so it's maybe weird that I'm
covering here in the middle, -
49:32 - 49:32but it's just an interlude.
So, we have, -
49:36 - 49:36then, something like V times E.
OK, you can run any of these -
49:42 - 49:42algorithms.
You could take Floyd-Warshall -
49:47 - 49:47for example.
Why not? -
49:49 - 49:49OK, then it would just be V^ I
mean, you could run in any of -
49:55 - 49:55these algorithms with weights of
one or zero, and just check -
50:01 - 50:01whether the values are infinity
or not. -
50:06 - 50:06So, I mean, t_ij equals zero,
if and only if the shortest -
50:10 - 50:10path weight from i to j is
infinity. -
50:13 - 50:13So, just solve this.
This is an easier problem than -
50:16 - 50:16shortest paths.
It is, in fact, -
50:18 - 50:18strictly easier in a certain
sense, because what's going on -
50:23 - 50:23with transitive closure,
and I just want to mention this -
50:27 - 50:27out of interest because
transitive closure is a useful -
50:31 - 50:31thing to know about.
Essentially, -
50:34 - 50:34what we are doing,
let me get this right, -
50:37 - 50:37is using a different set of
operators. -
50:39 - 50:39We're using or and and,
a logical or and and instead of -
50:43 - 50:43min and plus,
OK, because we want to know, -
50:46 - 50:46if you think about a
relaxation, in some sense, -
50:50 - 50:50maybe I should think about it
in terms of this min. -
50:53 - 50:53So, if I want to know,
is there a pathway from I to j -
50:57 - 50:57that uses vertices labeled one
through k in the middle? -
51:02 - 51:02Well, either there is a path
that doesn't use the vertex k, -
51:06 - 51:06or there is a path that uses k,
and then it would have to look -
51:10 - 51:10like that.
OK, so there would have to be a -
51:12 - 51:12path here, and there would have
to be a path there. -
51:16 - 51:16So, the min and plus get
replaced with or and and. -
51:19 - 51:19And if you remember,
this used to be plus, -
51:21 - 51:21and this used to be product in
the matrix world. -
51:25 - 51:25So, plus is now like or.
And, multiply is now like and, -
51:28 - 51:28which sounds very good,
right? -
51:31 - 51:31Plus does feel like or,
and multiply does feel like and -
51:36 - 51:36if you live in a zero-one world.
So, in fact, -
51:40 - 51:40this is not quite the field Z
mod two, but this is a good, -
51:45 - 51:45nice, field to work in.
This is the Boolean world. -
51:50 - 51:50So, I'll just write Boole.
Good old Boole knows all about -
51:55 - 51:55this.
It's like his master's thesis, -
51:58 - 51:58I think, talking about Boolean
algebra. -
52:03 - 52:03And, this actually means that
you can use fast matrix -
52:07 - 52:07multiply.
You can use Strassen's -
52:09 - 52:09algorithm, and the fancier
algorithms, and you can compute -
52:14 - 52:14the transitive closure in
subcubic time. -
52:16 - 52:16So, this is sub cubic if the
edges are sparse. -
52:20 - 52:20But, it's cubic in the worst
case if there are lots of edges. -
52:24 - 52:24This is cubic.
You can actually do better -
52:27 - 52:27using Strassen.
So, I'll just say you can do -
52:31 - 52:31it.
No details here. -
52:33 - 52:33I think it should be,
so in fact, there is a theorem. -
52:37 - 52:37This is probably not in the
textbook, but there's a theorem -
52:41 - 52:41that says transitive closure is
just as hard as matrix multiply. -
52:46 - 52:46OK, they are equivalent.
Their running times are the -
52:50 - 52:50same.
We don't know how long it takes -
52:52 - 52:52to do a matrix multiply over a
field. -
52:55 - 52:55It's somewhere between n^2 and
n^2.3. -
52:58 - 52:58But, whatever the answer is:
same for transitive closure. -
53:03 - 53:03OK, there's the interlude.
And that's where we actually -
53:09 - 53:09get to use Strassen and friends.
Remember, Strassen was n to the -
53:17 - 53:17log base two of seven algorithm.
Remember that, -
53:22 - 53:22especially on the final.
Those are things you should -
53:28 - 53:28have at the tip of your tongue.
OK, the last algorithm we're -
53:35 - 53:35going to cover is really going
to build on what we saw last -
53:39 - 53:39time: Johnson's algorithm.
And, I've lost some of the -
53:43 - 53:43running times here.
But, when we had unweighted -
53:46 - 53:46graphs, we could do all pairs
really fast, just as fast as a -
53:51 - 53:51single source Bellman-Ford.
That's kind of nifty. -
53:54 - 53:54We don't know how to improve
Bellman-Ford in the single -
53:58 - 53:58source case.
So, we can't really help to get -
54:03 - 54:03anything better than V times E.
And, if you remember running V -
54:07 - 54:07times Dijkstra,
V times Dijkstra was about the -
54:11 - 54:11same.
So, just put this in the recall -
54:14 - 54:14bubble here: V times Dijkstra
would give us V times E plus V^2 -
54:19 - 54:19log V.
And, if you ignore that log -
54:22 - 54:22factor, this is just VE.
OK, so this was really good. -
54:26 - 54:26Dijkstra was great.
And this was for nonnegative -
54:30 - 54:30edge weights.
So, with negative edge weights, -
54:34 - 54:34somehow we'd like to get the
same running time. -
54:38 - 54:38Now, how might I get the same
running time? -
54:41 - 54:41Well, it would be really nice
if I could use Dijkstra. -
54:46 - 54:46Of course, Dijkstra doesn't
work with negative weights. -
54:50 - 54:50So what could I do?
What would I hope to do? -
54:53 - 54:53What could I hope to?
Suppose I want, -
54:56 - 54:56in the middle of the algorithm,
it says run Dijkstra n times. -
55:02 - 55:02Then, what should I do to
prepare for that? -
55:06 - 55:06Make all the weights positive,
or nonnegative. -
55:10 - 55:10Why not, right?
We're being wishful thinking. -
55:13 - 55:13That's what we'll do.
So, this is called graph -
55:17 - 55:17re-weighting.
And, what's cool is we actually -
55:21 - 55:21already know how to do it.
We just don't know that we know -
55:26 - 55:26how to do it.
But I know that we know that we -
55:30 - 55:30know how to do it.
You don't yet know that we know -
55:34 - 55:34that I know that we know how to
do it. -
55:39 - 55:39So, it turns out you can
re-weight the vertices. -
55:42 - 55:42So, at the end of the last
class someone asked me, -
55:44 - 55:44can you just,
like, add the same weight to -
55:47 - 55:47all the edges?
That doesn't work. -
55:49 - 55:49Not quite, because different
paths have different numbers of -
55:52 - 55:52edges.
What we are going to do is add -
55:54 - 55:54a particular weight to each
vertex. -
55:56 - 55:56What does that mean?
Well, because we really only -
55:58 - 55:58have weights on the edges,
here's what well do. -
56:02 - 56:02We'll re-weight each edge,
so, (u,v), let's say, -
56:07 - 56:07going to go back into graph
speak instead of matrix speak, -
56:12 - 56:12(u,v) instead of I and j,
and we'll call this modified -
56:18 - 56:18weight w_h.
h is our function. -
56:21 - 56:21It gives us a number for every
vertex. -
56:24 - 56:24And, it's just going to be the
old weight of that edge plus the -
56:31 - 56:31weight of the start vertex minus
the weight of the terminating -
56:37 - 56:37vertex.
I'm sure these have good names. -
56:41 - 56:41One of these is the head,
and the other is the tail, -
56:44 - 56:44but I can never remember which.
OK, so we've directed edge -
56:47 - 56:47(u,v).
Just add one of them; -
56:49 - 56:49subtract the other.
And, it's a directed edge, -
56:51 - 56:51so that's a consistent
definition. -
56:53 - 56:53OK, so that's called
re-weighting. -
56:55 - 56:55Now, this is actually a
theorem. -
56:58 - 56:58If you do this,
then, let's say, -
57:03 - 57:03for any vertices,
u and v in the graph, -
57:10 - 57:10for any two vertices,
all paths from u to v have the -
57:19 - 57:19same weight as they did before,
well, not quite. -
57:27 - 57:27They have the same
re-weighting. -
57:34 - 57:34So, if you look at all the
different paths and you say, -
57:37 - 57:37well, what's the difference
between vh, well, -
57:40 - 57:40sorry, let's say delta,
which is the old shortest -
57:43 - 57:43paths, and deltas of h,
which is the shortest path -
57:46 - 57:46weights according to this new
weight function, -
57:49 - 57:49then that difference is the
same. -
57:50 - 57:50So, we'll say that all these
paths are re-weighted by the -
57:54 - 57:54same amounts.
OK, this is actually a -
57:56 - 57:56statement about all paths,
not just shortest paths. -
58:00 - 58:00There we go.
OK, to how many people is this -
58:05 - 58:05obvious already?
A few, yeah, -
58:09 - 58:09it is.
And what's the one word? -
58:13 - 58:13OK, it's maybe not that
obvious. -
58:16 - 58:16All right, shout out the word
when you figure it out. -
58:23 - 58:23Meanwhile, I'll write out this
rather verbose proof. -
58:29 - 58:29There's a one word proof,
still waiting. -
58:36 - 58:36So, let's just take one of
these paths that starts at u and -
58:42 - 58:42ends at v.
Take any path. -
58:44 - 58:44We're just going to see what
its new weight is relative to -
58:49 - 58:49its old weight.
And so, let's just write out -
58:54 - 58:54w_h of the path,
which we define in the usual -
58:58 - 58:58way as the sum over all edges of
the new weight of the edge from -
59:04 - 59:04v_i to v_i plus one.
Do you have the word? -
59:09 - 59:09No?
Tough puzzle then, -
59:11 - 59:11OK.
So that's the definition of the -
59:15 - 59:15weight of a path.
And, then we know this thing is -
59:20 - 59:20just w of v_i,
v_i plus one. -
59:23 - 59:23I'll get it right,
plus the weight of the first -
59:28 - 59:28vertex, plus,
sorry, the re-weighting of v_i -
59:32 - 59:32minus the re-weighting of v_i
plus one. -
59:38 - 59:38This is all in parentheses
that's summed over I. -
59:43 - 59:43Now I need the magic word.
Telescopes, good. -
59:47 - 59:47Now this is obvious:
each of these telescopes with -
59:51 - 59:51an extra previous,
except the very beginning and -
59:56 - 59:56the very end.
So, this is the sum of these -
60:00 - 60:00weights of edges,
but then outside the sum, -
60:04 - 60:04we have plus h of v_1,
and minus h of v_k. -
60:09 - 60:09OK, those guys don't quite
cancel. -
60:12 - 60:12We're not looking at a cycle,
just a path. -
60:16 - 60:16And, this thing is just w of
the path, as this is the normal -
60:21 - 60:21weight of the path.
And so the change, -
60:24 - 60:24the difference between w_h of P
and w of P is this thing, -
60:29 - 60:29which is just h of u minus h of
v. -
60:33 - 60:33And, the point is that's the
same as long as you fix the -
60:37 - 60:37endpoints, u and v,
of the shortest path, -
60:39 - 60:39you're changing this path
weight by the same thing for all -
60:43 - 60:43paths.
This is for any path from u to -
60:46 - 60:46v, and that proves the theorem.
So, the one word here was -
60:50 - 60:50telescopes.
These change in weights -
60:52 - 60:52telescope over any path.
Therefore, if we want to find -
60:56 - 60:56shortest paths,
you just find the shortest -
60:58 - 60:58paths in this re-weighted
version, and then you just -
61:02 - 61:02change it by this one amount.
You subtract off this amount -
61:07 - 61:07instead of adding it.
That will give you the shortest -
61:10 - 61:10path weight in the original
weights. -
61:13 - 61:13OK, so this is a tool.
We now know how to change -
61:16 - 61:16weights in the graph.
But what we really want is to -
61:19 - 61:19change weights in the graph so
that the weights all come out -
61:23 - 61:23nonnegative.
OK, how do we do that? -
61:25 - 61:25Why in the world would there be
a function, h, -
61:28 - 61:28that makes all the edge weights
nonnegative? -
61:32 - 61:32It doesn't make sense.
It turns out we already know. -
61:43 - 61:43So, I should write down this
consequence. -
62:12 - 62:12Let me get this in the right
order. -
62:14 - 62:14So in particular,
the shortest path changes by -
62:17 - 62:17this amount.
And if you want to know this -
62:20 - 62:20value, you just move the stuff
to the other side. -
62:23 - 62:23So, we compute deltas of h,
then we can compute delta. -
62:26 - 62:26That's the consequence here.
How many people here pronounce -
62:30 - 62:30this word corollary?
OK, and how many people -
62:34 - 62:34pronounce it corollary?
Yeah, we are alone. -
62:38 - 62:38Usually get at least one other
student, and they're usually -
62:43 - 62:43Canadian or British or
something. -
62:45 - 62:45I think that the accent.
So, I always avoid pronouncing -
62:50 - 62:50his word unless I really think,
it's corollary, -
62:54 - 62:54and get it right.
I at least say Z not Zed. -
62:58 - 62:58OK, here we go.
So, what we want to do is find -
63:03 - 63:03one of these functions.
I mean, let's just write down -
63:09 - 63:09what we could hope to have.
We want to find a re-weighted -
63:16 - 63:16function, h, the signs of weight
to each vertex such that w_h of -
63:23 - 63:23(u,v) is nonnegative.
That would be great for all -
63:28 - 63:28edges, all (u,v) in E.
OK, then we could run Dijkstra. -
63:35 - 63:35We could run Dijkstra,
get the delta h's, -
63:38 - 63:38and then just undo the
re-weighting, -
63:41 - 63:41and get what we want.
And, that is Johnson's -
63:45 - 63:45algorithm.
The claim is that this is -
63:48 - 63:48always possible.
OK, why should it always be -
63:52 - 63:52possible?
Well, let's look at this -
63:55 - 63:55constraint.
w_h of (u,v) is that. -
63:58 - 63:58So, it's w of (u,v) plus h of u
minus h of V should be -
64:02 - 64:02nonnegative.
Let me rewrite this a little -
64:10 - 64:10bit.
I'm going to put these guys -
64:15 - 64:15over here.
That would be the right thing, -
64:22 - 64:22h of v minus h of u is less
than or equal to w of (u,v). -
64:31 - 64:31Does that look familiar?
Did I get it right? -
64:39 - 64:39It should be right.
Anyone seen that inequality -
64:46 - 64:46before?
Yeah, yes, correct answer. -
64:52 - 64:52OK, where?
In a previous lecture? -
64:57 - 64:57In the previous lecture.
What is this called if I -
65:06 - 65:06replace h with x?
Charles knows. -
65:11 - 65:11Good, anyone else remember all
the way back to episode two? -
65:21 - 65:21I know there was a weekend.
What's this operator called? -
65:31 - 65:31Not subtraction but,
I think I heard it, -
65:34 - 65:34oh man.
All right, I'll tell you. -
65:37 - 65:37It's a difference constraint,
all right? -
65:40 - 65:40This is the difference
operator. -
65:42 - 65:42OK, it's our good friend
difference constraints. -
65:46 - 65:46So, this is what we want to
satisfy. -
65:48 - 65:48We have a system of difference
constraints. -
65:52 - 65:52h of V minus h of u should be,
we want to find these. -
65:56 - 65:56These are our unknowns.
Subject to these constraints, -
66:00 - 66:00we are given the w's.
Now, we know in these -
66:06 - 66:06difference constraints are
satisfiable. -
66:11 - 66:11Can someone tell me when these
constraints are satisfiable? -
66:19 - 66:19We know exactly when for any
set of difference constraints. -
66:27 - 66:27You've got to remember the
math. -
66:32 - 66:32Terminology,
I can understand. -
66:38 - 66:38It's hard to remember words
unless you're a linguist, -
66:48 - 66:48perhaps.
So, when is the system of -
66:54 - 66:54different constraints
satisfiable? -
67:02 - 67:02All right, you should
definitely, very good. -
67:08 - 67:08[LAUGHTER] Yes,
very good. -
67:12 - 67:12Someone brought their lecture
notes: when the constraint graph -
67:21 - 67:21has no negative weight cycles.
Good, thank you. -
67:28 - 67:28Now, what is the constraint
graph? -
67:34 - 67:34OK, this has a one letter
answer more or less. -
67:38 - 67:38I'll accept the one letter
answer. -
67:40 - 67:40What?
A? -
67:41 - 67:41A: close.
G. -
67:42 - 67:42Yeah, I mean,
same thing. -
67:44 - 67:44Yeah, so the constraint graph
is essentially G. -
67:48 - 67:48Actually, it is G.
The constraint graph is G, -
67:51 - 67:51good.
And, we prove this by adding a -
67:54 - 67:54new source for text,
and connecting that to -
67:58 - 67:58everyone.
But that's sort of beside the -
68:02 - 68:02point.
That was in order to actually -
68:04 - 68:04satisfy them.
But this is our -
68:06 - 68:06characterization.
So, if we assume that there are -
68:09 - 68:09no negative weight cycles in our
graph, which we've been doing -
68:12 - 68:12all the time,
then we know that this thing is -
68:15 - 68:15satisfiable.
Therefore, there is an -
68:17 - 68:17assignment of this h's.
There is a re-weighting that -
68:20 - 68:20makes all the weights
nonnegative. -
68:22 - 68:22Then we can run Dijkstra.
OK, we're done. -
68:25 - 68:25Isn't that cool?
And how do we satisfy these -
68:27 - 68:27constraints?
We know how to do that with one -
68:30 - 68:30run of Bellman-Ford,
which costs order VE, -
68:32 - 68:32which is less than V times
Dijkstra. -
68:36 - 68:36So, that's it,
write down the details -
68:40 - 68:40somewhere.
-
69:00 - 69:00So, this is Johnson's
algorithm. -
69:04 - 69:04This is the fanciest of them
all. -
69:08 - 69:08It will be our fastest,
all pairs shortest path -
69:14 - 69:14algorithm.
So, the claim is, -
69:17 - 69:17we can find a function,
h, from V to R such that the -
69:24 - 69:24modified weight of every edge is
nonnegative for every edge, -
69:31 - 69:31(u,v), in our graph.
And, we do that using -
69:37 - 69:37Bellman-Ford to solve the
difference constraints. -
69:57 - 69:57These are exactly the different
constraints that we were born to -
70:01 - 70:01solve that we learned to solve
last time. -
70:04 - 70:04The graphs here are
corresponding exactly if you -
70:07 - 70:07look back at the definition.
Or, Bellman-Ford will tell us -
70:10 - 70:10that there is a negative weight
cycle. -
70:13 - 70:13OK, great, so it's not that we
really have to assume that there -
70:17 - 70:17is no negative weight cycle.
We'll get to know. -
70:20 - 70:20And if your fancy,
you can actually figure out the -
70:23 - 70:23minus infinities from this.
But, at this point, -
70:26 - 70:26I just want to think about the
case where there is no negative -
70:30 - 70:30weight cycle.
But if there is, -
70:34 - 70:34we can find out that it exists,
and that just tell the user. -
70:40 - 70:40OK, then we'd stop.
Otherwise, there is no negative -
70:45 - 70:45weight cycle.
Therefore, there is an -
70:49 - 70:49assignment that gives is
nonnegative edge weights. -
70:54 - 70:54So, we just use it.
We use it to run Dijkstra. -
71:00 - 71:00So, step two is,
oh, I should say the running -
71:03 - 71:03time of all this is V times E.
So, we're just running -
71:06 - 71:06Bellman-Ford on exactly the
input graph. -
71:08 - 71:08Plus, we add a source,
if you recall, -
71:11 - 71:11to solve a set of difference
constraints. -
71:13 - 71:13You add a source vertex,
S, connected to everyone at -
71:16 - 71:16weight zero, run Bellman-Ford
from there because we don't have -
71:20 - 71:20a source here.
We just have a graph. -
71:22 - 71:22We want to know all pairs.
So, this, you can use to find -
71:26 - 71:26whether there is a negative
weight cycle anywhere. -
71:30 - 71:30Or, we get this magic
assignment. -
71:33 - 71:33So now, w_h is nonnegative,
so we can run Dijkstra on w_h. -
71:40 - 71:40We'll say, using w_h,
so you compute w_h. -
71:44 - 71:44That takes linear time.
And, we run Dijkstra for each -
71:49 - 71:49possible source.
I'll write this out explicitly. -
71:54 - 71:54We've had this in our minds
several times. -
72:00 - 72:00But, when we said n times
Dijkstra over n times BFS, -
72:05 - 72:05here it is.
We want to compute delta sub h -
72:10 - 72:10now, of (u,v) for all V,
and we do this separately for -
72:15 - 72:15all u.
And so, the running time here -
72:19 - 72:19is VE plus V^2 log V.
This is just V times the -
72:24 - 72:24running time of Dijkstra,
which is E plus V log V. -
72:30 - 72:30OK, it happens that this term
is the same as this one, -
72:35 - 72:35which is nice,
because that means step one -
72:39 - 72:39costs us nothing asymptotically.
OK, and then, -
72:43 - 72:43last step is,
well, now we know delta h. -
72:47 - 72:47We just need to compute delta.
So, for each pair of vertices, -
72:53 - 72:53we'll call it (u,v),
we just compute what the -
72:57 - 72:57original weights would be,
so what delta (u,v) is. -
73:03 - 73:03And we can do that using this
corollary. -
73:07 - 73:07It's just delta sub h of (u,v)
minus h of u plus h of v. -
73:14 - 73:14I got the signs right.
Yeah, so this takes V^2 time, -
73:20 - 73:20also dwarfed by the running
time of Dijkstra. -
73:25 - 73:25So, the overall running time of
Johnson's algorithm is just the -
73:32 - 73:32running time of step two,
running Dijkstra n times -- -
73:51 - 73:51-- which is pretty cool.
When it comes to single source -
73:55 - 73:55shortest paths,
Bellman-Ford is the best thing -
73:58 - 73:58for general weights.
Dijkstra is the best thing for -
74:02 - 74:02nonnegative weights.
But for all pair shortest -
74:05 - 74:05paths, we can skirt the whole
negative weight issue by using -
74:09 - 74:09this magic we saw from
Bellman-Ford. -
74:11 - 74:11But now, running Dijkstra n
times, which is still the best -
74:15 - 74:15thing we know how to do,
pretty much, -
74:17 - 74:17for the all pairs nonnegative
weights, now we can do it for -
74:21 - 74:21general weights too,
which is a pretty nice -
74:24 - 74:24combination of all the
techniques we've seen. -
74:28 - 74:28In the trilogy,
and along the way, -
74:30 - 74:30we saw lots of dynamic
programming, which is always -
74:34 - 74:34good practice.
Any questions? -
74:35 - 74:35This is the last new content
lecture before the quiz. -
74:39 - 74:39On Wednesday it will be quiz
review, if I recall correctly. -
74:43 - 74:43And then it's Thanksgiving,
so there's no recitation. -
74:46 - 74:46And then the quiz starts on
Monday. -
74:49 - 74:49So, study up.
See you then.
- Title:
- Lec 19 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
- Description:
-
Lecture 19: Shortest Paths III: All-pairs Shortest Paths, Matrix Multiplication, Floyd-Warshall, Johnson
View the complete course at: http://ocw.mit.edu/6-046JF05
License: Creative Commons BY-NC-SA
More information at http://ocw.mit.edu/terms
More courses at http://ocw.mit.edu
- Duration:
- 01:14:59