Lec 19 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005

Edit subtitles

0:07 - 0:07

-- shortest paths.
This is the finale.
0:11 - 0:11

Hopefully it was worth waiting
for.
0:14 - 0:14

Remind you there's a quiz
coming up soon,
0:18 - 0:18

you should be studying for it.
There's no problem set due at
0:23 - 0:23

the same time as the quiz
because you should be studying
0:29 - 0:29

now.
It's a take-home exam.
0:32 - 0:32

It's required that you come to
class on Monday.
0:37 - 0:37

Of course, you'll all come,
but everyone watching at home
0:43 - 0:43

should also come next Monday to
get the quiz.
0:48 - 0:48

It's the required lecture.
So, we need a bit of a recap in
0:54 - 0:54

the trilogy so far.
So, the last two lectures,
0:58 - 0:58

the last two episodes,
or about single source shortest
1:04 - 1:04

paths.
So, we wanted to find the
1:08 - 1:08

shortest path from a source
vertex to every other vertex.
1:14 - 1:14

And, we saw a few algorithms
for this.
1:17 - 1:17

Here's some recap.
We saw in the unweighted case,
1:22 - 1:22

that was sort of the easiest
where all the edge weights were
1:27 - 1:27

one.
Then we could use breadth first
1:30 - 1:30

search.
And this costs what we call
1:35 - 1:35

linear time in the graph world,
the number of vertices plus the
1:42 - 1:42

number of edges.
The next simplest case,
1:46 - 1:46

perhaps, is nonnegative edge
weights.
1:50 - 1:50

And in that case,
what algorithm do we use?
1:55 - 1:55

Dijkstra, all right,
everyone's awake.
2:00 - 2:00

Several answers at once,
great.
2:04 - 2:04

So this takes almost linear
time if you use a good heap
2:12 - 2:12

structure, so,
V log V plus E.
2:16 - 2:16

And, in the general case,
general weights,
2:21 - 2:21

we would use Bellman-Ford which
you saw.
2:27 - 2:27

And that costs VE,
good, OK, which is quite a bit
2:33 - 2:33

worse.
This is ignoring log factors.
2:38 - 2:38

Dijkstra is basically linear
time, Bellman-Ford you're
2:43 - 2:43

quadratic if you have a
connected graph.
2:46 - 2:46

So, in the sparse case,
when E is order V,
2:49 - 2:49

this is about linear.
This is about quadratic.
2:53 - 2:53

In the dense case,
when E is about V^2,
2:56 - 2:56

this is quadratic,
and this is cubic.
3:00 - 3:00

So, Dijkstra and Bellman-Ford
are separated by about an order
3:06 - 3:06

of V factor, which is pretty
bad.
3:09 - 3:09

OK, but that's the best we know
how to do for single source
3:15 - 3:15

shortest paths,
negative edge weights,
3:19 - 3:19

Bellman-Ford is the best.
We also saw in recitation the
3:25 - 3:25

case of a DAG.
And there, what do you do?
3:30 - 3:30

Topological sort,
yeah.
3:32 - 3:32

So, you can do a topological
sort to get an ordering on the
3:39 - 3:39

vertices.
That you run Bellman-Ford,
3:43 - 3:43

one round.
This is one way to think of
3:47 - 3:47

what's going on.
You run Bellman-Ford in the
3:52 - 3:52

order given by the topological
sort, which is once,
3:58 - 3:58

and you get a linear time
algorithm.
4:03 - 4:03

So, DAG is another case where
we know how to do well even with
4:07 - 4:07

weights.
Unweighted, we can also do
4:09 - 4:09

linear time.
But most of the time,
4:10 - 4:10

though, will be,
so you should keep these in
4:13 - 4:13

mind in the quiz.
When you get a shortest path
4:16 - 4:16

problem, or what you end up
determining is the shortest path
4:19 - 4:19

problem, think about what's the
best algorithm you can use in
4:23 - 4:23

that case?
OK, so that's single source
4:25 - 4:25

shortest paths.
And so, in our evolution of the
4:27 - 4:27

Death Star, initially it was
just nonnegative edge weights.
4:31 - 4:31

Then we got negative edge
weights.
4:34 - 4:34

Today, the Death Star
challenges us with all pair
4:37 - 4:37

shortest paths,
where we want to know the
4:40 - 4:40

shortest path weight between
every pair of vertices.
4:59 - 4:59

OK, so let's get some quick
results.
5:04 - 5:04

What could we do with this
case?
5:08 - 5:08

So, for example,
suppose I have an unweighted
5:14 - 5:14

graph.
Any suggestions of how I should
5:19 - 5:19

compute all pair shortest paths?
Between every pair of vertices,
5:27 - 5:27

I want to know the shortest
path weight.
5:32 - 5:32

BFS, a couple more words?
Yeah?
5:38 - 5:38

Right, BFS V times.
OK, I'll say V times BFS,
5:45 - 5:45

OK?
So, the running time would be
5:50 - 5:50

V^2 plus V times E,
yeah, which is assuming your
5:57 - 5:57

graph is connected,
V times E.
6:03 - 6:03

OK, good.
That's probably about the best
6:05 - 6:05

algorithm we know for unweighted
graphs.
6:08 - 6:08

So, a lot of these are going to
sort of be the obvious answer.
6:12 - 6:12

You take your single source
algorithm, you run it V times.
6:15 - 6:15

That's the best you can do,
OK, or the best we know how to
6:19 - 6:19

do.
This is not so bad.
6:20 - 6:20

This is like one iteration of
Bellman-Ford,
6:23 - 6:23

for comparison.
We definitely need at least,
6:25 - 6:25

like, V^2 time,
because the size of the output
6:28 - 6:28

is V^2, shortest path weight we
have to compute.
6:32 - 6:32

So, this is not perfect,
but pretty good.
6:37 - 6:37

And we are not going to improve
on that.
6:42 - 6:42

So, nonnegative edge weights:
the natural thing to do is to
6:49 - 6:49

run Dijkstra V times,
OK, no big surprise.
6:54 - 6:54

And the running time of that
is, well, V times E again,
7:01 - 7:01

plus V^2, log V,
which is also not too bad.
7:08 - 7:08

I mean, it's basically the same
as running BFS.
7:11 - 7:11

And then, there's the log
factor.
7:13 - 7:13

If you ignore the log factor,
this is the dominant term.
7:16 - 7:16

And, I mean,
this had an [added?] V^2 as
7:18 - 7:18

well.
So, these are both pretty good.
7:21 - 7:21

I mean, this is kind of neat.
Essentially,
7:23 - 7:23

the time it takes to run one
Bellman-Ford plus a log factor,
7:27 - 7:27

you can compute all pair
shortest paths if you have
7:30 - 7:30

nonnegative edge weights.
So, I mean, comparing all pairs
7:35 - 7:35

to signal source,
this seems a lot better,
7:40 - 7:40

except we can only handle
nonnegative edge weights.
7:45 - 7:45

OK, so now let's think about
the general case.
7:50 - 7:50

Well, this is the focus of
today, and here's where we can
7:56 - 7:56

actually make an improvement.
So the obvious thing is V times
8:02 - 8:02

Bellman-Ford,
which would cost V^2 times E.
8:08 - 8:08

And that's pretty pitiful,
and we're going to try to
8:12 - 8:12

improve that to something closer
to that nonnegative edge weight
8:16 - 8:16

bound.
So it turns out,
8:17 - 8:17

here, we can actually make an
improvement whereas in these
8:21 - 8:21

special cases,
we really can't do much better.
8:24 - 8:24

OK, I don't have a good
intuition why,
8:27 - 8:27

but it's the case.
So, we'll cover something like
8:30 - 8:30

three algorithms today for this
problem.
8:34 - 8:34

The last one will be the best,
but along the way we'll see
8:37 - 8:37

some nice connections between
shortest paths and dynamic
8:40 - 8:40

programming, which we haven't
really seen yet.
8:43 - 8:43

We've seen shortest path,
and applying greedy algorithms
8:46 - 8:46

to it, but today will actually
do dynamic programming.
8:49 - 8:49

The intuition is that with all
pair shortest paths,
8:52 - 8:52

there's more potential
subproblem reuse.
8:54 - 8:54

We've got to compute the
shortest path from x to y for
8:57 - 8:57

all x and y.
Maybe we can reuse those
8:59 - 8:59

shortest paths in computing
other shortest paths.
9:03 - 9:03

OK, there's a bit more
reusability, let's say.
9:07 - 9:07

OK, let me quickly define all
pair shortest paths formally,
9:12 - 9:12

because we're going to change
our notation slightly.
9:17 - 9:17

It's because we care about all
pairs.
9:20 - 9:20

So, as usual,
the input is directed graph,
9:24 - 9:24

so, vertices and edges.
We're going to say that the
9:30 - 9:30

vertices are labeled one to n
for convenience because with all
9:36 - 9:36

pairs, we're going to think of
things more as an n by n matrix
9:42 - 9:42

instead of edges in some sense
because it doesn't help to think
9:48 - 9:48

any more in terms of adjacency
lists.
9:52 - 9:52

And, you have edge weights as
usual.
9:55 - 9:55

This is what makes it
interesting.
10:00 - 10:00

Some of them are going to be
negative.
10:05 - 10:05

So, w maps to every real
number, and the target output is
10:14 - 10:14

a shortest path matrix.
So, this is now an n by n
10:21 - 10:21

matrix.
So, n is just the number of
10:26 - 10:26

vertices of shortest path
weights.
10:32 - 10:32

So, delta of i,
j is the shortest path weight
10:38 - 10:38

from i to j for all pairs of
vertices.
10:43 - 10:43

So this, you could represent as
an n by n matrix in particular.
10:51 - 10:51

OK, so now let's start doing
algorithms.
10:57 - 10:57

So, we have this very simple
algorithm, V times Bellman-Ford,
11:02 - 11:02

V^2 times E,
and just for comparison's sake,
11:06 - 11:06

I'm going to say,
let me rewrite that,
11:10 - 11:10

V times Bellman-Ford gives us
this running time of V^2 E,
11:15 - 11:15

and I'm going to think about
the case where,
11:18 - 11:18

let's just say the graph is
dense, meeting that the number
11:23 - 11:23

of edges is quadratic,
and the number of vertices.
11:29 - 11:29

So in that case,
this will take V^4 time,
11:33 - 11:33

which is pretty slow.
We'd like to do better.
11:38 - 11:38

So, first goal would just be to
beat V^4, V hypercubed,
11:43 - 11:43

I guess.
OK, and we are going to use
11:47 - 11:47

dynamic programming to do that.
Or at least that's what the
11:53 - 11:53

motivation will come from.
It will take us a while before
11:59 - 11:59

we can even beat V^4,
which is maybe a bit pathetic,
12:04 - 12:04

but it takes some clever
insights, let's say.
12:10 - 12:10

OK, so I'm going to introduce a
bit more notation for this
12:20 - 12:20

graph.
So, I'm going to think about
12:25 - 12:25

the weighted adjacency matrix.
So, I don't think we've really
12:33 - 12:33

seen this in lecture before,
although I think it's in the
12:38 - 12:38

appendix.
What that means,
12:40 - 12:40

so normally adjacency matrix is
like one if there's an edge,
12:44 - 12:44

and zero if there isn't.
And this is in a digraph,
12:48 - 12:48

so you have to be a little bit
careful.
12:51 - 12:51

Here, these values,
the entries in the matrix,
12:54 - 12:54

are going to be the weights of
the edges.
12:57 - 12:57

OK, this is this if ij is an
edge.
13:01 - 13:01

So, if ij is an edge in the
graph, and it's going to be
13:05 - 13:05

infinity if there is no edge.
OK, in terms of shortest paths,
13:09 - 13:09

this is a more useful way to
represent the graph.
13:12 - 13:12

All right, and so this includes
everything that we need from
13:16 - 13:16

here.
And now we just have to think
13:18 - 13:18

about it as a matrix.
Matrices will be a useful tool
13:22 - 13:22

in a little while.
OK, so now I'm going to define
13:25 - 13:25

some sub problems.
And, there's different ways
13:28 - 13:28

that you could define what's
going on in the shortest paths
13:32 - 13:32

problem.
OK, the natural thing is I want
13:36 - 13:36

to go from vertex i to vertex j.
What's the shortest path?
13:39 - 13:39

OK, we need to refine the sub
problems a little but more than
13:43 - 13:43

that.
Not surprising.
13:44 - 13:44

And if you think about my
analogy to Bellman-Ford,
13:47 - 13:47

what Bellman-Ford does is it
tries to build longer and longer
13:50 - 13:50

shortest paths.
But here, length is in terms of
13:53 - 13:53

the number of edges.
So, first, it builds shortest
13:55 - 13:55

paths of length one.
We've proven the first round it
13:58 - 13:58

does that.
The second round,
14:02 - 14:02

it provides all shortest paths
of length two,
14:06 - 14:06

of count two,
and so on.
14:09 - 14:09

We'd like to do that sort of
analogously, and try to reuse
14:15 - 14:15

things a little bit more.
So, I'm going to say d_ij^(m)
14:21 - 14:21

is the weight of the shortest
path from i to j with some
14:26 - 14:26

restriction involving m.
So: shortest path from i to j
14:33 - 14:33

using at most m edges.
OK, for example,
14:37 - 14:37

if m is zero,
then we don't have to really
14:41 - 14:41

think very hard to find all
shortest paths of length zero.
14:47 - 14:47

OK, they use zero edges,
I should say.
14:51 - 14:51

So, Bellman-Ford sort of tells
us how to go from m to m plus
14:57 - 14:57

one.
So, let's just figure that out.
15:02 - 15:02

So one thing we know from the
Bellman-Ford analysis is if we
15:06 - 15:06

look at d_ij^(m-1),
we know that in some sense the
15:09 - 15:09

longest shortest path of
relevance, unless you have
15:12 - 15:12

negative weight cycle,
the longest shortest path of
15:15 - 15:15

relevance is when m equals n
minus one because that's the
15:19 - 15:19

longest simple path you can
have.
15:21 - 15:21

So, this should be a shortest
path weight from i to j,
15:25 - 15:25

and it would be no matter what
larger value you put in the
15:28 - 15:28

superscript.
This should be delta of i comma
15:32 - 15:32

j if there's no negative weight
cycles.
15:35 - 15:35

OK, so this feels good for
dynamic programming.
15:39 - 15:39

This will give us the answer if
we can compute this for all m.
15:43 - 15:43

Then we'll have the shortest
path weights in particular.
15:47 - 15:47

We need a way to detect
negative weight cycles,
15:51 - 15:51

but let's not worry about that
too much for now.
15:54 - 15:54

There are negative weights,
but let's just assume for now
15:58 - 15:58

there's no negative weight
cycles.
16:02 - 16:02

OK, and we get a recursion
recurrence.
16:06 - 16:06

And the base case is when m
equals zero.
16:11 - 16:11

This is pretty easy.
They have the same vertices,
16:17 - 16:17

the weight of zero,
and otherwise it's infinity.
16:22 - 16:22

OK, and then the actual
recursion is for m.
16:57 - 16:57

OK, if I got this right,
this is a pretty easy,
17:01 - 17:01

intuitive recursion for
d_ij^(m) is a min of smaller
17:05 - 17:05

things in terms of n minus one.
I'll just show the picture,
17:10 - 17:10

and then the proof of that
claim should be obvious.
17:15 - 17:15

So, this is proof by picture.
So, we have on the one hand,
17:20 - 17:20

I over here,
and j over here.
17:22 - 17:22

We want to know the shortest
path from i to j.
17:26 - 17:26

And, we want to use,
at most, m edges.
17:30 - 17:30

So, the idea is,
well, you could use m minus one
17:35 - 17:35

edges to get somewhere.
So this is, at most,
17:39 - 17:39

m minus one edges,
some other place,
17:43 - 17:43

and we'll call it k.
So this is a candidate for k.
17:48 - 17:48

And then you could take the
edge directly from k to j.
17:53 - 17:53

So, this costs A_k^j,
and this costs DIK m minus one.
18:00 - 18:00

OK, and that's a candidate path
of length that uses,
18:03 - 18:03

at most, m edges from I to j.
And this is essentially just
18:06 - 18:06

considering all of them.
OK, so there's sort of many
18:09 - 18:09

paths we are considering.
All of these are candidate
18:12 - 18:12

values of k.
We are taking them in over all
18:14 - 18:14

k as intermediate nodes,
whatever.
18:16 - 18:16

So there they are.
We take the best such path.
18:19 - 18:19

That should encompass all
shortest paths.
18:21 - 18:21

And this is essentially sort of
what Bellman-Ford is doing,
18:24 - 18:24

although not exactly.
We also sort of want to think
18:27 - 18:27

about, well, what if I just go
directly with,
18:29 - 18:29

say, m minus one edges?
What if there is no edge here
18:34 - 18:34

that I want to use,
in some sense?
18:37 - 18:37

Well, we always think about
there being, and the way the A's
18:41 - 18:41

are defined, there's always this
zero weight edge to yourself.
18:45 - 18:45

So, you could just take a path
that's shorter,
18:49 - 18:49

go from d i to j,
and j is a particular value of
18:52 - 18:52

k that we might consider,
and then take a zero weight
18:56 - 18:56

edge at the end from A and jj.
OK, so this really encompasses
19:00 - 19:00

everything.
So that's a pretty trivial
19:04 - 19:04

claim.
OK, now once we have such a
19:06 - 19:06

recursion, we get a dynamic
program.
19:09 - 19:09

I mean, there,
this is it in some sense.
19:12 - 19:12

It's written recursively.
You can write a bottom up.
19:15 - 19:15

And I would like to write it
bottom up it little bit because
19:20 - 19:20

while it doesn't look like it,
this is a relaxation.
19:23 - 19:23

This is yet another relaxation
algorithm.
19:26 - 19:26

So, I'll give you,
so, this is sort of the
19:29 - 19:29

algorithm.
This is not a very interesting
19:32 - 19:32

algorithm.
So, you don't have to write it
19:36 - 19:36

all down if you don't feel like
it.
19:38 - 19:38

It's probably not even in the
book.
19:40 - 19:40

This is just an intermediate
step.
19:43 - 19:43

So, we loop over all m.
That's sort of the outermost
19:46 - 19:46

thing to do.
I want to build longer and
19:48 - 19:48

longer paths,
and this vaguely corresponds to
19:51 - 19:51

Bellman-Ford,
although it's actually worse
19:54 - 19:54

than Bellman-Ford.
But hey, what the heck?
19:56 - 19:56

It's a stepping stone.
OK, then for all i and j,
20:04 - 20:04

and then we want to compute
this min.
20:10 - 20:10

So, we'll just loop over all k,
and relax.
20:18 - 20:18

And, here's where we're
actually computing the min.
20:27 - 20:27

And, it's a relaxation,
is the point.
20:35 - 20:35

This is our good friend,
the relaxation step,
20:38 - 20:38

relaxing edge.
Well, it's not,
20:40 - 20:40

yeah.
I guess we're relaxing edge kj,
20:43 - 20:43

or something,
except we don't have the same
20:46 - 20:46

clear notion.
I mean, it's a particular thing
20:49 - 20:49

that we're relaxing.
It's not just a single edge
20:52 - 20:52

because we don't have a single
source anymore.
20:55 - 20:55

It's now relative to source I,
we are relaxing the edge kj,
20:59 - 20:59

something like that.
But this is clearly a
21:03 - 21:03

relaxation.
We are just making the triangle
21:06 - 21:06

inequality true if it wasn't
before.
21:08 - 21:08

The tribal inequality has got
to hold between all pairs.
21:12 - 21:12

And that's just implementing
this min, right?
21:14 - 21:14

You're taking d ij.
You take the min of what it was
21:17 - 21:17

before in some sense.
That was one of the
21:20 - 21:20

possibilities we considered when
we looked at the zero weight
21:24 - 21:24

edge.
We say, well,
21:25 - 21:25

or you could go from i to some
k in some way that we knew how
21:28 - 21:28

to before, and then add on the
edge, and check whether that's
21:32 - 21:32

better if it's better,
set our current estimate to
21:35 - 21:35

that.
And, you do this for all k.
21:39 - 21:39

In particular,
you might actually compute
21:41 - 21:41

something smaller than this min
because I didn't put
21:44 - 21:44

superscripts up here.
But that's just making paths
21:47 - 21:47

even better.
OK, so you have to argue that
21:49 - 21:49

relaxation is always a good
thing to do.
21:51 - 21:51

So, by not putting
superscripts,
21:53 - 21:53

maybe I do some more
relaxation, but more relaxation
21:56 - 21:56

never hurts us.
You can still argue correctness
21:59 - 21:59

using this claim.
So, it's not quite the direct
22:03 - 22:03

implementation,
but there you go,
22:06 - 22:06

dynamic programming algorithm.
The main reason I'll write it
22:10 - 22:10

down: so you see that it's a
relaxation, and you see the
22:15 - 22:15

running time is n^4,
OK, which is certainly no
22:18 - 22:18

better than Bellman-Ford.
Bellman-Ford was n^4 even in
22:23 - 22:23

the dense case,
and it's a little better in the
22:26 - 22:26

sparse case.
So: not doing so great.
22:30 - 22:30

But it's a start.
OK, it gets our dynamic
22:35 - 22:35

programming minds thinking.
And, we'll get a better dynamic
22:42 - 22:42

program in a moment.
But first, there's actually
22:47 - 22:47

something useful we can do with
this formulation,
22:53 - 22:53

and I guess I'll ask,
but I'll be really impressed if
22:59 - 22:59

anyone can see.
Does this formula look like
23:05 - 23:05

anything else that you've seen
in any context,
23:10 - 23:10

mathematical or algorithmic?
Have you seen that recurrence
23:16 - 23:16

anywhere else?
OK, not exactly as stated,
23:20 - 23:20

but similar.
I'm sure if you thought about
23:25 - 23:25

it for awhile,
you could come up with it.
23:30 - 23:30

Any answers?
I didn't think you would be
23:33 - 23:33

very intuitive,
but the answer is matrix
23:36 - 23:36

multiplication.
And it may now be obvious to
23:40 - 23:40

you, or it may not.
You have to think with the
23:43 - 23:43

right quirky mind.
Then it's obvious that it's
23:47 - 23:47

matrix multiplication.
Remember, matrix
23:50 - 23:50

multiplication,
we have A, B,
23:52 - 23:52

and C.
They're all n by n matrices.
23:55 - 23:55

And, we want to compute C
equals A times B.
24:00 - 24:00

And what that meant was,
well, c_ij was a sum over all k
24:05 - 24:05

of a_ik times b_kj.
All right, that was our
24:09 - 24:09

definition of matrix
multiplication.
24:12 - 24:12

And that formula looks kind of
like this one.
24:16 - 24:16

I mean, notice the subscripts:
ik and kj.
24:19 - 24:19

Now, the operators are a little
different.
24:23 - 24:23

Here, we're multiplying the
inside things and adding them
24:28 - 24:28

all together.
There, we're adding the inside
24:34 - 24:34

things and taking them in.
But other than that,
24:41 - 24:41

it's the same.
OK, weird, but here we go.
24:47 - 24:47

So, the connection to shortest
paths is you replace these
24:55 - 24:55

operators.
So, let's take matrix
25:00 - 25:00

multiplication and replace,
what should I do first,
25:05 - 25:05

plus this thing with min.
So, why not just change the
25:10 - 25:10

operators, replace dot with
plus?
25:14 - 25:14

This is just a different
algebra to work in,
25:18 - 25:18

where plus actually means min,
and dot actually means plus.
25:24 - 25:24

So, you have to check that
things sort of work out in that
25:30 - 25:30

context, but if we do that,
then we get that c_ij is the
25:35 - 25:35

min overall k of a_ik plus,
a bit messy here,
25:40 - 25:40

b_kj.
And that looks like what we
25:44 - 25:44

actually want to compute,
here, for one value of m,
25:49 - 25:49

you have to sort of do this m
times.
25:52 - 25:52

But this conceptually is
d_ij^(m), and this is
25:56 - 25:56

d_ik^(m-1).
So, this is looking like a
26:00 - 26:00

matrix product,
which is kind of cool.
26:04 - 26:04

So, if we sort of plug in this
claim, then, and think about
26:12 - 26:12

things as matrices,
the recurrence gives us,
26:17 - 26:17

and I'll just write this now at
matrix form, that d^(m) is d^(m)
26:26 - 26:26

minus one, funny product,
A.
26:30 - 26:30

All right, so these are the
weights.
26:32 - 26:32

These were the weighted
adjacency matrix.
26:35 - 26:35

This was the previous d value.
This is the new d value.
26:38 - 26:38

So, I'll just rewrite that in
matrix form with capital
26:41 - 26:41

letters.
OK, I have the circle up things
26:44 - 26:44

that are using this funny
algebra, so, in particular,
26:47 - 26:47

circled product.
OK, so that's kind of nifty.
26:50 - 26:50

We know something about
computing matrix
26:52 - 26:52

multiplications.
We can do it in n^3 time.
26:55 - 26:55

If we were a bit fancier,
maybe we could do it in
26:58 - 26:58

sub-cubic time.
So, we could try to sort of use
27:03 - 27:03

this connection.
And, well, think about what we
27:07 - 27:07

are computing here.
We are saying,
27:10 - 27:10

well, d to the m is the
previous one times A.
27:15 - 27:15

So, what is d^(m)?
Is that some other algebraic
27:19 - 27:19

notion that we know?
Yeah, it's the exponent.
27:23 - 27:23

We're taking A,
and we want to raise it to the
27:28 - 27:28

power, m, with this funny notion
of product.
27:33 - 27:33

So, in other words,
d to the m is really just A to
27:37 - 27:37

the m in a funny way.
So, I'll circle it,
27:40 - 27:40

OK?
So, that sounds good.
27:42 - 27:42

We also know how to compute
powers of things relatively
27:46 - 27:46

quickly, if you remember how.
OK, for this notion,
27:50 - 27:50

this power notion,
to make sense,
27:53 - 27:53

I should say what A to the zero
means.
27:55 - 27:55

And so, I need some kind of
identity matrix.
28:00 - 28:00

And for here,
the identity matrix is this
28:03 - 28:03

one, if I get it right.
So, it has zeros along the
28:07 - 28:07

diagonal, and infinities
everywhere else.
28:09 - 28:09

OK, that sort of just to match
this definition.
28:13 - 28:13

d_ij zero should be zeros on
the diagonals and infinity
28:17 - 28:17

everywhere else.
But you can check this is
28:20 - 28:20

actually an identity.
If you multiply it with this
28:23 - 28:23

funny multiplication against any
other matrix,
28:27 - 28:27

you get the matrix back.
Nothing changes.
28:31 - 28:31

This really is a valid identity
matrix.
28:35 - 28:35

And, I should mention that for
A to the m to make sense,
28:40 - 28:40

you really knew that your
product operation is
28:45 - 28:45

associative.
So, actually A to the m circled
28:49 - 28:49

makes sense because circled
multiplication is associative,
28:54 - 28:54

and you can check that;
not hard because,
28:58 - 28:58

I mean, min is associative,
and addition is associative,
29:04 - 29:04

and all sorts of good stuff.
And, you have some kind of
29:11 - 29:11

distributivity property.
And, this is,
29:14 - 29:14

in turn, because the real
numbers with,
29:18 - 29:18

and get the right order here,
with min as your addition
29:24 - 29:24

operation, and plus as your
multiplication operation is a
29:29 - 29:29

closed semi-ring.
So, if ever you want to know
29:34 - 29:34

when powers make sense,
this is a good rule.
29:38 - 29:38

If you have a closed semi-ring,
then matrix products on that
29:43 - 29:43

semi-ring will give you an
associative operator,
29:47 - 29:47

and then, good,
you can take products.
29:50 - 29:50

OK, that's just some formalism.
So now, we have some intuition.
29:55 - 29:55

The question is,
what's the right.
29:57 - 29:57

Algorithm?
There are many possible
30:00 - 30:00

answers, some of which are
right, some of which are not.
30:06 - 30:06

So, we have this connection to
matrix products,
30:10 - 30:10

and we have a connection to
matrix powers.
30:13 - 30:13

And, we have algorithms for
both.
30:16 - 30:16

The question is,
what should we do?
30:18 - 30:18

So, all we need to do now is to
compute A to the funny power,
30:23 - 30:23

n minus one.
n minus one is when we get
30:26 - 30:26

shortest paths,
assuming we have no negative
30:30 - 30:30

weight cycles.
In fact, we could compute a
30:34 - 30:34

larger power than n minus one.
Once you get beyond n minus
30:39 - 30:39

one, multipling by A doesn't
change you anymore.
30:43 - 30:43

So, how should we do it?
OK, you're not giving any smart
30:48 - 30:48

answers.
I'll give the stupid answer.
30:51 - 30:51

You could say,
well, I take A.
30:53 - 30:53

I multiply it by A.
Then I multiply it by A,
30:57 - 30:57

and I multiply it by A,
and I use normal,
31:00 - 31:00

boring matrix to
multiplication.
31:04 - 31:04

So, I do, like,
n minus two,
31:07 - 31:07

standard matrix multiplies.
So, standard multiply costs,
31:13 - 31:13

like, n^3.
And I'm doing n of them.
31:17 - 31:17

So, this gives me an n^4
algorithm, and compute all the
31:23 - 31:23

shortest pathways in n^4.
Woohoo!
31:27 - 31:27

OK, no improvement.
So, how can I do better?
31:32 - 31:32

Right, natural thing to try
which sadly does not work,
31:36 - 31:36

is to use the sub cubic matrix
multiply algorithm.
31:40 - 31:40

We will, in some sense,
get there in a moment with a
31:44 - 31:44

somewhat simpler problem.
But, it's actually not known
31:48 - 31:48

how to compute shortest paths
using fast matrix multiplication
31:53 - 31:53

like Strassen's system
algorithm.
31:56 - 31:56

But, good suggestion.
OK, you have to think about why
32:00 - 32:00

it doesn't work,
and I'll tell you.
32:04 - 32:04

It's not obvious,
so it's a perfectly reasonable
32:07 - 32:07

suggestion.
But in this context it doesn't
32:10 - 32:10

quite work.
It will come up in a few
32:13 - 32:13

moments.
The problem is,
32:14 - 32:14

Strassen requires the notion of
subtraction.
32:17 - 32:17

And here, addition is min.
And, there's no inverse to min.
32:21 - 32:21

Once you take the arguments,
you can't sort of undo a min.
32:25 - 32:25

OK, so there's no notion of
subtraction, so it's not known
32:29 - 32:29

how to pull that off,
sadly.
32:32 - 32:32

So, what other tricks do we
have up our sleeve?
32:36 - 32:36

Yeah?
Divide and conquer,
32:38 - 32:38

log n powering,
yeah, repeated squaring.
32:41 - 32:41

That works.
Good, we had a fancy way.
32:44 - 32:44

If you had a number n,
you sort of looked at the
32:48 - 32:48

binary number representation of
n, and you either squared the
32:53 - 32:53

number or squared it and added
another factor of A.
32:57 - 32:57

Here, we don't even have to be
smart about it.
33:02 - 33:02

OK, we can just compute,
we really only have to think
33:07 - 33:07

about powers of two.
What we want to know,
33:11 - 33:11

and I'm going to need a bigger
font here because there's
33:17 - 33:17

multiple levels of subscripts,
A to the circled power,
33:23 - 33:23

two to the ceiling of log n.
Actually, n minus one would be
33:28 - 33:28

enough.
But there you go.
33:32 - 33:32

You can write n if you didn't
leave yourself enough space like
33:36 - 33:36

me, n the ceiling,
n the circle.
33:38 - 33:38

This just means the next power
of two after n minus one,
33:41 - 33:41

two to the ceiling log.
So, we don't have to go
33:44 - 33:44

directly to n minus one.
We can go further because
33:47 - 33:47

anything farther than n minus
one is still just the shortest
33:51 - 33:51

pathways.
If you look at the definition,
33:53 - 33:53

and you know that your paths
are simple, which is true if you
33:57 - 33:57

have no negative weight cycles,
then fine, just go farther.
34:02 - 34:02

Why not?
And so, to compute this,
34:05 - 34:05

we just do ceiling of log n
minus one products,
34:09 - 34:09

just take A squared,
and then take the result and
34:14 - 34:14

square it; take the result and
square it.
34:17 - 34:17

So, this is order log n
squares.
34:20 - 34:20

And, we don't know how to use
Strassen, but we can use the
34:25 - 34:25

boring, standard multiply of
n^3, and that gives us n^3 log n
34:31 - 34:31

running time,
OK, which finally is something
34:35 - 34:35

that beats Bellman-Ford in the
dense case.
34:40 - 34:40

OK, in the dense case,
Bellman-Ford was n^4.
34:43 - 34:43

Here we get n^3 log n,
finally something better.
34:47 - 34:47

In the sparse case,
it's about the same,
34:50 - 34:50

maybe a little worse.
E is order V.
34:52 - 34:52

Then we're going to get,
like, V3 for Bellman-Ford.
34:56 - 34:56

Here, we get n^3 log n.
OK, after log factors,
34:59 - 34:59

this is an improvement some of
the time.
35:03 - 35:03

OK, it's about the same the
other times.
35:06 - 35:06

Another nifty thing that you
get for free out of this,
35:10 - 35:10

is you can detect negative
weight cycles.
35:13 - 35:13

So, here's a bit of a puzzle.
How would I detect,
35:17 - 35:17

after I compute this product,
A to the power to ceiling log n
35:21 - 35:21

minus one, how would I know if I
found a negative weight cycle?
35:26 - 35:26

What would that mean it this
matrix of all their shortest
35:30 - 35:30

paths of, at most,
a certain length?
35:34 - 35:34

If I found a cycle,
what would have to be in that
35:37 - 35:37

matrix?
Yeah?
35:38 - 35:38

Right, so I could,
for example,
35:39 - 35:39

take this thing,
multiply it by A,
35:41 - 35:41

see if the matrix changed at
all.
35:43 - 35:43

Right, that works fine.
That's what we do in
35:46 - 35:46

Bellman-Ford.
It's an even simpler thing.
35:48 - 35:48

It's already there.
You don't have to multiply.
35:51 - 35:51

But that's the same running
time.
35:53 - 35:53

That's a good answer.
The diagonal would have a
35:56 - 35:56

negative value,
yeah.
35:57 - 35:57

So, this is just a cute thing.
Both approaches would work,
36:05 - 36:05

can detect a negative weight
cycle just by looking at the
36:15 - 36:15

diagonal of the matrix.
You just look for a negative
36:24 - 36:24

value in the diagonal.
OK.
36:30 - 36:30

So, that's algorithm one,
let's say.
36:33 - 36:33

I mean, we've seen several that
are all bad, but I'll call this
36:37 - 36:37

number one.
OK, we'll see two more.
36:40 - 36:40

This is the only one that will,
well, I shouldn't say that.
36:44 - 36:44

Fine, there we go.
So, this is one dynamic program
36:48 - 36:48

that wasn't so helpful,
except it showed us a
36:51 - 36:51

connection to matrix
multiplication,
36:54 - 36:54

which is interesting.
We'll see why it's useful a
36:58 - 36:58

little bit more.
But, it bled to this nasty four
37:02 - 37:02

nested loops.
And, using this trick,
37:05 - 37:05

we got down to n^3 log n.
Let's try, just for n^3.
37:08 - 37:08

OK, just get rid of that log.
It's annoying.
37:12 - 37:12

It makes you a little bit worse
than Bellman-Ford,
37:15 - 37:15

and the sparse case.
So, let's just erase one of
37:19 - 37:19

these nested loops.
OK, I want to do that.
37:22 - 37:22

OK, obviously that algorithm
doesn't work because it's for
37:26 - 37:26

first decay, and it's not
defined, but,
37:28 - 37:28

you know, I've got enough
variables.
37:31 - 37:31

Why don't I just define k to
the m?
37:35 - 37:35

OK, it turns out that works.
I'll do it from scratch,
37:40 - 37:40

but why not?
I don't know if that's how
37:43 - 37:43

Floyd and Warshall came up with
their algorithm,
37:47 - 37:47

but here you go.
Here's Floyd-Warshall.
37:50 - 37:50

The idea is to define the
subproblems a little bit more
37:55 - 37:55

cleverly so that to compute one
of these values,
37:59 - 37:59

you don't have to take the min
of n things.
38:04 - 38:04

I just want to take the min of
two things.
38:07 - 38:07

If I could do that,
and I still only have n^3
38:10 - 38:10

subproblems, then I would have
n^3 time.
38:12 - 38:12

So, all right,
the running time of dynamic
38:15 - 38:15

program is number of subproblems
times the time to compute the
38:19 - 38:19

recurrence for one subproblem.
So, here's linear times n^3,
38:23 - 38:23

and we want n^3 times constant.
That would be good.
38:26 - 38:26

So that's Floyd-Warshall.
So, here's the way we're going
38:30 - 38:30

to redefine c_ij.
Or I guess, there it was called
38:35 - 38:35

d_ij.
Good, so we're going to define
38:39 - 38:39

something new.
So, c_ij superscript k is now
38:44 - 38:44

going to be the weight of the
shortest path from I to j as
38:50 - 38:50

before.
Notice I used the superscript k
38:54 - 38:54

instead of m because I want k
and m to be the same thing.
39:00 - 39:00

Deep.
OK, now, here's the new
39:04 - 39:04

constraint.
I want all intermediate
39:06 - 39:06

vertices along the path,
meeting all vertices except for
39:10 - 39:10

I and j at the beginning and the
end to have a small label.
39:14 - 39:14

So, they should be in the set
from one up to k.
39:17 - 39:17

And this is where we are really
using that our vertices are
39:21 - 39:21

labeled one up to m.
So, I'm going to say,
39:24 - 39:24

well, first think about the
shortest paths that don't use
39:28 - 39:28

any other vertices.
That's when k is zero.
39:32 - 39:32

Then think about all the
shortest paths that maybe they
39:36 - 39:36

use vertex one.
And then think about the
39:38 - 39:38

shortest paths that maybe use
vertex one or vertex two.
39:42 - 39:42

Why not?
You could define it in this
39:44 - 39:44

way.
It turns out,
39:45 - 39:45

then when you increase k,
you only have to think about
39:48 - 39:48

one new vertex.
Here, we had to take min over
39:51 - 39:51

all k.
Now we know which k to look at.
39:54 - 39:54

OK, maybe that made sense.
Maybe it's not quite obvious
39:57 - 39:57

yet.
But I'm going to redo this
39:59 - 39:59

claim, redo a recurrence.
So, maybe first I should say
40:04 - 40:04

some obvious things.
So, if I want delta of ij of
40:07 - 40:07

the shortest pathway,
well, just take all the
40:11 - 40:11

vertices.
So, take c_ij superscript n.
40:13 - 40:13

That's everything.
And this even works,
40:16 - 40:16

this is true even if you have a
negative weight cycle.
40:20 - 40:20

Although, again,
we're going to sort of ignore
40:23 - 40:23

negative weight cycles as long
as we can detect them.
40:27 - 40:27

And, another simple case is if
you have, well,
40:30 - 40:30

c_ij to zero.
Let me put that in the claim to
40:36 - 40:36

be a little bit more consistent
here.
40:41 - 40:41

So, here's the new claim.
If we want to compute c_ij
40:47 - 40:47

superscript zero,
what is it?
40:51 - 40:51

Superscript zero means I really
shouldn't use any intermediate
40:59 - 40:59

vertices.
So, this has a very simple
41:04 - 41:04

answer, a three letter answer.
So, it's not zero.
41:09 - 41:09

It's four letters.
What's that?
41:13 - 41:13

Nil.
No, not working yet.
41:15 - 41:15

It has some subscripts,
too.
41:18 - 41:18

So, the definition would be,
what's the shortest path weight
41:25 - 41:25

from I to j when you're not
allowed to use any intermediate
41:32 - 41:32

vertices?
Sorry?
41:35 - 41:35

So, yeah, it has a very simple
name.
41:38 - 41:38

That's the tricky part.
All right, so if i equals j,
41:43 - 41:43

[LAUGHTER] you're clever,
right, open bracket i equals j
41:49 - 41:49

means one, well,
OK.
41:50 - 41:50

It sort of works,
but it's not quite right.
41:55 - 41:55

In fact, I want infinity if i
does not equal j.
41:59 - 41:59

And I want to zero if i equals
j, a_ij, good.
42:05 - 42:05

I think it's a_ij.
It should be,
42:07 - 42:07

right?
Maybe I'm wrong.
42:09 - 42:09

Right, a_ij.
So it's essentially not what I
42:12 - 42:12

said.
That's the point.
42:14 - 42:14

If i does not equal j,
you still have to think about a
42:18 - 42:18

single edge connecting i to j,
right?
42:21 - 42:21

OK, so that's a bit of a
subtlety.
42:23 - 42:23

This is only intermediate
vertices, so you could still go
42:27 - 42:27

from i to j via a single edge.
That will cost a_ij.
42:33 - 42:33

If there is an edge:
infinity.
42:35 - 42:35

If there isn't one:
that is a_ij.
42:37 - 42:37

So, OK, that gets us started.
And then, we want a recurrence.
42:42 - 42:42

And, the recurrence is,
well, maybe you get away with
42:46 - 42:46

all the vertices that you had
before.
42:49 - 42:49

So, if you want to know paths
that you had before,
42:53 - 42:53

so if you want to know paths
that use one up to k,
42:57 - 42:57

maybe I just use one up to k
minus one.
43:01 - 43:01

You could try that.
Or, you could try using k.
43:05 - 43:05

So, either you use k or you
don't.
43:07 - 43:07

If you don't,
it's got to be this.
43:10 - 43:10

If you do, then you've got to
go to k.
43:13 - 43:13

So why not go to k at the end?
So, you go from I to k using
43:17 - 43:17

the previous vertices.
Obviously, you don't want to
43:21 - 43:21

repeat k in there.
And then, you go from k to j
43:25 - 43:25

somehow using vertices that are
not k.
43:29 - 43:29

This should be pretty
intuitive.
43:31 - 43:31

Again, I can draw a picture.
So, either you never go to k,
43:36 - 43:36

and that's this wiggly line.
You go from i to j using things
43:40 - 43:40

only one up to k minus one.
In other words,
43:44 - 43:44

here we have to use one up to
k.
43:46 - 43:46

So, this just means don't use
k.
43:48 - 43:48

So, that's this thing.
Or, you use k somewhere in the
43:52 - 43:52

middle there.
OK, it's got to be one of the
43:56 - 43:56

two.
And in this case,
43:57 - 43:57

you go from i to k using only
smaller vertices,
44:01 - 44:01

because you don't want to
repeat k.
44:05 - 44:05

And here, you go from k to j
using only smaller labeled
44:11 - 44:11

vertices.
So, every path is one of the
44:15 - 44:15

two.
So, we take the shortest of
44:18 - 44:18

these two subproblems.
That's the answer.
44:22 - 44:22

So, now we have a min of two
things.
44:26 - 44:26

It takes constant time to
compute.
44:30 - 44:30

So, we get a cubic algorithm.
So, let me write it down.
44:37 - 44:37

So, this is the Floyd-Warshall
algorithm.
44:41 - 44:41

I'll write the name again.
You give it a matrix A.
44:47 - 44:47

That's all it really needs to
know.
44:50 - 44:50

It codes everything.
You copy C to A.
44:54 - 44:54

That's the warm up.
Right at time zero,
44:59 - 44:59

C equals A.
And then you just have these
45:03 - 45:03

three loops for every value of
k, for every value of i,
45:07 - 45:07

and for every value of j.
You compute that min.
45:11 - 45:11

And if you think about it a
little bit, that min is a
45:15 - 45:15

relaxation.
Surprise, surprise.
45:47 - 45:47

So, that is the Floyd-Warshall
algorithm.
45:52 - 45:52

And, the running time is
clearly n^3, three nested loops,
45:58 - 45:58

constant time inside.
So, we're finally getting
46:02 - 46:02

something that is never worse
than Bellman-Ford.
46:05 - 46:05

In the sparse case,
it's the same.
46:07 - 46:07

And anything denser,
the number of edges is super
46:10 - 46:10

linear.
This is strictly better than
46:12 - 46:12

Bellman-Ford.
And, it's better than
46:14 - 46:14

everything we've seen so far for
all pair, shortest paths.
46:17 - 46:17

And, this handles negative
weights; very simple algorithm,
46:20 - 46:20

even simpler than the one
before.
46:22 - 46:22

It's just relaxation within
three loops.
46:24 - 46:24

What more could you ask for?
And we need to check that this
46:27 - 46:27

is indeed what min we're
computing here,
46:29 - 46:29

except that the superscripts
are omitted.
46:33 - 46:33

That's, again,
a bit of hand waving a bit.
46:36 - 46:36

It's OK to omit subscripts
because that can only mean that
46:39 - 46:39

you're doing more relaxation
techniques should be.
46:43 - 46:43

Doing more relaxations can
never hurt you.
46:45 - 46:45

In particular,
we do all the ones that we have
46:48 - 46:48

to.
Therefore, we find the shortest
46:50 - 46:50

path weights.
And, again, here,
46:52 - 46:52

we're assuming that there is no
negative weight cycles.
46:56 - 46:56

It shouldn't be hard to find
them, but you have to think
46:59 - 46:59

about that a little bit.
OK, you could run another round
47:04 - 47:04

of Bellman-Ford,
see if it relaxes in a new
47:08 - 47:08

edges again.
For example,
47:10 - 47:10

I think there's no nifty trick
for that version.
47:13 - 47:13

And, we're going to cover,
that's our second algorithm for
47:18 - 47:18

all pairs shortest paths.
Before we go up to the third
47:22 - 47:22

algorithm, which is going to be
the cleverest of them all,
47:26 - 47:26

the one Ring to rule them all,
to switch trilogies,
47:30 - 47:30

we're going to take a little
bit of a diversion,
47:34 - 47:34

side story, whatever,
and talk about transitive
47:37 - 47:37

closure briefly.
This is just a good thing to
47:43 - 47:43

know about.
And, it relates to the
47:46 - 47:46

algorithms we've seen so far.
So, here's a transitive closure
47:51 - 47:51

problem.
I give you a directed graph,
47:55 - 47:55

and for all pair vertices,
i and j, I want to compute this
48:00 - 48:00

number.
It's one if there's a path from
48:03 - 48:03

i to j.
From i to j,
48:07 - 48:07

OK, and then zero otherwise.
OK, this is sort of like a
48:14 - 48:14

boring adjacency matrix with no
weights, except it's about paths
48:23 - 48:23

instead of being about edges.
OK, so how can I compute this?
48:32 - 48:32

That's very simple.
How should I compute this?
48:39 - 48:39

This gives me a graph in some
sense.
48:45 - 48:45

This is adjacency matrix of a
new graph called the transitive
48:55 - 48:55

closure of my input graph.
So, breadth first search,
49:02 - 49:02

yeah, good.
So, all I need to do is find
49:05 - 49:05

shortest paths,
and if the weights come out
49:08 - 49:08

infinity, then there's no path.
If it's less than infinity,
49:13 - 49:13

that there's a path.
And so here,
49:15 - 49:15

so you are saying maybe I don't
care about the weights,
49:20 - 49:20

so I can run breadth first
search n times,
49:23 - 49:23

and that will work indeed.
So, if we do B times B of S,
49:27 - 49:27

so it's maybe weird that I'm
covering here in the middle,
49:32 - 49:32

but it's just an interlude.
So, we have,
49:36 - 49:36

then, something like V times E.
OK, you can run any of these
49:42 - 49:42

algorithms.
You could take Floyd-Warshall
49:47 - 49:47

for example.
Why not?
49:49 - 49:49

OK, then it would just be V^ I
mean, you could run in any of
49:55 - 49:55

these algorithms with weights of
one or zero, and just check
50:01 - 50:01

whether the values are infinity
or not.
50:06 - 50:06

So, I mean, t_ij equals zero,
if and only if the shortest
50:10 - 50:10

path weight from i to j is
infinity.
50:13 - 50:13

So, just solve this.
This is an easier problem than
50:16 - 50:16

shortest paths.
It is, in fact,
50:18 - 50:18

strictly easier in a certain
sense, because what's going on
50:23 - 50:23

with transitive closure,
and I just want to mention this
50:27 - 50:27

out of interest because
transitive closure is a useful
50:31 - 50:31

thing to know about.
Essentially,
50:34 - 50:34

what we are doing,
let me get this right,
50:37 - 50:37

is using a different set of
operators.
50:39 - 50:39

We're using or and and,
a logical or and and instead of
50:43 - 50:43

min and plus,
OK, because we want to know,
50:46 - 50:46

if you think about a
relaxation, in some sense,
50:50 - 50:50

maybe I should think about it
in terms of this min.
50:53 - 50:53

So, if I want to know,
is there a pathway from I to j
50:57 - 50:57

that uses vertices labeled one
through k in the middle?
51:02 - 51:02

Well, either there is a path
that doesn't use the vertex k,
51:06 - 51:06

or there is a path that uses k,
and then it would have to look
51:10 - 51:10

like that.
OK, so there would have to be a
51:12 - 51:12

path here, and there would have
to be a path there.
51:16 - 51:16

So, the min and plus get
replaced with or and and.
51:19 - 51:19

And if you remember,
this used to be plus,
51:21 - 51:21

and this used to be product in
the matrix world.
51:25 - 51:25

So, plus is now like or.
And, multiply is now like and,
51:28 - 51:28

which sounds very good,
right?
51:31 - 51:31

Plus does feel like or,
and multiply does feel like and
51:36 - 51:36

if you live in a zero-one world.
So, in fact,
51:40 - 51:40

this is not quite the field Z
mod two, but this is a good,
51:45 - 51:45

nice, field to work in.
This is the Boolean world.
51:50 - 51:50

So, I'll just write Boole.
Good old Boole knows all about
51:55 - 51:55

this.
It's like his master's thesis,
51:58 - 51:58

I think, talking about Boolean
algebra.
52:03 - 52:03

And, this actually means that
you can use fast matrix
52:07 - 52:07

multiply.
You can use Strassen's
52:09 - 52:09

algorithm, and the fancier
algorithms, and you can compute
52:14 - 52:14

the transitive closure in
subcubic time.
52:16 - 52:16

So, this is sub cubic if the
edges are sparse.
52:20 - 52:20

But, it's cubic in the worst
case if there are lots of edges.
52:24 - 52:24

This is cubic.
You can actually do better
52:27 - 52:27

using Strassen.
So, I'll just say you can do
52:31 - 52:31

it.
No details here.
52:33 - 52:33

I think it should be,
so in fact, there is a theorem.
52:37 - 52:37

This is probably not in the
textbook, but there's a theorem
52:41 - 52:41

that says transitive closure is
just as hard as matrix multiply.
52:46 - 52:46

OK, they are equivalent.
Their running times are the
52:50 - 52:50

same.
We don't know how long it takes
52:52 - 52:52

to do a matrix multiply over a
field.
52:55 - 52:55

It's somewhere between n^2 and
n^2.3.
52:58 - 52:58

But, whatever the answer is:
same for transitive closure.
53:03 - 53:03

OK, there's the interlude.
And that's where we actually
53:09 - 53:09

get to use Strassen and friends.
Remember, Strassen was n to the
53:17 - 53:17

log base two of seven algorithm.
Remember that,
53:22 - 53:22

especially on the final.
Those are things you should
53:28 - 53:28

have at the tip of your tongue.
OK, the last algorithm we're
53:35 - 53:35

going to cover is really going
to build on what we saw last
53:39 - 53:39

time: Johnson's algorithm.
And, I've lost some of the
53:43 - 53:43

running times here.
But, when we had unweighted
53:46 - 53:46

graphs, we could do all pairs
really fast, just as fast as a
53:51 - 53:51

single source Bellman-Ford.
That's kind of nifty.
53:54 - 53:54

We don't know how to improve
Bellman-Ford in the single
53:58 - 53:58

source case.
So, we can't really help to get
54:03 - 54:03

anything better than V times E.
And, if you remember running V
54:07 - 54:07

times Dijkstra,
V times Dijkstra was about the
54:11 - 54:11

same.
So, just put this in the recall
54:14 - 54:14

bubble here: V times Dijkstra
would give us V times E plus V^2
54:19 - 54:19

log V.
And, if you ignore that log
54:22 - 54:22

factor, this is just VE.
OK, so this was really good.
54:26 - 54:26

Dijkstra was great.
And this was for nonnegative
54:30 - 54:30

edge weights.
So, with negative edge weights,
54:34 - 54:34

somehow we'd like to get the
same running time.
54:38 - 54:38

Now, how might I get the same
running time?
54:41 - 54:41

Well, it would be really nice
if I could use Dijkstra.
54:46 - 54:46

Of course, Dijkstra doesn't
work with negative weights.
54:50 - 54:50

So what could I do?
What would I hope to do?
54:53 - 54:53

What could I hope to?
Suppose I want,
54:56 - 54:56

in the middle of the algorithm,
it says run Dijkstra n times.
55:02 - 55:02

Then, what should I do to
prepare for that?
55:06 - 55:06

Make all the weights positive,
or nonnegative.
55:10 - 55:10

Why not, right?
We're being wishful thinking.
55:13 - 55:13

That's what we'll do.
So, this is called graph
55:17 - 55:17

re-weighting.
And, what's cool is we actually
55:21 - 55:21

already know how to do it.
We just don't know that we know
55:26 - 55:26

how to do it.
But I know that we know that we
55:30 - 55:30

know how to do it.
You don't yet know that we know
55:34 - 55:34

that I know that we know how to
do it.
55:39 - 55:39

So, it turns out you can
re-weight the vertices.
55:42 - 55:42

So, at the end of the last
class someone asked me,
55:44 - 55:44

can you just,
like, add the same weight to
55:47 - 55:47

all the edges?
That doesn't work.
55:49 - 55:49

Not quite, because different
paths have different numbers of
55:52 - 55:52

edges.
What we are going to do is add
55:54 - 55:54

a particular weight to each
vertex.
55:56 - 55:56

What does that mean?
Well, because we really only
55:58 - 55:58

have weights on the edges,
here's what well do.
56:02 - 56:02

We'll re-weight each edge,
so, (u,v), let's say,
56:07 - 56:07

going to go back into graph
speak instead of matrix speak,
56:12 - 56:12

(u,v) instead of I and j,
and we'll call this modified
56:18 - 56:18

weight w_h.
h is our function.
56:21 - 56:21

It gives us a number for every
vertex.
56:24 - 56:24

And, it's just going to be the
old weight of that edge plus the
56:31 - 56:31

weight of the start vertex minus
the weight of the terminating
56:37 - 56:37

vertex.
I'm sure these have good names.
56:41 - 56:41

One of these is the head,
and the other is the tail,
56:44 - 56:44

but I can never remember which.
OK, so we've directed edge
56:47 - 56:47

(u,v).
Just add one of them;
56:49 - 56:49

subtract the other.
And, it's a directed edge,
56:51 - 56:51

so that's a consistent
definition.
56:53 - 56:53

OK, so that's called
re-weighting.
56:55 - 56:55

Now, this is actually a
theorem.
56:58 - 56:58

If you do this,
then, let's say,
57:03 - 57:03

for any vertices,
u and v in the graph,
57:10 - 57:10

for any two vertices,
all paths from u to v have the
57:19 - 57:19

same weight as they did before,
well, not quite.
57:27 - 57:27

They have the same
re-weighting.
57:34 - 57:34

So, if you look at all the
different paths and you say,
57:37 - 57:37

well, what's the difference
between vh, well,
57:40 - 57:40

sorry, let's say delta,
which is the old shortest
57:43 - 57:43

paths, and deltas of h,
which is the shortest path
57:46 - 57:46

weights according to this new
weight function,
57:49 - 57:49

then that difference is the
same.
57:50 - 57:50

So, we'll say that all these
paths are re-weighted by the
57:54 - 57:54

same amounts.
OK, this is actually a
57:56 - 57:56

statement about all paths,
not just shortest paths.
58:00 - 58:00

There we go.
OK, to how many people is this
58:05 - 58:05

obvious already?
A few, yeah,
58:09 - 58:09

it is.
And what's the one word?
58:13 - 58:13

OK, it's maybe not that
obvious.
58:16 - 58:16

All right, shout out the word
when you figure it out.
58:23 - 58:23

Meanwhile, I'll write out this
rather verbose proof.
58:29 - 58:29

There's a one word proof,
still waiting.
58:36 - 58:36

So, let's just take one of
these paths that starts at u and
58:42 - 58:42

ends at v.
Take any path.
58:44 - 58:44

We're just going to see what
its new weight is relative to
58:49 - 58:49

its old weight.
And so, let's just write out
58:54 - 58:54

w_h of the path,
which we define in the usual
58:58 - 58:58

way as the sum over all edges of
the new weight of the edge from
59:04 - 59:04

v_i to v_i plus one.
Do you have the word?
59:09 - 59:09

No?
Tough puzzle then,
59:11 - 59:11

OK.
So that's the definition of the
59:15 - 59:15

weight of a path.
And, then we know this thing is
59:20 - 59:20

just w of v_i,
v_i plus one.
59:23 - 59:23

I'll get it right,
plus the weight of the first
59:28 - 59:28

vertex, plus,
sorry, the re-weighting of v_i
59:32 - 59:32

minus the re-weighting of v_i
plus one.
59:38 - 59:38

This is all in parentheses
that's summed over I.
59:43 - 59:43

Now I need the magic word.
Telescopes, good.
59:47 - 59:47

Now this is obvious:
each of these telescopes with
59:51 - 59:51

an extra previous,
except the very beginning and
59:56 - 59:56

the very end.
So, this is the sum of these
60:00 - 60:00

weights of edges,
but then outside the sum,
60:04 - 60:04

we have plus h of v_1,
and minus h of v_k.
60:09 - 60:09

OK, those guys don't quite
cancel.
60:12 - 60:12

We're not looking at a cycle,
just a path.
60:16 - 60:16

And, this thing is just w of
the path, as this is the normal
60:21 - 60:21

weight of the path.
And so the change,
60:24 - 60:24

the difference between w_h of P
and w of P is this thing,
60:29 - 60:29

which is just h of u minus h of
v.
60:33 - 60:33

And, the point is that's the
same as long as you fix the
60:37 - 60:37

endpoints, u and v,
of the shortest path,
60:39 - 60:39

you're changing this path
weight by the same thing for all
60:43 - 60:43

paths.
This is for any path from u to
60:46 - 60:46

v, and that proves the theorem.
So, the one word here was
60:50 - 60:50

telescopes.
These change in weights
60:52 - 60:52

telescope over any path.
Therefore, if we want to find
60:56 - 60:56

shortest paths,
you just find the shortest
60:58 - 60:58

paths in this re-weighted
version, and then you just
61:02 - 61:02

change it by this one amount.
You subtract off this amount
61:07 - 61:07

instead of adding it.
That will give you the shortest
61:10 - 61:10

path weight in the original
weights.
61:13 - 61:13

OK, so this is a tool.
We now know how to change
61:16 - 61:16

weights in the graph.
But what we really want is to
61:19 - 61:19

change weights in the graph so
that the weights all come out
61:23 - 61:23

nonnegative.
OK, how do we do that?
61:25 - 61:25

Why in the world would there be
a function, h,
61:28 - 61:28

that makes all the edge weights
nonnegative?
61:32 - 61:32

It doesn't make sense.
It turns out we already know.
61:43 - 61:43

So, I should write down this
consequence.
62:12 - 62:12

Let me get this in the right
order.
62:14 - 62:14

So in particular,
the shortest path changes by
62:17 - 62:17

this amount.
And if you want to know this
62:20 - 62:20

value, you just move the stuff
to the other side.
62:23 - 62:23

So, we compute deltas of h,
then we can compute delta.
62:26 - 62:26

That's the consequence here.
How many people here pronounce
62:30 - 62:30

this word corollary?
OK, and how many people
62:34 - 62:34

pronounce it corollary?
Yeah, we are alone.
62:38 - 62:38

Usually get at least one other
student, and they're usually
62:43 - 62:43

Canadian or British or
something.
62:45 - 62:45

I think that the accent.
So, I always avoid pronouncing
62:50 - 62:50

his word unless I really think,
it's corollary,
62:54 - 62:54

and get it right.
I at least say Z not Zed.
62:58 - 62:58

OK, here we go.
So, what we want to do is find
63:03 - 63:03

one of these functions.
I mean, let's just write down
63:09 - 63:09

what we could hope to have.
We want to find a re-weighted
63:16 - 63:16

function, h, the signs of weight
to each vertex such that w_h of
63:23 - 63:23

(u,v) is nonnegative.
That would be great for all
63:28 - 63:28

edges, all (u,v) in E.
OK, then we could run Dijkstra.
63:35 - 63:35

We could run Dijkstra,
get the delta h's,
63:38 - 63:38

and then just undo the
re-weighting,
63:41 - 63:41

and get what we want.
And, that is Johnson's
63:45 - 63:45

algorithm.
The claim is that this is
63:48 - 63:48

always possible.
OK, why should it always be
63:52 - 63:52

possible?
Well, let's look at this
63:55 - 63:55

constraint.
w_h of (u,v) is that.
63:58 - 63:58

So, it's w of (u,v) plus h of u
minus h of V should be
64:02 - 64:02

nonnegative.
Let me rewrite this a little
64:10 - 64:10

bit.
I'm going to put these guys
64:15 - 64:15

over here.
That would be the right thing,
64:22 - 64:22

h of v minus h of u is less
than or equal to w of (u,v).
64:31 - 64:31

Does that look familiar?
Did I get it right?
64:39 - 64:39

It should be right.
Anyone seen that inequality
64:46 - 64:46

before?
Yeah, yes, correct answer.
64:52 - 64:52

OK, where?
In a previous lecture?
64:57 - 64:57

In the previous lecture.
What is this called if I
65:06 - 65:06

replace h with x?
Charles knows.
65:11 - 65:11

Good, anyone else remember all
the way back to episode two?
65:21 - 65:21

I know there was a weekend.
What's this operator called?
65:31 - 65:31

Not subtraction but,
I think I heard it,
65:34 - 65:34

oh man.
All right, I'll tell you.
65:37 - 65:37

It's a difference constraint,
all right?
65:40 - 65:40

This is the difference
operator.
65:42 - 65:42

OK, it's our good friend
difference constraints.
65:46 - 65:46

So, this is what we want to
satisfy.
65:48 - 65:48

We have a system of difference
constraints.
65:52 - 65:52

h of V minus h of u should be,
we want to find these.
65:56 - 65:56

These are our unknowns.
Subject to these constraints,
66:00 - 66:00

we are given the w's.
Now, we know in these
66:06 - 66:06

difference constraints are
satisfiable.
66:11 - 66:11

Can someone tell me when these
constraints are satisfiable?
66:19 - 66:19

We know exactly when for any
set of difference constraints.
66:27 - 66:27

You've got to remember the
math.
66:32 - 66:32

Terminology,
I can understand.
66:38 - 66:38

It's hard to remember words
unless you're a linguist,
66:48 - 66:48

perhaps.
So, when is the system of
66:54 - 66:54

different constraints
satisfiable?
67:02 - 67:02

All right, you should
definitely, very good.
67:08 - 67:08

[LAUGHTER] Yes,
very good.
67:12 - 67:12

Someone brought their lecture
notes: when the constraint graph
67:21 - 67:21

has no negative weight cycles.
Good, thank you.
67:28 - 67:28

Now, what is the constraint
graph?
67:34 - 67:34

OK, this has a one letter
answer more or less.
67:38 - 67:38

I'll accept the one letter
answer.
67:40 - 67:40

What?
A?
67:41 - 67:41

A: close.
G.
67:42 - 67:42

Yeah, I mean,
same thing.
67:44 - 67:44

Yeah, so the constraint graph
is essentially G.
67:48 - 67:48

Actually, it is G.
The constraint graph is G,
67:51 - 67:51

good.
And, we prove this by adding a
67:54 - 67:54

new source for text,
and connecting that to
67:58 - 67:58

everyone.
But that's sort of beside the
68:02 - 68:02

point.
That was in order to actually
68:04 - 68:04

satisfy them.
But this is our
68:06 - 68:06

characterization.
So, if we assume that there are
68:09 - 68:09

no negative weight cycles in our
graph, which we've been doing
68:12 - 68:12

all the time,
then we know that this thing is
68:15 - 68:15

satisfiable.
Therefore, there is an
68:17 - 68:17

assignment of this h's.
There is a re-weighting that
68:20 - 68:20

makes all the weights
nonnegative.
68:22 - 68:22

Then we can run Dijkstra.
OK, we're done.
68:25 - 68:25

Isn't that cool?
And how do we satisfy these
68:27 - 68:27

constraints?
We know how to do that with one
68:30 - 68:30

run of Bellman-Ford,
which costs order VE,
68:32 - 68:32

which is less than V times
Dijkstra.
68:36 - 68:36

So, that's it,
write down the details
68:40 - 68:40

somewhere.
69:00 - 69:00

So, this is Johnson's
algorithm.
69:04 - 69:04

This is the fanciest of them
all.
69:08 - 69:08

It will be our fastest,
all pairs shortest path
69:14 - 69:14

algorithm.
So, the claim is,
69:17 - 69:17

we can find a function,
h, from V to R such that the
69:24 - 69:24

modified weight of every edge is
nonnegative for every edge,
69:31 - 69:31

(u,v), in our graph.
And, we do that using
69:37 - 69:37

Bellman-Ford to solve the
difference constraints.
69:57 - 69:57

These are exactly the different
constraints that we were born to
70:01 - 70:01

solve that we learned to solve
last time.
70:04 - 70:04

The graphs here are
corresponding exactly if you
70:07 - 70:07

look back at the definition.
Or, Bellman-Ford will tell us
70:10 - 70:10

that there is a negative weight
cycle.
70:13 - 70:13

OK, great, so it's not that we
really have to assume that there
70:17 - 70:17

is no negative weight cycle.
We'll get to know.
70:20 - 70:20

And if your fancy,
you can actually figure out the
70:23 - 70:23

minus infinities from this.
But, at this point,
70:26 - 70:26

I just want to think about the
case where there is no negative
70:30 - 70:30

weight cycle.
But if there is,
70:34 - 70:34

we can find out that it exists,
and that just tell the user.
70:40 - 70:40

OK, then we'd stop.
Otherwise, there is no negative
70:45 - 70:45

weight cycle.
Therefore, there is an
70:49 - 70:49

assignment that gives is
nonnegative edge weights.
70:54 - 70:54

So, we just use it.
We use it to run Dijkstra.
71:00 - 71:00

So, step two is,
oh, I should say the running
71:03 - 71:03

time of all this is V times E.
So, we're just running
71:06 - 71:06

Bellman-Ford on exactly the
input graph.
71:08 - 71:08

Plus, we add a source,
if you recall,
71:11 - 71:11

to solve a set of difference
constraints.
71:13 - 71:13

You add a source vertex,
S, connected to everyone at
71:16 - 71:16

weight zero, run Bellman-Ford
from there because we don't have
71:20 - 71:20

a source here.
We just have a graph.
71:22 - 71:22

We want to know all pairs.
So, this, you can use to find
71:26 - 71:26

whether there is a negative
weight cycle anywhere.
71:30 - 71:30

Or, we get this magic
assignment.
71:33 - 71:33

So now, w_h is nonnegative,
so we can run Dijkstra on w_h.
71:40 - 71:40

We'll say, using w_h,
so you compute w_h.
71:44 - 71:44

That takes linear time.
And, we run Dijkstra for each
71:49 - 71:49

possible source.
I'll write this out explicitly.
71:54 - 71:54

We've had this in our minds
several times.
72:00 - 72:00

But, when we said n times
Dijkstra over n times BFS,
72:05 - 72:05

here it is.
We want to compute delta sub h
72:10 - 72:10

now, of (u,v) for all V,
and we do this separately for
72:15 - 72:15

all u.
And so, the running time here
72:19 - 72:19

is VE plus V^2 log V.
This is just V times the
72:24 - 72:24

running time of Dijkstra,
which is E plus V log V.
72:30 - 72:30

OK, it happens that this term
is the same as this one,
72:35 - 72:35

which is nice,
because that means step one
72:39 - 72:39

costs us nothing asymptotically.
OK, and then,
72:43 - 72:43

last step is,
well, now we know delta h.
72:47 - 72:47

We just need to compute delta.
So, for each pair of vertices,
72:53 - 72:53

we'll call it (u,v),
we just compute what the
72:57 - 72:57

original weights would be,
so what delta (u,v) is.
73:03 - 73:03

And we can do that using this
corollary.
73:07 - 73:07

It's just delta sub h of (u,v)
minus h of u plus h of v.
73:14 - 73:14

I got the signs right.
Yeah, so this takes V^2 time,
73:20 - 73:20

also dwarfed by the running
time of Dijkstra.
73:25 - 73:25

So, the overall running time of
Johnson's algorithm is just the
73:32 - 73:32

running time of step two,
running Dijkstra n times --
73:51 - 73:51

-- which is pretty cool.
When it comes to single source
73:55 - 73:55

shortest paths,
Bellman-Ford is the best thing
73:58 - 73:58

for general weights.
Dijkstra is the best thing for
74:02 - 74:02

nonnegative weights.
But for all pair shortest
74:05 - 74:05

paths, we can skirt the whole
negative weight issue by using
74:09 - 74:09

this magic we saw from
Bellman-Ford.
74:11 - 74:11

But now, running Dijkstra n
times, which is still the best
74:15 - 74:15

thing we know how to do,
pretty much,
74:17 - 74:17

for the all pairs nonnegative
weights, now we can do it for
74:21 - 74:21

general weights too,
which is a pretty nice
74:24 - 74:24

combination of all the
techniques we've seen.
74:28 - 74:28

In the trilogy,
and along the way,
74:30 - 74:30

we saw lots of dynamic
programming, which is always
74:34 - 74:34

good practice.
Any questions?
74:35 - 74:35

This is the last new content
lecture before the quiz.
74:39 - 74:39

On Wednesday it will be quiz
review, if I recall correctly.
74:43 - 74:43

And then it's Thanksgiving,
so there's no recitation.
74:46 - 74:46

And then the quiz starts on
Monday.
74:49 - 74:49

So, study up.
See you then.

Title:: Lec 19 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
Description:: Lecture 19: Shortest Paths III: All-pairs Shortest Paths, Matrix Multiplication, Floyd-Warshall, Johnson

View the complete course at: http://ocw.mit.edu/6-046JF05

License: Creative Commons BY-NC-SA

More information at http://ocw.mit.edu/terms

More courses at http://ocw.mit.edu

more » « less
Duration:: 01:14:59

	Amara Bot edited English subtitles for Lec 19 \| MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
	Amara Bot edited English subtitles for Lec 19 \| MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
	Amara Bot edited English subtitles for Lec 19 \| MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005

English subtitles

Revisions

Revision 1

Amara Bot

Lec 19 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005

Revisions

Our website uses cookies

Operating cookies (Required)