Lec 18 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005

Edit subtitles

0:07 - 0:07

Good morning,
everyone.
0:10 - 0:10

Glad you are all here bright
and early.
0:14 - 0:14

I'm counting the days till the
TA's outnumber the students.
0:21 - 0:21

They'll show up.
We return to a familiar story.
0:26 - 0:26

This is part two,
the Empire Strikes Back.
0:32 - 0:32

So last time,
our adversary,
0:34 - 0:34

the graph, came to us with a
problem.
0:37 - 0:37

We have a source,
and we had a directed graph,
0:40 - 0:40

and we had weights on the
edges, and they were all
0:44 - 0:44

nonnegative.
And there was happiness.
0:46 - 0:46

And we triumphed over the
Empire by designing Dijkstra's
0:50 - 0:50

algorithm, and very efficiently
finding single source shortest
0:55 - 0:55

paths, shortest path weight from
s to every other vertex.
1:00 - 1:00

Today, however,
the Death Star has a new trick
1:03 - 1:03

up its sleeve,
and we have negative weights,
1:06 - 1:06

potentially.
And we're going to have to
1:08 - 1:08

somehow deal with,
in particular,
1:10 - 1:10

negative weight cycles.
And we saw that when we have a
1:13 - 1:13

negative weight cycle,
we can just keep going around,
1:16 - 1:16

and around, and around,
and go back in time farther,
1:20 - 1:20

and farther,
and farther.
1:21 - 1:21

And we can get to be
arbitrarily far back in the
1:24 - 1:24

past.
And so there's no shortest
1:26 - 1:26

path, because whatever path you
take you can get a shorter one.
1:30 - 1:30

So we want to address that
issue today, and we're going to
1:33 - 1:33

come up with a new algorithm
actually simpler than Dijkstra,
1:37 - 1:37

but not as fast,
called the Bellman-Ford
1:40 - 1:40

algorithm.
And, it's going to allow
1:45 - 1:45

negative weights,
and in some sense allow
1:49 - 1:49

negative weight cycles,
although maybe not as much as
1:55 - 1:55

you might hope.
We have to leave room for a
1:59 - 1:59

sequel, of course.
OK, so the Bellman-Ford
2:04 - 2:04

algorithm, invented by two guys,
as you might expect,
2:10 - 2:10

it computes the shortest path
weights.
2:13 - 2:13

So, it makes no assumption
about the weights.
2:18 - 2:18

Weights are arbitrary,
and it's going to compute the
2:23 - 2:23

shortest path weights.
So, remember this notation:
2:28 - 2:28

delta of s, v is the weight of
the shortest path from s to v.
2:34 - 2:34

s was called a source vertex.
And, we want to compute these
2:40 - 2:40

weights for all vertices,
little v.
2:43 - 2:43

The claim is that computing
from s to everywhere is no
2:47 - 2:47

harder than computing s to a
particular location.
2:51 - 2:51

So, we're going to do for all
them.
2:54 - 2:54

It's still going to be the case
here.
2:57 - 2:57

And, it allows negative
weights.
2:59 - 2:59

And this is the good case,
but there's an alternative,
3:03 - 3:03

which is that Bellman-Ford may
just say, oops,
3:07 - 3:07

there's a negative weight
cycle.
3:11 - 3:11

And in that case it will just
say so.
3:14 - 3:14

So, they say a negative weight
cycle exists.
3:19 - 3:19

Therefore, some of these deltas
are minus infinity.
3:23 - 3:23

And that seems weird.
So, Bellman-Ford as we'll
3:28 - 3:28

present it today is intended for
the case, but there are no
3:33 - 3:33

negative weights cycles,
which is more intuitive.
3:39 - 3:39

It sort of allows them,
but it will just report them.
3:43 - 3:43

In that case,
it will not give you delta
3:46 - 3:46

values.
You can change the algorithm to
3:48 - 3:48

give you delta values in that
case, but we are not going to
3:53 - 3:53

see it here.
So, in exercise,
3:55 - 3:55

after you see the algorithm,
exercise is:
3:58 - 3:58

compute these deltas in all
cases.
4:12 - 4:12

So, it's not hard to do.
But we don't have time for it
4:20 - 4:20

here.
So, here's the algorithm.
4:24 - 4:24

It's pretty straightforward.
As I said, it's easier than
4:33 - 4:33

Dijkstra.
It's a relaxation algorithm.
4:37 - 4:37

So the main thing that it does
is relax edges just like
4:41 - 4:41

Dijkstra.
So, we'll be able to use a lot
4:44 - 4:44

of dilemmas from Dijkstra.
And proof of correctness will
4:47 - 4:47

be three times shorter because
the first two thirds we already
4:52 - 4:52

have from Dijkstra.
But I'm jumping ahead a bit.
4:55 - 4:55

So, the first part is
initialization.
4:58 - 4:58

Again, d of v will represent
the estimated distance from s to
5:02 - 5:02

v.
And we're going to be updating
5:05 - 5:05

those estimates as the algorithm
goes along.
5:08 - 5:08

And initially,
d of s is zero,
5:10 - 5:10

which now may not be the right
answer conceivably.
5:14 - 5:14

Everyone else is infinity,
which is certainly an upper
5:18 - 5:18

bound.
OK, these are both upper bounds
5:21 - 5:21

on the true distance.
So that's fine.
5:23 - 5:23

That's initialization just like
before.
5:36 - 5:36

And now we have a main loop
which happens v minus one times.
5:39 - 5:39

We're not actually going to use
the index i.
5:42 - 5:42

It's just a counter.
6:02 - 6:02

And we're just going to look at
every edge and relax it.
6:08 - 6:08

It's a very simple idea.
If you learn about relaxation,
6:13 - 6:13

this is the first thing you
might try.
6:17 - 6:17

The question is when do you
stop.
6:20 - 6:20

It's sort of like I have this
friend to what he was like six
6:26 - 6:26

years old he would claim,
oh, I know how to spell banana.
6:32 - 6:32

I just don't know when to stop.
OK, same thing with relaxation.
6:38 - 6:38

This is our relaxation step
just as before.
6:41 - 6:41

We look at the edge;
we see whether it violates the
6:44 - 6:44

triangle inequality according to
our current estimates we know
6:48 - 6:48

the distance from s to v should
be at most distance from s to
6:52 - 6:52

plus the weight of that edge
from u to v.
6:54 - 6:54

If it isn't,
we set it equal.
6:56 - 6:56

We've proved that this is
always an OK thing to do.
7:00 - 7:00

We never violate,
I mean, these d of v's never
7:04 - 7:04

get too small if we do a bunch
of relaxations.
7:07 - 7:07

So, the idea is you take every
edge.
7:10 - 7:10

You relax it.
I don't care which order.
7:13 - 7:13

Just relax every edge,
one each.
7:15 - 7:15

And that do that V minus one
times.
7:18 - 7:18

The claim is that that should
be enough if you have no
7:22 - 7:22

negative weights cycles.
So, if there's a negative
7:26 - 7:26

weight cycle,
we need to figure it out.
7:30 - 7:30

And, we'll do that in a fairly
straightforward way,
7:35 - 7:35

which is we're going to do
exactly the same thing.
7:40 - 7:40

So this is outside before loop
here.
7:44 - 7:44

We'll have the same four loops
for each edge in our graph.
7:50 - 7:50

We'll try to relax it.
And if you can relax it,
7:55 - 7:55

the claim is that there has to
be a negative weight cycle.
8:02 - 8:02

So this is the main thing that
needs proof.
8:28 - 8:28

OK, and that's the algorithm.
So the claim is that at the
8:32 - 8:32

ends we should have d of v,
let's see, L's so to speak.
8:35 - 8:35

d of v equals delta of s comma
v for every vertex,
8:39 - 8:39

v.
If we don't find a negative
8:40 - 8:40

weight cycle according to this
rule, that we should have all
8:44 - 8:44

the shortest path weights.
That's the claim.
8:47 - 8:47

Now, the first question is,
in here, the running time is
8:51 - 8:51

very easy to analyze.
So let's start with the running
8:54 - 8:54

time.
We can compare it to Dijkstra,
8:57 - 8:57

which is over here.
What is the running time of
9:02 - 9:02

this algorithm?
V times E, exactly.
9:06 - 9:06

OK, I'm going to assume,
because it's pretty reasonable,
9:13 - 9:13

that V and E are both positive.
Then it's V times E.
9:19 - 9:19

So, this is a little bit
slower, or a fair amount slower,
9:26 - 9:26

than Dijkstra's algorithm.
There it is:
9:31 - 9:31

E plus V log V is essentially,
ignoring the logs is pretty
9:36 - 9:36

much linear time.
Here we have something that's
9:39 - 9:39

at least quadratic in V,
assuming your graph is
9:43 - 9:43

connected.
So, it's slower,
9:45 - 9:45

but it's going to handle these
negative weights.
9:49 - 9:49

Dijkstra can't handle negative
weights at all.
9:53 - 9:53

So, let's do an example,
make it clear why you might
9:57 - 9:57

hope this algorithm works.
And then we'll prove that it
10:03 - 10:03

works, of course.
But the proof will be pretty
10:09 - 10:09

easy.
So, I'm going to draw a graph
10:13 - 10:13

that has negative weights,
but no negative weight cycles
10:19 - 10:19

so that I get an interesting
answer.
10:55 - 10:55

Good.
The other thing I need in order
10:57 - 10:57

to make the output of this
algorithm well defined,
11:01 - 11:01

it depends in which order you
visit the edges.
11:04 - 11:04

So I'm going to assign an
arbitrary order to these edges.
11:07 - 11:07

I could just ask you for an
order, but to be consistent with
11:11 - 11:11

the notes, I'll put an ordering
on it.
11:14 - 11:14

Let's say I put number four,
say that's the fourth edge I'll
11:17 - 11:17

visit.
It doesn't matter.
11:19 - 11:19

But it will affect what happens
during the algorithm for a
11:23 - 11:23

particular graph.
11:43 - 11:43

Do they get them all?
One, two, three,
11:46 - 11:46

four, five, six,
seven, eight,
11:49 - 11:49

OK.
And my source is going to be A.
11:51 - 11:51

And, that's it.
So, I want to run this
11:55 - 11:55

algorithm.
I'm just going to initialize
11:58 - 11:58

everything.
So, I set the estimates for s
12:01 - 12:01

to be zero, and everyone else to
be infinity.
12:06 - 12:06

And to give me some notion of
time, over here I'm going to
12:11 - 12:11

draw or write down what all of
these d values are as the
12:16 - 12:16

algorithm proceeds because I'm
going to start crossing them out
12:21 - 12:21

and rewriting them that the
figure will get a little bit
12:26 - 12:26

messier.
But we can keep track of it
12:29 - 12:29

over here.
It's initially zero and
12:32 - 12:32

infinities.
Yeah?
12:34 - 12:34

It doesn't matter.
So, for the algorithm you can
12:37 - 12:37

go to the edges in a different
order every time if you want.
12:40 - 12:40

We'll prove that,
but here I'm going to go
12:43 - 12:43

through the same order every
time.
12:45 - 12:45

Good question.
It turns out it doesn't matter
12:47 - 12:47

here.
OK, so here's the starting
12:49 - 12:49

point.
Now I'm going to relax every
12:51 - 12:51

edge.
So, there's going to be a lot
12:53 - 12:53

of edges here that don't do
anything.
12:55 - 12:55

I try to relax n minus one.
I'd say, well,
12:58 - 12:58

I know how to get from s to B
with weight infinity.
13:02 - 13:02

Infinity plus two I can get to
from s to E.
13:05 - 13:05

Well, infinity plus two is not
much better than infinity.
13:08 - 13:08

OK, so I don't do anything,
don't update this to infinity.
13:12 - 13:12

I mean, infinity plus two
sounds even worse.
13:15 - 13:15

But infinity plus two is
infinity.
13:17 - 13:17

OK, that's the edge number one.
So, no relaxation edge number
13:21 - 13:21

two, same deal as number three,
same deal, edge number four we
13:24 - 13:24

start to get something
interesting because I have a
13:28 - 13:28

finite value here that says I
can get from A to B using a
13:31 - 13:31

total weight of minus one.
So that seems good.
13:36 - 13:36

I'll write down minus one here,
and update B to minus one.
13:41 - 13:41

The rest stay the same.
So, I'm just going to keep
13:46 - 13:46

doing this over and over.
That was edge number four.
13:50 - 13:50

Number five,
we also get a relaxation.
13:54 - 13:54

Four is better than infinity.
So, c gets a number of four.
14:00 - 14:00

Then we get to edge number six.
That's infinity plus five is
14:04 - 14:04

worse than four.
OK, so no relaxation there.
14:08 - 14:08

Edge number seven is
interesting because I have a
14:11 - 14:11

finite value here minus one plus
the weight of this edge,
14:15 - 14:15

which is three.
That's a total of two,
14:18 - 14:18

which is actually better than
four.
14:21 - 14:21

So, this route,
A, B, c is actually better than
14:24 - 14:24

the route I just found a second
ago.
14:27 - 14:27

So, this is now a two.
This is all happening in one
14:31 - 14:31

iteration of the main loop.
We actually found two good
14:36 - 14:36

paths to c.
We found one better than the
14:39 - 14:39

other.
OK, and that was edge number
14:41 - 14:41

seven, and edge number eight is
over here.
14:44 - 14:44

It doesn't matter.
OK, so that was round one of
14:48 - 14:48

this outer loop,
so, the first value of i.
14:51 - 14:51

i equals one.
OK, now we continue.
14:53 - 14:53

Just keep going.
So, we start with edge number
14:56 - 14:56

one.
Now, minus one plus two is one.
15:00 - 15:00

That's better than infinity.
It'll start speeding up.
15:05 - 15:05

It's repetitive.
It's actually not too much
15:09 - 15:09

longer until we're done.
Number two, this is an infinity
15:14 - 15:14

so we don't do anything.
Number three:
15:18 - 15:18

minus one plus two is one;
better than infinity.
15:22 - 15:22

This is vertex d,
and it's number three.
15:26 - 15:26

Number four we've already done.
Nothing changed.
15:31 - 15:31

Number five:
this is where we see the path
15:35 - 15:35

four again, but that's worse
than two.
15:38 - 15:38

So, we don't update anything.
Number six: one plus five is
15:43 - 15:43

six, which is bigger than two,
so no good.
15:47 - 15:47

Go around this way.
Number seven:
15:50 - 15:50

same deal.
Number eight is interesting.
15:53 - 15:53

So, we have a weight of one
here, a weight of minus three
15:58 - 15:58

here.
So, the total is minus two,
16:03 - 16:03

which is better than one.
So, that was d.
16:07 - 16:07

And, I believe that's it.
So that was definitely the end
16:13 - 16:13

of that round.
So, it's I plus two because we
16:18 - 16:18

just looked at the eighth edge.
And, I'll cheat and check.
16:24 - 16:24

Indeed, that is the last thing
that happens.
16:30 - 16:30

We can check the couple of
outgoing edges from d because
16:34 - 16:34

that's the only one whose value
just changed.
16:37 - 16:37

And, there are no more
relaxations possible.
16:40 - 16:40

So, that was in two rounds.
The claim is we got all the
16:44 - 16:44

shortest path weights.
The algorithm would actually
16:47 - 16:47

loop four times to guarantee
correctness because we have five
16:51 - 16:51

vertices here and one less than
that.
16:54 - 16:54

So, in fact,
in the execution here there are
16:57 - 16:57

two more blank rounds at the
bottom.
16:59 - 16:59

Nothing happens.
But, what the hell?
17:03 - 17:03

OK, so that is Bellman-Ford.
I mean, it's certainly not
17:06 - 17:06

doing anything wrong.
The question is,
17:08 - 17:08

why is it guaranteed to
converge in V minus one steps
17:12 - 17:12

unless there is a negative
weight cycle?
17:14 - 17:14

Question?
17:24 - 17:24

Right, so that's an
optimization.
17:26 - 17:26

If you discover a whole round,
and nothing happens,
17:29 - 17:29

so you can keep track of that
in the algorithm thing,
17:32 - 17:32

you can stop.
In the worst case,
17:33 - 17:33

it won't make a difference.
But in practice,
17:36 - 17:36

you probably want to do that.
Yeah?
17:38 - 17:38

Good question.
All right, so some simple
17:40 - 17:40

observations,
I mean, we're only doing
17:42 - 17:42

relaxation.
So, we can use a lot of our
17:44 - 17:44

analysis from before.
In particular,
17:46 - 17:46

the d values are only
decreasing monotonically.
17:49 - 17:49

As we cross out values here,
we are always making it
17:52 - 17:52

smaller, which is good.
Another nifty thing about this
17:55 - 17:55

algorithm is that you can run it
even in a distributed system.
18:00 - 18:00

If this is some actual network,
some computer network,
18:03 - 18:03

and these are machines,
and they're communicating by
18:05 - 18:05

these links, I mean,
it's a purely local thing.
18:07 - 18:07

Relaxation is a local thing.
You don't need any global
18:10 - 18:10

strategy, and you're asking
about, can we do a different
18:13 - 18:13

order in each step?
Well, yeah, you could just keep
18:15 - 18:15

relaxing edges,
and keep relaxing edges,
18:17 - 18:17

and just keep going for the
entire lifetime of the network.
18:20 - 18:20

And eventually,
you will find shortest paths.
18:22 - 18:22

So, this algorithm is
guaranteed to finish in V rounds
18:24 - 18:24

in a distributed system.
It might be more asynchronous.
18:27 - 18:27

And, it's a little harder to
analyze.
18:30 - 18:30

But it will still work
eventually.
18:34 - 18:34

It's guaranteed to converge.
And so, Bellman-Ford is used in
18:42 - 18:42

the Internet for finding
shortest paths.
18:46 - 18:46

OK, so let's finally prove that
it works.
18:51 - 18:51

This should only take a couple
of boards.
18:56 - 18:56

So let's suppose we have a
graph and some edge weights that
19:04 - 19:04

have no negative weight cycles.
Then the claim is that we
19:13 - 19:13

terminate with the correct
answer.
19:19 - 19:19

So, Bellman-Ford terminates
with all of these d of v values
19:30 - 19:30

set to the delta values for
every vertex.
19:38 - 19:38

OK, the proof is going to be
pretty immediate using the
19:42 - 19:42

lemmas that we had from before
if you remember them.
19:46 - 19:46

So, we're just going to look at
every vertex separately.
19:50 - 19:50

So, I'll call the vertex v.
The claim is that this holds by
19:54 - 19:54

the end of the algorithm.
So, remember what we need to
19:59 - 19:59

prove is that at some point,
d of v equals delta of s comma
20:03 - 20:03

v because we know it decreases
monotonically,
20:06 - 20:06

and we know that it never gets
any smaller than the correct
20:11 - 20:11

value because relaxations are
always safe.
20:15 - 20:15

So, we just need to show at
some point this holds,
20:25 - 20:25

and that it will hold at the
end.
20:32 - 20:32

So, by monotonicity of the d
values, and by correctness part
20:41 - 20:41

one, which was that the d of v's
are always greater than or equal
20:52 - 20:52

to the deltas,
we only need to show that at
20:59 - 20:59

some point we have equality.
21:18 - 21:18

So that's our goal.
So what we're going to do is
21:22 - 21:22

just look at v,
and the shortest path to v,
21:25 - 21:25

and see what happens to the
algorithm relative to that path.
21:30 - 21:30

So, I'm going to name the path.
Let's call it p.
21:35 - 21:35

It starts at vertex v_0 and
goes to v_1, v_2,
21:40 - 21:40

whatever, and ends at v_k.
And, this is not just any
21:46 - 21:46

shortest path,
but it's one that starts at s.
21:51 - 21:51

So, v_0's s,
and it ends at v.
21:54 - 21:54

So, I'm going to give a couple
of names to s and v so I can
22:01 - 22:01

talk about the path more
uniformly.
22:05 - 22:05

So, this is a shortest path
from s to v.
22:11 - 22:11

Now, I also want it to be not
just any shortest path from s to
22:16 - 22:16

v, but among all shortest paths
from s to v I want it to be one
22:20 - 22:20

with the fewest possible edges.
22:32 - 22:32

OK, so shortest here means in
terms of the total weight of the
22:36 - 22:36

path.
Subject to being shortest in
22:39 - 22:39

weight, I wanted to also be
shortest in the number of edges.
22:43 - 22:43

And, the reason I want that is
to be able to conclude that p is
22:47 - 22:47

a simple path,
meaning that it doesn't repeat
22:50 - 22:50

any vertices.
Now, can anyone tell me why I
22:53 - 22:53

need to assume that the number
of edges is the smallest
22:57 - 22:57

possible in order to guarantee
that p is simple?
23:01 - 23:01

The claim is that not all
shortest paths are necessarily
23:04 - 23:04

simple.
Yeah?
23:05 - 23:05

Right, I can have a zero weight
cycle, exactly.
23:08 - 23:08

So, we are hoping,
I mean, in fact in the theorem
23:11 - 23:11

here, we're assuming that there
are no negative weight cycles.
23:15 - 23:15

But there might be zero weight
cycles still.
23:17 - 23:17

As a zero weight cycle,
you can put that in the middle
23:20 - 23:20

of any shortest path to make it
arbitrarily long,
23:23 - 23:23

repeat vertices over and over.
That's going to be annoying.
23:27 - 23:27

What I want is that p is
simple.
23:30 - 23:30

And, I can guarantee that
essentially by shortcutting.
23:34 - 23:34

If ever I take a zero weight
cycle, I throw it away.
23:37 - 23:37

And this is one mathematical
way of doing that.
23:40 - 23:40

OK, now what else do we know
about this shortest path?
23:43 - 23:43

Well, we know that subpaths are
shortest paths are shortest
23:47 - 23:47

paths.
That's optimal substructure.
23:49 - 23:49

So, we know what the shortest
path from s to v_i is sort of
23:53 - 23:53

inductively.
It's the shortest path,
23:56 - 23:56

I mean, it's the weight of that
path, which is,
23:59 - 23:59

in particular,
the shortest path from s to v
24:02 - 24:02

minus one plus the weight of the
last edge, v minus one to v_i.
24:07 - 24:07

So, this is by optimal
substructure as we proved last
24:17 - 24:17

time.
OK, and I think that's pretty
24:24 - 24:24

much the warm-up.
So, I want to sort of do this
24:31 - 24:31

inductively in I,
start out with v zero,
24:34 - 24:34

and go up to v_k.
So, the first question is,
24:38 - 24:38

what is d of v_0,
which is s?
24:40 - 24:40

What is d of the source?
Well, certainly at the
24:44 - 24:44

beginning of the algorithm,
it's zero.
24:47 - 24:47

So, let's say equals zero
initially because that's what we
24:52 - 24:52

set it to.
And it only goes down from
24:55 - 24:55

there.
So, it certainly,
24:57 - 24:57

at most, zero.
The real question is,
25:02 - 25:02

what is delta of s comma v_0.
What is the shortest path
25:07 - 25:07

weight from s to s?
It has to be zero,
25:10 - 25:10

otherwise you have a negative
weight cycle,
25:14 - 25:14

exactly.
My favorite answer,
25:16 - 25:16

zero.
So, if we had another path from
25:19 - 25:19

s to s, I mean,
that is a cycle.
25:22 - 25:22

So, it's got to be zero.
So, these are actually equal at
25:27 - 25:27

the beginning of the algorithm,
which is great.
25:32 - 25:32

That means they will be for all
time because we just argued up
25:37 - 25:37

here, only goes down,
never can get too small.
25:41 - 25:41

So, we have d of v_0 set to the
right thing.
25:45 - 25:45

Great: good for the base case
of the induction.
25:49 - 25:49

Of course, what we really care
about is v_k,
25:53 - 25:53

which is v.
So, let's talk about the v_i
25:57 - 25:57

inductively, and then we will
get v_k as a result.
26:11 - 26:11

So, yeah, let's do it by
induction.
26:14 - 26:14

That's more fun.
26:27 - 26:27

Let's say that d of v_i is
equal to delta of s v_i after I
26:33 - 26:33

rounds of the algorithm.
So, this is actually referring
26:38 - 26:38

to the I that is in the
algorithm here.
26:42 - 26:42

These are rounds.
So, one round is an entire
26:47 - 26:47

execution of all the edges,
relaxation of all the edges.
26:52 - 26:52

So, this is certainly true for
I equals zero.
26:57 - 26:57

We just proved that.
After zero rounds,
27:01 - 27:01

at the beginning of the
algorithm, d of v_0 equals delta
27:06 - 27:06

of s, v_0.
OK, so now, that's not really
27:11 - 27:11

what I wanted,
but OK, fine.
27:14 - 27:14

Now we'll prove it for d of v_i
plus one.
27:17 - 27:17

Generally, I recommend you
assume something.
27:20 - 27:20

In fact, why don't I follow my
own advice and change it?
27:25 - 27:25

It's usually nicer to think of
induction as recursion.
27:29 - 27:29

So, you assume that this is
true, let's say,
27:33 - 27:33

for j less than the i that you
care about, and then you prove
27:37 - 27:37

it for d of v_i.
It's usually a lot easier to
27:42 - 27:42

think about it that way.
In particular,
27:45 - 27:45

you can use strong induction
for all less than i.
27:48 - 27:48

Here, we're only going to need
it for one less.
27:52 - 27:52

We have some relation between I
and I minus one here in terms of
27:56 - 27:56

the deltas.
And so, we want to argue
27:59 - 27:59

something about the d values.
OK, well, let's think about
28:05 - 28:05

what's going on here.
We know that,
28:09 - 28:09

let's say, after I minus one
rounds, we have this inductive
28:16 - 28:16

hypothesis, d of v_i minus one
equals delta of s v_i minus one.
28:23 - 28:23

And, we want to conclude that
after i rounds,
28:28 - 28:28

so we have one more round to do
this.
28:32 - 28:32

We want to conclude that d of
v_i has the right answer,
28:38 - 28:38

delta of s comma v_i.
Does that look familiar at all?
28:44 - 28:44

So we want to relax every edge
in this round.
28:48 - 28:48

In particular,
at some point,
28:50 - 28:50

we have to relax the edge from
v_i minus one to v_i.
28:54 - 28:54

We know that this path consists
of edges.
28:57 - 28:57

That's the definition of a
path.
29:00 - 29:00

So, during the i'th round,
we relax every edge.
29:10 - 29:10

So, we better relax v_i minus
one v_i.
29:19 - 29:19

And, what happens then?
It's a test of memory.
29:43 - 29:43

Quick, the Death Star is
approaching.
29:47 - 29:47

So, if we have the correct
value for v_i minus one,
29:52 - 29:52

that we relax an outgoing edge
from there, and that edge is an
29:58 - 29:58

edge of the shortest path from s
to v_i.
30:02 - 30:02

What do we know?
d of v_i becomes the correct
30:07 - 30:07

value, delta of s comma v_i.
This was called correctness
30:13 - 30:13

lemma last time.
One of the things we proved
30:18 - 30:18

about Dijkstra's algorithm,
but it was really just a fact
30:24 - 30:24

about relaxation.
And it was a pretty simple
30:29 - 30:29

proof.
And it comes from this fact.
30:33 - 30:33

We know the shortest path
weight is this.
30:36 - 30:36

So, certainly d of v_i was at
least this big,
30:38 - 30:38

and let's suppose it's greater,
or otherwise we were done.
30:42 - 30:42

We know d of v_i minus one is
set to this.
30:45 - 30:45

And so, this is exactly the
condition that's being checked
30:49 - 30:49

in the relaxation step.
And, the d of v_i value will be
30:52 - 30:52

greater than this,
let's suppose.
30:54 - 30:54

And then, we'll set it equal to
this.
30:57 - 30:57

And that's exactly d of s v_i.
So, when we relax that edge,
31:02 - 31:02

we've got to set it to the
right value.
31:04 - 31:04

So, this is the end of the
proof, right?
31:07 - 31:07

It's very simple.
The point is,
31:09 - 31:09

you look at your shortest path.
Here it is.
31:12 - 31:12

And if we assume there's no
negative weight cycles,
31:15 - 31:15

this has the correct value
initially.
31:17 - 31:17

d of s is going to be zero.
After the first round,
31:20 - 31:20

you've got to relax this edge.
And then you get the right
31:24 - 31:24

value for that vertex.
After the second round,
31:27 - 31:27

you've got to relax this edge,
which gets you the right d
31:31 - 31:31

value for this vertex and so on.
And so, no matter which
31:36 - 31:36

shortest path you take,
you can apply this analysis.
31:41 - 31:41

And you know that by,
if the length of this path,
31:45 - 31:45

here we assumed it was k edges,
then after k rounds you've got
31:50 - 31:50

to be done.
OK, so this was not actually
31:54 - 31:54

the end of the proof.
Sorry.
31:57 - 31:57

So this means after k rounds,
we have the right answer for
32:03 - 32:03

v_k, which is v.
So, the only question is how
32:08 - 32:08

big could k be?
And, it better be the right
32:13 - 32:13

answer, at most,
v minus one is the claim by the
32:18 - 32:18

algorithm that you only need to
do v minus one steps.
32:24 - 32:24

And indeed, the number of edges
in a simple path in a graph is,
32:31 - 32:31

at most, the number of vertices
minus one.
32:37 - 32:37

k is, at most,
v minus one because p is
32:40 - 32:40

simple.
So, that's why we had to assume
32:44 - 32:44

that it wasn't just any shortest
path.
32:47 - 32:47

It had to be a simple one so it
didn't repeat any vertices.
32:52 - 32:52

So there are,
at most, V vertices in the
32:56 - 32:56

path, so at most,
V minus one edges in the path.
33:01 - 33:01

OK, and that's all there is to
Bellman-Ford.
33:06 - 33:06

So: pretty simple in
correctness.
33:09 - 33:09

Of course, we're using a lot of
the lemmas that we proved last
33:15 - 33:15

time, which makes it easier.
OK, a consequence of this
33:21 - 33:21

theorem, or of this proof is
that if Bellman-Ford fails to
33:27 - 33:27

converge, and that's what the
algorithm is checking is whether
33:34 - 33:34

this relaxation still requires
work after these d minus one
33:40 - 33:40

steps.
Right, the end of this
33:44 - 33:44

algorithm is run another round,
a V'th round,
33:48 - 33:48

see whether anything changes.
So, we'll say that the
33:53 - 33:53

algorithm fails to converge
after V minus one steps or
33:59 - 33:59

rounds.
Then, there has to be a
34:02 - 34:02

negative weight cycle.
OK, this is just a
34:04 - 34:04

contrapositive of what we
proved.
34:07 - 34:07

We proved that if you assume
there's no negative weight
34:10 - 34:10

cycle, then we know that d of s
is zero, and then all this
34:14 - 34:14

argument says is you've got to
converge after v minus one
34:18 - 34:18

rounds.
There can't be anything left to
34:21 - 34:21

do once you've reached the
shortest path weights because
34:25 - 34:25

you're going monotonically;
you can never hit the bottom.
34:30 - 34:30

You can never go to the floor.
So, if you fail to converge
34:34 - 34:34

somehow after V minus one
rounds, you've got to have
34:37 - 34:37

violated the assumption.
The only assumption we made was
34:41 - 34:41

there's no negative weight
cycle.
34:43 - 34:43

So, this tells us that
Bellman-Ford is actually
34:46 - 34:46

correct.
When it says that there is a
34:48 - 34:48

negative weight cycle,
it indeed means it.
34:51 - 34:51

It's true.
OK, and you can modify
34:53 - 34:53

Bellman-Ford in that case to
sort of run a little longer,
34:57 - 34:57

and find where all the minus
infinities are.
35:01 - 35:01

And that is,
in some sense,
35:03 - 35:03

one of the things you have to
do in your problem set,
35:06 - 35:06

I believe.
So, I won't cover it here.
35:08 - 35:08

But, it's a good exercise in
any case to figure out how you
35:12 - 35:12

would find where the minus
infinities are.
35:15 - 35:15

What are all the vertices
reachable from negative weight
35:18 - 35:18

cycle?
Those are the ones that have
35:20 - 35:20

minus infinities.
OK, so you might say,
35:23 - 35:23

well, that was awfully fast.
Actually, it's not over yet.
35:26 - 35:26

The episode is not yet ended.
We're going to use Bellman-Ford
35:30 - 35:30

to solve the even bigger and
greater shortest path problems.
35:35 - 35:35

And in the remainder of today's
lecture, we will see it applied
35:39 - 35:39

to a more general problem,
in some sense,
35:42 - 35:42

called linear programming.
And the next lecture,
35:46 - 35:46

we'll really use it to do some
amazing stuff with all pairs
35:50 - 35:50

shortest paths.
Let's go over here.
35:52 - 35:52

So, our goal,
although it won't be obvious
35:55 - 35:55

today, is to be able to compute
the shortest paths between every
36:00 - 36:00

pair of vertices,
which we could certainly do at
36:03 - 36:03

this point just by running
Bellman-Ford v times.
36:08 - 36:08

OK, but we want to do better
than that, of course.
36:15 - 36:15

And, that will be the climax of
the trilogy.
36:22 - 36:22

OK, today we just discovered
who Luke's father is.
36:30 - 36:30

So, it turns out the father of
shortest paths is linear
36:37 - 36:37

programming.
Actually, simultaneously the
36:43 - 36:43

father and the mother because
programs do not have gender.
36:51 - 36:51

OK, my father likes to say,
we both took improv comedy
36:58 - 36:58

lessons so we have degrees in
improvisation.
37:05 - 37:05

And he said,
you know, we went to improv
37:07 - 37:07

classes in order to learn how to
make our humor better.
37:11 - 37:11

And, the problem is,
it didn't actually make our
37:14 - 37:14

humor better.
It just made us less afraid to
37:16 - 37:16

use it.
[LAUGHTER] So,
37:17 - 37:17

you are subjected to all this
improv humor.
37:20 - 37:20

I didn't see the connection of
Luke's father,
37:23 - 37:23

but there you go.
OK, so, linear programming is a
37:26 - 37:26

very general problem,
a very big tool.
37:29 - 37:29

Has anyone seen linear
programming before?
37:33 - 37:33

OK, one person.
And, I'm sure you will,
37:36 - 37:36

at some time in your life,
do anything vaguely computing
37:41 - 37:41

optimization related,
linear programming comes up at
37:45 - 37:45

some point.
It's a very useful tool.
37:49 - 37:49

You're given a matrix and two
vectors: not too exciting yet.
37:54 - 37:54

What you want to do is find a
vector.
37:57 - 37:57

This is a very dry description.
We'll see what makes it so
38:03 - 38:03

interesting in a moment.
38:17 - 38:17

So, you want to maximize some
objective, and you have some
38:21 - 38:21

constraints.
And they're all linear.
38:24 - 38:24

So, the objective is a linear
function in the variables x,
38:29 - 38:29

and your constraints are a
bunch of linear constraints,
38:33 - 38:33

inequality constraints,
that's one makes an
38:36 - 38:36

interesting.
It's not just solving a linear
38:39 - 38:39

system as you've seen in linear
algebra, or whatever.
38:43 - 38:43

Or, of course,
it could be that there is no
38:47 - 38:47

such x.
OK: vaguely familiar you might
38:49 - 38:49

think to the theorem about
Bellman-Ford.
38:53 - 38:53

And, we'll show that there's
some kind of connection here
38:57 - 38:57

that either you want to find
something, or show that it
39:01 - 39:01

doesn't exist.
Well, that's still a pretty
39:06 - 39:06

vague connection,
but I also want to maximize
39:09 - 39:09

something, or are sort of
minimize the shortest paths,
39:13 - 39:13

OK, somewhat similar.
We have these constraints.
39:17 - 39:17

So, yeah.
This may be intuitive to you,
39:20 - 39:20

I don't know.
I prefer a more geometric
39:23 - 39:23

picture, and I will try to draw
such a geometric picture,
39:27 - 39:27

and I've never tried to do this
on a blackboard,
39:31 - 39:31

so it should be interesting.
I think I'm going to fail
39:36 - 39:36

miserably.
It sort of looks like a
39:40 - 39:40

dodecahedron,
right?
39:42 - 39:42

Sort of, kind of,
not really.
39:44 - 39:44

A bit rough on the bottom,
OK.
39:47 - 39:47

So, if you have a bunch of
linear constraints,
39:52 - 39:52

this is supposed to be in 3-D.
Now I labeled it.
39:57 - 39:57

It's now in 3-D.
Good.
40:00 - 40:00

So, you have these linear
constraints.
40:03 - 40:03

That turns out to define
hyperplanes in n dimensions.
40:07 - 40:07

OK, so you have this base here
that's three-dimensional space.
40:11 - 40:11

So, n equals three.
And, these hyperplanes,
40:14 - 40:14

if you're looking at one side
of the hyperplane,
40:18 - 40:18

that's the less than or equal
to, if you take the
40:21 - 40:21

intersection,
you get some convex polytope or
40:24 - 40:24

polyhedron.
In 3-D, you might get a
40:27 - 40:27

dodecahedron or whatever.
And, your goal,
40:30 - 40:30

you have some objective vector
c, let's say,
40:33 - 40:33

up.
Suppose that's the c vector.
40:37 - 40:37

Your goal is to find the
highest point in this polytope.
40:42 - 40:42

So here, it's maybe this one.
OK, this is the target.
40:47 - 40:47

This is the optimal,
x.
40:49 - 40:49

That is the geometric view.
If you prefer the algebraic
40:54 - 40:54

view, you want to maximize the c
transpose times x.
41:00 - 41:00

So, this is m.
This is n.
41:02 - 41:02

Check out the dimensions work
out.
41:05 - 41:05

So that's saying you want to
maximize the dot product.
41:09 - 41:09

You want to maximize the extent
to which x is in the direction
41:14 - 41:14

c.
And, you want to maximize that
41:16 - 41:16

subject to some constraints,
which looks something like
41:20 - 41:20

this, maybe.
So, this is A,
41:23 - 41:23

and it's m by n.
You want to multiply it by,
41:26 - 41:26

it should be something of
height n.
41:30 - 41:30

That's x.
Let me put x down here,
41:33 - 41:33

n by one.
And, it should be less than or
41:36 - 41:36

equal to something of this
height, which is B,
41:40 - 41:40

the right hand side.
OK, that's the algebraic view,
41:44 - 41:44

which is to check out all the
dimensions are working out.
41:49 - 41:49

But, you can read these off in
each row here,
41:53 - 41:53

when multiplied by this column,
gives you one value here.
41:57 - 41:57

And as just a linear
constraints on all the x sides.
42:03 - 42:03

So, you want to maximize this
linear function of x_1 up to x_n
42:09 - 42:09

subject to these constraints,
OK?
42:12 - 42:12

Pretty simple,
but pretty powerful in general.
42:16 - 42:16

So, it turns out that with,
you can formulate a huge number
42:22 - 42:22

of problems such as shortest
paths as a linear program.
42:27 - 42:27

So, it's a general tool.
And in this class,
42:32 - 42:32

we will not cover any
algorithms for solving linear
42:37 - 42:37

programming.
It's a bit tricky.
42:40 - 42:40

I'll just mention that they are
out there.
42:45 - 42:45

So, there's many efficient
algorithms, and lots of code
42:51 - 42:51

that does this.
It's a very practical setup.
42:55 - 42:55

So, lots of algorithms to solve
LP's, linear programs.
43:02 - 43:02

Linear programming is usually
called LP.
43:06 - 43:06

And, I'll mention a few of
them.
43:09 - 43:09

There's the simplex algorithm.
This is one of the first.
43:14 - 43:14

I think it is the first,
the ellipsoid algorithm.
43:19 - 43:19

There's interior point methods,
and there's random sampling.
43:25 - 43:25

I'll just say a little bit
about each of these because
43:30 - 43:30

we're not going to talk about
any of them in depth.
43:36 - 43:36

The simplex algorithm,
this is, I mean,
43:38 - 43:38

one of the first algorithms in
the world in some sense,
43:42 - 43:42

certainly one of the most
popular.
43:44 - 43:44

It's still used today.
Almost all linear programming
43:47 - 43:47

code uses the simplex algorithm.
It happens to run an
43:50 - 43:50

exponential time in the
worst-case, so it's actually
43:54 - 43:54

pretty bad theoretically.
But in practice,
43:56 - 43:56

it works really well.
And there is some recent work
43:59 - 43:59

that tries to understand this.
It's still exponential in the
44:03 - 44:03

worst case.
But, it's practical.
44:06 - 44:06

There's actually an open
problem whether there exists a
44:10 - 44:10

variation of simplex that runs
in polynomial time.
44:14 - 44:14

But, I won't go into that.
That's a major open problem in
44:18 - 44:18

this area of linear programming.
The ellipsoid algorithm was the
44:22 - 44:22

first algorithm to solve linear
programming in polynomial time.
44:26 - 44:26

So, for a long time,
people didn't know.
44:30 - 44:30

Around this time,
people started realizing
44:32 - 44:32

polynomial time is a good thing.
That happened around the late
44:36 - 44:36

60s.
Polynomial time is good.
44:38 - 44:38

And, the ellipsoid algorithm is
the first one to do it.
44:41 - 44:41

It's a very general algorithm,
and very powerful,
44:44 - 44:44

theoretically:
completely impractical.
44:46 - 44:46

But, it's cool.
It lets you do things like you
44:49 - 44:49

can solve a linear program that
has exponentially many
44:52 - 44:52

constraints in polynomial time.
You've got all sorts of crazy
44:56 - 44:56

things.
So, I'll just say it's
44:58 - 44:58

polynomial time.
I can't say something nice
45:02 - 45:02

about it; don't say it at all.
It's impractical.
45:05 - 45:05

Interior point methods are sort
of the mixture.
45:08 - 45:08

They run in polynomial time.
You can guarantee that.
45:11 - 45:11

And, they are also pretty
practical, and there's sort of
45:15 - 45:15

this competition these days
about whether simplex or
45:18 - 45:18

interior point is better.
And, I don't know what it is
45:22 - 45:22

today but a few years ago they
were neck and neck.
45:25 - 45:25

And, random sampling is a brand
new approach.
45:28 - 45:28

This is just from a couple
years ago by two MIT professors,
45:32 - 45:32

Dimitris Bertsimas and Santosh
Vempala, I guess the other is in
45:36 - 45:36

applied math.
So, just to show you,
45:39 - 45:39

there's active work in this
area.
45:41 - 45:41

People are still finding new
ways to solve linear programs.
45:45 - 45:45

This is completely randomized,
and very simple,
45:47 - 45:47

and very general.
It hasn't been implemented,
45:50 - 45:50

so we don't know how practical
it is yet.
45:52 - 45:52

But, it has potential.
OK: pretty neat.
45:55 - 45:55

OK, we're going to look at a
somewhat simpler version of
45:58 - 45:58

linear programming.
The first restriction we are
46:02 - 46:02

going to make is actually not
much of a restriction.
46:06 - 46:06

But, nonetheless we will
consider it, it's a little bit
46:10 - 46:10

easier to think about.
So here, we had some polytope
46:14 - 46:14

we wanted to maximize some
objective.
46:16 - 46:16

In a feasibility problem,
I just want to know,
46:20 - 46:20

is the polytope empty?
Can you find any point in that
46:23 - 46:23

polytope?
Can you find any set of values,
46:26 - 46:26

x, that satisfy these
constraints?
46:30 - 46:30

OK, so there's no objective.
c, just find x such that AX is
46:35 - 46:35

less than or equal to B.
OK, it turns out you can prove
46:39 - 46:39

a very general theorem that if
you can solve linear
46:43 - 46:43

feasibility, you can also solve
linear programming.
46:48 - 46:48

We won't prove that here,
but this is actually no easier
46:52 - 46:52

than the original problem even
though it feels easier,
46:56 - 46:56

and it's easier to think about.
I was just saying actually no
47:03 - 47:03

easier than LP.
OK, the next restriction we're
47:08 - 47:08

going to make is a real
restriction.
47:12 - 47:12

And it simplifies the problem
quite a bit.
47:30 - 47:30

And that's to look at different
constraints.
47:35 - 47:35

And, if all this seemed a bit
abstract so far,
47:41 - 47:41

we will now ground ourselves
little bit.
47:46 - 47:46

A system of different
constraints is a linear
47:51 - 47:51

feasibility problem.
So, it's an LP where there's no
47:58 - 47:58

objective.
And, it's with a restriction,
48:06 - 48:06

so, where each row of the
matrix, so, the matrix,
48:17 - 48:17

A, has one one,
and it has one minus one,
48:26 - 48:26

and everything else in the row
is zero.
48:36 - 48:36

OK, in other words,
each constraint has its very
48:41 - 48:41

simple form.
It involves two variables and
48:45 - 48:45

some number.
So, we have something like x_j
48:50 - 48:50

minus x_i is less than or equal
to w_ij.
48:54 - 48:54

So, this is just a number.
These are two variables.
49:00 - 49:00

There's a minus sign,
no values up here,
49:03 - 49:03

no coefficients,
no other of the X_k's appear,
49:06 - 49:06

just two of them.
And, you have a bunch of
49:09 - 49:09

constraints of this form,
one per row of the matrix.
49:13 - 49:13

Geometrically,
I haven't thought about what
49:16 - 49:16

this means.
I think it means the
49:18 - 49:18

hyperplanes are pretty simple.
Sorry I can't do better than
49:23 - 49:23

that.
It's a little hard to see this
49:25 - 49:25

in high dimensions.
But, it will start to
49:31 - 49:31

correspond to something we've
seen, namely the board that its
49:39 - 49:39

next to, very shortly.
OK, so let's do a very quick
49:45 - 49:45

example mainly to have something
to point at.
49:51 - 49:51

Here's a very simple system of
difference constraints --
50:11 - 50:11

-- OK, and a solution.
Why not?
50:14 - 50:14

It's not totally trivial to
solve this, but here's a
50:18 - 50:18

solution.
And the only thing to check is
50:21 - 50:21

that each of these constraints
is satisfied.
50:25 - 50:25

x_1 minus x_2 is three,
which is less than or equal to
50:30 - 50:30

three, and so on.
There could be negative values.
50:36 - 50:36

There could be positive values.
It doesn't matter.
50:42 - 50:42

I'd like to transform this
system of difference constraints
50:50 - 50:50

into a graph because we know a
lot about graphs.
50:56 - 50:56

So, we're going to call this
the constraint graph.
51:03 - 51:03

And, it's going to represent
these constraints.
51:08 - 51:08

How'd I do it?
Well, I take every constraint,
51:14 - 51:14

which in general looks like
this, and I convert it into an
51:20 - 51:20

edge.
OK, so if I write it as x_j
51:24 - 51:24

minus x_i is less than or equal
to some w_ij,
51:29 - 51:29

w seems suggestive of weights.
That's exactly why I called it
51:36 - 51:36

w.
I'm going to make that an edge
51:39 - 51:39

from v_i to v_j.
So, the order flips a little
51:42 - 51:42

bit.
And, the weight of that edge is
51:45 - 51:45

w_ij.
So, just do that.
51:46 - 51:46

Make n vertices.
So, you have the number of
51:49 - 51:49

vertices equals n.
The number of edges equals the
51:53 - 51:53

number of constraints,
which is m, the height of the
51:57 - 51:57

matrix, and just transform.
So, for example,
52:01 - 52:01

here we have three variables.
So, we have three vertices,
52:06 - 52:06

v_1, v_2, v_3.
We have x_1 minus x_2.
52:10 - 52:10

So, we have an edge from v_2 to
v_1 of weight three.
52:14 - 52:14

We have x_2 minus x_3.
So, we have an edge from v_3 to
52:19 - 52:19

v_2 of weight minus two.
And, we have x_1 minus x_3.
52:23 - 52:23

So, we have an edge from v_3 to
v_1 of weight two.
52:28 - 52:28

I hope I got the directions
right.
52:32 - 52:32

Yep.
So, there it is,
52:34 - 52:34

a graph: currently no obvious
connection to shortest paths,
52:40 - 52:40

right?
But in fact,
52:42 - 52:42

this constraint is closely
related to shortest paths.
52:48 - 52:48

So let me just rewrite it.
You could say,
52:52 - 52:52

well, an x_j is less than or
equal to x_i plus w_ij.
52:59 - 52:59

Or, you could think of it as
d[j] less than or equal to d[i]
53:04 - 53:04

plus w_ij.
This is a conceptual balloon.
53:07 - 53:07

Look awfully familiar?
A lot like the triangle
53:11 - 53:11

inequality, a lot like
relaxation.
53:13 - 53:13

So, there's a very close
connection between these two
53:18 - 53:18

problems as we will now prove.
53:43 - 53:43

So, we're going to have two
theorems.
53:45 - 53:45

And, they're going to look
similar to the correctness of
53:49 - 53:49

Bellman-Ford in that they talk
about negative weight cycles.
53:53 - 53:53

Here we go.
It turns out,
53:55 - 53:55

I mean, we have this constraint
graph.
53:57 - 53:57

It can have negative weights.
It can have positive weights.
54:02 - 54:02

It turns out what matters is if
you have a negative weight
54:06 - 54:06

cycle.
So, the first thing to prove is
54:08 - 54:08

that if you have a negative
weight cycle that something bad
54:12 - 54:12

happens.
OK, what could happen bad?
54:14 - 54:14

Well, we're just trying to
satisfy this system of
54:17 - 54:17

constraints.
So, the bad thing is that there
54:19 - 54:19

might not be any solution.
These constraints may be
54:23 - 54:23

infeasible.
And that's the claim.
54:25 - 54:25

The claim is that this is
actually an if and only if.
54:29 - 54:29

But first we'll proved the if.
If you have a negative weight
54:34 - 54:34

cycle, you're doomed.
The difference constraints are
54:38 - 54:38

unsatisfiable.
That's a more intuitive way to
54:42 - 54:42

say it.
In the LP world,
54:44 - 54:44

they call it infeasible.
But unsatisfiable makes a lot
54:48 - 54:48

more sense.
There's no way to assign the
54:52 - 54:52

x_i's in order to satisfy all
the constraints simultaneously.
54:57 - 54:57

So, let's just take a look.
Consider a negative weight
55:02 - 55:02

cycle.
It starts at some vertex,
55:04 - 55:04

goes through some vertices,
and at some point comes back.
55:08 - 55:08

I don't care whether it repeats
vertices, just as long as this
55:12 - 55:12

cycle, from v_1 to v_1 is a
negative weight cycle strictly
55:15 - 55:15

negative weight.
55:26 - 55:26

OK, and what I'm going to do is
just write down all the
55:31 - 55:31

constraints.
Each of these edges corresponds
55:34 - 55:34

to a constraint,
which must be in the set of
55:38 - 55:38

constraints because we had that
graph.
55:41 - 55:41

So, these are all edges.
Let's look at what they give
55:45 - 55:45

us.
So, we have an edge from v_1 to
55:48 - 55:48

v_2.
That corresponds to x_2 minus
55:51 - 55:51

x_1 is, at most,
something, w_12.
55:53 - 55:53

Then we have x_3 minus x_2.
That's the weight w_23,
55:58 - 55:58

and so on.
And eventually we get up to
56:04 - 56:04

something like x_k minus
x_(k-1).
56:09 - 56:09

That's this edge:
w_(k-1),k , and lastly we have
56:16 - 56:16

this edge, which wraps around.
So, it's x_1 minus x_k,
56:24 - 56:24

w_k1 if I've got the signs
right.
56:30 - 56:30

Good, so here's a bunch of
constraints.
56:36 - 56:36

What do you suggest I do with
them?
56:41 - 56:41

Anything interesting about
these constraints,
56:47 - 56:47

say, the left hand sides?
Sorry?
56:52 - 56:52

It sounded like the right word.
What was it?
57:00 - 57:00

Telescopes, yes,
good.
57:01 - 57:01

Everything cancels.
If I added these up,
57:04 - 57:04

there's an x_2 and a minus x_2.
There's a minus x_1 and an x_1.
57:08 - 57:08

There's a minus XK and an XK.
Everything here cancels if I
57:12 - 57:12

add up the left hand sides.
So, what happens if I add up
57:16 - 57:16

the right hand sides?
Over here I get zero,
57:19 - 57:19

my favorite answer.
And over here,
57:21 - 57:21

we get all the weights of all
the edges in the negative weight
57:25 - 57:25

cycle, which is the weight of
the cycle, which is negative.
57:30 - 57:30

So, zero is strictly less than
zero: contradiction.
57:33 - 57:33

Contradiction:
wait a minute,
57:35 - 57:35

we didn't assume anything that
was false.
57:38 - 57:38

So, it's not really a
contradiction in the
57:40 - 57:40

mathematical sense.
We didn't contradict the world.
57:44 - 57:44

We just said that these
constraints are contradictory.
57:47 - 57:47

In other words,
if you pick any values of the
57:50 - 57:50

x_i's, there is no way that
these can all be true because
57:54 - 57:54

that you would get a
contradiction.
57:56 - 57:56

So, it's impossible for these
things to be satisfied by some
58:00 - 58:00

real x_i's.
So, these must be
58:02 - 58:02

unsatisfiable.
Let's say there's no satisfying
58:07 - 58:07

assignment, a little more
precise, x_1 up to x_m,
58:12 - 58:12

no weights.
Can we satisfy those
58:15 - 58:15

constraints?
Because they add up to zero on
58:19 - 58:19

the left-hand side,
and negative on the right-hand
58:24 - 58:24

side.
OK, so that's an easy proof.
58:27 - 58:27

The reverse direction will be
only slightly harder.
58:33 - 58:33

OK, so, cool.
We have this connection.
58:35 - 58:35

So motivation is,
suppose you'd want to solve
58:37 - 58:37

these difference constraints.
And we'll see one such
58:40 - 58:40

application.
I Googled around for difference
58:42 - 58:42

constraints.
There is a fair number of
58:44 - 58:44

papers that care about
difference constraints.
58:47 - 58:47

And, they all use shortest
paths to solve them.
58:49 - 58:49

So, if we can prove a
connection between shortest
58:52 - 58:52

paths, which we know how to
compute, and difference
58:55 - 58:55

constraints, then we'll have
something cool.
58:57 - 58:57

And, next class will see even
more applications of difference
59:00 - 59:00

constraints.
It turns out they're really
59:05 - 59:05

useful for all pairs shortest
paths.
59:10 - 59:10

OK, but for now let's just
prove this equivalence and
59:16 - 59:16

finish it off.
So, the reverse direction is if
59:22 - 59:22

there's no negative weight cycle
in this constraint graph,
59:29 - 59:29

then the system better be
satisfiable.
59:35 - 59:35

The claim is that these
negative weight cycles are the
59:42 - 59:42

only barriers for finding a
solution to these difference
59:49 - 59:49

constraints.
I have this feeling somewhere
59:55 - 59:55

here.
I had to talk about the
59:59 - 59:59

constraint graph.
Good.
60:13 - 60:13

Satisfied, good.
So, here we're going to see a
60:20 - 60:20

technique that is very useful
when thinking about shortest
60:28 - 60:28

paths.
And, it's a bit hard to guess,
60:33 - 60:33

especially if you haven't seen
it before.
60:37 - 60:37

This is useful in problem sets,
and in quizzes,
60:41 - 60:41

and finals, and everything.
So, keep this in mind.
60:45 - 60:45

I mean, I'm using it to prove
this rather simple theorem,
60:51 - 60:51

but the idea of changing the
graph, so I'm going to call this
60:56 - 60:56

constraint graph G.
Changing the graph is a very
61:00 - 61:00

powerful idea.
So, we're going to add a new
61:04 - 61:04

vertex, s, or source,
use the source,
61:08 - 61:08

Luke, and we're going to add a
bunch of edges from s because
61:13 - 61:13

being a source,
it better be connected to some
61:17 - 61:17

things.
So, we are going to add a zero
61:24 - 61:24

weight edge, or weight zero edge
from s to everywhere,
61:30 - 61:30

so, to every other vertex in
the constraint graph.
61:36 - 61:36

Those vertices are called v_i,
v_1 up to v_n.
61:40 - 61:40

So, I have my constraint graph.
But I'll copy this one so I can
61:46 - 61:46

change it.
It's always good to backup your
61:50 - 61:50

work before you make changes,
right?
61:53 - 61:53

So now, I want to add a new
vertex, s, over here,
61:58 - 61:58

my new source.
I just take my constraint
62:01 - 62:01

graph, whatever it looks like,
add in weight zero edges to all
62:07 - 62:07

the other vertices.
Simple enough.
62:11 - 62:11

Now, what did I do?
What did you do?
62:14 - 62:14

Well, I have a candidate source
now which can reach all the
62:19 - 62:19

vertices.
So, shortest path from s,
62:22 - 62:22

hopefully, well,
paths from s exist.
62:25 - 62:25

I can get from s to everywhere
in weight at most zero.
62:30 - 62:30

OK, maybe less.
Could it be less?
62:32 - 62:32

Well, you know,
like v_2, I can get to it by
62:34 - 62:34

zero minus two.
So, that's less than zero.
62:37 - 62:37

So I've got to be a little
careful.
62:39 - 62:39

What if there's a negative
weight cycle?
62:41 - 62:41

Oh no?
Then there wouldn't be any
62:43 - 62:43

shortest paths.
Fortunately,
62:44 - 62:44

we assume that there's no
negative weight cycle in the
62:47 - 62:47

original graph.
And if you think about it,
62:50 - 62:50

if there's no negative weight
cycle in the original graph,
62:53 - 62:53

we add an edge from s to
everywhere else.
62:55 - 62:55

We're not making any new
negative weight cycles because
62:59 - 62:59

you can start at s and go
somewhere at a cost of zero,
63:02 - 63:02

which doesn't affect any
weights.
63:05 - 63:05

And then, you are forced to
stay in the old graph.
63:09 - 63:09

So, there can't be any new
negative weight cycles.
63:13 - 63:13

So, the modified graph has no
negative weight cycles.
63:17 - 63:17

That's good because it also has
paths from s,
63:21 - 63:21

and therefore it also has
shortest paths from s.
63:25 - 63:25

The modified graph has no
negative weight because it
63:30 - 63:30

didn't before.
And, it has paths from s.
63:34 - 63:34

There's a path from s to every
vertex.
63:38 - 63:38

There may not have been before.
Before, I couldn't get from v_2
63:45 - 63:45

to v_3, for example.
Well, that's still true.
63:50 - 63:50

But from s I can get to
everywhere.
63:53 - 63:53

So, that means that this graph,
this modified graph,
63:59 - 63:59

has shortest paths.
Shortest paths exist from s.
64:05 - 64:05

In other words,
if I took all the shortest path
64:10 - 64:10

weights, like I ran Bellman-Ford
from s, then,
64:15 - 64:15

I would get a bunch of finite
numbers, d of v,
64:19 - 64:19

for every value,
for every vertex.
64:23 - 64:23

That seems like a good idea.
Let's do it.
64:27 - 64:27

So, shortest paths exist.
Let's just assign x_i to be the
64:34 - 64:34

shortest path weight from s to
v_i.
64:37 - 64:37

Why not?
That's a good choice for a
64:40 - 64:40

number, the shortest path weight
from s to v_i.
64:44 - 64:44

This is finite because it's
less than infinity,
64:48 - 64:48

and it's greater than minus
infinity, so,
64:52 - 64:52

some finite number.
That's what we need to do in
64:56 - 64:56

order to satisfy these
constraints.
65:00 - 65:00

The claim is that this is a
satisfying assignment.
65:04 - 65:04

Why?
Triangle inequality.
65:06 - 65:06

Somewhere here we wrote
triangle inequality.
65:09 - 65:09

This looks a lot like the
triangle inequality.
65:13 - 65:13

In fact, I think that's the end
of the proof.
65:16 - 65:16

Let's see here.
What we want to be true with
65:20 - 65:20

this assignment is that x_j
minus x_i is less than or equal
65:25 - 65:25

to w_ij whenever ij is an edge.
Or, let's say v_i,
65:28 - 65:28

v_j, for every such constraint,
so, for v_i,
65:32 - 65:32

v_j in the edge set.
OK, so what is this true?
65:37 - 65:37

Well, let's just expand it out.
So, x_i is this delta,
65:42 - 65:42

and x_j is some other delta.
So, we have delta of s,
65:47 - 65:47

vj minus delta of s_vi.
And, on the right-hand side,
65:52 - 65:52

well, w_ij, that was the weight
of the edge from I to J.
65:57 - 65:57

So, this is the weight of v_i
to v_j.
66:01 - 66:01

OK, I will rewrite this
slightly.
66:04 - 66:04

Delta s, vj is less than or
equal to delta s,
66:07 - 66:07

vi plus w of v_i,
v_j.
66:09 - 66:09

And that's the triangle
inequality more or less.
66:13 - 66:13

The shortest path from s to v_j
is, at most, shortest path from
66:18 - 66:18

s to v_i plus a particular path
from v_i to v_j,
66:22 - 66:22

namely the single edge v_i to
v_j.
66:25 - 66:25

This could only be longer than
the shortest path.
66:30 - 66:30

And so, that makes the
right-hand side bigger,
66:33 - 66:33

which makes this inequality
more true, meaning it was true
66:38 - 66:38

before.
And now it's still true.
66:40 - 66:40

And, that proves it.
This is true.
66:42 - 66:42

And, these were all equivalent
statements.
66:46 - 66:46

This we know to be true by
triangle inequality.
66:49 - 66:49

Therefore, these constraints
are all satisfied.
66:52 - 66:52

Magic.
I'm so excited here.
66:54 - 66:54

So, we've proved that having a
negative weight cycle is exactly
66:59 - 66:59

when these system of difference
constraints are unsatisfiable.
67:05 - 67:05

So, if we want to satisfy them,
if we want to find the right
67:08 - 67:08

answer to x, we run
Bellman-Ford.
67:10 - 67:10

Either it says,
oh, no negative weight cycle.
67:12 - 67:12

Then you are hosed.
Then, there is no solution.
67:15 - 67:15

But that's the best you could
hope to know.
67:17 - 67:17

Otherwise, it says,
oh, there was no negative
67:20 - 67:20

weight cycle,
and here are your shortest path
67:22 - 67:22

weights.
You just plug them in,
67:24 - 67:24

and bam, you have your x_i's
that satisfy the constraints.
67:27 - 67:27

Awesome.
Now, it wasn't just any graph.
67:30 - 67:30

I mean, we started with
constraints, algebra,
67:33 - 67:33

we converted it into a graph by
this transform.
67:36 - 67:36

Then we added a source vertex,
s.
67:38 - 67:38

So, I mean, we had to build a
graph to solve our problem,
67:42 - 67:42

very powerful idea.
Cool.
67:43 - 67:43

This is the idea of reduction.
You can reduce the problem you
67:47 - 67:47

want to solve into some problem
you know how to solve.
67:51 - 67:51

You know how to solve shortest
paths when there are no negative
67:55 - 67:55

weight cycles,
or find out that there is a
67:57 - 67:57

negative weight cycle by
Bellman-Ford.
68:01 - 68:01

So, now we know how to solve
difference constraints.
68:06 - 68:06

It turns out you can do even
more.
68:09 - 68:09

Bellman-Ford does a little bit
more than just solve these
68:15 - 68:15

constraints.
But first let me write down
68:19 - 68:19

what I've been jumping up and
down about.
68:23 - 68:23

The corollary is you can use
Bellman-Ford.
68:27 - 68:27

I mean, you make this graph.
Then you apply Bellman-Ford,
68:34 - 68:34

and it will solve your system
of difference constraints.
68:41 - 68:41

So, let me put in some numbers
here.
68:46 - 68:46

You have m difference
constraints.
68:50 - 68:50

And, you have n variables.
And, it will solve them in
68:56 - 68:56

order m times n time.
Actually, these numbers go up
69:02 - 69:02

slightly because we are adding n
edges, and we're adding one
69:07 - 69:07

vertex, but assuming all of
these numbers are nontrivial,
69:12 - 69:12

m is at least n.
It's order MN time.
69:15 - 69:15

OK, trying to avoid cases where
some of them are close to zero.
69:20 - 69:20

Good.
So, some other facts,
69:22 - 69:22

that's what I just said.
And we'll leave these as
69:26 - 69:26

exercises because they're not
too essential.
69:31 - 69:31

The main thing we need is this.
But, some other cool facts is
69:36 - 69:36

that Bellman-Ford actually
optimizes some objective
69:39 - 69:39

functions.
So, we are saying it's just a
69:42 - 69:42

feasibility problem.
We just want to know whether
69:46 - 69:46

these constraints are
satisfiable.
69:49 - 69:49

In fact, you can add a
particular objective function.
69:53 - 69:53

So, you can't give it an
arbitrary objective function,
69:57 - 69:57

but here's one of interest.
x_1 plus x_2 plus x_n,
70:05 - 70:05

OK, but not just that.
We have some constraints.
70:24 - 70:24

OK, this is a linear program.
I want to maximize the sum of
70:27 - 70:27

the x_i's subject to all the
x_i's being nonpositive and the
70:31 - 70:31

difference constraints.
So, this we had before.
70:34 - 70:34

This is fine.
We noticed at some point you
70:36 - 70:36

could get from s to everywhere
with cost, at most,
70:39 - 70:39

zero.
So, we know that in this
70:41 - 70:41

assignment all of the x_i's are
negative.
70:43 - 70:43

That's not necessary,
but it's true when you run
70:46 - 70:46

Bellman-Ford.
So if you solve your system
70:48 - 70:48

using Bellman-Ford,
which is no less general than
70:51 - 70:51

anything else,
you happen to get nonpositive
70:53 - 70:53

x_i's.
And so, subject to that
70:55 - 70:55

constraint, it actually makes
them is close to zero as
70:58 - 70:58

possible in the L1 norm.
In the sum of these values,
71:04 - 71:04

it tries to make the sum as
close to zero,
71:09 - 71:09

it tries to make the values as
small as possible in absolute
71:15 - 71:15

value in this sense.
OK, it does more than that.
71:20 - 71:20

It cooks, it cleans,
it finds shortest paths.
71:25 - 71:25

It also minimizes the spread,
the maximum over all i of x_i
71:32 - 71:32

minus the minimum over all i of
x_i.
71:37 - 71:37

So, I mean, if you have your
real line, and here are the
71:41 - 71:41

x_i's wherever they are.
It minimizes this distance.
71:44 - 71:44

And zero is somewhere over
here.
71:47 - 71:47

So, it tries to make the x_i's
as compact as possible.
71:50 - 71:50

This is actually the L infinity
norm, if you know stuff about
71:54 - 71:54

norms from your linear algebra
class.
71:57 - 71:57

OK, this is the L1 norm.
I think it minimizes every LP
72:01 - 72:01

norm.
Good, so let's use this for
72:05 - 72:05

something.
Yeah, let's solve a real
72:09 - 72:09

problem, and then we'll be done
for today.
72:14 - 72:14

Next class we'll see the really
cool stuff, the really cool
72:21 - 72:21

application of all of this.
For now, and we'll see a cool
72:27 - 72:27

but relatively simple
application, which is VLSI
72:33 - 72:33

layout.
We talked a little bit about
72:38 - 72:38

VLSI way back and divide and
conquer.
72:41 - 72:41

You have a bunch of chips,
or you want to arrange them,
72:46 - 72:46

and minimize some objectives.
So, here's a particular,
72:50 - 72:50

tons of problems that come out
of VLSI layout.
72:55 - 72:55

Here's one of them.
You have a bunch of features of
72:59 - 72:59

an integrated circuit.
You want to somehow arrange
73:05 - 73:05

them on your circuit without
putting any two of them too
73:10 - 73:10

close to each other.
You have some minimum
73:14 - 73:14

separation like at least they
should not get top of each
73:19 - 73:19

other.
Probably, you also need some
73:22 - 73:22

separation to put wires in
between, and so on,
73:27 - 73:27

so, without putting any two
features too close together.
73:33 - 73:33

OK, so just to give you an
idea, so I have some objects and
73:37 - 73:37

I'm going to be a little bit
vague about how this works.
73:41 - 73:41

You have some features.
This is stuff,
73:44 - 73:44

some chips, whatever.
We don't really care what their
73:47 - 73:47

shapes look like.
I just want to be able to move
73:51 - 73:51

them around so that the gap at
any point, so let me just think
73:55 - 73:55

about this gap.
This gap should be at least
73:58 - 73:58

some delta.
Or, I don't want to use delta.
74:01 - 74:01

Let's say epsilon,
good, small number.
74:05 - 74:05

So, I just need some separation
between all of my parts.
74:09 - 74:09

And for this problem,
I'm going to be pretty simple,
74:12 - 74:12

just say that the parts are
only allowed to slide
74:16 - 74:16

horizontally.
So, it's a one-dimensional
74:18 - 74:18

problem.
These objects are in 2-d,
74:21 - 74:21

or whatever,
but I can only slide them an x
74:24 - 74:24

coordinate.
So, to model that,
74:26 - 74:26

I'm going to look at the left
edge of every part and say,
74:30 - 74:30

well, these two left edges
should be at least some
74:33 - 74:33

separation.
So, I think of it as whatever
74:37 - 74:37

the distance is plus some
epsilon.
74:39 - 74:39

But, you know,
if you have some funky 2-d
74:42 - 74:42

shapes you have to compute,
well, this is a little bit too
74:45 - 74:45

close because these come into
alignment.
74:48 - 74:48

But, there's some constraint,
well, for any two pieces,
74:51 - 74:51

I could figure out how close
they can get.
74:54 - 74:54

They should get no closer.
So, I'm going to call this x_1.
74:57 - 74:57

I'll call this x_2.
So, we have some constraint
75:00 - 75:00

like x_2 minus x_1 is at least d
plus epsilon,
75:03 - 75:03

or whatever you compute that
weight to be.
75:07 - 75:07

OK, so for every pair of
pieces, I can do this,
75:10 - 75:10

compute some constraint on how
far apart they have to be.
75:13 - 75:13

And, now I'd like to assign
these x coordinates.
75:16 - 75:16

Right now, I'm assuming they're
just variables.
75:19 - 75:19

I want to slide these pieces
around horizontally in order to
75:22 - 75:22

compactify them as much as
possible so they fit in the
75:25 - 75:25

smallest chip that I can make
because it costs money,
75:28 - 75:28

and time, and everything,
and power, everything.
75:31 - 75:31

You always want your chip
small.
75:34 - 75:34

So, Bellman-Ford does that.
All right, so Bellman-Ford
75:40 - 75:40

solves these constraints because
it's just a bunch of difference
75:48 - 75:48

constraints.
And we know that they are
75:52 - 75:52

solvable because you could
spread all the pieces out
75:58 - 75:58

arbitrarily far.
And, it minimizes the spread,
76:03 - 76:03

minimizes the size of the chip
I need, a max of x_i minus the
76:10 - 76:10

min of x_i.
So, this is it maximizes
76:15 - 76:15

compactness, or minimizes size
of the chip.
76:18 - 76:18

OK, this is a one-dimensional
problem, so it may seem a little
76:23 - 76:23

artificial, but the two
dimensional problem is really
76:27 - 76:27

hard to solve.
And this is,
76:29 - 76:29

in fact, the best you can do
with a nice polynomial time
76:33 - 76:33

algorithm.
There are other applications if
76:37 - 76:37

you're scheduling events in,
like, a multimedia environment,
76:42 - 76:42

and you want to guarantee that
this audio plays at least two
76:47 - 76:47

seconds after this video,
but then there are things that
76:51 - 76:51

are playing at the same time,
and they have to be within some
76:56 - 76:56

gap of each other,
so, lots of papers about using
76:59 - 76:59

Bellman-Ford,
solve difference constraints to
77:03 - 77:03

enable multimedia environments.
OK, so there you go.
77:07 - 77:07

And next class we'll see more
applications of Bellman-Ford to
77:11 - 77:11

all pairs shortest paths.
Questions?
77:14 - 77:14

Great.

Title:: Lec 18 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
Description:: Lecture 18: Shortest Paths II: Bellman-Ford, Linear Programming, Difference Constraints

View the complete course at: http://ocw.mit.edu/6-046JF05

License: Creative Commons BY-NC-SA

More information at http://ocw.mit.edu/terms

More courses at http://ocw.mit.edu

more » « less
Duration:: 01:17:17

	Amara Bot edited English subtitles for Lec 18 \| MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005	Jan 25, 2011, 2:22 PM
	Amara Bot edited English subtitles for Lec 18 \| MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005	Jan 25, 2011, 2:22 PM
	Amara Bot edited English subtitles for Lec 18 \| MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005	Jan 25, 2011, 2:22 PM

English subtitles

Revisions

Revision 1

Amara Bot Jan 25, 2011, 2:22 PM

Lec 18 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005

Revisions

Our website uses cookies

Operating cookies (Required)