Lec 1 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005

Edit subtitles

0:10 - 0:10

We're going to get started.
Handouts are the by the door if
0:15 - 0:15

anybody didn't pick one up.
My name is Charles Leiserson.
0:19 - 0:19

I will be lecturing this course
this term, Introduction to
0:23 - 0:23

Algorithms, with Erik Demaine.
In addition,
0:27 - 0:27

this is an SMA course,
a Singapore MIT Alliance course
0:31 - 0:31

which will be run in Singapore
by David Hsu.
0:35 - 0:35

And so all the lectures will be
videotaped and made available on
0:41 - 0:41

the Web for the Singapore
students, as well as for MIT
0:46 - 0:46

students who choose to watch
them on the Web.
0:50 - 0:50

If you have an issue of not
wanting to be on the videotape,
0:55 - 0:55

you should sit in the back row.
OK?
0:58 - 0:58

Otherwise, you will be on it.
There is a video recording
1:04 - 1:04

policy, but it seems like they
ran out.
1:06 - 1:06

If anybody wants to see it,
people, if they could just sort
1:10 - 1:10

of pass them around maybe a
little bit, once you're done
1:14 - 1:14

reading it, or you can come up.
I did secure one copy.
1:18 - 1:18

Before we get into the content
of the course,
1:21 - 1:21

let's briefly go over the
course information because there
1:25 - 1:25

are some administrative things
that we sort of have to do.
1:30 - 1:30

As you can see,
this term we have a big staff.
1:34 - 1:34

Take a look at the handout
here.
1:36 - 1:36

Including this term six TAs,
which is two more TAs than we
1:40 - 1:40

normally get for this course.
That means recitations will be
1:45 - 1:45

particularly small.
There is a World Wide Web page,
1:49 - 1:49

and you should bookmark that
and go there regularly because
1:54 - 1:54

that is where everything will be
distributed.
1:58 - 1:58

Email.
You should not be emailing
2:00 - 2:00

directly to, even though we give
you our email addresses,
2:04 - 2:04

to the individual members of
the staff.
2:07 - 2:07

You should email us generally.
And the reason is you will get
2:11 - 2:11

much faster response.
And also, for any
2:14 - 2:14

communications,
generally we like to monitor
2:17 - 2:17

what the communications are so
it's helpful to have emails
2:21 - 2:21

coming to everybody on the
course staff.
2:23 - 2:23

As I mentioned,
we will be doing distance
2:26 - 2:26

learning this term.
And so you can watch lectures
2:29 - 2:29

online if you choose to do that.
I would recommend,
2:34 - 2:34

for people who have the
opportunity to watch,
2:38 - 2:38

to come live.
It's better live.
2:41 - 2:41

You get to interact.
There's an intangible that
2:44 - 2:44

comes with having it live.
In fact, in addition to the
2:49 - 2:49

videos, I meet weekly with the
Singapore students so that they
2:54 - 2:54

have a live session as well.
Prerequisites.
2:58 - 2:58

The prerequisites for this
course are 6.042,
3:02 - 3:02

which is Math for Computer
Science, and 6.001.
3:05 - 3:05

You basically need discrete
mathematics and probability,
3:10 - 3:10

as well as programming
experience to take this course
3:14 - 3:14

successfully.
People do not have that
3:17 - 3:17

background should not be in the
class.
3:20 - 3:20

We will be checking
prerequisites.
3:23 - 3:23

If you have any questions,
please come to talk to us after
3:27 - 3:27

class.
Let's see.
3:30 - 3:30

Lectures are here.
For SMA students,
3:32 - 3:32

they have the videotapes and
they will also have a weekly
3:37 - 3:37

meeting.
Students must attend a one-hour
3:40 - 3:40

recitation session each week.
There will be new material
3:44 - 3:44

presented in the recitation.
Unlike the lectures,
3:48 - 3:48

they will not be online.
Unlike the lectures,
3:51 - 3:51

there will not be lecture notes
distributed for the recitations
3:56 - 3:56

in general.
And, yet, there will be
4:00 - 4:00

material there that is directly
on the exams.
4:03 - 4:03

And so every term we say oh,
when did you cover that?
4:07 - 4:07

That was in recitation.
You missed that one.
4:10 - 4:10

So, recitations are mandatory.
And, in particular,
4:14 - 4:14

also let me just mention your
recitation instructor is the one
4:19 - 4:19

who assigns your final grade.
So we have a grade meeting and
4:23 - 4:23

keep everybody normal,
but your recitation has the
4:27 - 4:27

final say on your grade.
Handouts.
4:31 - 4:31

Handouts are available on the
course Web page.
4:35 - 4:35

We will not generally,
except for this one,
4:38 - 4:38

first handout,
be bringing handouts to class.
4:42 - 4:42

Textbook is this book,
Introduction to Algorithms.
4:46 - 4:46

MIT students can get it any of
the local bookstores,
4:51 - 4:51

including the MIT Coop.
There is also a new online
4:55 - 4:55

service that provides textbooks.
You can also get a discount if
5:02 - 5:02

you buy it at the MIT Press
Bookstore.
5:05 - 5:05

There is a coupon in the MIT
Student Telephone Directory for
5:10 - 5:10

a discount on MIT Press books.
And you can use that to
5:15 - 5:15

purchase this book at a
discount.
5:18 - 5:18

Course website.
This is the course website.
5:22 - 5:22

It links to the Stellar
website, which is where,
5:26 - 5:26

actually, everything will be
kept.
5:30 - 5:30

And SMA students have their own
website.
5:33 - 5:33

Some students find this course
particularly challenges so we
5:38 - 5:38

will have extra help.
We will post weekly office
5:42 - 5:42

hours on the course website for
the TAs.
5:45 - 5:45

And then as an experiment this
term, we are going to offer
5:49 - 5:49

homework labs for this class.
What a homework lab is,
5:53 - 5:53

is it's a place and a time you
can go where other people in the
5:58 - 5:58

course will go to do homework.
And there will be typically two
6:04 - 6:04

TAs who staff the lab.
And so, as you're working on
6:08 - 6:08

your homework,
you can get help from the TAs
6:11 - 6:11

if you need it.
And it's generally a place,
6:14 - 6:14

we're going to schedule those,
and they will be on the course
6:18 - 6:18

calendar for where it is and
when it is that they will be
6:22 - 6:22

held, but usually Sundays 2:00
to 4:00 pm, or else it will be
6:26 - 6:26

some evening.
I think the first one is an
6:29 - 6:29

evening, right?
Near to when the homework is
6:33 - 6:33

due.
Your best bet is try to do the
6:36 - 6:36

homework in advance of the
homework lab.
6:39 - 6:39

But then, if you want extra
help, if you want to talk over
6:44 - 6:44

your solutions with people
because as we will talk about
6:48 - 6:48

problem sets you can solve in
collaboration with other people
6:53 - 6:53

in the class.
In addition,
6:55 - 6:55

there are several peer
assistance programs.
7:00 - 7:00

Also the office of Minority
Education has an assistance
7:04 - 7:04

program, and those usually get
booked up pretty quickly.
7:08 - 7:08

If you're interested in those,
good idea to make an
7:12 - 7:12

appointment to get there and get
help soon.
7:15 - 7:15

The homework labs,
I hope a lot of people will try
7:19 - 7:19

that out.
We've never done this.
7:22 - 7:22

I don't know of any other
course.
7:24 - 7:24

Do other people know of courses
at MIT that have done this?
7:28 - 7:28

6.011 did it,
OK.
7:31 - 7:31

Good.
And was it successful in that
7:34 - 7:34

class?
It never went,
7:36 - 7:36

OK.
Good.
7:37 - 7:37

[LAUGHTER] We will see.
If it's not paying off then we
7:42 - 7:42

will just return to ordinary
office hours for those TAs,
7:47 - 7:47

but I think for some students
that is a good opportunity.
7:53 - 7:53

If you wish to be registered in
this course, you must sign up on
7:59 - 7:59

the course Web page.
So, that is requirement one.
8:05 - 8:05

It must be done today.
You will find it difficult to
8:09 - 8:09

pass the course if you are not
in the class.
8:13 - 8:13

And you should notify your TA
if you decide to drop so that we
8:19 - 8:19

can get you off and stop the
mailings, stop the spam.
8:24 - 8:24

And you should register today
before 7:00 PM.
8:29 - 8:29

And then we're going to email
your recitation assignment to
8:33 - 8:33

you before Noon tomorrow.
And if you don't receive this
8:37 - 8:37

information by Thursday Noon,
please send us an email to the
8:41 - 8:41

course staff generally,
not to me individually,
8:44 - 8:44

saying that you didn't receive
your recitation assignment.
8:48 - 8:48

And so if you haven't received
it by Thursday Noon you want to.
8:53 - 8:53

I think generally they are
going to send them out tonight
8:57 - 8:57

or at least by tomorrow morning.
Yeah.
9:00 - 9:00

OK.
SMA students don't have to
9:02 - 9:02

worry about this.
Problem sets.
9:04 - 9:04

We have nine problem sets that
we project will be assigned
9:07 - 9:07

during the semester.
A couple things about problem
9:10 - 9:10

sets.
Homeworks won't generally be
9:12 - 9:12

accepted, if you have
extenuating circumstances you
9:15 - 9:15

should make prior arrangements
with your recitation instructor.
9:19 - 9:19

In fact, almost all of the
administrative stuff,
9:22 - 9:22

you shouldn't come to me to ask
and say can I hand in something
9:25 - 9:25

late?
You should be talking to your
9:27 - 9:27

recitation instructor.
You can read the other things
9:32 - 9:32

about the form,
but let me just mention that
9:36 - 9:36

there are exercises that should
be solved but not handed in as
9:41 - 9:41

well to give you drill on the
material.
9:44 - 9:44

I highly recommend you doing
the exercises.
9:47 - 9:47

They both test your
understanding of the material,
9:51 - 9:51

and exercises have this way of
finding themselves on quizzes.
9:56 - 9:56

You're often asked to describe
algorithms.
10:00 - 10:00

And here is a little outline of
what you can use to describe an
10:06 - 10:06

algorithm.
The grading policy is something
10:10 - 10:10

that somehow I cover.
And always every term there are
10:15 - 10:15

at least a couple of students
who pretend like I never showed
10:21 - 10:21

them this.
If you skip problems it has a
10:25 - 10:25

nonlinear effect on your grade.
Nonlinear, OK?
10:30 - 10:30

If you don't skip any problems,
no effect on your grade.
10:34 - 10:34

If you skip one problem,
a hundredth of a letter grade,
10:39 - 10:39

we can handle that.
But two problems it's a tenth.
10:43 - 10:43

And, as you see,
by the time you have skipped
10:46 - 10:46

like five letter grades,
it is already five problems.
10:50 - 10:50

This is not problem sets,
by the way.
10:53 - 10:53

This is problems,
OK?
10:55 - 10:55

You're down a third of a letter
grade.
10:59 - 10:59

And if you don't do nine or
more, so that's typically about
11:03 - 11:03

three to four problem sets,
you don't pass the class.
11:07 - 11:07

I always have some students
coming at the end of the year
11:12 - 11:12

saying oh, I didn't do any of my
problems.
11:15 - 11:15

Can you just pass me because I
did OK on the exams?
11:19 - 11:19

Answer no, a very simple answer
because we've said it upfront.
11:23 - 11:23

So, the problem sets are an
integral part of the course.
11:27 - 11:27

Collaboration policy.
This is extremely important so
11:32 - 11:32

everybody pay attention.
If you are asleep now wake up.
11:36 - 11:36

Like that's going to wake
anybody up, right?
11:39 - 11:39

[LAUGHTER] The goal of
homework.
11:41 - 11:41

Professor Demaine and my
philosophy is that the goal of
11:45 - 11:45

homework is to help you learn
the material.
11:48 - 11:48

And one way of helping to learn
is not to just be stuck and
11:52 - 11:52

unable to solve something
because then you're in no better
11:56 - 11:56

shape when the exam roles
around, which is where we're
12:00 - 12:00

actually evaluating you.
So, you're encouraged to
12:05 - 12:05

collaborate.
But there are some commonsense
12:08 - 12:08

things about collaboration.
If you go and you collaborate
12:12 - 12:12

to the extent that all you're
doing is getting the information
12:16 - 12:16

from somebody else,
you're not learning the
12:19 - 12:19

material and you're not going to
do well on the exams.
12:23 - 12:23

In our experience,
students who collaborate
12:26 - 12:26

generally do better than
students who work alone.
12:30 - 12:30

But you owe it to yourself,
if you're going to work in a
12:34 - 12:34

study group, to be prepared for
your study group meeting.
12:37 - 12:37

And specifically you should
spend a half an hour to 45
12:41 - 12:41

minutes on each problem before
you go to group so you're up to
12:45 - 12:45

speed and you've tried out your
ideas.
12:47 - 12:47

And you may have solutions to
some, you may be stuck on some
12:51 - 12:51

other ones, but at least you
applied yourself to it.
12:54 - 12:54

After 30 to 45 minutes,
if you cannot get the problem,
12:58 - 12:58

just sitting there and banging
your head against it makes no
13:01 - 13:01

sense.
It's not a productive use of
13:05 - 13:05

your time.
And I know most of you have
13:07 - 13:07

issues with having time on your
hands, right?
13:10 - 13:10

Like it's not there.
So, don't go banging your head
13:13 - 13:13

against problems that are too
hard or where you don't
13:16 - 13:16

understand what's going on or
whatever.
13:19 - 13:19

That's when the study group can
help out.
13:21 - 13:21

And, as I mentioned,
we'll have homework labs which
13:24 - 13:24

will give you an opportunity to
go and do that and coordinate
13:28 - 13:28

with other students rather than
necessarily having to form your
13:32 - 13:32

own group.
And the TAs will be there.
13:36 - 13:36

If your group is unable to
solve the problem then talk to
13:40 - 13:40

other groups or ask your
recitation instruction.
13:44 - 13:44

And, that's how you go about
solving them.
13:47 - 13:47

Writing up the problem sets,
however, is your individual
13:51 - 13:51

responsibility and should be
done alone.
13:54 - 13:54

You don't write up your problem
solutions with other students,
13:59 - 13:59

you write them up on your own.
And you should on your problem
14:04 - 14:04

sets, because this is an
academic place,
14:08 - 14:08

we understand that the source
of academic information is very
14:13 - 14:13

important, if you collaborated
on solutions you should write a
14:18 - 14:18

list of the collaborators.
Say I worked with so and so on
14:22 - 14:22

this solution.
It does not affect your grade.
14:26 - 14:26

It's just a question of being
scholarly.
14:30 - 14:30

It is a violation of this
policy to submit a problem
14:34 - 14:34

solution that you cannot orally
explain to a member of the
14:39 - 14:39

course staff.
You say oh, well,
14:41 - 14:41

my write-up is similar to that
other person's.
14:45 - 14:45

I didn't copy them.
We may ask you to orally
14:48 - 14:48

explain your solution.
If you are unable,
14:52 - 14:52

according to this policy,
the presumption is that you
14:56 - 14:56

cheated.
So, do not write up stuff that
14:59 - 14:59

you don't understand.
You should be able to write up
15:04 - 15:04

the stuff that you understand.
Understand why you're putting
15:08 - 15:08

down what you're putting down.
If it isn't obvious,
15:12 - 15:12

no collaboration whatsoever is
permitted on exams.
15:16 - 15:16

Exams is when we evaluate you.
And now we're not interested in
15:20 - 15:20

evaluating other people,
we're interested in evaluating
15:24 - 15:24

you.
So, no collaboration on exams.
15:26 - 15:26

We will have a take-home exam
for the second quiz.
15:31 - 15:31

You should look at the
schedule.
15:33 - 15:33

If there are problems with the
schedule of that,
15:36 - 15:36

we want to know early.
And we will give you more
15:40 - 15:40

details about the collaboration
in the lecture on Monday,
15:44 - 15:44

November 28th.
Now, generally,
15:46 - 15:46

the lectures here,
they're mandatory and you have
15:49 - 15:49

to know them,
but I know that some people say
15:52 - 15:52

gee, 9:30 is kind of early,
especially on a Monday or
15:56 - 15:56

whatever.
It can be kind of early to get
15:59 - 15:59

up.
However, on Monday,
16:02 - 16:02

November 28th,
you fail the exam if you do not
16:05 - 16:05

show up to lecture on time.
That one day you must show up.
16:10 - 16:10

Any questions about that?
That one day you must show up
16:14 - 16:14

here, even if you've been
watching them on the Web.
16:19 - 16:19

And generally,
if you think you have
16:21 - 16:21

transgressed,
the best is to come to us to
16:25 - 16:25

talk about it.
We can usually work something
16:28 - 16:28

out.
It's when we find somebody has
16:32 - 16:32

transgressed from a third-party
or from obvious analyses that we
16:38 - 16:38

do with homeworks,
that's when things get messy.
16:42 - 16:42

So, if you think,
for some reason or other,
16:45 - 16:45

oh, I may have done something
wrong, please come and talk to
16:50 - 16:50

us.
We actually were students once,
16:53 - 16:53

too, albeit many years ago.
Any questions?
16:56 - 16:56

So, this class has great
material.
17:00 - 17:00

Fabulous material.
And it's really fun,
17:07 - 17:07

but you do have to work hard.
Let's talk content.
17:29 - 17:29

This is the topic of the first
part of the course.
17:32 - 17:32

The first part of the course is
focused on analysis.
17:36 - 17:36

The second part of the course
is focused on design.
17:39 - 17:39

Before you can do design,
you have to master a bunch of
17:43 - 17:43

techniques for analyzing
algorithms.
17:45 - 17:45

And then you'll be in a
position to design algorithms
17:49 - 17:49

that you can analyze and that
which are efficient.
17:52 - 17:52

The analysis of algorithm is
the theoretical study --
18:05 - 18:05

-- of computer program
performance --
18:19 - 18:19

-- and resource usage.
And a particular focus on
18:24 - 18:24

performance.
We're studying how to make
18:28 - 18:28

things fast.
In particular,
18:31 - 18:31

computer programs.
We also will discover and talk
18:36 - 18:36

about other resources such as
communication,
18:40 - 18:40

such as memory,
whether RAM memory or disk
18:45 - 18:45

memory.
There are other resources that
18:48 - 18:48

we may care about,
but predominantly we focus on
18:53 - 18:53

performance.
Because this is a course about
18:57 - 18:57

performance, I like to put
things in perspective a little
19:03 - 19:03

bit by starting out and asking,
in programming,
19:07 - 19:07

what is more important than
performance?
19:13 - 19:13

If you're in an engineering
situation and writing code,
19:19 - 19:19

writing software,
what's more important than
19:24 - 19:24

performance?
Correctness.
19:26 - 19:26

Good.
OK.
19:27 - 19:27

What else?
Simplicity can be.
19:31 - 19:31

Very good.
Yeah.
19:33 - 19:33

Maintainability often much more
important than performance.
19:41 - 19:41

Cost.
And what type of cost are you
19:45 - 19:45

thinking?
No, I mean cost of what?
19:49 - 19:49

We're talking software here,
right?
19:54 - 19:54

What type of cost do you have
in mind?
20:00 - 20:00

There are some costs that are
involved when programming like
20:05 - 20:05

programmer time.
So, programmer time is another
20:09 - 20:09

thing also that might be.
Stability.
20:12 - 20:12

Robustness of the software.
Does it break all the time?
20:16 - 20:16

What else?
20:25 - 20:25

Come on.
We've got a bunch of engineers
20:28 - 20:28

here.
A lot of things.
20:30 - 20:30

How about features?
Features can be more important.
20:34 - 20:34

Having a wider collection of
features than your competitors.
20:38 - 20:38

Functionality.
Modularity.
20:40 - 20:40

Is it designed in a way where
you can make changes in a local
20:44 - 20:44

part of the code and you don't
have to make changes across the
20:49 - 20:49

code in order to affect a simple
change in the functionality?
20:53 - 20:53

There is one big one which
definitely, especially in the
20:57 - 20:57

`90s, was like the big thing in
computers.
21:01 - 21:01

The big thing.
Well, security actually.
21:04 - 21:04

Good.
I don't even have that one
21:06 - 21:06

down.
Security is excellent.
21:08 - 21:08

That's actually been more in
the 2000.
21:11 - 21:11

Security has been far more
important often than
21:15 - 21:15

performance.
Scalability has been important,
21:18 - 21:18

although scalability,
in some sense,
21:21 - 21:21

is performance related.
But, yes, scalability is good.
21:25 - 21:25

What was the big breakthrough
and why do people use Macintosh
21:29 - 21:29

rather than Windows,
those people who are of that
21:33 - 21:33

religion?
User-friendliness.
21:36 - 21:36

Wow.
If you look at the number of
21:39 - 21:39

cycles of computers that went
into user-friendliness in the
21:44 - 21:44

`90s, it grew from almost
nothing to where it's now the
21:48 - 21:48

vast part of the computation
goes into user-friendly.
21:52 - 21:52

So, all those things are more
important than performance.
21:57 - 21:57

This is a course on
performance.
22:00 - 22:00

Then you can say OK,
well, why do we bother and why
22:06 - 22:06

study algorithms and performance
if it's at the bottom of the
22:12 - 22:12

heap?
Almost always people would
22:16 - 22:16

rather have these other things
than performance.
22:21 - 22:21

You go off and you say to
somebody, would I rather have
22:27 - 22:27

performance or more
user-friendliness?
22:32 - 22:32

It's almost always more
important than performance.
22:36 - 22:36

Why do we care then?
Yeah?
22:44 - 22:44

That wasn't user-friendly.
Sometimes performance is
22:48 - 22:48

correlated with
user-friendliness,
22:50 - 22:50

absolutely.
Nothing is more frustrating
22:53 - 22:53

than sitting there waiting,
right?
22:55 - 22:55

So, that's a good reason.
What are some other reasons
22:59 - 22:59

why?
Sometimes they have real-time
23:03 - 23:03

constraints so they don't
actually work unless they
23:08 - 23:08

perform adequately.
Yeah?
23:10 - 23:10

Hard to get,
well, we don't usually quantify
23:14 - 23:14

user-friendliness so I'm not
sure, but I understand what
23:20 - 23:20

you're saying.
He said we don't get
23:23 - 23:23

exponential performance
improvements in
23:27 - 23:27

user-friendliness.
We often don't get that in
23:32 - 23:32

performance either,
by the way.
23:35 - 23:35

[LAUGHTER] Sometimes we do,
but that's good.
23:38 - 23:38

There are several reasons that
I think are important.
23:42 - 23:42

Once is that often performance
measures the line between the
23:47 - 23:47

feasible and the infeasible.
We have heard some of these
23:52 - 23:52

things.
For example,
23:53 - 23:53

when there are real-time
requirements,
23:57 - 23:57

if it's not fast enough it's
simply not functional.
24:02 - 24:02

Or, if it uses too much memory
it's simply not going to work
24:06 - 24:06

for you.
And, as a consequence,
24:08 - 24:08

what you find is algorithms are
on the cutting edge of
24:11 - 24:11

entrepreneurship.
If you're talking about just
24:14 - 24:14

re-implementing stuff that
people did ten years ago,
24:17 - 24:17

performance isn't that
important at some level.
24:20 - 24:20

But, if you're talking about
doing stuff that nobody has done
24:23 - 24:23

before, one of the reasons often
that they haven't done it is
24:27 - 24:27

because it's too time-consuming.
Things don't scale and so
24:32 - 24:32

forth.
So, that's one reason,
24:34 - 24:34

is the feasible versus
infeasible.
24:37 - 24:37

Another thing is that
algorithms give you a language
24:40 - 24:40

for talking about program
behavior, and that turns out to
24:45 - 24:45

be a language that has been
pervasive through computer
24:49 - 24:49

science, is that the theoretical
language is what gets adopted by
24:54 - 24:54

all the practitioners because
it's a clean way of thinking
24:58 - 24:58

about things.
A good way I think about
25:02 - 25:02

performance, and the reason it's
on the bottom of the heap,
25:08 - 25:08

is sort of like performance is
like money, it's like currency.
25:13 - 25:13

You say what good does a stack
of hundred dollar bills do for
25:19 - 25:19

you?
Would you rather have food or
25:22 - 25:22

water or shelter or whatever?
And you're willing to pay those
25:27 - 25:27

hundred dollar bills,
if you have hundred dollar
25:31 - 25:31

bills, for that commodity.
Even though water is far more
25:37 - 25:37

important to your living.
Well, similarly,
25:40 - 25:40

performance is what you use to
pay for user-friendliness.
25:44 - 25:44

It's what you pay for security.
And you hear people say,
25:48 - 25:48

for example,
that I want greater
25:51 - 25:51

functionality,
so people will program in Java,
25:54 - 25:54

even though it's much slower
than C, because they'll say it
25:58 - 25:58

costs me maybe a factor of three
or something in performance to
26:03 - 26:03

program in Java.
But Java is worth it because
26:07 - 26:07

it's got all these
object-oriented features and so
26:10 - 26:10

forth, exception mechanisms and
so on.
26:13 - 26:13

And so people are willing to
pay a factor of three in
26:16 - 26:16

performance.
So, that's why you want
26:18 - 26:18

performance because you can use
it to pay for these other things
26:23 - 26:23

that you want.
And that's why,
26:25 - 26:25

in some sense,
it's on the bottom of the heap,
26:28 - 26:28

because it's the universal
thing that you quantify.
26:32 - 26:32

Do you want to spend a factor
of two on this or spend a factor
26:37 - 26:37

of three on security,
et cetera?
26:40 - 26:40

And, in addition,
the lessons generalize to other
26:44 - 26:44

resource measures like
communication,
26:47 - 26:47

like memory and so forth.
And the last reason we study
26:51 - 26:51

algorithm performance is it's
tons of fun.
26:54 - 26:54

Speed is always fun,
right?
26:56 - 26:56

Why do people drive fast cars,
race horses,
27:00 - 27:00

whatever?
Rockets, et cetera,
27:03 - 27:03

why do we do that?
Because speed is fun.
27:06 - 27:06

Ski.
Who likes to ski?
27:08 - 27:08

I love to ski.
I like going fast on those
27:10 - 27:10

skis.
It's fun.
27:11 - 27:11

Hockey, fast sports,
right?
27:13 - 27:13

We all like the fast sports.
Not all of us,
27:16 - 27:16

I mean.
Some people say he's not
27:18 - 27:18

talking to me.
OK, let's move on.
27:21 - 27:21

That's sort of a little bit of
a notion as to why we study
27:25 - 27:25

this, is that it does,
in some sense,
27:27 - 27:27

form a common basis for all
these other things we care
27:31 - 27:31

about.
And so we want to understand
27:35 - 27:35

how can we generate money for
ourselves in computation?
27:40 - 27:40

We're going to start out with a
very simple problem.
27:45 - 27:45

It's one of the oldest problems
that has been studied in
27:49 - 27:49

algorithms, is the problem of
sorting.
27:53 - 27:53

We're going to actually study
this for several lectures
27:57 - 27:57

because sorting contains many
algorithmic techniques.
28:03 - 28:03

The sorting problem is the
following.
28:10 - 28:10

We have a sequence a_1,
a_2 up to a_n of numbers as
28:21 - 28:21

input.
And our output is a permutation
28:28 - 28:28

of those numbers.
28:42 - 28:42

A permutation is a
rearrangement of the numbers.
28:47 - 28:47

Every number appears exactly
once in the rearrangement such
28:53 - 28:53

that, I sometimes use a dollar
sign to mean "such that," a_1 is
29:00 - 29:00

less than or equal to a_2 prime.
Such that they are
29:07 - 29:07

monotonically increasing in
size.
29:11 - 29:11

Take a bunch of numbers,
put them in order.
29:18 - 29:18

Here's an algorithm to do it.
It's called insertion sort.
29:40 - 29:40

And we will write this
algorithm in what we call
29:44 - 29:44

pseudocode.
It's sort of a programming
29:47 - 29:47

language, except it's got
English in there often.
29:52 - 29:52

And it's just a shorthand for
writing for being precise.
29:57 - 29:57

So this sorts A from 1 to n.
And here is the code for it.
30:59 - 30:59

This is what we call
pseudocode.
31:02 - 31:02

And if you don't understand the
pseudocode then you should ask
31:07 - 31:07

questions about any of the
notations.
31:10 - 31:10

You will start to get used to
it as we go on.
31:14 - 31:14

One thing is that in the
pseudocode we use indentation,
31:18 - 31:18

where in most languages they
have some kind of begin-end
31:23 - 31:23

delimiters like curly braces or
something in Java or C,
31:28 - 31:28

for example.
We just use indentation.
31:32 - 31:32

The whole idea of the
pseudocode is to try to get the
31:35 - 31:35

algorithms as short as possible
while still understanding what
31:40 - 31:40

the individual steps are.
In practice,
31:42 - 31:42

there actually have been
languages that use indentation
31:46 - 31:46

as a means of showing the
nesting of things.
31:49 - 31:49

It's generally a bad idea,
because if things go over one
31:53 - 31:53

page to another,
for example,
31:55 - 31:55

you cannot tell what level of
nesting it is.
31:59 - 31:59

Whereas, with explicit braces
it's much easier to tell.
32:04 - 32:04

So, there are reasons why this
is a bad notation if you were
32:09 - 32:09

doing software engineering.
But it's a good one for us
32:14 - 32:14

because it just keeps things
short and makes fewer things to
32:20 - 32:20

write down.
So, this is insertion sort.
32:23 - 32:23

Let's try to figure out a
little bit what this does.
32:29 - 32:29

It basically takes an array A
and at any point the thing to
32:37 - 32:37

understand is,
we're setting basically,
32:42 - 32:42

we're running the outer loop
from j is 2 to n,
32:48 - 32:48

and the inner loop that starts
at j minus 1 and then goes down
32:57 - 32:57

until it's zero.
Basically, if we look at any
33:02 - 33:02

point in the algorithm,
we essentially are looking at
33:07 - 33:07

some element here j.
A of j, the jth element.
33:11 - 33:11

And what we do essentially is
we pull a value out here that we
33:16 - 33:16

call the key.
And at this point the important
33:20 - 33:20

thing to understand,
and we'll talk more about this
33:24 - 33:24

in recitation on Friday,
is that there is an invariant
33:29 - 33:29

that is being maintained by this
loop each time through.
33:35 - 33:35

And the invariant is that this
part of the array is sorted.
33:41 - 33:41

And the goal each time through
the loop is to increase,
33:46 - 33:46

is to add one to the length of
the things that are sorted.
33:51 - 33:51

And the way we do that is we
pull out the key and we just
33:57 - 33:57

copy values up like this.
And keep copying up until we
34:02 - 34:02

find the place where this key
goes, and then we insert it in
34:08 - 34:08

that place.
And that's why it's called
34:11 - 34:11

insertion sort.
We just sort of move the
34:14 - 34:14

things, copy the things up until
we find where it goes,
34:19 - 34:19

and then we put it into place.
And now we have it from A from
34:24 - 34:24

one to j is sorted,
and now we can work on j plus
34:29 - 34:29

one.
Let's give an example of that.
34:34 - 34:34

Imagine we are doing 8,
2, 4, 9, 3, 6.
34:38 - 34:38

We start out with j equals 2.
And we figure out that we want
34:45 - 34:45

to insert it there.
Now we have 2,
34:50 - 34:50

8, 4, 9, 3, 6.
Then we look at the four and
34:55 - 34:55

say oh, well,
that goes over here.
35:00 - 35:00

We get 2, 4,
8, 9, 3, 6 after the second
35:04 - 35:04

iteration of the outer loop.
Then we look at 9 and discover
35:09 - 35:09

immediately it just goes right
there.
35:12 - 35:12

Very little work to do on that
step.
35:16 - 35:16

So, we have exactly the same
output after that iteration.
35:21 - 35:21

Then we look at the 3 and
that's going to be inserted over
35:26 - 35:26

there.
2, 3, 4, 8, 9,
35:30 - 35:30

6.
And finally we look at the 6
35:34 - 35:34

and that goes in there.
2, 3, 4, 6, 8,
35:40 - 35:40

9.
And at that point we are done.
35:45 - 35:45

Question?
35:58 - 35:58

The array initially starts at
one, yes.
36:02 - 36:02

A[1...n], OK?
So, this is the insertion sort
36:06 - 36:06

algorithm.
And it's the first algorithm
36:10 - 36:10

that we're going to analyze.
And we're going to pull out
36:15 - 36:15

some tools that we have from our
math background to help to
36:21 - 36:21

analyze it.
First of all,
36:23 - 36:23

let's take a look at the issue
of running time.
36:29 - 36:29

The running time depends,
of this algorithm depends on a
36:36 - 36:36

lot of things.
One thing it depends on is the
36:42 - 36:42

input itself.
For example,
36:45 - 36:45

if the input is already sorted
--
36:55 - 36:55

-- then insertion sort has very
little work to do.
37:00 - 37:00

Because every time through it's
going to be like this case.
37:04 - 37:04

It doesn't have to shuffle too
many guys over because they're
37:09 - 37:09

already in place.
Whereas, in some sense,
37:12 - 37:12

what's the worst case for
insertion sort?
37:15 - 37:15

If it is reverse sorted then
it's going to have to do a lot
37:19 - 37:19

of work because it's going to
have to shuffle everything over
37:24 - 37:24

on each step of the outer loop.
In addition to the actual input
37:29 - 37:29

it depends, of course,
on the input size.
37:38 - 37:38

Here, for example,
we did six elements.
37:42 - 37:42

It's going to take longer if
we, for example,
37:46 - 37:46

do six times ten to the ninth
elements.
37:50 - 37:50

If we were sorting a lot more
stuff, it's going to take us a
37:57 - 37:57

lot longer.
Typically, the way we handle
38:01 - 38:01

that is we are going to
parameterize things in the input
38:06 - 38:06

size.
We are going to talk about time
38:11 - 38:11

as a function of the size of
things that we are sorting so we
38:16 - 38:16

can look at what is the behavior
of that.
38:20 - 38:20

And the last thing I want to
say about running time is
38:25 - 38:25

generally we want upper bonds on
the running time.
38:30 - 38:30

We want to know that the time
is no more than a certain
38:34 - 38:34

amount.
And the reason is because that
38:37 - 38:37

represents a guarantee to the
user.
38:40 - 38:40

If I say it's not going to run,
for example,
38:44 - 38:44

if I tell you here's a program
and it won't run more than three
38:49 - 38:49

seconds, that gives you real
information about how you could
38:53 - 38:53

use it, for example,
in a real-time setting.
38:58 - 38:58

Whereas, if I said here's a
program and it goes at least
39:03 - 39:03

three seconds,
you don't know if it's going to
39:06 - 39:06

go for three years.
It doesn't give you that much
39:10 - 39:10

guarantee if you are a user of
it.
39:13 - 39:13

Generally we want upper bonds
because it represents a
39:17 - 39:17

guarantee to the user.
39:30 - 39:30

There are different kinds of
analyses that people do.
39:44 - 39:44

The one we're mostly going to
focus on is what's called
39:53 - 39:53

worst-case analysis.
And this is what we do usually
40:02 - 40:02

where we define T of n to be the
maximum time on any input of
40:12 - 40:12

size n.
So, it's the maximum input,
40:16 - 40:16

the maximum it could possibly
cost us on an input of size n.
40:20 - 40:20

What that does is,
if you look at the fact that
40:23 - 40:23

sometimes the inputs are better
and sometimes they're worse,
40:27 - 40:27

we're looking at the worst case
of those because that's the way
40:31 - 40:31

we're going to be able to make a
guarantee.
40:34 - 40:34

It always does something rather
than just sometimes does
40:38 - 40:38

something.
So, we're looking at the
40:40 - 40:40

maximum.
Notice that if I didn't have
40:43 - 40:43

maximum then T(n) in some sense
is a relation,
40:46 - 40:46

not a function,
because the time on an input of
40:50 - 40:50

size n depends on which input of
size n.
40:52 - 40:52

I could have many different
times, but by putting the
40:56 - 40:56

maximum at it,
it turns that relation into a
40:59 - 40:59

function because there's only
one maximum time that it will
41:03 - 41:03

take.
Sometimes we will talk about
41:12 - 41:12

average case.
Sometimes we will do this.
41:22 - 41:22

Here T of n is then the
expected time over all inputs of
41:35 - 41:35

size n.
It's the expected time.
41:40 - 41:40

Now, if I talk about expected
time, what else do I need to say
41:45 - 41:45

here?
What does that mean,
41:47 - 41:47

expected time?
I'm sorry.
41:49 - 41:49

Raise your hand.
Expected inputs.
41:52 - 41:52

What does that mean,
expected inputs?
42:05 - 42:05

I need more math.
What do I need by expected time
42:10 - 42:10

here, math?
You have to take the time of
42:14 - 42:14

every input and then average
them, OK.
42:18 - 42:18

That's kind of what we mean by
expected time.
42:23 - 42:23

Good.
Not quite.
42:24 - 42:24

I mean, what you say is
completely correct,
42:29 - 42:29

except is not quite enough.
Yeah?
42:34 - 42:34

It's the time of every input
times the probability that it
42:39 - 42:39

will be that input.
It's a way of taking a weighted
42:45 - 42:45

average, exactly right.
How do I know what the
42:49 - 42:49

probability of every input is?
How do I know what the
42:55 - 42:55

probability a particular input
occurs is in a given situation?
43:02 - 43:02

I don't.
I have to make an assumption.
43:06 - 43:06

What's that assumption called?
What kind of assumption do I
43:12 - 43:12

have to meet?
I need an assumption --
43:24 - 43:24

-- of the statistical
distribution of inputs.
43:28 - 43:28

Otherwise, expected time
doesn't mean anything because I
43:34 - 43:34

don't know what the probability
of something is.
43:38 - 43:38

In order to do probability,
you need some assumptions and
43:44 - 43:44

you've got to state those
assumptions clearly.
43:48 - 43:48

One of the most common
assumptions is that all inputs
43:53 - 43:53

are equally likely.
That's called the uniform
43:57 - 43:57

distribution.
Every input of size n is
44:02 - 44:02

equally likely,
that kind of thing.
44:05 - 44:05

But there are other ways that
you could make that assumption,
44:09 - 44:09

and they may not all be true.
This is much more complicated,
44:14 - 44:14

as you can see.
Fortunately,
44:16 - 44:16

all of you have a strong
probability background.
44:20 - 44:20

And so we will not have any
trouble addressing these
44:24 - 44:24

probabilistic issues of dealing
with expectations and such.
44:30 - 44:30

If you don't,
time to go and say gee,
44:34 - 44:34

maybe I should take that
Probability class that is a
44:40 - 44:40

prerequisite for this class.
The last one I am going to
44:47 - 44:47

mention is best-case analysis.
And this I claim is bogus.
44:53 - 44:53

Bogus.
No good.
44:55 - 44:55

Why is best-case analysis
bogus?
45:00 - 45:00

Yeah?
The best-case probably doesn't
45:03 - 45:03

ever happen.
Actually, it's interesting
45:06 - 45:06

because for the sorting problem,
the most common things that get
45:11 - 45:11

sorted are things that are
already sorted interestingly,
45:15 - 45:15

or at least almost sorted.
For example,
45:18 - 45:18

one of the most common things
that are sorted is check numbers
45:23 - 45:23

by banks.
They tend to come in,
45:26 - 45:26

in the same order that they are
written.
45:30 - 45:30

They're sorting things that are
almost always sorted.
45:37 - 45:37

I mean, it's good.
When upper bond,
45:41 - 45:41

not lower bound?
Yeah, you want to make a
45:46 - 45:46

guarantee.
And so why is this not a
45:51 - 45:51

guarantee?
You're onto something there,
45:55 - 45:55

but we need a little more
precision here.
46:02 - 46:02

How can I cheat?
Yeah?
46:04 - 46:04

Yeah, you can cheat.
You cheat.
46:08 - 46:08

You take any slow algorithm
that you want and just check for
46:14 - 46:14

some particular input,
and if it's that input,
46:19 - 46:19

then you say immediately yeah,
OK, here is the answer.
46:25 - 46:25

And then it's got a good
best-case.
46:30 - 46:30

But I didn't tell you anything
about the vast majority of what
46:37 - 46:37

is going on.
So, you can cheat with a slow
46:42 - 46:42

algorithm that works fast on
some input.
46:47 - 46:47

It doesn't really do much for
you so we normally don't worry
46:54 - 46:54

about that.
Let's see.
46:56 - 46:56

What is insertion sorts
worst-case time?
47:02 - 47:02

Now we get into some sort of
funny issues.
47:07 - 47:07

First of all,
it sort of depends on the
47:12 - 47:12

computer you're running on.
Whose computer,
47:18 - 47:18

right?
Is it a big supercomputer or is
47:23 - 47:23

it your wristwatch?
They have different
47:28 - 47:28

computational abilities.
And when we compare algorithms,
47:34 - 47:34

we compare them typically for
relative speed.
47:38 - 47:38

This is if you compared two
algorithms on the same machine.
47:42 - 47:42

You could argue,
well, it doesn't really matter
47:46 - 47:46

what the machine is because I
will just look at their relative
47:51 - 47:51

speed.
But, of course,
47:52 - 47:52

I may also be interested in
absolute speed.
47:56 - 47:56

Is one algorithm actually
better no matter what machine
48:00 - 48:00

it's run on?
48:08 - 48:08

And so this kind of gets sort
of confusing as to how I can
48:13 - 48:13

talk about the worst-case time
of an algorithm of a piece of
48:18 - 48:18

software when I am not talking
about the hardware because,
48:23 - 48:23

clearly, if I had run on a
faster machine,
48:27 - 48:27

my algorithms are going to go
faster.
48:30 - 48:30

So, this is where you get the
big idea of algorithms.
48:36 - 48:36

Which is why algorithm is such
a huge field,
48:39 - 48:39

why it spawns companies like
Google, like Akamai,
48:43 - 48:43

like Amazon.
Why algorithmic analysis,
48:46 - 48:46

throughout the history of
computing, has been such a huge
48:51 - 48:51

success, is our ability to
master and to be able to take
48:55 - 48:55

what is apparently a really
messy, complicated situation and
49:00 - 49:00

reduce it to being able to do
some mathematics.
49:05 - 49:05

And that idea is called
asymptotic analysis.
49:17 - 49:17

And the basic idea of
asymptotic analysis is to ignore
49:22 - 49:22

machine-dependent constants --
49:34 - 49:34

-- and, instead of the actual
running time,
49:39 - 49:39

look at the growth of the
running time.
49:59 - 49:59

So, we don't look at the actual
running time.
50:03 - 50:03

We look at the growth.
Let's see what we mean by that.
50:07 - 50:07

This is a huge idea.
It's not a hard idea,
50:11 - 50:11

otherwise I wouldn't be able to
teach it in the first lecture,
50:16 - 50:16

but it's a huge idea.
We are going to spend a couple
50:20 - 50:20

of lectures understanding the
implications of that and will
50:25 - 50:25

basically be doing it throughout
the term.
50:30 - 50:30

And if you go on to be
practicing engineers,
50:34 - 50:34

you will be doing it all the
time.
50:36 - 50:36

In order to do that,
we adopt some notations that
50:41 - 50:41

are going to help us.
In particular,
50:44 - 50:44

we will adopt asymptotic
notation.
50:46 - 50:46

Most of you have seen some kind
of asymptotic notation.
50:51 - 50:51

Maybe a few of you haven't,
but mostly you should have seen
50:56 - 50:56

a little bit.
The one we're going to be using
51:00 - 51:00

in this class predominantly is
theta notation.
51:05 - 51:05

And theta notation is pretty
easy notation to master because
51:13 - 51:13

all you do is,
from a formula,
51:16 - 51:16

just drop low order terms and
ignore leading constants.
51:30 - 51:30

For example,
if I have a formula like 3n^3 =
51:36 - 51:36

90n^2 - 5n + 6046,
I say, well,
51:40 - 51:40

what low-order terms do I drop?
Well, n^3 is a bigger term n^2
51:48 - 51:48

than.
I am going to drop all these
51:53 - 51:53

terms and ignore the leading
constant, so I say that's
52:00 - 52:00

Theta(n^3).
That's pretty easy.
52:05 - 52:05

So, that's theta notation.
Now, this is an engineering way
52:09 - 52:09

of manipulating theta notation.
There is actually a
52:13 - 52:13

mathematical definition for
this, which we are going to talk
52:18 - 52:18

about next time,
which is a definition in terms
52:22 - 52:22

of sets of functions.
And, you are going to be
52:26 - 52:26

responsible, this is both a math
and a computer science
52:30 - 52:30

engineering class.
Throughout the course you are
52:35 - 52:35

going to be responsible both for
mathematical rigor as if it were
52:39 - 52:39

a math course and engineering
commonsense because it's an
52:43 - 52:43

engineering course.
We are going to be doing both.
52:46 - 52:46

This is the engineering way of
understanding what you do,
52:50 - 52:50

so you're responsible for being
able to do these manipulations.
52:54 - 52:54

You're also going to be
responsible for understanding
52:58 - 52:58

the mathematical definition of
theta notion and of its related
53:02 - 53:02

O notation and omega notation.
If I take a look as n
53:08 - 53:08

approached infinity,
a Theta(n^2) algorithm always
53:15 - 53:15

beats, eventually,
a Theta(n^3) algorithm.
53:20 - 53:20

As n gets bigger,
it doesn't matter what these
53:27 - 53:27

other terms were if I were
describing the absolute precise
53:34 - 53:34

behavior in terms of a formula.
If I had a Theta(n^2)
53:42 - 53:42

algorithm, it would always be
faster for sufficiently large n
53:46 - 53:46

than a Theta(n^3) algorithm.
It wouldn't matter what those
53:51 - 53:51

low-order terms were.
It wouldn't matter what the
53:54 - 53:54

leading constant was.
This one will always be faster.
53:59 - 53:59

Even if you ran the Theta(n^2)
algorithm on a slow computer and
54:05 - 54:05

the Theta(n^3) algorithm on a
fast computer.
54:09 - 54:09

The great thing about
asymptotic notation is it
54:14 - 54:14

satisfies our issue of being
able to compare both relative
54:20 - 54:20

and absolute speed,
because we are able to do this
54:24 - 54:24

no matter what the computer
platform.
54:29 - 54:29

On different platforms we may
get different constants here,
54:35 - 54:35

machine-dependent constants for
the actual running time,
54:40 - 54:40

but if I look at the growth as
the size of the input gets
54:45 - 54:45

larger, the asymptotics
generally won't change.
54:50 - 54:50

For example,
I will just draw that as a
54:54 - 54:54

picture.
If I have n on this axis and
54:57 - 54:57

T(n) on this axis.
This may be,
55:01 - 55:01

for example,
a Theta(n^3) algorithm and this
55:05 - 55:05

may be a Theta(n^2) algorithm.
There is always going to be
55:09 - 55:09

some point n_o where for
everything larger the Theta(n^2)
55:14 - 55:14

algorithm is going to be cheaper
than the Theta(n^3) algorithm
55:19 - 55:19

not matter how much advantage
you give it at the beginning in
55:24 - 55:24

terms of the speed of the
computer you are running on.
55:30 - 55:30

Now, from an engineering point
of view, there are some issues
55:34 - 55:34

we have to deal with because
sometimes it could be that that
55:39 - 55:39

n_o is so large that the
computers aren't big enough to
55:43 - 55:43

run the problem.
That's why we,
55:45 - 55:45

nevertheless,
are interested in some of the
55:48 - 55:48

slower algorithms,
because some of the slower
55:51 - 55:51

algorithms, even though they may
not asymptotically be slower,
55:56 - 55:56

I mean asymptotically they will
be slower.
56:00 - 56:00

They may still be faster on
reasonable sizes of things.
56:04 - 56:04

And so we have to both balance
our mathematical understanding
56:08 - 56:08

with our engineering commonsense
in order to do good programming.
56:12 - 56:12

So, just having done analysis
of algorithms doesn't
56:16 - 56:16

automatically make you a good
programmer.
56:18 - 56:18

You also need to learn how to
program and use these tools in
56:23 - 56:23

practice to understand when they
are relevant and when they are
56:27 - 56:27

not relevant.
There is a saying.
56:30 - 56:30

If you want to be a good
program, you just program ever
56:35 - 56:35

day for two years,
you will be an excellent
56:38 - 56:38

programmer.
If you want to be a world-class
56:42 - 56:42

programmer, you can program
every day for ten years,
56:47 - 56:47

or you can program every day
for two years and take an
56:51 - 56:51

algorithms class.
Let's get back to what we were
56:55 - 56:55

doing, which is analyzing
insertion sort.
57:00 - 57:00

We are going to look at the
worse-case.
57:16 - 57:16

Which, as we mentioned before,
is when the input is reverse
57:21 - 57:21

sorted.
The biggest element comes first
57:25 - 57:25

and the smallest last because
now every time you do the
57:30 - 57:30

insertion you've got to shuffle
everything over.
57:35 - 57:35

You can write down the running
time by looking at the nesting
57:39 - 57:39

of loops.
What we do is we sum up.
57:41 - 57:41

What we assume is that every
operation, every elemental
57:44 - 57:44

operation is going to take some
constant amount of time.
57:47 - 57:47

But we don't have to worry
about what that constant is
57:50 - 57:50

because we're going to be doing
asymptotic analysis.
57:53 - 57:53

As I say, the beautify of the
method is that it causes all
57:57 - 57:57

these things that are real
distinctions to sort of vanish.
58:01 - 58:01

We sort of look at them from
30,000 feet rather than from
58:05 - 58:05

three millimeters or something.
Each of these operations is
58:10 - 58:10

going to sort of be a basic
operation.
58:13 - 58:13

One way to think about this,
in terms of counting
58:17 - 58:17

operations, is counting memory
references.
58:20 - 58:20

How many times do you actually
access some variable?
58:24 - 58:24

That's another way of sort of
thinking about this model.
58:29 - 58:29

When we do that,
well, we're going to go through
58:34 - 58:34

this loop, j is going from 2 to
n, and then we're going to add
58:40 - 58:40

up the work that we do within
the loop.
58:44 - 58:44

We can sort of write that in
math as summation of j equals 2
58:50 - 58:50

to n.
And then what is the work that
58:53 - 58:53

is going on in this loop?
Well, the work that is going on
58:59 - 58:59

in this loop varies,
but in the worst case how many
59:04 - 59:04

operations are going on here for
each value of j?
59:10 - 59:10

For a given value of j,
how much work goes on in this
59:17 - 59:17

loop?
Can somebody tell me?
59:21 - 59:21

Asymptotically.
It's j times some constant,
59:27 - 59:27

so it's theta j.
So, there is theta j work going
59:33 - 59:33

on here because this loop starts
out with i being j minus 1,
59:39 - 59:39

and then it's doing just a
constant amount of stuff for
59:44 - 59:44

each step of the value of i,
and i is running from j minus
59:50 - 59:50

one down to zero.
So, we can say that is theta j
59:55 - 59:55

work that is going on.
Do people follow that?
60:00 - 60:00

OK.
And now we have a formula we
60:03 - 60:03

can evaluate.
What is the evaluation?
60:06 - 60:06

If I want to simplify this
formula, what is that equal to?
60:20 - 60:20

Sorry. In the back there.
60:28 - 60:28

Yeah. OK.
That's just Theta(n^2),
60:34 - 60:34

good.
Because when you're saying is
60:36 - 60:36

the sum of consecutive numbers,
you mean what?
60:40 - 60:40

What's the mathematic term we
have for that so we can
60:43 - 60:43

communicate?
You've got to know these things
60:47 - 60:47

so you can communicate.
It's called what type of
60:50 - 60:50

sequence?
It's actually a series,
60:52 - 60:52

but that's OK.
What type of series is this
60:56 - 60:56

called?
Arithmetic series,
60:57 - 60:57

good.
Wow, we've got some sharp
61:00 - 61:00

people who can communicate.
This is an arithmetic series.
61:06 - 61:06

You're basically summing 1 + 2
+ 3 + 4, some constants in
61:12 - 61:12

there, but basically it's 1 + 2
+ 3 + 4 + 5 + 6 up to n.
61:17 - 61:17

That's Theta(n^2).
If you don't know this math,
61:22 - 61:22

there is a chapter in the book,
or you could have taken the
61:28 - 61:28

prerequisite.
Erythematic series.
61:31 - 61:31

People have this vague
recollection.
61:34 - 61:34

Oh, yeah.
Good.
61:35 - 61:35

Now, you have to learn these
manipulations.
61:38 - 61:38

We will talk about a bit next
time, but you have to learn your
61:43 - 61:43

theta manipulations for what
works with theta.
61:46 - 61:46

And you have to be very careful
because theta is a weak
61:50 - 61:50

notation.
A strong notation is something
61:53 - 61:53

like Leibniz notation from
calculus where the chain rule is
61:57 - 61:57

just canceling two things.
It's just fabulous that you can
62:02 - 62:02

cancel in the chain rule.
And Leibniz notation just
62:05 - 62:05

expresses that so directly you
can manipulate.
62:08 - 62:08

Theta notation is not like
that.
62:09 - 62:09

If you think it is like that
you are in trouble.
62:12 - 62:12

You really have to think of
what is going on under the theta
62:16 - 62:16

notation.
And it is more of a descriptive
62:18 - 62:18

notation than it is a
manipulative notation.
62:21 - 62:21

There are manipulations you can
do with it, but unless you
62:24 - 62:24

understand what is really going
on under the theta notation you
62:28 - 62:28

will find yourself in trouble.
And next time we will talk a
62:33 - 62:33

little bit more about theta
notation.
62:36 - 62:36

Is insertion sort fast?
62:49 - 62:49

Well, it turns out for small n
it is moderately fast.
63:02 - 63:02

But it is not at all for large
n.
63:18 - 63:18

So, I am going to give you an
algorithm that is faster.
63:22 - 63:22

It's called merge sort.
I wonder if I should leave
63:25 - 63:25

insertion sort up.
Why not.
63:46 - 63:46

I am going to write on this
later, so if you are taking
63:52 - 63:52

notes, leave some space on the
left.
63:56 - 63:56

Here is merge sort of an array
A from 1 up to n.
64:02 - 64:02

And it is basically three
steps.
64:06 - 64:06

If n equals 1 we are done.
Sorting one element,
64:12 - 64:12

it is already sorted.
All right.
64:16 - 64:16

Recursive algorithm.
Otherwise, what we do is we
64:22 - 64:22

recursively sort A from 1 up to
the ceiling of n over 2.
64:30 - 64:30

And the array A of the ceiling
of n over 2 plus one up to n.
64:39 - 64:39

So, we sort two halves of the
input.
64:45 - 64:45

And then, three,
we take those two lists that we
64:52 - 64:52

have done and we merge them.
65:03 - 65:03

And, to do that,
we use a merge subroutine which
65:06 - 65:06

I will show you.
65:14 - 65:14

The key subroutine here is
merge, and it works like this.
65:20 - 65:20

I have two lists.
Let's say one of them is 20.
65:26 - 65:26

I am doing this in reverse
order.
65:30 - 65:30

I have sorted this like this.
And then I sort another one.
65:35 - 65:35

I don't know why I do it this
order, but anyway.
65:40 - 65:40

Here is my other list.
I have my two lists that I have
65:45 - 65:45

sorted.
So, this is A[1] to A[|n/2|]
65:48 - 65:48

and A[|n/2|+1] to A[n] for the
way it will be called in this
65:54 - 65:54

program.
And now to merge these two,
65:57 - 65:57

what I want to do is produce a
sorted list out of both of them.
66:04 - 66:04

What I do is first observe
where is the smallest element of
66:09 - 66:09

any two lists that are already
sorted?
66:12 - 66:12

It's in one of two places,
the head of the first list or
66:16 - 66:16

the head of the second list.
I look at those two elements
66:21 - 66:21

and say which one is smaller?
This one is smaller.
66:25 - 66:25

Then what I do is output into
my output array the smaller of
66:29 - 66:29

the two.
And I cross it off.
66:32 - 66:32

And now where is the next
smallest element?
66:36 - 66:36

And the answer is it's going to
be the head of one of these two
66:40 - 66:40

lists.
Then I cross out this guy and
66:43 - 66:43

put him here and circle this
one.
66:46 - 66:46

Now I look at these two guys.
This one is smaller so I output
66:50 - 66:50

that and circle that one.
Now I look at these two guys,
66:54 - 66:54

output 9.
So, every step here is some
66:57 - 66:57

fixed number of operations that
is independent of the size of
67:02 - 67:02

the arrays at each step.
Each individual step is just me
67:08 - 67:08

looking at two elements and
picking out the smallest and
67:14 - 67:14

advancing some pointers into the
array so that I know where the
67:20 - 67:20

current head of that list is.
And so, therefore,
67:25 - 67:25

the time is order n on n total
elements.
67:30 - 67:30

The time to actually go through
this and merge two lists is
67:35 - 67:35

order n.
We sometimes call this linear
67:38 - 67:38

time because it's not quadratic
or whatever.
67:41 - 67:41

It is proportional to n,
proportional to the input size.
67:45 - 67:45

It's linear time.
I go through and just do this
67:49 - 67:49

simple operation,
just working up these lists,
67:53 - 67:53

and in the end I have done
essentially n operations,
67:57 - 67:57

order n operations each of
which cost constant time.
68:02 - 68:02

That's a total of order n time.
Everybody with me?
68:07 - 68:07

OK.
So, this is a recursive
68:10 - 68:10

program.
We can actually now write what
68:13 - 68:13

is called a recurrence for this
program.
68:17 - 68:17

The way we do that is say let's
let the time to sort n elements
68:24 - 68:24

to be T(n).
Then how long does it take to
68:28 - 68:28

do step one?
68:35 - 68:35

That's just constant time.
We just check to see if n is 1,
68:39 - 68:39

and if it is we return.
That's independent of the size
68:43 - 68:43

of anything that we are doing.
It just takes a certain number
68:48 - 68:48

of machine instructions on
whatever machine and we say it
68:52 - 68:52

is constant time.
We call that theta one.
68:55 - 68:55

This is actually a little bit
of an abuse if you get into it.
69:00 - 69:00

And the reason is because
typically in order to say it you
69:04 - 69:04

need to say what it is growing
with.
69:07 - 69:07

Nevertheless,
we use this as an abuse of the
69:11 - 69:11

notation just to mean it is a
constant.
69:14 - 69:14

So, that's an abuse just so
people know.
69:17 - 69:17

But it simplifies things if I
can just write theta one.
69:21 - 69:21

And it basically means the same
thing.
69:24 - 69:24

Now we recursively sort these
two things.
69:27 - 69:27

How can I describe that?
The time to do this,
69:33 - 69:33

I can describe recursively as T
of ceiling of n over 2 plus T of
69:41 - 69:41

n minus ceiling of n over 2.
That is actually kind of messy,
69:48 - 69:48

so what we will do is just be
sloppy and write 2T(n/2).
69:55 - 69:55

So, this is just us being
sloppy.
70:00 - 70:00

And we will see on Friday in
recitation that it is OK to be
70:04 - 70:04

sloppy.
That's the great thing about
70:07 - 70:07

algorithms.
As long as you are rigorous and
70:10 - 70:10

precise, you can be as sloppy as
you want.
70:13 - 70:13

[LAUGHTER] This is sloppy
because I didn't worry about
70:16 - 70:16

what was going on,
because it turns out it doesn't
70:20 - 70:20

make any difference.
And we are going to actually
70:23 - 70:23

see that that is the case.
And, finally,
70:27 - 70:27

I have to merge the two sorted
lists which have a total of n
70:30 - 70:30

elements.
And we just analyze that using
70:32 - 70:32

the merge subroutine.
And that takes us to theta n
70:34 - 70:34

time.
70:40 - 70:40

That allows us now to write a
recurrence for the performance
70:44 - 70:44

of merge sort.
70:57 - 70:57

Which is to say that T of n is
equal to theta 1 if n equals 1
71:05 - 71:05

and 2T of n over 2 plus theta of
n if n is bigger than 1.
71:12 - 71:12

Because either I am doing step
one or I am doing all steps one,
71:20 - 71:20

two and three.
Here I am doing step one and I
71:26 - 71:26

return and I am done.
Or else I am doing step one,
71:32 - 71:32

I don't return,
and then I also do steps two
71:36 - 71:36

and three.
So, I add those together.
71:39 - 71:39

I could say theta n plus theta
1, but theta n plus theta 1 is
71:44 - 71:44

just theta n because theta 1 is
a lower order term than theta n
71:49 - 71:49

and I can throw it away.
It is either theta 1 or it is
71:53 - 71:53

2T of n over 2 plus theta n.
Now, typically we won't be
71:58 - 71:58

writing this.
Usually we omit this.
72:01 - 72:01

If it makes no difference to
the solution of the recurrence,
72:06 - 72:06

we will usually omit constant
base cases.
72:08 - 72:08

In algorithms,
it's not true generally in
72:11 - 72:11

mathematics, but in algorithms
if you are running something on
72:16 - 72:16

a constant size input it takes
constant time always.
72:19 - 72:19

So, we don't worry about what
this value is.
72:22 - 72:22

And it turns out it has no real
impact on the asymptotic
72:26 - 72:26

solution of the recurrence.
How do we solve a recurrence
72:31 - 72:31

like this?
I now have T of n expressed in
72:35 - 72:35

terms of T of n over 2.
That's in the book and it is
72:39 - 72:39

also in Lecture 2.
We are going to do Lecture 2 to
72:43 - 72:43

solve that, but in the meantime
what I am going to do is give
72:48 - 72:48

you a visual way of
understanding what this costs,
72:52 - 72:52

which is one of the techniques
we will elaborate on next time.
72:58 - 72:58

It is called a recursion tree
technique.
73:02 - 73:02

And I will use it for the
actual recurrence that is almost
73:08 - 73:08

the same 2T(n/2),
but I am going to actually
73:12 - 73:12

explicitly, because I want you
to see where it occurs,
73:17 - 73:17

plus some constant times n
where c is a constant greater
73:23 - 73:23

than zero.
So, we are going to look at
73:26 - 73:26

this recurrence with a base case
of order one.
73:32 - 73:32

I am just making the constant
in here, the upper bound on the
73:36 - 73:36

constant be explicit rather than
implicit.
73:39 - 73:39

And the way you do a recursion
tree is the following.
73:43 - 73:43

You start out by writing down
the left-hand side of the
73:47 - 73:47

recurrence.
And then what you do is you say
73:50 - 73:50

well, that is equal to,
and now let's write it as a
73:54 - 73:54

tree.
I do c of n work plus now I am
73:56 - 73:56

going to have to do work on each
of my two children.
74:01 - 74:01

T of n over 2 and T of n over
2.
74:05 - 74:05

If I sum up what is in here,
I get this because that is what
74:11 - 74:11

the recurrence says,
T(n)=2T(n/2)+cn.
74:15 - 74:15

I have 2T(n/2)+cn.
Then I do it again.
74:20 - 74:20

I have cn here.
I now have here cn/2.
74:24 - 74:24

And here is cn/2.
And each of these now has a
74:29 - 74:29

T(n/4).
74:36 - 74:36

And these each have a T(n/4).
And this has a T(n/4).
74:43 - 74:43

And I keep doing that,
the dangerous dot,
74:49 - 74:49

dot, dots.
And, if I keep doing that,
74:54 - 74:54

I end up with it looking like
this.
75:18 - 75:18

And I keep going down until I
get to a leaf.
75:23 - 75:23

And a leaf, I have essentially
a T(1).
75:28 - 75:28

That is T(1).
And so the first question I ask
75:34 - 75:34

here is, what is the height of
this tree?
75:39 - 75:39

Yeah.
It's log n.
75:41 - 75:41

It's actually very close to
exactly log n because I am
75:48 - 75:48

starting out at the top with n
and then I go to n/2 and n/4 and
75:56 - 75:56

all the way down until I get to
1.
76:01 - 76:01

The number of halvings of n
until I get to 1 is log n so the
76:05 - 76:05

height here is log n.
It's OK if it is constant times
76:09 - 76:09

log n.
It doesn't matter.
76:11 - 76:11

How many leaves are in this
tree, by the way?
76:25 - 76:25

How many leaves does this tree
have?
76:28 - 76:28

Yeah.
The number of leaves,
76:31 - 76:31

once again, is actually pretty
close.
76:34 - 76:34

It's actually n.
If you took it all the way
76:38 - 76:38

down.
Let's make some simplifying
76:41 - 76:41

assumption.
n is a perfect power of 2,
76:45 - 76:45

so it is an integer power of 2.
Then this is exactly log n to
76:51 - 76:51

get down to T(1).
And then there are exactly n
76:55 - 76:55

leaves, because the number of
leaves here, the number of nodes
77:01 - 77:01

at this level is 1,
2, 4, 8.
77:05 - 77:05

And if I go down height h,
I have 2 to the h leaves,
77:11 - 77:11

2 to the log n,
that is just n.
77:14 - 77:14

We are doing math here,
right?
77:17 - 77:17

Now let's figure out how much
work, if I look at adding up
77:23 - 77:23

everything in this tree I am
going to get T(n),
77:29 - 77:29

so let's add that up.
Well, let's add it up level by
77:36 - 77:36

level.
How much do we have in the
77:40 - 77:40

first level?
Just cn.
77:42 - 77:42

If I add up the second level,
how much do I have?
77:48 - 77:48

cn.
How about if I add up the third
77:52 - 77:52

level?
cn.
77:53 - 77:53

How about if I add up all the
leaves?
77:57 - 77:57

Theta n.
It is not necessarily cn
78:03 - 78:03

because the boundary case may
have a different constant.
78:09 - 78:09

It is actually theta n,
but cn all the way here.
78:15 - 78:15

If I add up the total amount,
that is equal to cn times log
78:22 - 78:22

n, because that's the height,
that is how many cn's I have
78:29 - 78:29

here, plus theta n.
And this is a higher order term
78:36 - 78:36

than this, so this goes away,
get rid of the constants,
78:42 - 78:42

that is equal to theta(n lg n).
And theta(n lg n) is
78:48 - 78:48

asymptotically faster than
theta(n^2).
78:53 - 78:53

So, merge sort,
on a large enough input size,
78:58 - 78:58

is going to beat insertion
sort.
79:03 - 79:03

Merge sort is going to be a
faster algorithm.
79:07 - 79:07

Sorry, you guys,
I didn't realize you couldn't
79:12 - 79:12

see over there.
You should speak up if you
79:16 - 79:16

cannot see.
So, this is a faster algorithm
79:20 - 79:20

because theta(n lg n) grows more
slowly than theta(n^2).
79:25 - 79:25

And merge sort asymptotically
beats insertion sort.
79:31 - 79:31

Even if you ran insertion sort
on a supercomputer,
79:35 - 79:35

somebody running on a PC with
merge sort for sufficient large
79:41 - 79:41

input will clobber them because
actually n^2 is way bigger than
79:46 - 79:46

n log n once you get the n's to
be large.
79:50 - 79:50

And, in practice,
merge sort tends to win here
79:54 - 79:54

for n bigger than,
say, 30 or so.
79:58 - 79:58

If you have a very small input
like 30 elements,
80:02 - 80:02

insertion sort is a perfectly
decent sort to use.
80:06 - 80:06

But merge sort is going to be a
lot faster even for something
80:11 - 80:11

that is only a few dozen
elements.
80:14 - 80:14

It is going to actually be a
faster algorithm.
80:18 - 80:18

That's sort of the lessons,
OK?
80:21 - 80:21

Remember that to get your
recitation assignments and
80:25 - 80:25

attend recitation on Friday.
Because we are going to be
80:31 - 80:31

going through a bunch of the
things that I have left on the
80:36 - 80:36

table here.
And see you next Monday.

Title:: Lec 1 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
Video Language:: English
Duration:: 01:20:36

	Amara Bot edited English subtitles for Lec 1 \| MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
	Amara Bot edited English subtitles for Lec 1 \| MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005
	Amara Bot edited English subtitles for Lec 1 \| MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005

English subtitles

Revisions

Revision 1

Amara Bot

Lec 1 | MIT 6.046J / 18.410J Introduction to Algorithms (SMA 5503), Fall 2005

Revisions

Our website uses cookies

Operating cookies (Required)