-
We're going to get started.
Handouts are the by the door if
-
anybody didn't pick one up.
My name is Charles Leiserson.
-
I will be lecturing this course
this term, Introduction to
-
Algorithms, with Erik Demaine.
In addition,
-
this is an SMA course,
a Singapore MIT Alliance course
-
which will be run in Singapore
by David Hsu.
-
And so all the lectures will be
videotaped and made available on
-
the Web for the Singapore
students, as well as for MIT
-
students who choose to watch
them on the Web.
-
If you have an issue of not
wanting to be on the videotape,
-
you should sit in the back row.
OK?
-
Otherwise, you will be on it.
There is a video recording
-
policy, but it seems like they
ran out.
-
If anybody wants to see it,
people, if they could just sort
-
of pass them around maybe a
little bit, once you're done
-
reading it, or you can come up.
I did secure one copy.
-
Before we get into the content
of the course,
-
let's briefly go over the
course information because there
-
are some administrative things
that we sort of have to do.
-
As you can see,
this term we have a big staff.
-
Take a look at the handout
here.
-
Including this term six TAs,
which is two more TAs than we
-
normally get for this course.
That means recitations will be
-
particularly small.
There is a World Wide Web page,
-
and you should bookmark that
and go there regularly because
-
that is where everything will be
distributed.
-
Email.
You should not be emailing
-
directly to, even though we give
you our email addresses,
-
to the individual members of
the staff.
-
You should email us generally.
And the reason is you will get
-
much faster response.
And also, for any
-
communications,
generally we like to monitor
-
what the communications are so
it's helpful to have emails
-
coming to everybody on the
course staff.
-
As I mentioned,
we will be doing distance
-
learning this term.
And so you can watch lectures
-
online if you choose to do that.
I would recommend,
-
for people who have the
opportunity to watch,
-
to come live.
It's better live.
-
You get to interact.
There's an intangible that
-
comes with having it live.
In fact, in addition to the
-
videos, I meet weekly with the
Singapore students so that they
-
have a live session as well.
Prerequisites.
-
The prerequisites for this
course are 6.042,
-
which is Math for Computer
Science, and 6.001.
-
You basically need discrete
mathematics and probability,
-
as well as programming
experience to take this course
-
successfully.
People do not have that
-
background should not be in the
class.
-
We will be checking
prerequisites.
-
If you have any questions,
please come to talk to us after
-
class.
Let's see.
-
Lectures are here.
For SMA students,
-
they have the videotapes and
they will also have a weekly
-
meeting.
Students must attend a one-hour
-
recitation session each week.
There will be new material
-
presented in the recitation.
Unlike the lectures,
-
they will not be online.
Unlike the lectures,
-
there will not be lecture notes
distributed for the recitations
-
in general.
And, yet, there will be
-
material there that is directly
on the exams.
-
And so every term we say oh,
when did you cover that?
-
That was in recitation.
You missed that one.
-
So, recitations are mandatory.
And, in particular,
-
also let me just mention your
recitation instructor is the one
-
who assigns your final grade.
So we have a grade meeting and
-
keep everybody normal,
but your recitation has the
-
final say on your grade.
Handouts.
-
Handouts are available on the
course Web page.
-
We will not generally,
except for this one,
-
first handout,
be bringing handouts to class.
-
Textbook is this book,
Introduction to Algorithms.
-
MIT students can get it any of
the local bookstores,
-
including the MIT Coop.
There is also a new online
-
service that provides textbooks.
You can also get a discount if
-
you buy it at the MIT Press
Bookstore.
-
There is a coupon in the MIT
Student Telephone Directory for
-
a discount on MIT Press books.
And you can use that to
-
purchase this book at a
discount.
-
Course website.
This is the course website.
-
It links to the Stellar
website, which is where,
-
actually, everything will be
kept.
-
And SMA students have their own
website.
-
Some students find this course
particularly challenges so we
-
will have extra help.
We will post weekly office
-
hours on the course website for
the TAs.
-
And then as an experiment this
term, we are going to offer
-
homework labs for this class.
What a homework lab is,
-
is it's a place and a time you
can go where other people in the
-
course will go to do homework.
And there will be typically two
-
TAs who staff the lab.
And so, as you're working on
-
your homework,
you can get help from the TAs
-
if you need it.
And it's generally a place,
-
we're going to schedule those,
and they will be on the course
-
calendar for where it is and
when it is that they will be
-
held, but usually Sundays 2:00
to 4:00 pm, or else it will be
-
some evening.
I think the first one is an
-
evening, right?
Near to when the homework is
-
due.
Your best bet is try to do the
-
homework in advance of the
homework lab.
-
But then, if you want extra
help, if you want to talk over
-
your solutions with people
because as we will talk about
-
problem sets you can solve in
collaboration with other people
-
in the class.
In addition,
-
there are several peer
assistance programs.
-
Also the office of Minority
Education has an assistance
-
program, and those usually get
booked up pretty quickly.
-
If you're interested in those,
good idea to make an
-
appointment to get there and get
help soon.
-
The homework labs,
I hope a lot of people will try
-
that out.
We've never done this.
-
I don't know of any other
course.
-
Do other people know of courses
at MIT that have done this?
-
6.011 did it,
OK.
-
Good.
And was it successful in that
-
class?
It never went,
-
OK.
Good.
-
[LAUGHTER] We will see.
If it's not paying off then we
-
will just return to ordinary
office hours for those TAs,
-
but I think for some students
that is a good opportunity.
-
If you wish to be registered in
this course, you must sign up on
-
the course Web page.
So, that is requirement one.
-
It must be done today.
You will find it difficult to
-
pass the course if you are not
in the class.
-
And you should notify your TA
if you decide to drop so that we
-
can get you off and stop the
mailings, stop the spam.
-
And you should register today
before 7:00 PM.
-
And then we're going to email
your recitation assignment to
-
you before Noon tomorrow.
And if you don't receive this
-
information by Thursday Noon,
please send us an email to the
-
course staff generally,
not to me individually,
-
saying that you didn't receive
your recitation assignment.
-
And so if you haven't received
it by Thursday Noon you want to.
-
I think generally they are
going to send them out tonight
-
or at least by tomorrow morning.
Yeah.
-
OK.
SMA students don't have to
-
worry about this.
Problem sets.
-
We have nine problem sets that
we project will be assigned
-
during the semester.
A couple things about problem
-
sets.
Homeworks won't generally be
-
accepted, if you have
extenuating circumstances you
-
should make prior arrangements
with your recitation instructor.
-
In fact, almost all of the
administrative stuff,
-
you shouldn't come to me to ask
and say can I hand in something
-
late?
You should be talking to your
-
recitation instructor.
You can read the other things
-
about the form,
but let me just mention that
-
there are exercises that should
be solved but not handed in as
-
well to give you drill on the
material.
-
I highly recommend you doing
the exercises.
-
They both test your
understanding of the material,
-
and exercises have this way of
finding themselves on quizzes.
-
You're often asked to describe
algorithms.
-
And here is a little outline of
what you can use to describe an
-
algorithm.
The grading policy is something
-
that somehow I cover.
And always every term there are
-
at least a couple of students
who pretend like I never showed
-
them this.
If you skip problems it has a
-
nonlinear effect on your grade.
Nonlinear, OK?
-
If you don't skip any problems,
no effect on your grade.
-
If you skip one problem,
a hundredth of a letter grade,
-
we can handle that.
But two problems it's a tenth.
-
And, as you see,
by the time you have skipped
-
like five letter grades,
it is already five problems.
-
This is not problem sets,
by the way.
-
This is problems,
OK?
-
You're down a third of a letter
grade.
-
And if you don't do nine or
more, so that's typically about
-
three to four problem sets,
you don't pass the class.
-
I always have some students
coming at the end of the year
-
saying oh, I didn't do any of my
problems.
-
Can you just pass me because I
did OK on the exams?
-
Answer no, a very simple answer
because we've said it upfront.
-
So, the problem sets are an
integral part of the course.
-
Collaboration policy.
This is extremely important so
-
everybody pay attention.
If you are asleep now wake up.
-
Like that's going to wake
anybody up, right?
-
[LAUGHTER] The goal of
homework.
-
Professor Demaine and my
philosophy is that the goal of
-
homework is to help you learn
the material.
-
And one way of helping to learn
is not to just be stuck and
-
unable to solve something
because then you're in no better
-
shape when the exam roles
around, which is where we're
-
actually evaluating you.
So, you're encouraged to
-
collaborate.
But there are some commonsense
-
things about collaboration.
If you go and you collaborate
-
to the extent that all you're
doing is getting the information
-
from somebody else,
you're not learning the
-
material and you're not going to
do well on the exams.
-
In our experience,
students who collaborate
-
generally do better than
students who work alone.
-
But you owe it to yourself,
if you're going to work in a
-
study group, to be prepared for
your study group meeting.
-
And specifically you should
spend a half an hour to 45
-
minutes on each problem before
you go to group so you're up to
-
speed and you've tried out your
ideas.
-
And you may have solutions to
some, you may be stuck on some
-
other ones, but at least you
applied yourself to it.
-
After 30 to 45 minutes,
if you cannot get the problem,
-
just sitting there and banging
your head against it makes no
-
sense.
It's not a productive use of
-
your time.
And I know most of you have
-
issues with having time on your
hands, right?
-
Like it's not there.
So, don't go banging your head
-
against problems that are too
hard or where you don't
-
understand what's going on or
whatever.
-
That's when the study group can
help out.
-
And, as I mentioned,
we'll have homework labs which
-
will give you an opportunity to
go and do that and coordinate
-
with other students rather than
necessarily having to form your
-
own group.
And the TAs will be there.
-
If your group is unable to
solve the problem then talk to
-
other groups or ask your
recitation instruction.
-
And, that's how you go about
solving them.
-
Writing up the problem sets,
however, is your individual
-
responsibility and should be
done alone.
-
You don't write up your problem
solutions with other students,
-
you write them up on your own.
And you should on your problem
-
sets, because this is an
academic place,
-
we understand that the source
of academic information is very
-
important, if you collaborated
on solutions you should write a
-
list of the collaborators.
Say I worked with so and so on
-
this solution.
It does not affect your grade.
-
It's just a question of being
scholarly.
-
It is a violation of this
policy to submit a problem
-
solution that you cannot orally
explain to a member of the
-
course staff.
You say oh, well,
-
my write-up is similar to that
other person's.
-
I didn't copy them.
We may ask you to orally
-
explain your solution.
If you are unable,
-
according to this policy,
the presumption is that you
-
cheated.
So, do not write up stuff that
-
you don't understand.
You should be able to write up
-
the stuff that you understand.
Understand why you're putting
-
down what you're putting down.
If it isn't obvious,
-
no collaboration whatsoever is
permitted on exams.
-
Exams is when we evaluate you.
And now we're not interested in
-
evaluating other people,
we're interested in evaluating
-
you.
So, no collaboration on exams.
-
We will have a take-home exam
for the second quiz.
-
You should look at the
schedule.
-
If there are problems with the
schedule of that,
-
we want to know early.
And we will give you more
-
details about the collaboration
in the lecture on Monday,
-
November 28th.
Now, generally,
-
the lectures here,
they're mandatory and you have
-
to know them,
but I know that some people say
-
gee, 9:30 is kind of early,
especially on a Monday or
-
whatever.
It can be kind of early to get
-
up.
However, on Monday,
-
November 28th,
you fail the exam if you do not
-
show up to lecture on time.
That one day you must show up.
-
Any questions about that?
That one day you must show up
-
here, even if you've been
watching them on the Web.
-
And generally,
if you think you have
-
transgressed,
the best is to come to us to
-
talk about it.
We can usually work something
-
out.
It's when we find somebody has
-
transgressed from a third-party
or from obvious analyses that we
-
do with homeworks,
that's when things get messy.
-
So, if you think,
for some reason or other,
-
oh, I may have done something
wrong, please come and talk to
-
us.
We actually were students once,
-
too, albeit many years ago.
Any questions?
-
So, this class has great
material.
-
Fabulous material.
And it's really fun,
-
but you do have to work hard.
Let's talk content.
-
This is the topic of the first
part of the course.
-
The first part of the course is
focused on analysis.
-
The second part of the course
is focused on design.
-
Before you can do design,
you have to master a bunch of
-
techniques for analyzing
algorithms.
-
And then you'll be in a
position to design algorithms
-
that you can analyze and that
which are efficient.
-
The analysis of algorithm is
the theoretical study --
-
-- of computer program
performance --
-
-- and resource usage.
And a particular focus on
-
performance.
We're studying how to make
-
things fast.
In particular,
-
computer programs.
We also will discover and talk
-
about other resources such as
communication,
-
such as memory,
whether RAM memory or disk
-
memory.
There are other resources that
-
we may care about,
but predominantly we focus on
-
performance.
Because this is a course about
-
performance, I like to put
things in perspective a little
-
bit by starting out and asking,
in programming,
-
what is more important than
performance?
-
If you're in an engineering
situation and writing code,
-
writing software,
what's more important than
-
performance?
Correctness.
-
Good.
OK.
-
What else?
Simplicity can be.
-
Very good.
Yeah.
-
Maintainability often much more
important than performance.
-
Cost.
And what type of cost are you
-
thinking?
No, I mean cost of what?
-
We're talking software here,
right?
-
What type of cost do you have
in mind?
-
There are some costs that are
involved when programming like
-
programmer time.
So, programmer time is another
-
thing also that might be.
Stability.
-
Robustness of the software.
Does it break all the time?
-
What else?
-
Come on.
We've got a bunch of engineers
-
here.
A lot of things.
-
How about features?
Features can be more important.
-
Having a wider collection of
features than your competitors.
-
Functionality.
Modularity.
-
Is it designed in a way where
you can make changes in a local
-
part of the code and you don't
have to make changes across the
-
code in order to affect a simple
change in the functionality?
-
There is one big one which
definitely, especially in the
-
`90s, was like the big thing in
computers.
-
The big thing.
Well, security actually.
-
Good.
I don't even have that one
-
down.
Security is excellent.
-
That's actually been more in
the 2000.
-
Security has been far more
important often than
-
performance.
Scalability has been important,
-
although scalability,
in some sense,
-
is performance related.
But, yes, scalability is good.
-
What was the big breakthrough
and why do people use Macintosh
-
rather than Windows,
those people who are of that
-
religion?
User-friendliness.
-
Wow.
If you look at the number of
-
cycles of computers that went
into user-friendliness in the
-
`90s, it grew from almost
nothing to where it's now the
-
vast part of the computation
goes into user-friendly.
-
So, all those things are more
important than performance.
-
This is a course on
performance.
-
Then you can say OK,
well, why do we bother and why
-
study algorithms and performance
if it's at the bottom of the
-
heap?
Almost always people would
-
rather have these other things
than performance.
-
You go off and you say to
somebody, would I rather have
-
performance or more
user-friendliness?
-
It's almost always more
important than performance.
-
Why do we care then?
Yeah?
-
That wasn't user-friendly.
Sometimes performance is
-
correlated with
user-friendliness,
-
absolutely.
Nothing is more frustrating
-
than sitting there waiting,
right?
-
So, that's a good reason.
What are some other reasons
-
why?
Sometimes they have real-time
-
constraints so they don't
actually work unless they
-
perform adequately.
Yeah?
-
Hard to get,
well, we don't usually quantify
-
user-friendliness so I'm not
sure, but I understand what
-
you're saying.
He said we don't get
-
exponential performance
improvements in
-
user-friendliness.
We often don't get that in
-
performance either,
by the way.
-
[LAUGHTER] Sometimes we do,
but that's good.
-
There are several reasons that
I think are important.
-
Once is that often performance
measures the line between the
-
feasible and the infeasible.
We have heard some of these
-
things.
For example,
-
when there are real-time
requirements,
-
if it's not fast enough it's
simply not functional.
-
Or, if it uses too much memory
it's simply not going to work
-
for you.
And, as a consequence,
-
what you find is algorithms are
on the cutting edge of
-
entrepreneurship.
If you're talking about just
-
re-implementing stuff that
people did ten years ago,
-
performance isn't that
important at some level.
-
But, if you're talking about
doing stuff that nobody has done
-
before, one of the reasons often
that they haven't done it is
-
because it's too time-consuming.
Things don't scale and so
-
forth.
So, that's one reason,
-
is the feasible versus
infeasible.
-
Another thing is that
algorithms give you a language
-
for talking about program
behavior, and that turns out to
-
be a language that has been
pervasive through computer
-
science, is that the theoretical
language is what gets adopted by
-
all the practitioners because
it's a clean way of thinking
-
about things.
A good way I think about
-
performance, and the reason it's
on the bottom of the heap,
-
is sort of like performance is
like money, it's like currency.
-
You say what good does a stack
of hundred dollar bills do for
-
you?
Would you rather have food or
-
water or shelter or whatever?
And you're willing to pay those
-
hundred dollar bills,
if you have hundred dollar
-
bills, for that commodity.
Even though water is far more
-
important to your living.
Well, similarly,
-
performance is what you use to
pay for user-friendliness.
-
It's what you pay for security.
And you hear people say,
-
for example,
that I want greater
-
functionality,
so people will program in Java,
-
even though it's much slower
than C, because they'll say it
-
costs me maybe a factor of three
or something in performance to
-
program in Java.
But Java is worth it because
-
it's got all these
object-oriented features and so
-
forth, exception mechanisms and
so on.
-
And so people are willing to
pay a factor of three in
-
performance.
So, that's why you want
-
performance because you can use
it to pay for these other things
-
that you want.
And that's why,
-
in some sense,
it's on the bottom of the heap,
-
because it's the universal
thing that you quantify.
-
Do you want to spend a factor
of two on this or spend a factor
-
of three on security,
et cetera?
-
And, in addition,
the lessons generalize to other
-
resource measures like
communication,
-
like memory and so forth.
And the last reason we study
-
algorithm performance is it's
tons of fun.
-
Speed is always fun,
right?
-
Why do people drive fast cars,
race horses,
-
whatever?
Rockets, et cetera,
-
why do we do that?
Because speed is fun.
-
Ski.
Who likes to ski?
-
I love to ski.
I like going fast on those
-
skis.
It's fun.
-
Hockey, fast sports,
right?
-
We all like the fast sports.
Not all of us,
-
I mean.
Some people say he's not
-
talking to me.
OK, let's move on.
-
That's sort of a little bit of
a notion as to why we study
-
this, is that it does,
in some sense,
-
form a common basis for all
these other things we care
-
about.
And so we want to understand
-
how can we generate money for
ourselves in computation?
-
We're going to start out with a
very simple problem.
-
It's one of the oldest problems
that has been studied in
-
algorithms, is the problem of
sorting.
-
We're going to actually study
this for several lectures
-
because sorting contains many
algorithmic techniques.
-
The sorting problem is the
following.
-
We have a sequence a_1,
a_2 up to a_n of numbers as
-
input.
And our output is a permutation
-
of those numbers.
-
A permutation is a
rearrangement of the numbers.
-
Every number appears exactly
once in the rearrangement such
-
that, I sometimes use a dollar
sign to mean "such that," a_1 is
-
less than or equal to a_2 prime.
Such that they are
-
monotonically increasing in
size.
-
Take a bunch of numbers,
put them in order.
-
Here's an algorithm to do it.
It's called insertion sort.
-
And we will write this
algorithm in what we call
-
pseudocode.
It's sort of a programming
-
language, except it's got
English in there often.
-
And it's just a shorthand for
writing for being precise.
-
So this sorts A from 1 to n.
And here is the code for it.
-
This is what we call
pseudocode.
-
And if you don't understand the
pseudocode then you should ask
-
questions about any of the
notations.
-
You will start to get used to
it as we go on.
-
One thing is that in the
pseudocode we use indentation,
-
where in most languages they
have some kind of begin-end
-
delimiters like curly braces or
something in Java or C,
-
for example.
We just use indentation.
-
The whole idea of the
pseudocode is to try to get the
-
algorithms as short as possible
while still understanding what
-
the individual steps are.
In practice,
-
there actually have been
languages that use indentation
-
as a means of showing the
nesting of things.
-
It's generally a bad idea,
because if things go over one
-
page to another,
for example,
-
you cannot tell what level of
nesting it is.
-
Whereas, with explicit braces
it's much easier to tell.
-
So, there are reasons why this
is a bad notation if you were
-
doing software engineering.
But it's a good one for us
-
because it just keeps things
short and makes fewer things to
-
write down.
So, this is insertion sort.
-
Let's try to figure out a
little bit what this does.
-
It basically takes an array A
and at any point the thing to
-
understand is,
we're setting basically,
-
we're running the outer loop
from j is 2 to n,
-
and the inner loop that starts
at j minus 1 and then goes down
-
until it's zero.
Basically, if we look at any
-
point in the algorithm,
we essentially are looking at
-
some element here j.
A of j, the jth element.
-
And what we do essentially is
we pull a value out here that we
-
call the key.
And at this point the important
-
thing to understand,
and we'll talk more about this
-
in recitation on Friday,
is that there is an invariant
-
that is being maintained by this
loop each time through.
-
And the invariant is that this
part of the array is sorted.
-
And the goal each time through
the loop is to increase,
-
is to add one to the length of
the things that are sorted.
-
And the way we do that is we
pull out the key and we just
-
copy values up like this.
And keep copying up until we
-
find the place where this key
goes, and then we insert it in
-
that place.
And that's why it's called
-
insertion sort.
We just sort of move the
-
things, copy the things up until
we find where it goes,
-
and then we put it into place.
And now we have it from A from
-
one to j is sorted,
and now we can work on j plus
-
one.
Let's give an example of that.
-
Imagine we are doing 8,
2, 4, 9, 3, 6.
-
We start out with j equals 2.
And we figure out that we want
-
to insert it there.
Now we have 2,
-
8, 4, 9, 3, 6.
Then we look at the four and
-
say oh, well,
that goes over here.
-
We get 2, 4,
8, 9, 3, 6 after the second
-
iteration of the outer loop.
Then we look at 9 and discover
-
immediately it just goes right
there.
-
Very little work to do on that
step.
-
So, we have exactly the same
output after that iteration.
-
Then we look at the 3 and
that's going to be inserted over
-
there.
2, 3, 4, 8, 9,
-
6.
And finally we look at the 6
-
and that goes in there.
2, 3, 4, 6, 8,
-
9.
And at that point we are done.
-
Question?
-
The array initially starts at
one, yes.
-
A[1...n], OK?
So, this is the insertion sort
-
algorithm.
And it's the first algorithm
-
that we're going to analyze.
And we're going to pull out
-
some tools that we have from our
math background to help to
-
analyze it.
First of all,
-
let's take a look at the issue
of running time.
-
The running time depends,
of this algorithm depends on a
-
lot of things.
One thing it depends on is the
-
input itself.
For example,
-
if the input is already sorted
--
-
-- then insertion sort has very
little work to do.
-
Because every time through it's
going to be like this case.
-
It doesn't have to shuffle too
many guys over because they're
-
already in place.
Whereas, in some sense,
-
what's the worst case for
insertion sort?
-
If it is reverse sorted then
it's going to have to do a lot
-
of work because it's going to
have to shuffle everything over
-
on each step of the outer loop.
In addition to the actual input
-
it depends, of course,
on the input size.
-
Here, for example,
we did six elements.
-
It's going to take longer if
we, for example,
-
do six times ten to the ninth
elements.
-
If we were sorting a lot more
stuff, it's going to take us a
-
lot longer.
Typically, the way we handle
-
that is we are going to
parameterize things in the input
-
size.
We are going to talk about time
-
as a function of the size of
things that we are sorting so we
-
can look at what is the behavior
of that.
-
And the last thing I want to
say about running time is
-
generally we want upper bonds on
the running time.
-
We want to know that the time
is no more than a certain
-
amount.
And the reason is because that
-
represents a guarantee to the
user.
-
If I say it's not going to run,
for example,
-
if I tell you here's a program
and it won't run more than three
-
seconds, that gives you real
information about how you could
-
use it, for example,
in a real-time setting.
-
Whereas, if I said here's a
program and it goes at least
-
three seconds,
you don't know if it's going to
-
go for three years.
It doesn't give you that much
-
guarantee if you are a user of
it.
-
Generally we want upper bonds
because it represents a
-
guarantee to the user.
-
There are different kinds of
analyses that people do.
-
The one we're mostly going to
focus on is what's called
-
worst-case analysis.
And this is what we do usually
-
where we define T of n to be the
maximum time on any input of
-
size n.
So, it's the maximum input,
-
the maximum it could possibly
cost us on an input of size n.
-
What that does is,
if you look at the fact that
-
sometimes the inputs are better
and sometimes they're worse,
-
we're looking at the worst case
of those because that's the way
-
we're going to be able to make a
guarantee.
-
It always does something rather
than just sometimes does
-
something.
So, we're looking at the
-
maximum.
Notice that if I didn't have
-
maximum then T(n) in some sense
is a relation,
-
not a function,
because the time on an input of
-
size n depends on which input of
size n.
-
I could have many different
times, but by putting the
-
maximum at it,
it turns that relation into a
-
function because there's only
one maximum time that it will
-
take.
Sometimes we will talk about
-
average case.
Sometimes we will do this.
-
Here T of n is then the
expected time over all inputs of
-
size n.
It's the expected time.
-
Now, if I talk about expected
time, what else do I need to say
-
here?
What does that mean,
-
expected time?
I'm sorry.
-
Raise your hand.
Expected inputs.
-
What does that mean,
expected inputs?
-
I need more math.
What do I need by expected time
-
here, math?
You have to take the time of
-
every input and then average
them, OK.
-
That's kind of what we mean by
expected time.
-
Good.
Not quite.
-
I mean, what you say is
completely correct,
-
except is not quite enough.
Yeah?
-
It's the time of every input
times the probability that it
-
will be that input.
It's a way of taking a weighted
-
average, exactly right.
How do I know what the
-
probability of every input is?
How do I know what the
-
probability a particular input
occurs is in a given situation?
-
I don't.
I have to make an assumption.
-
What's that assumption called?
What kind of assumption do I
-
have to meet?
I need an assumption --
-
-- of the statistical
distribution of inputs.
-
Otherwise, expected time
doesn't mean anything because I
-
don't know what the probability
of something is.
-
In order to do probability,
you need some assumptions and
-
you've got to state those
assumptions clearly.
-
One of the most common
assumptions is that all inputs
-
are equally likely.
That's called the uniform
-
distribution.
Every input of size n is
-
equally likely,
that kind of thing.
-
But there are other ways that
you could make that assumption,
-
and they may not all be true.
This is much more complicated,
-
as you can see.
Fortunately,
-
all of you have a strong
probability background.
-
And so we will not have any
trouble addressing these
-
probabilistic issues of dealing
with expectations and such.
-
If you don't,
time to go and say gee,
-
maybe I should take that
Probability class that is a
-
prerequisite for this class.
The last one I am going to
-
mention is best-case analysis.
And this I claim is bogus.
-
Bogus.
No good.
-
Why is best-case analysis
bogus?
-
Yeah?
The best-case probably doesn't
-
ever happen.
Actually, it's interesting
-
because for the sorting problem,
the most common things that get
-
sorted are things that are
already sorted interestingly,
-
or at least almost sorted.
For example,
-
one of the most common things
that are sorted is check numbers
-
by banks.
They tend to come in,
-
in the same order that they are
written.
-
They're sorting things that are
almost always sorted.
-
I mean, it's good.
When upper bond,
-
not lower bound?
Yeah, you want to make a
-
guarantee.
And so why is this not a
-
guarantee?
You're onto something there,
-
but we need a little more
precision here.
-
How can I cheat?
Yeah?
-
Yeah, you can cheat.
You cheat.
-
You take any slow algorithm
that you want and just check for
-
some particular input,
and if it's that input,
-
then you say immediately yeah,
OK, here is the answer.
-
And then it's got a good
best-case.
-
But I didn't tell you anything
about the vast majority of what
-
is going on.
So, you can cheat with a slow
-
algorithm that works fast on
some input.
-
It doesn't really do much for
you so we normally don't worry
-
about that.
Let's see.
-
What is insertion sorts
worst-case time?
-
Now we get into some sort of
funny issues.
-
First of all,
it sort of depends on the
-
computer you're running on.
Whose computer,
-
right?
Is it a big supercomputer or is
-
it your wristwatch?
They have different
-
computational abilities.
And when we compare algorithms,
-
we compare them typically for
relative speed.
-
This is if you compared two
algorithms on the same machine.
-
You could argue,
well, it doesn't really matter
-
what the machine is because I
will just look at their relative
-
speed.
But, of course,
-
I may also be interested in
absolute speed.
-
Is one algorithm actually
better no matter what machine
-
it's run on?
-
And so this kind of gets sort
of confusing as to how I can
-
talk about the worst-case time
of an algorithm of a piece of
-
software when I am not talking
about the hardware because,
-
clearly, if I had run on a
faster machine,
-
my algorithms are going to go
faster.
-
So, this is where you get the
big idea of algorithms.
-
Which is why algorithm is such
a huge field,
-
why it spawns companies like
Google, like Akamai,
-
like Amazon.
Why algorithmic analysis,
-
throughout the history of
computing, has been such a huge
-
success, is our ability to
master and to be able to take
-
what is apparently a really
messy, complicated situation and
-
reduce it to being able to do
some mathematics.
-
And that idea is called
asymptotic analysis.
-
And the basic idea of
asymptotic analysis is to ignore
-
machine-dependent constants --
-
-- and, instead of the actual
running time,
-
look at the growth of the
running time.
-
So, we don't look at the actual
running time.
-
We look at the growth.
Let's see what we mean by that.
-
This is a huge idea.
It's not a hard idea,
-
otherwise I wouldn't be able to
teach it in the first lecture,
-
but it's a huge idea.
We are going to spend a couple
-
of lectures understanding the
implications of that and will
-
basically be doing it throughout
the term.
-
And if you go on to be
practicing engineers,
-
you will be doing it all the
time.
-
In order to do that,
we adopt some notations that
-
are going to help us.
In particular,
-
we will adopt asymptotic
notation.
-
Most of you have seen some kind
of asymptotic notation.
-
Maybe a few of you haven't,
but mostly you should have seen
-
a little bit.
The one we're going to be using
-
in this class predominantly is
theta notation.
-
And theta notation is pretty
easy notation to master because
-
all you do is,
from a formula,
-
just drop low order terms and
ignore leading constants.
-
For example,
if I have a formula like 3n^3 =
-
90n^2 - 5n + 6046,
I say, well,
-
what low-order terms do I drop?
Well, n^3 is a bigger term n^2
-
than.
I am going to drop all these
-
terms and ignore the leading
constant, so I say that's
-
Theta(n^3).
That's pretty easy.
-
So, that's theta notation.
Now, this is an engineering way
-
of manipulating theta notation.
There is actually a
-
mathematical definition for
this, which we are going to talk
-
about next time,
which is a definition in terms
-
of sets of functions.
And, you are going to be
-
responsible, this is both a math
and a computer science
-
engineering class.
Throughout the course you are
-
going to be responsible both for
mathematical rigor as if it were
-
a math course and engineering
commonsense because it's an
-
engineering course.
We are going to be doing both.
-
This is the engineering way of
understanding what you do,
-
so you're responsible for being
able to do these manipulations.
-
You're also going to be
responsible for understanding
-
the mathematical definition of
theta notion and of its related
-
O notation and omega notation.
If I take a look as n
-
approached infinity,
a Theta(n^2) algorithm always
-
beats, eventually,
a Theta(n^3) algorithm.
-
As n gets bigger,
it doesn't matter what these
-
other terms were if I were
describing the absolute precise
-
behavior in terms of a formula.
If I had a Theta(n^2)
-
algorithm, it would always be
faster for sufficiently large n
-
than a Theta(n^3) algorithm.
It wouldn't matter what those
-
low-order terms were.
It wouldn't matter what the
-
leading constant was.
This one will always be faster.
-
Even if you ran the Theta(n^2)
algorithm on a slow computer and
-
the Theta(n^3) algorithm on a
fast computer.
-
The great thing about
asymptotic notation is it
-
satisfies our issue of being
able to compare both relative
-
and absolute speed,
because we are able to do this
-
no matter what the computer
platform.
-
On different platforms we may
get different constants here,
-
machine-dependent constants for
the actual running time,
-
but if I look at the growth as
the size of the input gets
-
larger, the asymptotics
generally won't change.
-
For example,
I will just draw that as a
-
picture.
If I have n on this axis and
-
T(n) on this axis.
This may be,
-
for example,
a Theta(n^3) algorithm and this
-
may be a Theta(n^2) algorithm.
There is always going to be
-
some point n_o where for
everything larger the Theta(n^2)
-
algorithm is going to be cheaper
than the Theta(n^3) algorithm
-
not matter how much advantage
you give it at the beginning in
-
terms of the speed of the
computer you are running on.
-
Now, from an engineering point
of view, there are some issues
-
we have to deal with because
sometimes it could be that that
-
n_o is so large that the
computers aren't big enough to
-
run the problem.
That's why we,
-
nevertheless,
are interested in some of the
-
slower algorithms,
because some of the slower
-
algorithms, even though they may
not asymptotically be slower,
-
I mean asymptotically they will
be slower.
-
They may still be faster on
reasonable sizes of things.
-
And so we have to both balance
our mathematical understanding
-
with our engineering commonsense
in order to do good programming.
-
So, just having done analysis
of algorithms doesn't
-
automatically make you a good
programmer.
-
You also need to learn how to
program and use these tools in
-
practice to understand when they
are relevant and when they are
-
not relevant.
There is a saying.
-
If you want to be a good
program, you just program ever
-
day for two years,
you will be an excellent
-
programmer.
If you want to be a world-class
-
programmer, you can program
every day for ten years,
-
or you can program every day
for two years and take an
-
algorithms class.
Let's get back to what we were
-
doing, which is analyzing
insertion sort.
-
We are going to look at the
worse-case.
-
Which, as we mentioned before,
is when the input is reverse
-
sorted.
The biggest element comes first
-
and the smallest last because
now every time you do the
-
insertion you've got to shuffle
everything over.
-
You can write down the running
time by looking at the nesting
-
of loops.
What we do is we sum up.
-
What we assume is that every
operation, every elemental
-
operation is going to take some
constant amount of time.
-
But we don't have to worry
about what that constant is
-
because we're going to be doing
asymptotic analysis.
-
As I say, the beautify of the
method is that it causes all
-
these things that are real
distinctions to sort of vanish.
-
We sort of look at them from
30,000 feet rather than from
-
three millimeters or something.
Each of these operations is
-
going to sort of be a basic
operation.
-
One way to think about this,
in terms of counting
-
operations, is counting memory
references.
-
How many times do you actually
access some variable?
-
That's another way of sort of
thinking about this model.
-
When we do that,
well, we're going to go through
-
this loop, j is going from 2 to
n, and then we're going to add
-
up the work that we do within
the loop.
-
We can sort of write that in
math as summation of j equals 2
-
to n.
And then what is the work that
-
is going on in this loop?
Well, the work that is going on
-
in this loop varies,
but in the worst case how many
-
operations are going on here for
each value of j?
-
For a given value of j,
how much work goes on in this
-
loop?
Can somebody tell me?
-
Asymptotically.
It's j times some constant,
-
so it's theta j.
So, there is theta j work going
-
on here because this loop starts
out with i being j minus 1,
-
and then it's doing just a
constant amount of stuff for
-
each step of the value of i,
and i is running from j minus
-
one down to zero.
So, we can say that is theta j
-
work that is going on.
Do people follow that?
-
OK.
And now we have a formula we
-
can evaluate.
What is the evaluation?
-
If I want to simplify this
formula, what is that equal to?
-
Sorry. In the back there.
-
Yeah. OK.
That's just Theta(n^2),
-
good.
Because when you're saying is
-
the sum of consecutive numbers,
you mean what?
-
What's the mathematic term we
have for that so we can
-
communicate?
You've got to know these things
-
so you can communicate.
It's called what type of
-
sequence?
It's actually a series,
-
but that's OK.
What type of series is this
-
called?
Arithmetic series,
-
good.
Wow, we've got some sharp
-
people who can communicate.
This is an arithmetic series.
-
You're basically summing 1 + 2
+ 3 + 4, some constants in
-
there, but basically it's 1 + 2
+ 3 + 4 + 5 + 6 up to n.
-
That's Theta(n^2).
If you don't know this math,
-
there is a chapter in the book,
or you could have taken the
-
prerequisite.
Erythematic series.
-
People have this vague
recollection.
-
Oh, yeah.
Good.
-
Now, you have to learn these
manipulations.
-
We will talk about a bit next
time, but you have to learn your
-
theta manipulations for what
works with theta.
-
And you have to be very careful
because theta is a weak
-
notation.
A strong notation is something
-
like Leibniz notation from
calculus where the chain rule is
-
just canceling two things.
It's just fabulous that you can
-
cancel in the chain rule.
And Leibniz notation just
-
expresses that so directly you
can manipulate.
-
Theta notation is not like
that.
-
If you think it is like that
you are in trouble.
-
You really have to think of
what is going on under the theta
-
notation.
And it is more of a descriptive
-
notation than it is a
manipulative notation.
-
There are manipulations you can
do with it, but unless you
-
understand what is really going
on under the theta notation you
-
will find yourself in trouble.
And next time we will talk a
-
little bit more about theta
notation.
-
Is insertion sort fast?
-
Well, it turns out for small n
it is moderately fast.
-
But it is not at all for large
n.
-
So, I am going to give you an
algorithm that is faster.
-
It's called merge sort.
I wonder if I should leave
-
insertion sort up.
Why not.
-
I am going to write on this
later, so if you are taking
-
notes, leave some space on the
left.
-
Here is merge sort of an array
A from 1 up to n.
-
And it is basically three
steps.
-
If n equals 1 we are done.
Sorting one element,
-
it is already sorted.
All right.
-
Recursive algorithm.
Otherwise, what we do is we
-
recursively sort A from 1 up to
the ceiling of n over 2.
-
And the array A of the ceiling
of n over 2 plus one up to n.
-
So, we sort two halves of the
input.
-
And then, three,
we take those two lists that we
-
have done and we merge them.
-
And, to do that,
we use a merge subroutine which
-
I will show you.
-
The key subroutine here is
merge, and it works like this.
-
I have two lists.
Let's say one of them is 20.
-
I am doing this in reverse
order.
-
I have sorted this like this.
And then I sort another one.
-
I don't know why I do it this
order, but anyway.
-
Here is my other list.
I have my two lists that I have
-
sorted.
So, this is A[1] to A[|n/2|]
-
and A[|n/2|+1] to A[n] for the
way it will be called in this
-
program.
And now to merge these two,
-
what I want to do is produce a
sorted list out of both of them.
-
What I do is first observe
where is the smallest element of
-
any two lists that are already
sorted?
-
It's in one of two places,
the head of the first list or
-
the head of the second list.
I look at those two elements
-
and say which one is smaller?
This one is smaller.
-
Then what I do is output into
my output array the smaller of
-
the two.
And I cross it off.
-
And now where is the next
smallest element?
-
And the answer is it's going to
be the head of one of these two
-
lists.
Then I cross out this guy and
-
put him here and circle this
one.
-
Now I look at these two guys.
This one is smaller so I output
-
that and circle that one.
Now I look at these two guys,
-
output 9.
So, every step here is some
-
fixed number of operations that
is independent of the size of
-
the arrays at each step.
Each individual step is just me
-
looking at two elements and
picking out the smallest and
-
advancing some pointers into the
array so that I know where the
-
current head of that list is.
And so, therefore,
-
the time is order n on n total
elements.
-
The time to actually go through
this and merge two lists is
-
order n.
We sometimes call this linear
-
time because it's not quadratic
or whatever.
-
It is proportional to n,
proportional to the input size.
-
It's linear time.
I go through and just do this
-
simple operation,
just working up these lists,
-
and in the end I have done
essentially n operations,
-
order n operations each of
which cost constant time.
-
That's a total of order n time.
Everybody with me?
-
OK.
So, this is a recursive
-
program.
We can actually now write what
-
is called a recurrence for this
program.
-
The way we do that is say let's
let the time to sort n elements
-
to be T(n).
Then how long does it take to
-
do step one?
-
That's just constant time.
We just check to see if n is 1,
-
and if it is we return.
That's independent of the size
-
of anything that we are doing.
It just takes a certain number
-
of machine instructions on
whatever machine and we say it
-
is constant time.
We call that theta one.
-
This is actually a little bit
of an abuse if you get into it.
-
And the reason is because
typically in order to say it you
-
need to say what it is growing
with.
-
Nevertheless,
we use this as an abuse of the
-
notation just to mean it is a
constant.
-
So, that's an abuse just so
people know.
-
But it simplifies things if I
can just write theta one.
-
And it basically means the same
thing.
-
Now we recursively sort these
two things.
-
How can I describe that?
The time to do this,
-
I can describe recursively as T
of ceiling of n over 2 plus T of
-
n minus ceiling of n over 2.
That is actually kind of messy,
-
so what we will do is just be
sloppy and write 2T(n/2).
-
So, this is just us being
sloppy.
-
And we will see on Friday in
recitation that it is OK to be
-
sloppy.
That's the great thing about
-
algorithms.
As long as you are rigorous and
-
precise, you can be as sloppy as
you want.
-
[LAUGHTER] This is sloppy
because I didn't worry about
-
what was going on,
because it turns out it doesn't
-
make any difference.
And we are going to actually
-
see that that is the case.
And, finally,
-
I have to merge the two sorted
lists which have a total of n
-
elements.
And we just analyze that using
-
the merge subroutine.
And that takes us to theta n
-
time.
-
That allows us now to write a
recurrence for the performance
-
of merge sort.
-
Which is to say that T of n is
equal to theta 1 if n equals 1
-
and 2T of n over 2 plus theta of
n if n is bigger than 1.
-
Because either I am doing step
one or I am doing all steps one,
-
two and three.
Here I am doing step one and I
-
return and I am done.
Or else I am doing step one,
-
I don't return,
and then I also do steps two
-
and three.
So, I add those together.
-
I could say theta n plus theta
1, but theta n plus theta 1 is
-
just theta n because theta 1 is
a lower order term than theta n
-
and I can throw it away.
It is either theta 1 or it is
-
2T of n over 2 plus theta n.
Now, typically we won't be
-
writing this.
Usually we omit this.
-
If it makes no difference to
the solution of the recurrence,
-
we will usually omit constant
base cases.
-
In algorithms,
it's not true generally in
-
mathematics, but in algorithms
if you are running something on
-
a constant size input it takes
constant time always.
-
So, we don't worry about what
this value is.
-
And it turns out it has no real
impact on the asymptotic
-
solution of the recurrence.
How do we solve a recurrence
-
like this?
I now have T of n expressed in
-
terms of T of n over 2.
That's in the book and it is
-
also in Lecture 2.
We are going to do Lecture 2 to
-
solve that, but in the meantime
what I am going to do is give
-
you a visual way of
understanding what this costs,
-
which is one of the techniques
we will elaborate on next time.
-
It is called a recursion tree
technique.
-
And I will use it for the
actual recurrence that is almost
-
the same 2T(n/2),
but I am going to actually
-
explicitly, because I want you
to see where it occurs,
-
plus some constant times n
where c is a constant greater
-
than zero.
So, we are going to look at
-
this recurrence with a base case
of order one.
-
I am just making the constant
in here, the upper bound on the
-
constant be explicit rather than
implicit.
-
And the way you do a recursion
tree is the following.
-
You start out by writing down
the left-hand side of the
-
recurrence.
And then what you do is you say
-
well, that is equal to,
and now let's write it as a
-
tree.
I do c of n work plus now I am
-
going to have to do work on each
of my two children.
-
T of n over 2 and T of n over
2.
-
If I sum up what is in here,
I get this because that is what
-
the recurrence says,
T(n)=2T(n/2)+cn.
-
I have 2T(n/2)+cn.
Then I do it again.
-
I have cn here.
I now have here cn/2.
-
And here is cn/2.
And each of these now has a
-
T(n/4).
-
And these each have a T(n/4).
And this has a T(n/4).
-
And I keep doing that,
the dangerous dot,
-
dot, dots.
And, if I keep doing that,
-
I end up with it looking like
this.
-
And I keep going down until I
get to a leaf.
-
And a leaf, I have essentially
a T(1).
-
That is T(1).
And so the first question I ask
-
here is, what is the height of
this tree?
-
Yeah.
It's log n.
-
It's actually very close to
exactly log n because I am
-
starting out at the top with n
and then I go to n/2 and n/4 and
-
all the way down until I get to
1.
-
The number of halvings of n
until I get to 1 is log n so the
-
height here is log n.
It's OK if it is constant times
-
log n.
It doesn't matter.
-
How many leaves are in this
tree, by the way?
-
How many leaves does this tree
have?
-
Yeah.
The number of leaves,
-
once again, is actually pretty
close.
-
It's actually n.
If you took it all the way
-
down.
Let's make some simplifying
-
assumption.
n is a perfect power of 2,
-
so it is an integer power of 2.
Then this is exactly log n to
-
get down to T(1).
And then there are exactly n
-
leaves, because the number of
leaves here, the number of nodes
-
at this level is 1,
2, 4, 8.
-
And if I go down height h,
I have 2 to the h leaves,
-
2 to the log n,
that is just n.
-
We are doing math here,
right?
-
Now let's figure out how much
work, if I look at adding up
-
everything in this tree I am
going to get T(n),
-
so let's add that up.
Well, let's add it up level by
-
level.
How much do we have in the
-
first level?
Just cn.
-
If I add up the second level,
how much do I have?
-
cn.
How about if I add up the third
-
level?
cn.
-
How about if I add up all the
leaves?
-
Theta n.
It is not necessarily cn
-
because the boundary case may
have a different constant.
-
It is actually theta n,
but cn all the way here.
-
If I add up the total amount,
that is equal to cn times log
-
n, because that's the height,
that is how many cn's I have
-
here, plus theta n.
And this is a higher order term
-
than this, so this goes away,
get rid of the constants,
-
that is equal to theta(n lg n).
And theta(n lg n) is
-
asymptotically faster than
theta(n^2).
-
So, merge sort,
on a large enough input size,
-
is going to beat insertion
sort.
-
Merge sort is going to be a
faster algorithm.
-
Sorry, you guys,
I didn't realize you couldn't
-
see over there.
You should speak up if you
-
cannot see.
So, this is a faster algorithm
-
because theta(n lg n) grows more
slowly than theta(n^2).
-
And merge sort asymptotically
beats insertion sort.
-
Even if you ran insertion sort
on a supercomputer,
-
somebody running on a PC with
merge sort for sufficient large
-
input will clobber them because
actually n^2 is way bigger than
-
n log n once you get the n's to
be large.
-
And, in practice,
merge sort tends to win here
-
for n bigger than,
say, 30 or so.
-
If you have a very small input
like 30 elements,
-
insertion sort is a perfectly
decent sort to use.
-
But merge sort is going to be a
lot faster even for something
-
that is only a few dozen
elements.
-
It is going to actually be a
faster algorithm.
-
That's sort of the lessons,
OK?
-
Remember that to get your
recitation assignments and
-
attend recitation on Friday.
Because we are going to be
-
going through a bunch of the
things that I have left on the
-
table here.
And see you next Monday.