< Return to Video

Robert Tjian (Berkeley/HHMI) Part 1: Gene regulation: An introduction

  • 0:00 - 0:04
    - My name is Bob Tjian,
    I'm a Professor of MCB
  • 0:04 - 0:06
    at the University of
    California at Berkeley,
  • 0:06 - 0:10
    and I'm also serving as the
    President of the Howard Hughes.
  • 0:10 - 0:13
    I'm going to spend the
    next 25 or 30 minutes
  • 0:13 - 0:16
    telling you about some fundamentals
  • 0:16 - 0:20
    of one of the most important
    molecular processes
  • 0:20 - 0:23
    in living cells, which is
    the expression of genes
  • 0:23 - 0:26
    through a process called transcription.
  • 0:27 - 0:32
    Now, first, to understand
    what gene expression means,
  • 0:32 - 0:36
    you have to have a sense
    of what we tend to refer to
  • 0:36 - 0:40
    in the field as a central
    dogma of molecular biology.
  • 0:40 - 0:41
    Another way to think about this
  • 0:41 - 0:46
    is the flow of biological
    information from DNA,
  • 0:47 - 0:48
    in other words, our chromosomes,
  • 0:48 - 0:51
    which every cell has its compliment,
  • 0:52 - 0:57
    to be transcribed into a
    sister molecule called RNA.
  • 0:57 - 1:00
    So, this process of
    converting DNA into RNA
  • 1:00 - 1:02
    is called transcription,
  • 1:02 - 1:05
    and that is the topic of this lecture.
  • 1:05 - 1:09
    This process is very complicated,
  • 1:09 - 1:12
    as you'll see by the
    end of my two lectures,
  • 1:12 - 1:13
    and it is very important
  • 1:13 - 1:18
    for many, many fundamental
    processes in biology.
  • 1:18 - 1:21
    So, what I'm gonna
    spend today's lecture on
  • 1:21 - 1:25
    is the discovery of a large family
  • 1:25 - 1:27
    of transcription proteins.
  • 1:27 - 1:31
    These are factors we call
    them that are key molecules
  • 1:31 - 1:35
    that regulate the use
    of genetic information
  • 1:35 - 1:38
    that has been encoded in the genome.
  • 1:39 - 1:42
    Now, transcription factors or proteins
  • 1:42 - 1:46
    are involved in many
    fundamental aspects of biology,
  • 1:46 - 1:48
    including embryonic development,
  • 1:48 - 1:52
    cellular differentiation, and cell fate.
  • 1:52 - 1:55
    In other words, pretty much
    what your cells are doing,
  • 1:55 - 1:56
    how a tissue works,
  • 1:56 - 2:00
    and how an organism
    survives and reproduces
  • 2:00 - 2:03
    is dependent on the
    process of gene expression.
  • 2:03 - 2:07
    And the first step in this
    process is transcription.
  • 2:09 - 2:12
    Now, there are many other reasons
  • 2:12 - 2:15
    why a large group of people and scientists
  • 2:15 - 2:16
    are interested in transcription,
  • 2:16 - 2:19
    and another reason is that understanding
  • 2:19 - 2:22
    the fundamental molecular mechanisms
  • 2:22 - 2:25
    that controls transcription in humans
  • 2:25 - 2:27
    or in any other organism
  • 2:27 - 2:32
    can inform us and teach
    us about what happens
  • 2:32 - 2:35
    when something goes wrong,
    for example, in diseases.
  • 2:35 - 2:38
    And I list here just a few diseases
  • 2:38 - 2:41
    that we could study as a result
  • 2:41 - 2:43
    of understanding the
    structure and function
  • 2:43 - 2:46
    of these transcription factor proteins
  • 2:46 - 2:48
    that I'm going to be telling you about.
  • 2:48 - 2:51
    And of course, the hope
    is that in understanding
  • 2:51 - 2:55
    the molecular underpinnings of
    complex diseases like cancer,
  • 2:55 - 2:58
    diabetes, Parkinson's, and so forth,
  • 2:58 - 3:01
    that we will be able to develop and use
  • 3:02 - 3:05
    better, more specific therapeutic drugs
  • 3:05 - 3:08
    and also to develop more accurate
  • 3:08 - 3:10
    and rapid diagnostic tools.
  • 3:10 - 3:12
    So, those are a couple of the reasons
  • 3:12 - 3:14
    why many of us have spent,
  • 3:14 - 3:16
    in my case, over 30 years
  • 3:16 - 3:19
    studying this process of
    transcriptional regulation.
  • 3:21 - 3:23
    Now, to get the whole thing started,
  • 3:23 - 3:24
    I have to give you a sense
  • 3:24 - 3:27
    of what the magnitude of the problem is.
  • 3:27 - 3:30
    So, imagine that one would
    really like to understand
  • 3:30 - 3:35
    how this process of decoding
    the genome happens in humans.
  • 3:35 - 3:36
    So, as you may know,
  • 3:36 - 3:39
    the human genome has
    some 3 billion base pairs
  • 3:39 - 3:41
    or bits of genetic information,
  • 3:41 - 3:45
    and that encodes roughly 22,000 genes.
  • 3:45 - 3:48
    These are stretches of DNA sequence
  • 3:48 - 3:51
    that encode ultimately a product
  • 3:51 - 3:56
    that is a protein which actually
    makes the cells function.
  • 3:56 - 3:58
    So, as I already explained to you,
  • 3:58 - 4:01
    there's this flow of
    biological information
  • 4:01 - 4:04
    where you have to extract the
    information buried in DNA,
  • 4:04 - 4:05
    convert it into RNA.
  • 4:05 - 4:08
    And what I'm not gonna
    tell you about today
  • 4:08 - 4:10
    is the process of going
    from RNA to protein,
  • 4:10 - 4:14
    which is a reaction called
    a translational reaction.
  • 4:14 - 4:17
    I'm going to instead just
    focus on the first step
  • 4:17 - 4:19
    of converting DNA into RNA,
  • 4:19 - 4:21
    which is the process of transcription.
  • 4:23 - 4:27
    Now, one of the most amazing results
  • 4:27 - 4:29
    that we got over the last decade or so
  • 4:29 - 4:32
    was when the human genome
    was entirely sequenced,
  • 4:32 - 4:35
    the first few that were sequenced,
  • 4:35 - 4:38
    we realized that actually
    the number of genes in humans
  • 4:38 - 4:42
    is not vastly different
    from many other organisms,
  • 4:42 - 4:46
    even simple organisms like little worms
  • 4:46 - 4:47
    or fruit flies and so forth.
  • 4:47 - 4:51
    That is roughly 22 to 25,000 genes
  • 4:51 - 4:53
    is all the number of genes
  • 4:53 - 4:56
    that all of these
    different organisms have.
  • 4:56 - 4:58
    And yet, anybody looking at us
  • 4:58 - 5:02
    versus a little roundworm
    in the soil or a fruit fly
  • 5:02 - 5:05
    can tell that we're a
    much more complex organism
  • 5:05 - 5:07
    with a much bigger brain,
  • 5:07 - 5:10
    much more complex behavior, and so forth.
  • 5:10 - 5:11
    So, how does this happen?
  • 5:12 - 5:16
    Part of the answer to this
    very interesting mystery
  • 5:16 - 5:20
    or paradox lies in the way
    that genes are organized
  • 5:20 - 5:22
    and how they're regulated.
  • 5:22 - 5:24
    And one of the most striking results
  • 5:24 - 5:26
    of the genome sequencing
    project was to realize
  • 5:26 - 5:31
    that a vast, vast majority
    of the DNA in our chromosomes
  • 5:31 - 5:35
    is actually not coding for
    specific gene products,
  • 5:35 - 5:40
    and that only roughly 3% of
    the DNA is actually encoding
  • 5:40 - 5:42
    let's call those little arrows
  • 5:42 - 5:45
    that I show you on this purple DNA
  • 5:45 - 5:46
    are the gene coding regions.
  • 5:46 - 5:50
    So, you'll notice that there's
    a lot of non-arrow sequences,
  • 5:50 - 5:53
    which I'll show you in
    this next slide as green.
  • 5:53 - 5:55
    These are non-coding regions.
  • 5:55 - 5:59
    So, the vast majority, 97%
    or greater is non-coding,
  • 5:59 - 6:03
    so what are these other sequences doing?
  • 6:03 - 6:06
    And of course, it turns
    out that these sequences
  • 6:06 - 6:10
    carry very important
    little fragments of DNA,
  • 6:10 - 6:12
    which we call regulatory sequences.
  • 6:12 - 6:15
    And these are the sequences
    that actually control
  • 6:15 - 6:19
    whether a gene gets turned on or not.
  • 6:19 - 6:22
    And I'll be spending much
    of the next 20 minutes
  • 6:22 - 6:24
    telling you about how
    this process all works
  • 6:24 - 6:28
    and what these little
    bits of DNA sequences
  • 6:28 - 6:32
    actually function to
    control gene expression.
  • 6:34 - 6:37
    Now, the other thing that I
    have to bring you up to date on
  • 6:37 - 6:41
    is this mysterious process
    we're calling transcription,
  • 6:41 - 6:44
    which reads double-stranded DNA
  • 6:44 - 6:45
    and then makes a related molecule,
  • 6:45 - 6:48
    which is a single-stranded RNA molecule,
  • 6:48 - 6:50
    which is a informational molecule.
  • 6:50 - 6:54
    That reaction is catalyzed
    by a very complex
  • 6:54 - 6:59
    multi-subunit enzyme
    called RNA polymerase II.
  • 6:59 - 7:01
    Now, there's the roman
    numeral II at the end of this
  • 7:01 - 7:06
    because there were actually
    three enzymes in most mammals,
  • 7:07 - 7:09
    at least three enzymes that
    carry out different processes
  • 7:09 - 7:12
    and different types of RNA production.
  • 7:12 - 7:13
    But I'm only gonna tell you
  • 7:13 - 7:16
    about the ones that make
    the classical messenger RNA,
  • 7:16 - 7:19
    which then ultimately becomes proteins.
  • 7:19 - 7:22
    So, now one of the things that we learned
  • 7:22 - 7:26
    early on in the study of mammalian
  • 7:26 - 7:30
    or other multicellular organism
    transcription processes
  • 7:30 - 7:33
    is that despite the fact that this enzyme
  • 7:33 - 7:35
    is quite complex in its structure,
  • 7:36 - 7:39
    it turns out to be an
    enzyme that's nevertheless
  • 7:39 - 7:42
    needs a lot of help to do its job.
  • 7:42 - 7:45
    So, on its own, this RNA polymerase II
  • 7:45 - 7:49
    cannot tell the difference
    between the non-coding regions
  • 7:49 - 7:52
    of the genome and places where
    it's supposed to be coding
  • 7:52 - 7:57
    or reading to make the
    appropriate messenger RNAs.
  • 7:57 - 8:00
    So, this sort of leads you
    to think that there must be
  • 8:00 - 8:05
    a number of other factors
    that somehow direct
  • 8:05 - 8:08
    RNA polymerase to the right
    place at the right time
  • 8:08 - 8:11
    in the genome of every cell in your body
  • 8:11 - 8:14
    so that the right products get made
  • 8:14 - 8:17
    so each cell in your body
    is functioning properly.
  • 8:18 - 8:22
    And this is where things
    get really interesting
  • 8:22 - 8:25
    is some 25, 30 years ago,
  • 8:25 - 8:29
    a number of laboratories took on the job
  • 8:29 - 8:33
    of hunting for these elusive
    and, as it turned out,
  • 8:33 - 8:35
    a specialized protein factors
  • 8:35 - 8:39
    that recognize these little
    stretches of DNA sequences
  • 8:39 - 8:40
    that I've been telling you about
  • 8:40 - 8:42
    that make up the vast majority
  • 8:42 - 8:45
    of the non-coding part of the genome.
  • 8:45 - 8:48
    And how these proteins then can recognize
  • 8:48 - 8:51
    and ultimately physically interact
  • 8:51 - 8:54
    with these little bits
    of genetic information
  • 8:54 - 8:58
    to then turn genes on or off.
  • 8:58 - 9:00
    Now, in this lecture,
  • 9:00 - 9:04
    I can't go into all the details
    of the types of experiments
  • 9:04 - 9:08
    or the ranges of experiments
    that many, many laboratories
  • 9:08 - 9:09
    have done over the last two decades
  • 9:09 - 9:13
    to finally work out this molecular puzzle
  • 9:13 - 9:15
    of how transcription works.
  • 9:15 - 9:17
    But I can tell you that
    there are fundamentally
  • 9:17 - 9:19
    two major approaches that have been taken
  • 9:19 - 9:24
    over the last few decades
    to kind of get a parts list
  • 9:25 - 9:27
    of the machinery that decodes the genome
  • 9:27 - 9:30
    and carries out the
    process of transcription.
  • 9:30 - 9:32
    One is kind of the old style,
  • 9:32 - 9:35
    I'll call it bucket biochemistry
  • 9:35 - 9:39
    or take a live cell, crush it up,
  • 9:39 - 9:42
    spread out all of its parts
    and then try to figure out
  • 9:42 - 9:43
    how to put it back together again,
  • 9:43 - 9:46
    that's what I call in vitro biochemistry.
  • 9:46 - 9:48
    And the other one is in vivo genetics
  • 9:48 - 9:51
    where you effectively use genetic tools,
  • 9:51 - 9:55
    mutagenesis to go in there
    and selectively remove
  • 9:55 - 9:59
    or knock down or knock out
    certain genes and gene products
  • 9:59 - 10:02
    and then ask what is the
    consequence on that cell
  • 10:02 - 10:03
    or that organism?
  • 10:03 - 10:08
    Both of these technologies
    are very powerful
  • 10:08 - 10:13
    and highly complementary,
    and they continue to be used.
  • 10:13 - 10:16
    Today, I will focus primarily
  • 10:16 - 10:19
    on the in vitro biochemical techniques
  • 10:19 - 10:23
    which led us to the discovery
    of the first few classes
  • 10:23 - 10:24
    of transcription factors.
  • 10:24 - 10:26
    And in subsequent lectures,
  • 10:26 - 10:30
    we'll go to more recent technologies
  • 10:30 - 10:33
    that allows us to sort of
    speed up this whole process
  • 10:33 - 10:35
    of identifying key regulatory molecules
  • 10:35 - 10:37
    and how they work.
  • 10:39 - 10:43
    So, let's go back to the
    sort of the basic unit
  • 10:43 - 10:45
    of gene expression, which is a gene,
  • 10:45 - 10:49
    here shown in the orange arrow,
  • 10:49 - 10:52
    and the non-coding
    sequences surrounding it.
  • 10:52 - 10:55
    And you'll see that now I've
    added a few more elements
  • 10:55 - 10:56
    to this purple DNA.
  • 10:56 - 10:59
    You see some symbols, a blue square,
  • 10:59 - 11:03
    a round circle that's pink,
    and then a yellow triangle.
  • 11:03 - 11:07
    Those are just a way for
    me to graphically represent
  • 11:07 - 11:09
    the little bits of DNA sequences
  • 11:09 - 11:11
    that I told you about that
    are the regulatory sequences.
  • 11:11 - 11:15
    So, the little round one
    happens to very GC-rich,
  • 11:15 - 11:18
    the triangle one is a classical element
  • 11:18 - 11:19
    that's called a TATA box,
  • 11:19 - 11:21
    I'll tell you about a little bit later.
  • 11:21 - 11:24
    And the blue one is yet
    another recognition element.
  • 11:24 - 11:27
    So, why are we so interested
    in these little stretches
  • 11:27 - 11:30
    of nucleic acid sequence in the genome
  • 11:30 - 11:33
    when it's buried amongst
    billions of other sequences?
  • 11:33 - 11:36
    Well, these individual little sequences
  • 11:36 - 11:39
    turn out to be very important
    because of where they sit,
  • 11:39 - 11:42
    you'll notice they're sitting
    near the top of the arrow,
  • 11:42 - 11:46
    and they are recognized
    by very special proteins
  • 11:46 - 11:48
    which are the transcription factors.
  • 11:48 - 11:51
    So, now I've showing you some symbols
  • 11:51 - 11:54
    with little cutouts which
    fit into either the square,
  • 11:54 - 11:56
    the circle, or the triangle.
  • 11:56 - 11:59
    So, transcription factors,
  • 11:59 - 12:03
    at least one major family
    of transcription factors,
  • 12:03 - 12:07
    are proteins whose
    three-dimensional structure
  • 12:07 - 12:10
    is folded into a shape that
    allows them to recognize
  • 12:10 - 12:13
    these short stretches
    of double-stranded DNA.
  • 12:14 - 12:16
    In fact, largely through interactions
  • 12:16 - 12:17
    with the major group of DNA,
  • 12:17 - 12:20
    and I'll show you a structure
    of one in a little bit.
  • 12:21 - 12:24
    So, now it turns out
    that there are probably
  • 12:24 - 12:27
    thousands of these transcription factors
  • 12:27 - 12:29
    because the number of genes
    that we have to control,
  • 12:29 - 12:33
    as I showed you, is in the
    order of 20 or 25,000 genes.
  • 12:33 - 12:37
    And so, it turns out that you
    need a pretty large percentage
  • 12:37 - 12:41
    of the genome devoted to encoding
    these regulatory proteins
  • 12:41 - 12:45
    in order for a complex organism
    like ourselves to survive.
  • 12:45 - 12:47
    Then the other component of this,
  • 12:47 - 12:50
    let's call it the
    transcriptional apparatus,
  • 12:50 - 12:52
    is, of course, the enzyme
    that catalyzes RNA.
  • 12:52 - 12:56
    And I already told you
    that this enzyme on its own
  • 12:56 - 12:59
    can't tell the difference
    between random DNA sequence
  • 12:59 - 13:01
    and a gene or a promoter.
  • 13:01 - 13:06
    These other sequence-specific
    DNA-binding proteins
  • 13:06 - 13:09
    are the ones that must recruit
  • 13:09 - 13:11
    or otherwise direct RNA polymerase
  • 13:11 - 13:16
    to essentially land on the right
    place and at the right time
  • 13:16 - 13:19
    in the genome to turn on
    a certain subset of genes
  • 13:19 - 13:24
    that are specifically required
    in a specialized cell type,
  • 13:24 - 13:26
    whatever cell you happen to be looking at.
  • 13:26 - 13:30
    So, that is kind of the
    first level of complexity
  • 13:30 - 13:33
    of sort of informational interactions
  • 13:33 - 13:35
    between the transcription factors
  • 13:35 - 13:38
    and the more ubiquitous,
  • 13:38 - 13:42
    and I would call it promiscuous
    RNA polymerase II enzyme.
  • 13:44 - 13:45
    Well, as it turns out,
  • 13:45 - 13:49
    it took several decades to work out
  • 13:49 - 13:53
    most if not all of the components
  • 13:53 - 13:56
    of this so-called
    transcriptional machinery.
  • 13:56 - 14:01
    And it turns out in this
    slide I'm showing you
  • 14:01 - 14:03
    things are already starting
    to get more complicated.
  • 14:03 - 14:05
    So, not only do you have RNA polymerase,
  • 14:05 - 14:08
    but you have a bunch of other
    proteins that go by names
  • 14:08 - 14:11
    like TFIIA, B,
  • 14:11 - 14:13
    you know, D, E, H, F, and so forth.
  • 14:13 - 14:16
    So, it looks like there
    are going to be many,
  • 14:16 - 14:18
    many proteins that are necessary
  • 14:18 - 14:22
    to form the transcriptional apparatus.
  • 14:22 - 14:23
    And then on top of that
  • 14:23 - 14:26
    you need sequence-specific
    DNA-binding proteins
  • 14:26 - 14:29
    which are already described to you
  • 14:29 - 14:33
    to further inform or
    otherwise regulate the process
  • 14:33 - 14:36
    of when a particular
    RNA polymerase molecule
  • 14:36 - 14:38
    should be binding to a particular gene.
  • 14:38 - 14:40
    So, that's the sort of overview,
  • 14:40 - 14:42
    now let me get into the specifics
  • 14:42 - 14:45
    and how did we actually discover
    these family of proteins.
  • 14:45 - 14:47
    And it'll be interesting for you to see
  • 14:47 - 14:51
    how science in this field evolved.
  • 14:51 - 14:54
    Now, as is often the case
  • 14:54 - 14:57
    when you first try to tackle
    a very complex problem,
  • 14:57 - 14:59
    and, of course, we didn't
    really know how complex it was
  • 14:59 - 15:01
    when we began these studies,
  • 15:01 - 15:03
    but we assumed it might be complicated,
  • 15:03 - 15:07
    certainly would be more
    complicated than systems
  • 15:07 - 15:09
    that we had already had some idea about,
  • 15:09 - 15:14
    for example, in bacteria
    or in bacteriophages.
  • 15:14 - 15:17
    We took a lesson from our
    studies of bacteriophages
  • 15:17 - 15:20
    and decided that to begin to dissect
  • 15:20 - 15:22
    the molecular complexities
  • 15:22 - 15:25
    of the transcription
    process in animal cells,
  • 15:25 - 15:27
    we should start with viruses
  • 15:27 - 15:31
    because we knew that viruses
    will enter these host cells,
  • 15:31 - 15:34
    these complex cells that
    we ultimately want to study
  • 15:34 - 15:36
    and have to use the
    same molecular machinery
  • 15:36 - 15:39
    to transcribe their genes
  • 15:39 - 15:42
    as the host mammalian cell would do.
  • 15:42 - 15:44
    So, this was kind of a trick
  • 15:44 - 15:47
    or a way to look at a molecular window
  • 15:47 - 15:50
    into a complex system
    and try to simplify it.
  • 15:50 - 15:51
    And in our case,
  • 15:51 - 15:55
    the early studies of the
    late '70s and early '80s
  • 15:55 - 15:56
    involved very simple,
  • 15:56 - 15:59
    one of these simplest
    double-stranded DNA viruses
  • 15:59 - 16:01
    called Simian virus 40.
  • 16:01 - 16:03
    And Simian virus 40, of
    course, is a monkey virus,
  • 16:03 - 16:06
    which was nice because
    it's very close to humans
  • 16:06 - 16:08
    and many things that we could learn
  • 16:08 - 16:11
    about the way this virus uses its host,
  • 16:11 - 16:13
    which are monkey cells, to replicate
  • 16:13 - 16:16
    and to express their RNAs and genes
  • 16:16 - 16:20
    would be applicable to our
    studies of humans, as you'll see.
  • 16:20 - 16:23
    And this virus was one of the first
  • 16:23 - 16:28
    whose DNA, its double-stranded
    DNA of about 5,000 base pairs
  • 16:28 - 16:29
    was fully sequenced.
  • 16:29 - 16:33
    This was long before a
    rapid modern day sequencing
  • 16:33 - 16:36
    was available, so this gave
    us a very powerful tool.
  • 16:36 - 16:38
    It basically allowed us to
    look at the entire genome
  • 16:38 - 16:41
    of this virus, which
    was tiny by comparison,
  • 16:41 - 16:45
    only 5,243 base pairs.
  • 16:45 - 16:48
    But just that information
    was already very important
  • 16:48 - 16:50
    'cause it very quickly allowed us,
  • 16:50 - 16:53
    for example, to map where the genes are.
  • 16:53 - 16:56
    And one of the genes encoded a protein
  • 16:56 - 16:58
    called a tumor antigen,
  • 16:58 - 17:00
    which turns out to be
    a transcription factor.
  • 17:00 - 17:03
    This then allowed us to get our hands
  • 17:03 - 17:06
    basically to do biochemistry and genetics
  • 17:06 - 17:09
    on the very first eukaryotic
    transcription factor,
  • 17:09 - 17:12
    which in this case
    happens to be a represser.
  • 17:12 - 17:15
    That is a protein that
    when it binds the DNA
  • 17:15 - 17:19
    just the same way as I showed
    you for the the model case,
  • 17:20 - 17:24
    it binds through specific
    protein DNA interactions.
  • 17:24 - 17:27
    But in this case, actually
    shuts transcription down
  • 17:27 - 17:28
    rather than turn it up.
  • 17:30 - 17:34
    In the process of studying
    the way that this little virus
  • 17:34 - 17:36
    when it infects a mammalian cell
  • 17:36 - 17:39
    uses proteins like T-antigen
  • 17:39 - 17:43
    to regulate its gene expression,
  • 17:43 - 17:46
    it became clear that it had
    to use the host machinery
  • 17:46 - 17:48
    to do the process.
  • 17:48 - 17:53
    And that meant that there
    must be monkey proteins
  • 17:53 - 17:55
    that are also involved in activating
  • 17:55 - 17:58
    or repressing genes of this virus.
  • 17:58 - 18:01
    And this then led us to
    the most important step,
  • 18:01 - 18:04
    which is to transfer the
    technology we learned about viruses
  • 18:04 - 18:07
    and how to work with the
    virus transcription factor
  • 18:07 - 18:09
    like T-antigen to the cellular ones.
  • 18:09 - 18:11
    And I'm gonna give you just one example
  • 18:11 - 18:14
    of how the simple jump into the host cell
  • 18:14 - 18:18
    allowed us to discover the first
    human transcription factor.
  • 18:18 - 18:21
    So, the question that we then asked
  • 18:21 - 18:26
    back in the early 1980s
    was what host molecule
  • 18:26 - 18:29
    is regulating the expression
    of transcription of this virus
  • 18:29 - 18:31
    when the virus is in the host?
  • 18:31 - 18:34
    And we knew from the DNA
    sequence of the virus
  • 18:34 - 18:39
    that there were these six
    very GC-rich snippets of DNA
  • 18:40 - 18:42
    that were regulatory
    'cause if we deleted them,
  • 18:42 - 18:46
    the virus no longer would
    express the gene of interest.
  • 18:46 - 18:48
    So, we knew that something
    was probably responsible
  • 18:48 - 18:51
    for recognizing these GC boxes,
  • 18:51 - 18:54
    and we knew that it wasn't
    a virally encoded gene
  • 18:54 - 18:57
    because we had tested
    all of the viral genes
  • 18:57 - 18:59
    of which there were
    only six to begin with.
  • 18:59 - 19:01
    So, we knew it had to be a host gene
  • 19:01 - 19:05
    and that led us to a whole, I would say,
  • 19:05 - 19:08
    family of experiments
    that led to the discovery
  • 19:08 - 19:11
    of sequence-specific mammalian
    transcription factors.
  • 19:11 - 19:14
    And as I said, we could have
    taken multiple approaches
  • 19:14 - 19:17
    to try to address this complicated issue.
  • 19:17 - 19:19
    I'll just give you one example
  • 19:19 - 19:21
    of using in vitro biochemistry
  • 19:21 - 19:25
    to finally get our hands
    on this key sequence
  • 19:25 - 19:27
    specific human transcription factor,
  • 19:27 - 19:31
    which, of course, has a
    homologue in the monkey.
  • 19:32 - 19:35
    And the way we did it was very interesting
  • 19:35 - 19:37
    and simple in retrospect,
  • 19:37 - 19:39
    and that is recognizing the fact
  • 19:39 - 19:42
    that whatever this protein was,
  • 19:42 - 19:45
    it had to have the property of recognizing
  • 19:45 - 19:50
    those GC boxes that were sitting
    next to the the viral gene.
  • 19:50 - 19:52
    We assume that it must
    be a sequence-specific
  • 19:52 - 19:54
    DNA binding-protein, so all we had to do
  • 19:54 - 19:58
    was figure out a way to extract proteins
  • 19:58 - 20:01
    from human cells or monkey cells
  • 20:01 - 20:05
    and then try to fish out
    those specific proteins
  • 20:05 - 20:07
    out of the many thousands
    of different proteins
  • 20:07 - 20:10
    that were in this gamish
    of cellular extract
  • 20:10 - 20:12
    that would be responsible
    for discriminating
  • 20:12 - 20:17
    between random DNA sequences
    and the specific GC box.
  • 20:17 - 20:21
    And I'll quickly run through
    sort of the logic behind this.
  • 20:21 - 20:25
    So, what I'm showing you
    here is a solid surface
  • 20:25 - 20:29
    with DNA coupled to it
    that is highly enriched
  • 20:29 - 20:31
    for the recognition element, the GC box,
  • 20:31 - 20:33
    which should be the sequence
  • 20:33 - 20:35
    recognized by the protein of interest.
  • 20:35 - 20:37
    Now, we had no idea what this
    protein was gonna look like,
  • 20:37 - 20:40
    how many proteins there
    were gonna be, and so forth,
  • 20:40 - 20:42
    but we knew it had to
    recognize the GC box.
  • 20:42 - 20:45
    So, we're gonna try to
    fish this out of a pool
  • 20:45 - 20:47
    of many thousands of other proteins.
  • 20:47 - 20:50
    Now, the the key trick here
  • 20:50 - 20:52
    was that because all cell extracts
  • 20:52 - 20:55
    contain not only one DNA binding protein,
  • 20:55 - 20:57
    but, as I told you, thousands of different
  • 20:57 - 20:58
    DNA binding proteins.
  • 20:58 - 21:01
    But most of them, or in fact in our case,
  • 21:01 - 21:05
    none of the other of several
    hundred to a thousand proteins
  • 21:05 - 21:09
    that could bind DNA actually
    happen to recognize the GC box,
  • 21:09 - 21:11
    they just bind other DNA sequences.
  • 21:11 - 21:14
    So, to kind of favor our protein
  • 21:14 - 21:16
    being able to bind to our GC box
  • 21:16 - 21:19
    and not have to compete
    with all the other proteins,
  • 21:19 - 21:23
    what we did was to add non-specific DNA
  • 21:23 - 21:27
    and mask stoichiometric excess
  • 21:27 - 21:30
    so that all the other proteins
    that wouldn't recognize
  • 21:30 - 21:33
    the GC box would still have
    some partner to hang onto.
  • 21:33 - 21:35
    And this trick worked very well.
  • 21:35 - 21:40
    So, having the specific
    DNA on the solid resin
  • 21:40 - 21:44
    and the non-specific DNA
    flowing all over the place,
  • 21:44 - 21:48
    we could capture selectively
    the pink molecules here,
  • 21:48 - 21:50
    which are the GC box recognition ones,
  • 21:50 - 21:53
    and the blue-green molecules,
  • 21:53 - 21:56
    of course, predominantly
    bind to non-specific DNA.
  • 21:56 - 21:58
    I show you one little
    blue one on the column
  • 21:58 - 22:01
    because nothing works
    perfectly in real science
  • 22:01 - 22:04
    and tells you that we have
    to go through this process
  • 22:04 - 22:08
    iteratively to actually
    finally obtain a preparation
  • 22:08 - 22:12
    that's purely pink molecules
    with no green-blue ones.
  • 22:12 - 22:14
    Well, that turned out
    to work very, very well.
  • 22:14 - 22:18
    And that whole process of
    biochemical fractionation
  • 22:18 - 22:23
    followed by a direct affinity
    sequence-specific DNA resin
  • 22:24 - 22:28
    gave us the ability to perform
    a biochemical purification
  • 22:28 - 22:32
    followed by a molecular cloning
    of the transcription factor
  • 22:32 - 22:35
    that encodes the protein SP1.
  • 22:35 - 22:37
    And then we carried out
    a bunch of experiments,
  • 22:37 - 22:39
    which I'll tell you next,
  • 22:39 - 22:40
    to show that this protein
  • 22:40 - 22:42
    actually does activate transcription.
  • 22:44 - 22:46
    And of course, we went back and
    we proved that this protein,
  • 22:46 - 22:49
    which turned out to be a
    rather large polypeptide,
  • 22:49 - 22:52
    can indeed recognize the GC box.
  • 22:52 - 22:56
    And it doesn't matter if it's
    a GC box from the SV 0 genome
  • 22:56 - 22:59
    or any other GC box that we
    could find in the human genome,
  • 22:59 - 23:02
    it would find that sequence and bind to it
  • 23:02 - 23:06
    and then it would generally
    activate transcription.
  • 23:06 - 23:08
    So, this led to the discovery of the first
  • 23:08 - 23:11
    of a very large family
  • 23:11 - 23:14
    of sequence-specific DNA-binding proteins.
  • 23:14 - 23:16
    Now, I told you that
    the way these proteins
  • 23:16 - 23:19
    tend to recognize short DNA sequences
  • 23:19 - 23:22
    is to interact with DNA
    through the major groove.
  • 23:22 - 23:23
    And here's a perfect example.
  • 23:23 - 23:25
    So, the thick blue model there
  • 23:25 - 23:29
    shows the actual three structures
  • 23:29 - 23:30
    that are called zinc fingers.
  • 23:30 - 23:31
    And the reason they're called zinc fingers
  • 23:31 - 23:35
    is because there are amino
    acids that are organized
  • 23:35 - 23:38
    around a center that
    contains a zinc molecule
  • 23:38 - 23:41
    which holds the three-dimensional
    shape of the polypeptide
  • 23:41 - 23:44
    in a position just right
  • 23:44 - 23:46
    for fitting into the
    major groove of the DNA.
  • 23:46 - 23:48
    And the DNA here is shown in pink,
  • 23:48 - 23:50
    and you can see that that blue outline
  • 23:50 - 23:53
    fits right into the
    major groove of the DNA,
  • 23:53 - 23:55
    but not to the minor groove.
  • 23:55 - 23:58
    And one of the most important findings
  • 23:58 - 23:59
    was not only the discovery
  • 23:59 - 24:01
    of the first human transcription factor,
  • 24:01 - 24:05
    but the realization that most
    if not all sequence-specific
  • 24:05 - 24:07
    DNA-binding transcription factors
  • 24:07 - 24:09
    have a similar structural motif.
  • 24:09 - 24:13
    That is to say some structure
    is built to recognize
  • 24:13 - 24:16
    sequences in the major groove of DNA.
  • 24:16 - 24:19
    And these three-dimensional motifs
  • 24:19 - 24:24
    are recognizable as amino
    acid sequences in the genome.
  • 24:24 - 24:28
    So, we can now much more
    quickly scan the entire sequence
  • 24:28 - 24:30
    of a genome and identify genes
  • 24:30 - 24:32
    that are likely to be DNA-binding proteins
  • 24:32 - 24:35
    as a result of understanding
    the structure-function
  • 24:35 - 24:39
    relationships of these DNA-binding
    motifs like zinc fingers.
  • 24:40 - 24:43
    So, what I'd like to show you now
  • 24:43 - 24:46
    is that I've only
    introduced you to one class
  • 24:46 - 24:48
    of transcription factors,
  • 24:48 - 24:51
    which are the sequence-specific-DNA
    binding proteins.
  • 24:51 - 24:54
    Well, I think I gave you a little taste
  • 24:54 - 24:55
    of the level of complexity
  • 24:55 - 24:57
    that's probably going to be needed
  • 24:57 - 25:00
    to be able to build the machine
  • 25:00 - 25:03
    that's ultimately going
    to be able to allow you
  • 25:03 - 25:07
    to transcribe every gene in
    every cell of a human body.
  • 25:07 - 25:10
    So, that turns out to be a
    much more elaborated machine
  • 25:10 - 25:12
    than what I just showed you.
  • 25:12 - 25:14
    So, I wanna show you now
  • 25:14 - 25:17
    what is sort of our
    state-of-the-art thinking
  • 25:17 - 25:21
    about what is actually
    needed to build the machinery
  • 25:21 - 25:25
    at a gene to allow it to be
    expressed and transcribed.
  • 25:25 - 25:28
    And the term I want to introduce you to
  • 25:28 - 25:31
    is the pre-initiation complex.
  • 25:31 - 25:33
    And it's pretty much what it says.
  • 25:33 - 25:36
    It's the complex of multiple subunits
  • 25:36 - 25:41
    that has to essentially land
    on the promoter of a gene
  • 25:41 - 25:44
    which will be designated
    for later expression.
  • 25:45 - 25:50
    And this is a process that
    is probably quite orderly,
  • 25:50 - 25:52
    that is there's an order
    of events that happens,
  • 25:52 - 25:55
    which we, by the way,
    are not entirely sure
  • 25:55 - 25:57
    exactly what the order
    is or even if the order
  • 25:57 - 25:59
    is the same from one gene to the next,
  • 25:59 - 26:02
    but we can kind of see where
    it starts and where it ends up.
  • 26:02 - 26:04
    And the pathway in between,
  • 26:04 - 26:07
    I would say is still a little bit murky.
  • 26:07 - 26:10
    And the story here again starts
    with a little snippet of DNA
  • 26:10 - 26:11
    called the TATA box,
  • 26:11 - 26:14
    which I already introduced you to briefly.
  • 26:14 - 26:19
    It's an AT-rich sequence which
    sits at the five prime end
  • 26:19 - 26:21
    or the beginning of many
    genes, but not all genes,
  • 26:21 - 26:25
    maybe 20% of the genes might
    contain this AT-rich region.
  • 26:26 - 26:30
    And that AT sequence is the signal
  • 26:30 - 26:32
    or a landmark, if you like,
  • 26:32 - 26:34
    for a particular protein to bind to it.
  • 26:34 - 26:36
    And that protein is called,
  • 26:36 - 26:38
    not surprisingly, the TATA-binding protein
  • 26:38 - 26:40
    'cause it's the TATA sequence.
  • 26:40 - 26:44
    And so, this represents a second class
  • 26:44 - 26:45
    of transcription factors.
  • 26:45 - 26:48
    These are not the type that
    I just introduced you to,
  • 26:48 - 26:51
    which are gonna be
    different for every gene,
  • 26:51 - 26:52
    the TATA sequence is present
  • 26:52 - 26:54
    in a very large number of genes,
  • 26:54 - 26:57
    so it can't be gene specific,
  • 26:57 - 26:59
    but it turns out to be very crucial
  • 26:59 - 27:02
    for our understanding of
    how gene regulation works.
  • 27:02 - 27:06
    So, so you start with
    a TATA-binding protein
  • 27:06 - 27:08
    finding a TATA box.
  • 27:08 - 27:10
    We later found out that
    the TATA-binding protein
  • 27:10 - 27:14
    rarely functions on its own
    and has a bunch of friends
  • 27:14 - 27:17
    that we call TAFs or
    TBP associated factors.
  • 27:17 - 27:19
    And now you're talking about an assembly
  • 27:19 - 27:23
    of multi-subunit complex of
    almost a million daltons.
  • 27:23 - 27:26
    There are somewhere
    between 12 to 15 subunits
  • 27:26 - 27:28
    in addition to the TATA-binding protein
  • 27:28 - 27:31
    that make up this little
    complex of proteins
  • 27:31 - 27:33
    that kind of travels around together.
  • 27:33 - 27:36
    And this is found in most cell types,
  • 27:36 - 27:39
    and later on I'll show you
    in a subsequent lecture
  • 27:39 - 27:41
    that not every cell type
  • 27:41 - 27:44
    might have exactly the same
    compliment of these subunits,
  • 27:44 - 27:48
    but many of them have
    this prototypic complex.
  • 27:48 - 27:52
    Is this enough for building
    the pre-initiation complex?
  • 27:52 - 27:54
    Unfortunately not.
  • 27:54 - 27:58
    It turns out that there
    are a host of other,
  • 27:58 - 28:00
    I'll call them ancillary factors
  • 28:00 - 28:03
    in addition to the multi-subunit
    RNA polymerase itself
  • 28:03 - 28:08
    that are necessary for you
    to build up an ensemble
  • 28:08 - 28:12
    that is necessary to form an active
  • 28:12 - 28:16
    ready to activate transcriptional
    pre-initiation complex
  • 28:16 - 28:17
    or the PIC.
  • 28:20 - 28:24
    And this is kind of the
    picture we're getting to,
  • 28:24 - 28:26
    and even this picture
    with many, many colors
  • 28:26 - 28:29
    and many, many different polypeptides,
  • 28:29 - 28:30
    you know, that adds up to probably greater
  • 28:30 - 28:34
    than 85 individual proteins
  • 28:34 - 28:37
    that all have to kind of fit
    together like a jigsaw puzzle.
  • 28:37 - 28:39
    It's probably not even the whole story,
  • 28:39 - 28:42
    you'll notice I still have one
    big red question mark there
  • 28:42 - 28:47
    because I think as we begin
    to study specific cell types
  • 28:47 - 28:50
    and specific processes
    like embryonic development
  • 28:50 - 28:53
    or germ layer formation,
  • 28:53 - 28:56
    additional components
    that are not present here
  • 28:56 - 28:58
    in this prototypic pre-initiation complex
  • 28:58 - 29:00
    will come into play,
  • 29:00 - 29:03
    and that's a subject
    of subsequent lecture.
  • 29:03 - 29:06
    But already you can tell that
    the transcriptional machinery
  • 29:06 - 29:09
    is anything but simple.
  • 29:10 - 29:13
    So, can we get a better
    idea of what transcription
  • 29:13 - 29:16
    might actually look like
    and what's happening
  • 29:16 - 29:18
    when a transcription process takes place?
  • 29:18 - 29:22
    So, let me first of all say
    that I'm gonna finish my lecture
  • 29:22 - 29:24
    now with a little cartoon,
  • 29:24 - 29:29
    which is our attempt to imagine
  • 29:29 - 29:31
    the events that take place
  • 29:31 - 29:33
    when you form a pre-initiation complex,
  • 29:33 - 29:37
    you bring regulatory proteins
    to the activated gene
  • 29:37 - 29:40
    and what happens during this process.
  • 29:40 - 29:44
    Now, keep in mind that
    this is at this point
  • 29:44 - 29:48
    mostly a cartoon that
    is in our imagination
  • 29:48 - 29:53
    and only parts or if any
    of this is probably real,
  • 29:53 - 29:56
    but it gives you a sense of the complexity
  • 29:56 - 29:59
    of the transactions
    that have to take place
  • 29:59 - 30:02
    just for one gene to
    transcribe and express itself.
  • 30:02 - 30:04
    So, let me show you the movie,
  • 30:04 - 30:07
    and then we'll finish
    just by keeping in mind
  • 30:07 - 30:10
    that there's much to be learned.
  • 30:10 - 30:13
    And in my next lecture,
    we'll go into the selectivity
  • 30:13 - 30:17
    of this process in specialized cell types.
  • 30:17 - 30:20
    So, now let's see what
    this sort of this cartoon
  • 30:20 - 30:22
    of transcription looks like.
  • 30:22 - 30:24
    So, we start off with DNA
  • 30:24 - 30:27
    with some preassembled TFIID molecule,
  • 30:27 - 30:29
    and along comes this other green molecule,
  • 30:29 - 30:31
    which is actually a co-factor,
  • 30:31 - 30:33
    which then forms this very large complex
  • 30:33 - 30:34
    with RNA polymerase.
  • 30:34 - 30:38
    And then a distal
    activator protein came in
  • 30:38 - 30:39
    and activated the process.
  • 30:39 - 30:44
    And this molecule, this bluish
    molecule that's moved away
  • 30:45 - 30:48
    from the complex is
    actually the RNA polymerase.
  • 30:48 - 30:52
    And that little yellow
    sort of bead on a string
  • 30:52 - 30:54
    is actually the RNA product.
  • 30:54 - 30:58
    So, that gives you a sense of
    things have to happen quickly
  • 30:58 - 31:00
    and yet it involves many, many molecules
  • 31:00 - 31:02
    having to assemble and then disassemble
  • 31:02 - 31:04
    to give you this reaction to happen.
  • 31:04 - 31:07
    And in my next lecture,
  • 31:07 - 31:10
    we'll go into more specific
    aspects of this reaction,
  • 31:10 - 31:13
    and particularly during
    embryonic development
  • 31:13 - 31:16
    and tissue-specific gene expression.
Title:
Robert Tjian (Berkeley/HHMI) Part 1: Gene regulation: An introduction
Description:

more » « less
Video Language:
English
Duration:
31:29

English subtitles

Revisions