1 00:00:00,240 --> 00:00:03,720 - My name is Bob Tjian, I'm a Professor of MCB 2 00:00:03,720 --> 00:00:06,000 at the University of California at Berkeley, 3 00:00:06,000 --> 00:00:10,020 and I'm also serving as the President of the Howard Hughes. 4 00:00:10,020 --> 00:00:13,440 I'm going to spend the next 25 or 30 minutes 5 00:00:13,440 --> 00:00:15,870 telling you about some fundamentals 6 00:00:15,870 --> 00:00:19,620 of one of the most important molecular processes 7 00:00:19,620 --> 00:00:23,490 in living cells, which is the expression of genes 8 00:00:23,490 --> 00:00:25,593 through a process called transcription. 9 00:00:26,610 --> 00:00:31,610 Now, first, to understand what gene expression means, 10 00:00:32,190 --> 00:00:36,360 you have to have a sense of what we tend to refer to 11 00:00:36,360 --> 00:00:39,900 in the field as a central dogma of molecular biology. 12 00:00:39,900 --> 00:00:41,280 Another way to think about this 13 00:00:41,280 --> 00:00:45,877 is the flow of biological information from DNA, 14 00:00:46,950 --> 00:00:48,480 in other words, our chromosomes, 15 00:00:48,480 --> 00:00:50,523 which every cell has its compliment, 16 00:00:52,020 --> 00:00:56,730 to be transcribed into a sister molecule called RNA. 17 00:00:56,730 --> 00:00:59,940 So, this process of converting DNA into RNA 18 00:00:59,940 --> 00:01:01,800 is called transcription, 19 00:01:01,800 --> 00:01:05,370 and that is the topic of this lecture. 20 00:01:05,370 --> 00:01:08,760 This process is very complicated, 21 00:01:08,760 --> 00:01:11,640 as you'll see by the end of my two lectures, 22 00:01:11,640 --> 00:01:13,260 and it is very important 23 00:01:13,260 --> 00:01:18,210 for many, many fundamental processes in biology. 24 00:01:18,210 --> 00:01:20,910 So, what I'm gonna spend today's lecture on 25 00:01:20,910 --> 00:01:24,840 is the discovery of a large family 26 00:01:24,840 --> 00:01:27,030 of transcription proteins. 27 00:01:27,030 --> 00:01:31,380 These are factors we call them that are key molecules 28 00:01:31,380 --> 00:01:35,370 that regulate the use of genetic information 29 00:01:35,370 --> 00:01:37,863 that has been encoded in the genome. 30 00:01:38,730 --> 00:01:41,670 Now, transcription factors or proteins 31 00:01:41,670 --> 00:01:46,050 are involved in many fundamental aspects of biology, 32 00:01:46,050 --> 00:01:48,270 including embryonic development, 33 00:01:48,270 --> 00:01:51,660 cellular differentiation, and cell fate. 34 00:01:51,660 --> 00:01:55,140 In other words, pretty much what your cells are doing, 35 00:01:55,140 --> 00:01:56,400 how a tissue works, 36 00:01:56,400 --> 00:01:59,970 and how an organism survives and reproduces 37 00:01:59,970 --> 00:02:03,480 is dependent on the process of gene expression. 38 00:02:03,480 --> 00:02:06,873 And the first step in this process is transcription. 39 00:02:09,150 --> 00:02:11,760 Now, there are many other reasons 40 00:02:11,760 --> 00:02:14,940 why a large group of people and scientists 41 00:02:14,940 --> 00:02:16,410 are interested in transcription, 42 00:02:16,410 --> 00:02:19,350 and another reason is that understanding 43 00:02:19,350 --> 00:02:21,750 the fundamental molecular mechanisms 44 00:02:21,750 --> 00:02:24,990 that controls transcription in humans 45 00:02:24,990 --> 00:02:26,560 or in any other organism 46 00:02:27,420 --> 00:02:31,800 can inform us and teach us about what happens 47 00:02:31,800 --> 00:02:34,860 when something goes wrong, for example, in diseases. 48 00:02:34,860 --> 00:02:38,100 And I list here just a few diseases 49 00:02:38,100 --> 00:02:40,740 that we could study as a result 50 00:02:40,740 --> 00:02:43,380 of understanding the structure and function 51 00:02:43,380 --> 00:02:45,900 of these transcription factor proteins 52 00:02:45,900 --> 00:02:48,240 that I'm going to be telling you about. 53 00:02:48,240 --> 00:02:50,550 And of course, the hope is that in understanding 54 00:02:50,550 --> 00:02:54,780 the molecular underpinnings of complex diseases like cancer, 55 00:02:54,780 --> 00:02:57,900 diabetes, Parkinson's, and so forth, 56 00:02:57,900 --> 00:03:01,120 that we will be able to develop and use 57 00:03:02,130 --> 00:03:05,040 better, more specific therapeutic drugs 58 00:03:05,040 --> 00:03:07,890 and also to develop more accurate 59 00:03:07,890 --> 00:03:10,200 and rapid diagnostic tools. 60 00:03:10,200 --> 00:03:11,820 So, those are a couple of the reasons 61 00:03:11,820 --> 00:03:14,070 why many of us have spent, 62 00:03:14,070 --> 00:03:15,960 in my case, over 30 years 63 00:03:15,960 --> 00:03:19,173 studying this process of transcriptional regulation. 64 00:03:20,520 --> 00:03:23,010 Now, to get the whole thing started, 65 00:03:23,010 --> 00:03:24,360 I have to give you a sense 66 00:03:24,360 --> 00:03:27,180 of what the magnitude of the problem is. 67 00:03:27,180 --> 00:03:29,910 So, imagine that one would really like to understand 68 00:03:29,910 --> 00:03:34,770 how this process of decoding the genome happens in humans. 69 00:03:34,770 --> 00:03:35,970 So, as you may know, 70 00:03:35,970 --> 00:03:39,120 the human genome has some 3 billion base pairs 71 00:03:39,120 --> 00:03:41,400 or bits of genetic information, 72 00:03:41,400 --> 00:03:45,000 and that encodes roughly 22,000 genes. 73 00:03:45,000 --> 00:03:48,120 These are stretches of DNA sequence 74 00:03:48,120 --> 00:03:51,390 that encode ultimately a product 75 00:03:51,390 --> 00:03:55,650 that is a protein which actually makes the cells function. 76 00:03:55,650 --> 00:03:57,810 So, as I already explained to you, 77 00:03:57,810 --> 00:04:00,720 there's this flow of biological information 78 00:04:00,720 --> 00:04:04,140 where you have to extract the information buried in DNA, 79 00:04:04,140 --> 00:04:05,460 convert it into RNA. 80 00:04:05,460 --> 00:04:07,650 And what I'm not gonna tell you about today 81 00:04:07,650 --> 00:04:10,140 is the process of going from RNA to protein, 82 00:04:10,140 --> 00:04:13,950 which is a reaction called a translational reaction. 83 00:04:13,950 --> 00:04:17,040 I'm going to instead just focus on the first step 84 00:04:17,040 --> 00:04:18,720 of converting DNA into RNA, 85 00:04:18,720 --> 00:04:20,620 which is the process of transcription. 86 00:04:22,980 --> 00:04:26,850 Now, one of the most amazing results 87 00:04:26,850 --> 00:04:29,160 that we got over the last decade or so 88 00:04:29,160 --> 00:04:31,890 was when the human genome was entirely sequenced, 89 00:04:31,890 --> 00:04:34,530 the first few that were sequenced, 90 00:04:34,530 --> 00:04:38,130 we realized that actually the number of genes in humans 91 00:04:38,130 --> 00:04:42,420 is not vastly different from many other organisms, 92 00:04:42,420 --> 00:04:45,690 even simple organisms like little worms 93 00:04:45,690 --> 00:04:47,400 or fruit flies and so forth. 94 00:04:47,400 --> 00:04:51,210 That is roughly 22 to 25,000 genes 95 00:04:51,210 --> 00:04:53,010 is all the number of genes 96 00:04:53,010 --> 00:04:55,590 that all of these different organisms have. 97 00:04:55,590 --> 00:04:58,140 And yet, anybody looking at us 98 00:04:58,140 --> 00:05:02,370 versus a little roundworm in the soil or a fruit fly 99 00:05:02,370 --> 00:05:04,830 can tell that we're a much more complex organism 100 00:05:04,830 --> 00:05:06,780 with a much bigger brain, 101 00:05:06,780 --> 00:05:09,810 much more complex behavior, and so forth. 102 00:05:09,810 --> 00:05:11,060 So, how does this happen? 103 00:05:12,090 --> 00:05:16,200 Part of the answer to this very interesting mystery 104 00:05:16,200 --> 00:05:20,010 or paradox lies in the way that genes are organized 105 00:05:20,010 --> 00:05:21,690 and how they're regulated. 106 00:05:21,690 --> 00:05:23,520 And one of the most striking results 107 00:05:23,520 --> 00:05:26,310 of the genome sequencing project was to realize 108 00:05:26,310 --> 00:05:31,200 that a vast, vast majority of the DNA in our chromosomes 109 00:05:31,200 --> 00:05:35,070 is actually not coding for specific gene products, 110 00:05:35,070 --> 00:05:39,930 and that only roughly 3% of the DNA is actually encoding 111 00:05:39,930 --> 00:05:41,850 let's call those little arrows 112 00:05:41,850 --> 00:05:44,670 that I show you on this purple DNA 113 00:05:44,670 --> 00:05:46,350 are the gene coding regions. 114 00:05:46,350 --> 00:05:50,160 So, you'll notice that there's a lot of non-arrow sequences, 115 00:05:50,160 --> 00:05:52,560 which I'll show you in this next slide as green. 116 00:05:52,560 --> 00:05:54,870 These are non-coding regions. 117 00:05:54,870 --> 00:05:58,800 So, the vast majority, 97% or greater is non-coding, 118 00:05:58,800 --> 00:06:02,550 so what are these other sequences doing? 119 00:06:02,550 --> 00:06:05,790 And of course, it turns out that these sequences 120 00:06:05,790 --> 00:06:10,020 carry very important little fragments of DNA, 121 00:06:10,020 --> 00:06:12,420 which we call regulatory sequences. 122 00:06:12,420 --> 00:06:15,450 And these are the sequences that actually control 123 00:06:15,450 --> 00:06:19,020 whether a gene gets turned on or not. 124 00:06:19,020 --> 00:06:21,630 And I'll be spending much of the next 20 minutes 125 00:06:21,630 --> 00:06:24,390 telling you about how this process all works 126 00:06:24,390 --> 00:06:28,500 and what these little bits of DNA sequences 127 00:06:28,500 --> 00:06:32,283 actually function to control gene expression. 128 00:06:34,350 --> 00:06:37,290 Now, the other thing that I have to bring you up to date on 129 00:06:37,290 --> 00:06:40,980 is this mysterious process we're calling transcription, 130 00:06:40,980 --> 00:06:43,710 which reads double-stranded DNA 131 00:06:43,710 --> 00:06:45,390 and then makes a related molecule, 132 00:06:45,390 --> 00:06:47,550 which is a single-stranded RNA molecule, 133 00:06:47,550 --> 00:06:49,740 which is a informational molecule. 134 00:06:49,740 --> 00:06:54,300 That reaction is catalyzed by a very complex 135 00:06:54,300 --> 00:06:59,010 multi-subunit enzyme called RNA polymerase II. 136 00:06:59,010 --> 00:07:01,290 Now, there's the roman numeral II at the end of this 137 00:07:01,290 --> 00:07:05,733 because there were actually three enzymes in most mammals, 138 00:07:06,690 --> 00:07:09,270 at least three enzymes that carry out different processes 139 00:07:09,270 --> 00:07:11,610 and different types of RNA production. 140 00:07:11,610 --> 00:07:12,630 But I'm only gonna tell you 141 00:07:12,630 --> 00:07:16,380 about the ones that make the classical messenger RNA, 142 00:07:16,380 --> 00:07:19,350 which then ultimately becomes proteins. 143 00:07:19,350 --> 00:07:22,170 So, now one of the things that we learned 144 00:07:22,170 --> 00:07:25,860 early on in the study of mammalian 145 00:07:25,860 --> 00:07:30,090 or other multicellular organism transcription processes 146 00:07:30,090 --> 00:07:32,640 is that despite the fact that this enzyme 147 00:07:32,640 --> 00:07:34,893 is quite complex in its structure, 148 00:07:35,880 --> 00:07:39,480 it turns out to be an enzyme that's nevertheless 149 00:07:39,480 --> 00:07:42,450 needs a lot of help to do its job. 150 00:07:42,450 --> 00:07:45,090 So, on its own, this RNA polymerase II 151 00:07:45,090 --> 00:07:48,960 cannot tell the difference between the non-coding regions 152 00:07:48,960 --> 00:07:52,470 of the genome and places where it's supposed to be coding 153 00:07:52,470 --> 00:07:56,940 or reading to make the appropriate messenger RNAs. 154 00:07:56,940 --> 00:08:00,480 So, this sort of leads you to think that there must be 155 00:08:00,480 --> 00:08:04,950 a number of other factors that somehow direct 156 00:08:04,950 --> 00:08:08,490 RNA polymerase to the right place at the right time 157 00:08:08,490 --> 00:08:11,190 in the genome of every cell in your body 158 00:08:11,190 --> 00:08:14,010 so that the right products get made 159 00:08:14,010 --> 00:08:17,403 so each cell in your body is functioning properly. 160 00:08:18,480 --> 00:08:21,540 And this is where things get really interesting 161 00:08:21,540 --> 00:08:25,170 is some 25, 30 years ago, 162 00:08:25,170 --> 00:08:29,310 a number of laboratories took on the job 163 00:08:29,310 --> 00:08:32,640 of hunting for these elusive and, as it turned out, 164 00:08:32,640 --> 00:08:35,340 a specialized protein factors 165 00:08:35,340 --> 00:08:39,390 that recognize these little stretches of DNA sequences 166 00:08:39,390 --> 00:08:40,410 that I've been telling you about 167 00:08:40,410 --> 00:08:42,210 that make up the vast majority 168 00:08:42,210 --> 00:08:45,420 of the non-coding part of the genome. 169 00:08:45,420 --> 00:08:48,480 And how these proteins then can recognize 170 00:08:48,480 --> 00:08:51,150 and ultimately physically interact 171 00:08:51,150 --> 00:08:53,910 with these little bits of genetic information 172 00:08:53,910 --> 00:08:57,660 to then turn genes on or off. 173 00:08:57,660 --> 00:09:00,060 Now, in this lecture, 174 00:09:00,060 --> 00:09:04,320 I can't go into all the details of the types of experiments 175 00:09:04,320 --> 00:09:07,680 or the ranges of experiments that many, many laboratories 176 00:09:07,680 --> 00:09:09,330 have done over the last two decades 177 00:09:09,330 --> 00:09:12,510 to finally work out this molecular puzzle 178 00:09:12,510 --> 00:09:14,730 of how transcription works. 179 00:09:14,730 --> 00:09:17,100 But I can tell you that there are fundamentally 180 00:09:17,100 --> 00:09:19,320 two major approaches that have been taken 181 00:09:19,320 --> 00:09:24,320 over the last few decades to kind of get a parts list 182 00:09:24,510 --> 00:09:27,090 of the machinery that decodes the genome 183 00:09:27,090 --> 00:09:29,640 and carries out the process of transcription. 184 00:09:29,640 --> 00:09:32,430 One is kind of the old style, 185 00:09:32,430 --> 00:09:34,950 I'll call it bucket biochemistry 186 00:09:34,950 --> 00:09:38,820 or take a live cell, crush it up, 187 00:09:38,820 --> 00:09:41,640 spread out all of its parts and then try to figure out 188 00:09:41,640 --> 00:09:43,050 how to put it back together again, 189 00:09:43,050 --> 00:09:45,540 that's what I call in vitro biochemistry. 190 00:09:45,540 --> 00:09:47,910 And the other one is in vivo genetics 191 00:09:47,910 --> 00:09:50,880 where you effectively use genetic tools, 192 00:09:50,880 --> 00:09:54,570 mutagenesis to go in there and selectively remove 193 00:09:54,570 --> 00:09:59,100 or knock down or knock out certain genes and gene products 194 00:09:59,100 --> 00:10:01,620 and then ask what is the consequence on that cell 195 00:10:01,620 --> 00:10:02,970 or that organism? 196 00:10:02,970 --> 00:10:07,970 Both of these technologies are very powerful 197 00:10:08,250 --> 00:10:12,513 and highly complementary, and they continue to be used. 198 00:10:13,410 --> 00:10:16,410 Today, I will focus primarily 199 00:10:16,410 --> 00:10:18,510 on the in vitro biochemical techniques 200 00:10:18,510 --> 00:10:22,740 which led us to the discovery of the first few classes 201 00:10:22,740 --> 00:10:24,450 of transcription factors. 202 00:10:24,450 --> 00:10:25,800 And in subsequent lectures, 203 00:10:25,800 --> 00:10:29,580 we'll go to more recent technologies 204 00:10:29,580 --> 00:10:32,760 that allows us to sort of speed up this whole process 205 00:10:32,760 --> 00:10:35,370 of identifying key regulatory molecules 206 00:10:35,370 --> 00:10:37,323 and how they work. 207 00:10:38,790 --> 00:10:42,690 So, let's go back to the sort of the basic unit 208 00:10:42,690 --> 00:10:45,060 of gene expression, which is a gene, 209 00:10:45,060 --> 00:10:48,720 here shown in the orange arrow, 210 00:10:48,720 --> 00:10:51,780 and the non-coding sequences surrounding it. 211 00:10:51,780 --> 00:10:54,510 And you'll see that now I've added a few more elements 212 00:10:54,510 --> 00:10:56,100 to this purple DNA. 213 00:10:56,100 --> 00:10:59,160 You see some symbols, a blue square, 214 00:10:59,160 --> 00:11:02,580 a round circle that's pink, and then a yellow triangle. 215 00:11:02,580 --> 00:11:06,897 Those are just a way for me to graphically represent 216 00:11:06,897 --> 00:11:08,970 the little bits of DNA sequences 217 00:11:08,970 --> 00:11:11,250 that I told you about that are the regulatory sequences. 218 00:11:11,250 --> 00:11:14,730 So, the little round one happens to very GC-rich, 219 00:11:14,730 --> 00:11:17,970 the triangle one is a classical element 220 00:11:17,970 --> 00:11:19,200 that's called a TATA box, 221 00:11:19,200 --> 00:11:20,610 I'll tell you about a little bit later. 222 00:11:20,610 --> 00:11:23,550 And the blue one is yet another recognition element. 223 00:11:23,550 --> 00:11:26,520 So, why are we so interested in these little stretches 224 00:11:26,520 --> 00:11:29,760 of nucleic acid sequence in the genome 225 00:11:29,760 --> 00:11:33,240 when it's buried amongst billions of other sequences? 226 00:11:33,240 --> 00:11:35,580 Well, these individual little sequences 227 00:11:35,580 --> 00:11:38,730 turn out to be very important because of where they sit, 228 00:11:38,730 --> 00:11:42,180 you'll notice they're sitting near the top of the arrow, 229 00:11:42,180 --> 00:11:45,930 and they are recognized by very special proteins 230 00:11:45,930 --> 00:11:48,300 which are the transcription factors. 231 00:11:48,300 --> 00:11:50,640 So, now I've showing you some symbols 232 00:11:50,640 --> 00:11:53,880 with little cutouts which fit into either the square, 233 00:11:53,880 --> 00:11:56,040 the circle, or the triangle. 234 00:11:56,040 --> 00:11:58,560 So, transcription factors, 235 00:11:58,560 --> 00:12:03,150 at least one major family of transcription factors, 236 00:12:03,150 --> 00:12:06,660 are proteins whose three-dimensional structure 237 00:12:06,660 --> 00:12:10,110 is folded into a shape that allows them to recognize 238 00:12:10,110 --> 00:12:12,663 these short stretches of double-stranded DNA. 239 00:12:13,530 --> 00:12:15,900 In fact, largely through interactions 240 00:12:15,900 --> 00:12:17,400 with the major group of DNA, 241 00:12:17,400 --> 00:12:20,050 and I'll show you a structure of one in a little bit. 242 00:12:21,330 --> 00:12:24,210 So, now it turns out that there are probably 243 00:12:24,210 --> 00:12:26,610 thousands of these transcription factors 244 00:12:26,610 --> 00:12:29,010 because the number of genes that we have to control, 245 00:12:29,010 --> 00:12:33,480 as I showed you, is in the order of 20 or 25,000 genes. 246 00:12:33,480 --> 00:12:37,200 And so, it turns out that you need a pretty large percentage 247 00:12:37,200 --> 00:12:40,890 of the genome devoted to encoding these regulatory proteins 248 00:12:40,890 --> 00:12:45,240 in order for a complex organism like ourselves to survive. 249 00:12:45,240 --> 00:12:47,250 Then the other component of this, 250 00:12:47,250 --> 00:12:49,500 let's call it the transcriptional apparatus, 251 00:12:49,500 --> 00:12:51,900 is, of course, the enzyme that catalyzes RNA. 252 00:12:51,900 --> 00:12:56,490 And I already told you that this enzyme on its own 253 00:12:56,490 --> 00:12:59,370 can't tell the difference between random DNA sequence 254 00:12:59,370 --> 00:13:01,440 and a gene or a promoter. 255 00:13:01,440 --> 00:13:05,520 These other sequence-specific DNA-binding proteins 256 00:13:05,520 --> 00:13:08,550 are the ones that must recruit 257 00:13:08,550 --> 00:13:11,130 or otherwise direct RNA polymerase 258 00:13:11,130 --> 00:13:15,630 to essentially land on the right place and at the right time 259 00:13:15,630 --> 00:13:19,290 in the genome to turn on a certain subset of genes 260 00:13:19,290 --> 00:13:23,640 that are specifically required in a specialized cell type, 261 00:13:23,640 --> 00:13:26,460 whatever cell you happen to be looking at. 262 00:13:26,460 --> 00:13:30,090 So, that is kind of the first level of complexity 263 00:13:30,090 --> 00:13:32,910 of sort of informational interactions 264 00:13:32,910 --> 00:13:35,250 between the transcription factors 265 00:13:35,250 --> 00:13:38,250 and the more ubiquitous, 266 00:13:38,250 --> 00:13:41,853 and I would call it promiscuous RNA polymerase II enzyme. 267 00:13:43,560 --> 00:13:44,880 Well, as it turns out, 268 00:13:44,880 --> 00:13:49,020 it took several decades to work out 269 00:13:49,020 --> 00:13:52,770 most if not all of the components 270 00:13:52,770 --> 00:13:56,070 of this so-called transcriptional machinery. 271 00:13:56,070 --> 00:14:00,810 And it turns out in this slide I'm showing you 272 00:14:00,810 --> 00:14:03,030 things are already starting to get more complicated. 273 00:14:03,030 --> 00:14:04,740 So, not only do you have RNA polymerase, 274 00:14:04,740 --> 00:14:07,800 but you have a bunch of other proteins that go by names 275 00:14:07,800 --> 00:14:10,680 like TFIIA, B, 276 00:14:10,680 --> 00:14:13,350 you know, D, E, H, F, and so forth. 277 00:14:13,350 --> 00:14:16,440 So, it looks like there are going to be many, 278 00:14:16,440 --> 00:14:18,270 many proteins that are necessary 279 00:14:18,270 --> 00:14:21,930 to form the transcriptional apparatus. 280 00:14:21,930 --> 00:14:23,490 And then on top of that 281 00:14:23,490 --> 00:14:26,370 you need sequence-specific DNA-binding proteins 282 00:14:26,370 --> 00:14:28,560 which are already described to you 283 00:14:28,560 --> 00:14:32,730 to further inform or otherwise regulate the process 284 00:14:32,730 --> 00:14:35,580 of when a particular RNA polymerase molecule 285 00:14:35,580 --> 00:14:37,860 should be binding to a particular gene. 286 00:14:37,860 --> 00:14:40,110 So, that's the sort of overview, 287 00:14:40,110 --> 00:14:41,580 now let me get into the specifics 288 00:14:41,580 --> 00:14:45,480 and how did we actually discover these family of proteins. 289 00:14:45,480 --> 00:14:47,280 And it'll be interesting for you to see 290 00:14:47,280 --> 00:14:51,360 how science in this field evolved. 291 00:14:51,360 --> 00:14:54,210 Now, as is often the case 292 00:14:54,210 --> 00:14:56,850 when you first try to tackle a very complex problem, 293 00:14:56,850 --> 00:14:59,460 and, of course, we didn't really know how complex it was 294 00:14:59,460 --> 00:15:00,780 when we began these studies, 295 00:15:00,780 --> 00:15:03,480 but we assumed it might be complicated, 296 00:15:03,480 --> 00:15:06,660 certainly would be more complicated than systems 297 00:15:06,660 --> 00:15:09,150 that we had already had some idea about, 298 00:15:09,150 --> 00:15:13,710 for example, in bacteria or in bacteriophages. 299 00:15:13,710 --> 00:15:17,010 We took a lesson from our studies of bacteriophages 300 00:15:17,010 --> 00:15:20,430 and decided that to begin to dissect 301 00:15:20,430 --> 00:15:22,080 the molecular complexities 302 00:15:22,080 --> 00:15:24,750 of the transcription process in animal cells, 303 00:15:24,750 --> 00:15:26,850 we should start with viruses 304 00:15:26,850 --> 00:15:30,840 because we knew that viruses will enter these host cells, 305 00:15:30,840 --> 00:15:33,990 these complex cells that we ultimately want to study 306 00:15:33,990 --> 00:15:36,480 and have to use the same molecular machinery 307 00:15:36,480 --> 00:15:38,580 to transcribe their genes 308 00:15:38,580 --> 00:15:41,640 as the host mammalian cell would do. 309 00:15:41,640 --> 00:15:43,890 So, this was kind of a trick 310 00:15:43,890 --> 00:15:46,980 or a way to look at a molecular window 311 00:15:46,980 --> 00:15:49,710 into a complex system and try to simplify it. 312 00:15:49,710 --> 00:15:51,060 And in our case, 313 00:15:51,060 --> 00:15:54,510 the early studies of the late '70s and early '80s 314 00:15:54,510 --> 00:15:55,920 involved very simple, 315 00:15:55,920 --> 00:15:58,590 one of these simplest double-stranded DNA viruses 316 00:15:58,590 --> 00:16:00,840 called Simian virus 40. 317 00:16:00,840 --> 00:16:03,330 And Simian virus 40, of course, is a monkey virus, 318 00:16:03,330 --> 00:16:06,450 which was nice because it's very close to humans 319 00:16:06,450 --> 00:16:07,890 and many things that we could learn 320 00:16:07,890 --> 00:16:10,770 about the way this virus uses its host, 321 00:16:10,770 --> 00:16:13,140 which are monkey cells, to replicate 322 00:16:13,140 --> 00:16:16,440 and to express their RNAs and genes 323 00:16:16,440 --> 00:16:20,370 would be applicable to our studies of humans, as you'll see. 324 00:16:20,370 --> 00:16:23,190 And this virus was one of the first 325 00:16:23,190 --> 00:16:27,930 whose DNA, its double-stranded DNA of about 5,000 base pairs 326 00:16:27,930 --> 00:16:29,310 was fully sequenced. 327 00:16:29,310 --> 00:16:32,670 This was long before a rapid modern day sequencing 328 00:16:32,670 --> 00:16:35,580 was available, so this gave us a very powerful tool. 329 00:16:35,580 --> 00:16:38,340 It basically allowed us to look at the entire genome 330 00:16:38,340 --> 00:16:41,220 of this virus, which was tiny by comparison, 331 00:16:41,220 --> 00:16:44,880 only 5,243 base pairs. 332 00:16:44,880 --> 00:16:47,790 But just that information was already very important 333 00:16:47,790 --> 00:16:49,650 'cause it very quickly allowed us, 334 00:16:49,650 --> 00:16:52,920 for example, to map where the genes are. 335 00:16:52,920 --> 00:16:56,190 And one of the genes encoded a protein 336 00:16:56,190 --> 00:16:57,540 called a tumor antigen, 337 00:16:57,540 --> 00:17:00,210 which turns out to be a transcription factor. 338 00:17:00,210 --> 00:17:03,120 This then allowed us to get our hands 339 00:17:03,120 --> 00:17:06,030 basically to do biochemistry and genetics 340 00:17:06,030 --> 00:17:09,390 on the very first eukaryotic transcription factor, 341 00:17:09,390 --> 00:17:12,210 which in this case happens to be a represser. 342 00:17:12,210 --> 00:17:15,210 That is a protein that when it binds the DNA 343 00:17:15,210 --> 00:17:19,173 just the same way as I showed you for the the model case, 344 00:17:20,160 --> 00:17:23,910 it binds through specific protein DNA interactions. 345 00:17:23,910 --> 00:17:26,610 But in this case, actually shuts transcription down 346 00:17:26,610 --> 00:17:27,903 rather than turn it up. 347 00:17:29,580 --> 00:17:33,930 In the process of studying the way that this little virus 348 00:17:33,930 --> 00:17:36,480 when it infects a mammalian cell 349 00:17:36,480 --> 00:17:39,180 uses proteins like T-antigen 350 00:17:39,180 --> 00:17:42,540 to regulate its gene expression, 351 00:17:42,540 --> 00:17:45,990 it became clear that it had to use the host machinery 352 00:17:45,990 --> 00:17:48,180 to do the process. 353 00:17:48,180 --> 00:17:52,710 And that meant that there must be monkey proteins 354 00:17:52,710 --> 00:17:55,050 that are also involved in activating 355 00:17:55,050 --> 00:17:57,540 or repressing genes of this virus. 356 00:17:57,540 --> 00:18:00,510 And this then led us to the most important step, 357 00:18:00,510 --> 00:18:04,170 which is to transfer the technology we learned about viruses 358 00:18:04,170 --> 00:18:06,630 and how to work with the virus transcription factor 359 00:18:06,630 --> 00:18:09,390 like T-antigen to the cellular ones. 360 00:18:09,390 --> 00:18:11,220 And I'm gonna give you just one example 361 00:18:11,220 --> 00:18:13,920 of how the simple jump into the host cell 362 00:18:13,920 --> 00:18:17,820 allowed us to discover the first human transcription factor. 363 00:18:17,820 --> 00:18:21,180 So, the question that we then asked 364 00:18:21,180 --> 00:18:25,620 back in the early 1980s was what host molecule 365 00:18:25,620 --> 00:18:29,310 is regulating the expression of transcription of this virus 366 00:18:29,310 --> 00:18:31,260 when the virus is in the host? 367 00:18:31,260 --> 00:18:33,960 And we knew from the DNA sequence of the virus 368 00:18:33,960 --> 00:18:38,960 that there were these six very GC-rich snippets of DNA 369 00:18:39,780 --> 00:18:42,240 that were regulatory 'cause if we deleted them, 370 00:18:42,240 --> 00:18:45,570 the virus no longer would express the gene of interest. 371 00:18:45,570 --> 00:18:48,330 So, we knew that something was probably responsible 372 00:18:48,330 --> 00:18:51,300 for recognizing these GC boxes, 373 00:18:51,300 --> 00:18:53,970 and we knew that it wasn't a virally encoded gene 374 00:18:53,970 --> 00:18:56,610 because we had tested all of the viral genes 375 00:18:56,610 --> 00:18:58,950 of which there were only six to begin with. 376 00:18:58,950 --> 00:19:00,960 So, we knew it had to be a host gene 377 00:19:00,960 --> 00:19:04,740 and that led us to a whole, I would say, 378 00:19:04,740 --> 00:19:07,740 family of experiments that led to the discovery 379 00:19:07,740 --> 00:19:10,860 of sequence-specific mammalian transcription factors. 380 00:19:10,860 --> 00:19:13,860 And as I said, we could have taken multiple approaches 381 00:19:13,860 --> 00:19:16,830 to try to address this complicated issue. 382 00:19:16,830 --> 00:19:18,510 I'll just give you one example 383 00:19:18,510 --> 00:19:20,940 of using in vitro biochemistry 384 00:19:20,940 --> 00:19:24,780 to finally get our hands on this key sequence 385 00:19:24,780 --> 00:19:27,300 specific human transcription factor, 386 00:19:27,300 --> 00:19:30,993 which, of course, has a homologue in the monkey. 387 00:19:31,950 --> 00:19:35,190 And the way we did it was very interesting 388 00:19:35,190 --> 00:19:36,930 and simple in retrospect, 389 00:19:36,930 --> 00:19:39,030 and that is recognizing the fact 390 00:19:39,030 --> 00:19:41,760 that whatever this protein was, 391 00:19:41,760 --> 00:19:44,760 it had to have the property of recognizing 392 00:19:44,760 --> 00:19:49,650 those GC boxes that were sitting next to the the viral gene. 393 00:19:49,650 --> 00:19:52,200 We assume that it must be a sequence-specific 394 00:19:52,200 --> 00:19:54,060 DNA binding-protein, so all we had to do 395 00:19:54,060 --> 00:19:57,510 was figure out a way to extract proteins 396 00:19:57,510 --> 00:20:01,110 from human cells or monkey cells 397 00:20:01,110 --> 00:20:04,770 and then try to fish out those specific proteins 398 00:20:04,770 --> 00:20:06,660 out of the many thousands of different proteins 399 00:20:06,660 --> 00:20:09,720 that were in this gamish of cellular extract 400 00:20:09,720 --> 00:20:12,300 that would be responsible for discriminating 401 00:20:12,300 --> 00:20:17,100 between random DNA sequences and the specific GC box. 402 00:20:17,100 --> 00:20:20,580 And I'll quickly run through sort of the logic behind this. 403 00:20:20,580 --> 00:20:25,440 So, what I'm showing you here is a solid surface 404 00:20:25,440 --> 00:20:28,830 with DNA coupled to it that is highly enriched 405 00:20:28,830 --> 00:20:31,440 for the recognition element, the GC box, 406 00:20:31,440 --> 00:20:33,450 which should be the sequence 407 00:20:33,450 --> 00:20:35,370 recognized by the protein of interest. 408 00:20:35,370 --> 00:20:37,440 Now, we had no idea what this protein was gonna look like, 409 00:20:37,440 --> 00:20:39,510 how many proteins there were gonna be, and so forth, 410 00:20:39,510 --> 00:20:42,030 but we knew it had to recognize the GC box. 411 00:20:42,030 --> 00:20:45,090 So, we're gonna try to fish this out of a pool 412 00:20:45,090 --> 00:20:47,460 of many thousands of other proteins. 413 00:20:47,460 --> 00:20:49,500 Now, the the key trick here 414 00:20:49,500 --> 00:20:52,380 was that because all cell extracts 415 00:20:52,380 --> 00:20:54,870 contain not only one DNA binding protein, 416 00:20:54,870 --> 00:20:56,820 but, as I told you, thousands of different 417 00:20:56,820 --> 00:20:58,410 DNA binding proteins. 418 00:20:58,410 --> 00:21:00,870 But most of them, or in fact in our case, 419 00:21:00,870 --> 00:21:04,950 none of the other of several hundred to a thousand proteins 420 00:21:04,950 --> 00:21:08,790 that could bind DNA actually happen to recognize the GC box, 421 00:21:08,790 --> 00:21:11,190 they just bind other DNA sequences. 422 00:21:11,190 --> 00:21:14,070 So, to kind of favor our protein 423 00:21:14,070 --> 00:21:16,080 being able to bind to our GC box 424 00:21:16,080 --> 00:21:18,780 and not have to compete with all the other proteins, 425 00:21:18,780 --> 00:21:22,920 what we did was to add non-specific DNA 426 00:21:22,920 --> 00:21:26,700 and mask stoichiometric excess 427 00:21:26,700 --> 00:21:29,580 so that all the other proteins that wouldn't recognize 428 00:21:29,580 --> 00:21:33,270 the GC box would still have some partner to hang onto. 429 00:21:33,270 --> 00:21:34,920 And this trick worked very well. 430 00:21:34,920 --> 00:21:39,840 So, having the specific DNA on the solid resin 431 00:21:39,840 --> 00:21:43,710 and the non-specific DNA flowing all over the place, 432 00:21:43,710 --> 00:21:47,850 we could capture selectively the pink molecules here, 433 00:21:47,850 --> 00:21:50,160 which are the GC box recognition ones, 434 00:21:50,160 --> 00:21:52,560 and the blue-green molecules, 435 00:21:52,560 --> 00:21:56,040 of course, predominantly bind to non-specific DNA. 436 00:21:56,040 --> 00:21:58,470 I show you one little blue one on the column 437 00:21:58,470 --> 00:22:01,290 because nothing works perfectly in real science 438 00:22:01,290 --> 00:22:03,540 and tells you that we have to go through this process 439 00:22:03,540 --> 00:22:07,650 iteratively to actually finally obtain a preparation 440 00:22:07,650 --> 00:22:11,520 that's purely pink molecules with no green-blue ones. 441 00:22:11,520 --> 00:22:14,340 Well, that turned out to work very, very well. 442 00:22:14,340 --> 00:22:18,030 And that whole process of biochemical fractionation 443 00:22:18,030 --> 00:22:23,030 followed by a direct affinity sequence-specific DNA resin 444 00:22:23,730 --> 00:22:28,110 gave us the ability to perform a biochemical purification 445 00:22:28,110 --> 00:22:31,620 followed by a molecular cloning of the transcription factor 446 00:22:31,620 --> 00:22:35,040 that encodes the protein SP1. 447 00:22:35,040 --> 00:22:37,230 And then we carried out a bunch of experiments, 448 00:22:37,230 --> 00:22:38,580 which I'll tell you next, 449 00:22:38,580 --> 00:22:40,140 to show that this protein 450 00:22:40,140 --> 00:22:42,483 actually does activate transcription. 451 00:22:43,530 --> 00:22:46,380 And of course, we went back and we proved that this protein, 452 00:22:46,380 --> 00:22:49,170 which turned out to be a rather large polypeptide, 453 00:22:49,170 --> 00:22:52,020 can indeed recognize the GC box. 454 00:22:52,020 --> 00:22:55,500 And it doesn't matter if it's a GC box from the SV 0 genome 455 00:22:55,500 --> 00:22:59,460 or any other GC box that we could find in the human genome, 456 00:22:59,460 --> 00:23:02,370 it would find that sequence and bind to it 457 00:23:02,370 --> 00:23:05,670 and then it would generally activate transcription. 458 00:23:05,670 --> 00:23:08,280 So, this led to the discovery of the first 459 00:23:08,280 --> 00:23:10,830 of a very large family 460 00:23:10,830 --> 00:23:13,680 of sequence-specific DNA-binding proteins. 461 00:23:13,680 --> 00:23:15,960 Now, I told you that the way these proteins 462 00:23:15,960 --> 00:23:19,200 tend to recognize short DNA sequences 463 00:23:19,200 --> 00:23:21,900 is to interact with DNA through the major groove. 464 00:23:21,900 --> 00:23:23,220 And here's a perfect example. 465 00:23:23,220 --> 00:23:25,230 So, the thick blue model there 466 00:23:25,230 --> 00:23:28,590 shows the actual three structures 467 00:23:28,590 --> 00:23:29,910 that are called zinc fingers. 468 00:23:29,910 --> 00:23:31,470 And the reason they're called zinc fingers 469 00:23:31,470 --> 00:23:34,770 is because there are amino acids that are organized 470 00:23:34,770 --> 00:23:37,860 around a center that contains a zinc molecule 471 00:23:37,860 --> 00:23:41,190 which holds the three-dimensional shape of the polypeptide 472 00:23:41,190 --> 00:23:43,560 in a position just right 473 00:23:43,560 --> 00:23:45,570 for fitting into the major groove of the DNA. 474 00:23:45,570 --> 00:23:47,700 And the DNA here is shown in pink, 475 00:23:47,700 --> 00:23:50,160 and you can see that that blue outline 476 00:23:50,160 --> 00:23:52,710 fits right into the major groove of the DNA, 477 00:23:52,710 --> 00:23:54,690 but not to the minor groove. 478 00:23:54,690 --> 00:23:57,540 And one of the most important findings 479 00:23:57,540 --> 00:23:58,830 was not only the discovery 480 00:23:58,830 --> 00:24:00,750 of the first human transcription factor, 481 00:24:00,750 --> 00:24:04,890 but the realization that most if not all sequence-specific 482 00:24:04,890 --> 00:24:06,570 DNA-binding transcription factors 483 00:24:06,570 --> 00:24:09,090 have a similar structural motif. 484 00:24:09,090 --> 00:24:13,170 That is to say some structure is built to recognize 485 00:24:13,170 --> 00:24:15,750 sequences in the major groove of DNA. 486 00:24:15,750 --> 00:24:19,170 And these three-dimensional motifs 487 00:24:19,170 --> 00:24:23,760 are recognizable as amino acid sequences in the genome. 488 00:24:23,760 --> 00:24:27,810 So, we can now much more quickly scan the entire sequence 489 00:24:27,810 --> 00:24:29,760 of a genome and identify genes 490 00:24:29,760 --> 00:24:31,920 that are likely to be DNA-binding proteins 491 00:24:31,920 --> 00:24:34,860 as a result of understanding the structure-function 492 00:24:34,860 --> 00:24:38,853 relationships of these DNA-binding motifs like zinc fingers. 493 00:24:39,960 --> 00:24:42,690 So, what I'd like to show you now 494 00:24:42,690 --> 00:24:46,260 is that I've only introduced you to one class 495 00:24:46,260 --> 00:24:48,210 of transcription factors, 496 00:24:48,210 --> 00:24:51,210 which are the sequence-specific-DNA binding proteins. 497 00:24:51,210 --> 00:24:53,910 Well, I think I gave you a little taste 498 00:24:53,910 --> 00:24:55,320 of the level of complexity 499 00:24:55,320 --> 00:24:57,060 that's probably going to be needed 500 00:24:57,060 --> 00:24:59,580 to be able to build the machine 501 00:24:59,580 --> 00:25:02,940 that's ultimately going to be able to allow you 502 00:25:02,940 --> 00:25:07,050 to transcribe every gene in every cell of a human body. 503 00:25:07,050 --> 00:25:10,286 So, that turns out to be a much more elaborated machine 504 00:25:10,286 --> 00:25:12,090 than what I just showed you. 505 00:25:12,090 --> 00:25:14,400 So, I wanna show you now 506 00:25:14,400 --> 00:25:16,830 what is sort of our state-of-the-art thinking 507 00:25:16,830 --> 00:25:20,580 about what is actually needed to build the machinery 508 00:25:20,580 --> 00:25:25,380 at a gene to allow it to be expressed and transcribed. 509 00:25:25,380 --> 00:25:27,930 And the term I want to introduce you to 510 00:25:27,930 --> 00:25:30,870 is the pre-initiation complex. 511 00:25:30,870 --> 00:25:33,360 And it's pretty much what it says. 512 00:25:33,360 --> 00:25:35,850 It's the complex of multiple subunits 513 00:25:35,850 --> 00:25:40,850 that has to essentially land on the promoter of a gene 514 00:25:40,860 --> 00:25:44,433 which will be designated for later expression. 515 00:25:45,390 --> 00:25:49,980 And this is a process that is probably quite orderly, 516 00:25:49,980 --> 00:25:52,410 that is there's an order of events that happens, 517 00:25:52,410 --> 00:25:55,080 which we, by the way, are not entirely sure 518 00:25:55,080 --> 00:25:57,330 exactly what the order is or even if the order 519 00:25:57,330 --> 00:25:59,310 is the same from one gene to the next, 520 00:25:59,310 --> 00:26:02,070 but we can kind of see where it starts and where it ends up. 521 00:26:02,070 --> 00:26:03,690 And the pathway in between, 522 00:26:03,690 --> 00:26:06,540 I would say is still a little bit murky. 523 00:26:06,540 --> 00:26:10,290 And the story here again starts with a little snippet of DNA 524 00:26:10,290 --> 00:26:11,220 called the TATA box, 525 00:26:11,220 --> 00:26:13,560 which I already introduced you to briefly. 526 00:26:13,560 --> 00:26:18,560 It's an AT-rich sequence which sits at the five prime end 527 00:26:18,660 --> 00:26:20,970 or the beginning of many genes, but not all genes, 528 00:26:20,970 --> 00:26:25,383 maybe 20% of the genes might contain this AT-rich region. 529 00:26:26,460 --> 00:26:29,850 And that AT sequence is the signal 530 00:26:29,850 --> 00:26:31,500 or a landmark, if you like, 531 00:26:31,500 --> 00:26:33,930 for a particular protein to bind to it. 532 00:26:33,930 --> 00:26:35,850 And that protein is called, 533 00:26:35,850 --> 00:26:38,190 not surprisingly, the TATA-binding protein 534 00:26:38,190 --> 00:26:40,290 'cause it's the TATA sequence. 535 00:26:40,290 --> 00:26:43,620 And so, this represents a second class 536 00:26:43,620 --> 00:26:45,210 of transcription factors. 537 00:26:45,210 --> 00:26:48,240 These are not the type that I just introduced you to, 538 00:26:48,240 --> 00:26:50,640 which are gonna be different for every gene, 539 00:26:50,640 --> 00:26:52,200 the TATA sequence is present 540 00:26:52,200 --> 00:26:54,120 in a very large number of genes, 541 00:26:54,120 --> 00:26:56,700 so it can't be gene specific, 542 00:26:56,700 --> 00:26:58,680 but it turns out to be very crucial 543 00:26:58,680 --> 00:27:02,130 for our understanding of how gene regulation works. 544 00:27:02,130 --> 00:27:06,300 So, so you start with a TATA-binding protein 545 00:27:06,300 --> 00:27:07,950 finding a TATA box. 546 00:27:07,950 --> 00:27:10,440 We later found out that the TATA-binding protein 547 00:27:10,440 --> 00:27:13,920 rarely functions on its own and has a bunch of friends 548 00:27:13,920 --> 00:27:17,010 that we call TAFs or TBP associated factors. 549 00:27:17,010 --> 00:27:19,260 And now you're talking about an assembly 550 00:27:19,260 --> 00:27:23,340 of multi-subunit complex of almost a million daltons. 551 00:27:23,340 --> 00:27:26,070 There are somewhere between 12 to 15 subunits 552 00:27:26,070 --> 00:27:27,780 in addition to the TATA-binding protein 553 00:27:27,780 --> 00:27:30,930 that make up this little complex of proteins 554 00:27:30,930 --> 00:27:33,390 that kind of travels around together. 555 00:27:33,390 --> 00:27:35,520 And this is found in most cell types, 556 00:27:35,520 --> 00:27:38,760 and later on I'll show you in a subsequent lecture 557 00:27:38,760 --> 00:27:40,590 that not every cell type 558 00:27:40,590 --> 00:27:43,530 might have exactly the same compliment of these subunits, 559 00:27:43,530 --> 00:27:47,850 but many of them have this prototypic complex. 560 00:27:47,850 --> 00:27:51,960 Is this enough for building the pre-initiation complex? 561 00:27:51,960 --> 00:27:54,120 Unfortunately not. 562 00:27:54,120 --> 00:27:57,630 It turns out that there are a host of other, 563 00:27:57,630 --> 00:28:00,030 I'll call them ancillary factors 564 00:28:00,030 --> 00:28:03,330 in addition to the multi-subunit RNA polymerase itself 565 00:28:03,330 --> 00:28:08,330 that are necessary for you to build up an ensemble 566 00:28:08,490 --> 00:28:12,180 that is necessary to form an active 567 00:28:12,180 --> 00:28:16,380 ready to activate transcriptional pre-initiation complex 568 00:28:16,380 --> 00:28:17,213 or the PIC. 569 00:28:19,890 --> 00:28:23,520 And this is kind of the picture we're getting to, 570 00:28:23,520 --> 00:28:26,160 and even this picture with many, many colors 571 00:28:26,160 --> 00:28:28,560 and many, many different polypeptides, 572 00:28:28,560 --> 00:28:30,330 you know, that adds up to probably greater 573 00:28:30,330 --> 00:28:33,660 than 85 individual proteins 574 00:28:33,660 --> 00:28:36,960 that all have to kind of fit together like a jigsaw puzzle. 575 00:28:36,960 --> 00:28:39,360 It's probably not even the whole story, 576 00:28:39,360 --> 00:28:42,390 you'll notice I still have one big red question mark there 577 00:28:42,390 --> 00:28:46,980 because I think as we begin to study specific cell types 578 00:28:46,980 --> 00:28:50,280 and specific processes like embryonic development 579 00:28:50,280 --> 00:28:52,890 or germ layer formation, 580 00:28:52,890 --> 00:28:55,770 additional components that are not present here 581 00:28:55,770 --> 00:28:58,484 in this prototypic pre-initiation complex 582 00:28:58,484 --> 00:29:00,210 will come into play, 583 00:29:00,210 --> 00:29:03,420 and that's a subject of subsequent lecture. 584 00:29:03,420 --> 00:29:06,360 But already you can tell that the transcriptional machinery 585 00:29:06,360 --> 00:29:08,673 is anything but simple. 586 00:29:09,630 --> 00:29:12,720 So, can we get a better idea of what transcription 587 00:29:12,720 --> 00:29:16,440 might actually look like and what's happening 588 00:29:16,440 --> 00:29:18,360 when a transcription process takes place? 589 00:29:18,360 --> 00:29:21,720 So, let me first of all say that I'm gonna finish my lecture 590 00:29:21,720 --> 00:29:24,450 now with a little cartoon, 591 00:29:24,450 --> 00:29:29,160 which is our attempt to imagine 592 00:29:29,160 --> 00:29:30,990 the events that take place 593 00:29:30,990 --> 00:29:33,000 when you form a pre-initiation complex, 594 00:29:33,000 --> 00:29:37,410 you bring regulatory proteins to the activated gene 595 00:29:37,410 --> 00:29:40,020 and what happens during this process. 596 00:29:40,020 --> 00:29:43,530 Now, keep in mind that this is at this point 597 00:29:43,530 --> 00:29:47,700 mostly a cartoon that is in our imagination 598 00:29:47,700 --> 00:29:52,700 and only parts or if any of this is probably real, 599 00:29:52,890 --> 00:29:56,340 but it gives you a sense of the complexity 600 00:29:56,340 --> 00:29:58,800 of the transactions that have to take place 601 00:29:58,800 --> 00:30:02,040 just for one gene to transcribe and express itself. 602 00:30:02,040 --> 00:30:04,290 So, let me show you the movie, 603 00:30:04,290 --> 00:30:07,410 and then we'll finish just by keeping in mind 604 00:30:07,410 --> 00:30:09,960 that there's much to be learned. 605 00:30:09,960 --> 00:30:13,290 And in my next lecture, we'll go into the selectivity 606 00:30:13,290 --> 00:30:16,560 of this process in specialized cell types. 607 00:30:16,560 --> 00:30:20,280 So, now let's see what this sort of this cartoon 608 00:30:20,280 --> 00:30:22,140 of transcription looks like. 609 00:30:22,140 --> 00:30:23,700 So, we start off with DNA 610 00:30:23,700 --> 00:30:26,790 with some preassembled TFIID molecule, 611 00:30:26,790 --> 00:30:28,800 and along comes this other green molecule, 612 00:30:28,800 --> 00:30:30,690 which is actually a co-factor, 613 00:30:30,690 --> 00:30:32,760 which then forms this very large complex 614 00:30:32,760 --> 00:30:34,170 with RNA polymerase. 615 00:30:34,170 --> 00:30:37,830 And then a distal activator protein came in 616 00:30:37,830 --> 00:30:39,270 and activated the process. 617 00:30:39,270 --> 00:30:44,270 And this molecule, this bluish molecule that's moved away 618 00:30:44,610 --> 00:30:48,060 from the complex is actually the RNA polymerase. 619 00:30:48,060 --> 00:30:51,810 And that little yellow sort of bead on a string 620 00:30:51,810 --> 00:30:53,640 is actually the RNA product. 621 00:30:53,640 --> 00:30:57,660 So, that gives you a sense of things have to happen quickly 622 00:30:57,660 --> 00:30:59,850 and yet it involves many, many molecules 623 00:30:59,850 --> 00:31:02,460 having to assemble and then disassemble 624 00:31:02,460 --> 00:31:04,170 to give you this reaction to happen. 625 00:31:04,170 --> 00:31:06,810 And in my next lecture, 626 00:31:06,810 --> 00:31:10,380 we'll go into more specific aspects of this reaction, 627 00:31:10,380 --> 00:31:13,470 and particularly during embryonic development 628 00:31:13,470 --> 00:31:16,203 and tissue-specific gene expression.