How is eukaryotic transcriptional regulation achieved
Beyond the bHLH, these transcription factors may merge into a leucine zipper motif or other protein interaction domain for dimerization. Though the primary binding domain is N-terminal to H1, the H1 domains also appear to play a role in binding the major groove of the DNA. NF-kB nuclear factor kB is a ubiquitous transcription factor discovered and most noticeable in the immune system. When active, it is a heterodimer, with both subunits containing a Rel homology region RHR.
Rel is an oncogene, and the RHR are named for their similarity to the previously-sequenced rel. Just as with the other types of transcription factors, some RHR-containing proteins are repressors, while others are activators.
The regulation of NF-kB is rather interesting: once it is in the nucleus, it is generally active. However, it is, as almost all cellular proteins, made in the cytoplasm.
Inhibitors of NF-kB IkB also reside in the cytoplasm, and they act by binding the NF-kB and covering the nuclear localization signal that allows its import into the nucleus.
Thus sequestered, the NF-kB must remain in the cytoplasm inactive until some stimulus activates IkB kinase, which phosphorylates the IkB and leads to ubiquitination and degradation, finally releasing the NF-kB from its bonds. Not surprisingly for a factor discovered in the immune system, it is activated in response to bacterial and viral antigens, as well as other types of cellular stress or insult. The longer the promoter, the more available space for proteins to bind. This also adds more control to the transcription process.
The length of the promoter is gene-specific and can differ dramatically between genes. Consequently, the level of control of gene expression can also differ quite dramatically between genes. The purpose of the promoter is to bind transcription factors that control the initiation of transcription. Within the core promoter region, 25 to 35 bases upstream of the transcriptional start site, resides the TATA box. Some of these transcription factors help to bind the RNA polymerase to the promoter, and others help to activate the transcription initiation complex.
In addition to the TATA box, other binding sites are found in some promoters. Some biologists prefer to restrict the range of the eukaryotic promoter to the core promoter, or polymerase binding site, and refer to these additional sites as promoter-proximal elements, because they are usually found within a few hundred base pairs upstream of the transcriptional start site.
Specific transcription factors can bind to these promoter-proximal elements to regulate gene transcription. A given gene may have its own combination of these specific transcription-factor binding sites. There are hundreds of transcription factors in a cell, each of which binds specifically to a particular DNA sequence motif. When transcription factors bind to the promoter just upstream of the encoded gene, it is referred to as a cis-acting element , because it is on the same chromosome just next to the gene.
Transcription factors respond to environmental stimuli that cause the proteins to find their binding sites and initiate transcription of the gene that is needed. Evolutionary changes in differential gene expression among sexes have been documented Schiff et al. Microarray surveys suggest that populations can harbor variation in which genes are expressed in a sex-specific manner Jin et al.
In multicellular organisms, many genes are expressed in a succession of spatially and temporally distinct phases during the life cycle for examples see Gerhart and Kirschner [] ; Carroll, Grenier, and Weatherbee [] ; and Davidson [].
Examples include several independent losses of patterning roles for homeodomain transcription factors in arthropods Dawes et al. Conversely, a new regulatory linkage may be established if a promoter acquires a binding site for a different transcription factor, a process known as recruitment or co-option Duboule and Wilkins ; Wilkins Many likely cases have been identified Lowe and Wray ; Saccone et al.
Evolutionary gains and losses of particular phases of gene expression may be facilitated by the modular organization of promoters see section 2. Promoter function has both a biochemical phenotype, the gene expression profile, as well as an organismal phenotype, involving features such as anatomy, physiology, life history, and behavior. These biochemical and organismal effects are evolutionarily dissociable to some extent, because some changes in gene expression appear to have no consequence for organismal phenotype.
Such changes in gene expression are analogous to conservative amino acid replacements in a protein table 5 , many of which are likewise thought to have no impact on organismal phenotype Kimura ; Gillespie Several cases are known where the timing or spatial extent of gene expression differs among species without any obvious phenotypic consequence e. Although it may be difficult to demonstrate beyond any doubt that a particular difference in transcription is phenotypically silent, the opposite case is easier to establish.
Differences in gene expression have been linked to diverse aspects of organismal phenotype, including: 1 anatomy Burke et al. The genetic basis for an observed difference in the expression of a particular gene in some cases does not reside in cis , but rather within one of the loci encoding transcription factors that interact with it.
Three classes of mutations can underlie these trans effects. Numerous experiments demonstrate that this trans effect is pervasive: manipulating the expression of a transcription factor typically alters the expression of its downstream targets Gilbert ; Alberts et al. Although many evolutionary differences in the expression profiles of transcription factors are known see section 4.
Indirect evidence of an evolutionary role comes from phenotypic correlates of interspecific differences in transcription factor expression e. Amino acid substitutions in DNA binding domains of transcription factors can affect the expression of downstream genes e. Such changes are apparently relatively rare, as the amino acid sequences of DNA binding domains are usually highly conserved Duboule ; Latchman Nonetheless, variants are sometimes found within populations e. Interspecies gene-swapping experiments support this view: in a surprising number of cases, a vertebrate gene encoding a transcription factor can restore a somewhat wild-type phenotype to a fly that is homozygous for a null allele of the orthologous gene e.
Few gene swaps rescue phenotype perfectly and some fail almost completely, however, which may be due in part to changes in DNA binding specificity. Again, experiments provide evidence of this third class of trans effects on transcription Hope and Struhl ; Dawson, Morris, and Latchman Functional changes in protein-protein interaction domains have evolved in Hox transcription factors within the Arthropoda Galant and Carroll ; Ronshaugen, McGinnis, and McGinnis and in serum response transcription factors between arthropods and chordates Avila et al.
Sequence comparisons suggest that amino acid substitutions in protein-protein interaction domains can evolve rapidly under positive selection Sutton and Wilkinson ; Barrier, Robichaux, and Purugganan All three classes of trans effects mentioned above are likely to be highly pleiotropic because of the large number of downstream target genes that would be affected see section 3.
The classic modes of natural selection that operate on coding sequences and morphology also operate on promoter sequences also see section 2. Many deleterious promoter alleles have been identified in humans, involving a wide range of genes and phenotypic consequences summarized in Cooper Cases of long-term conservation of binding sites see section 4.
Likely cases include some histocompatibility loci in humans and mice Guardiola et al. Reasons for overdominant selection on coding sequences of these loci are reasonably well understood, and transcription profiles should be under selection for variation in the cell type in which they are expressed Guardiola et al. Environmental heterogeneity within the range of a single species can result in local adaptation and balancing selection.
When binding sites within a promoter differ but the resulting expression profile is unchanged, stabilizing selection may be operating. Some haplotypes contain a second mutation within the promoter that adds a third Sp1 binding site, elevating transcription and resulting in an improved prognosis Romey et al.
The third Sp1 site never occurs in haplotypes that produce wild-type protein, suggesting that it may be under positive selection as a result of its compensatory effect. The structure and function of promoter sequences are profoundly different from those of coding sequences table 6. These differences impose nontrivial challenges for studying the evolution of transcriptional regulation. Coding sequences have a regular, direct, precise, and easily interpreted relationship with their proximate biochemical phenotype, a specific sequence of amino acids.
In contrast, promoters have an idiosyncratic, indirect, nonlinear, and context-dependent relationship with their proximate phenotype, a particular transcription profile. Furthermore, the transcription profile generated by a promoter depends on other loci that encode transcription factors that bind to it, and on the loci encoding the transcription factors that regulate these immediate upstream regulators, and so forth.
This regress transcends generations, in that maternally loaded transcription factors or their mRNAs are required to activate early zygotic gene expression. Environmental influences on gene expression add a further layer of complexity. Although the amino acid sequence of a protein rarely changes during the course of development or in response to environmental conditions, the transcription profile of most genes is modulated during the life cycle and in response to changing external conditions.
Even when differential splicing produces distinct protein isoforms from the same locus under different circumstances, the relationship between DNA and protein sequence remains direct. It is important to recognize that sequence data alone cannot reveal the organization of binding sites within a promoter; nor can they show what proteins bind to them, or how they function, or what transcription profile they generate.
This is partly a matter of missing information: for instance, the full matrix of binding sequences is not yet known for most transcription factors, even in well-studied species. But it is mostly an inescapable consequence of the way transcription is regulated: many potential binding sites have no influence on transcription in vivo, sequences essential for transcription always reside both cis and trans , and transcription can be strongly influenced by genetic background, physiological status, and environmental conditions.
Because the sequences bound by transcription factors are short and imprecise see section 3. Only a fraction of these binding sites actually influence transcription Latchman ; Weinzierl ; Li and Johnston ; Lee et al. Potential binding sites may not function for a variety of reasons see section 3.
Which sites actually influence transcription, and are therefore possible targets of selection, can only be determined experimentally. Biochemical characterizations can identify binding sites precisely and are the only way to determine whether consensus sequences differ among species. The most common methods are footprinting and mobility shift assays Carey and Smale Because these assays are carried out in vitro, they cannot reveal the influence of chromatin modulation on protein binding or transcription.
Assays of in vivo binding sites Walter and Biggin ; Ren et al. The only definitive means of identifying a binding site with a role in regulating transcription is to modify its sequence and assay transcription in vivo, typically by transient or stable transformation with a reporter gene see section 5.
All of the methods mentioned above require considerable effort when used to test a potential binding site at multiple phases of the life cycle and under a variety of environmental conditions. In practice, most promoters have only been searched for potential binding sites at a restricted phase of the life cycle and under uniform culture conditions. For this reason, experimentally verified binding sites are nearly always an underestimate, and the physical extent of a promoter is rarely well defined.
The resulting difficulties for studying promoter evolution are substantial. Few promoters have been subjected to thorough searches for binding sites, and some binding sites probably remain unidentified even in carefully studied cases. Information about the functional consequences of binding site differences among species is limited to just a few cases e.
Because of the way promoter function is typically studied, some kinds of binding sites are naturally less likely to be discovered. These include binding sites that mediate responses to physiological status and environmental conditions because most assays are carried out under uniform conditions , binding sites that act at restricted times during the life cycle because typically only part of the life cycle is assayed , and binding sites of weak effect because of assay insensitivity.
In addition, most studies measure either quantitative or spatial aspects of transcription and some ignore temporal changes; as a result, the binding sites that are identified are often biased with respect to their effects on time, space, and level of transcription.
Because empirical validation of binding sites is laborious, attempts have been made to increase the reliability of informatic approaches to binding site identification. We discuss here a few of the many approaches which have been developed for additional information, see Hardison [] ; Stormo [] ; Ohler and Niemann [] ; and Markstein and Levine []. Most informatic approaches apply either to a specific locus or to a complete genome. In the former category are programs that use databases of known binding site matrices to scan a sequence for potential binding sites e.
However, many of the potential binding sites these programs identify have no biological function and are simply spurious matches to a binding site see previous paragraphs and section 3. This method can successfully identify previously unknown binding sites Loots et al.
The effectiveness of this method is limited, however, because nucleotides can be conserved by chance, because real binding sites can turn over even when the transcriptional output is maintained, and because some aspects of transcription are species-specific e. The first problem leads to false positives, whereas the second and third generate false negatives. When a complete genome sequence is available, several additional methods can be applied to identify unknown binding sites.
These algorithms rely on large data sets to identify overrepresented sequence motifs e. For all of these methods, both false positives and false negatives remain a significant issue.
Although methods for informatic detection of binding sites are becoming more sophisticated, for now the results are best viewed as a starting point for empirical validation rather than as a definitive identification of transcription factor binding sites.
A binding site may be occupied by different transcription factors or by none at different times or places during development see section 3. Overlapping binding site specificities have important implications for evolutionary studies: 1 A transcription factor might not influence transcription, even if a consensus binding site for it exists and is known to bind protein.
The presence of a binding site is necessary but not sufficient for transcription factor binding. Indeed, the protein bound most of the time may not be the one whose consensus recognition motif is the closest match. Recognizing cases of varied binding site occupancy requires testing nuclear extracts across developmental stages, among cell types, and under diverse environmental conditions using supershift assays or in vivo footprinting. In addition, an interaction that has been biochemically validated in one species may not occur in another, even if the sequence of the binding site is perfectly conserved.
Demonstrating a conserved or altered protein-DNA interaction requires comparative biochemical data. Once functional binding sites have been mapped, the next step is identifying homologous binding sites among species or alleles. Promoter sequences can usually be aligned rather easily within a species, although binding sites that fall within repeats can be problematic. In the most straightforward interspecific comparisons, potential binding sites that occupy similar positions, spacing, and orientation relative to the start site of transcription and relative to each other are likely to be homologous.
Complications can arise for a variety of reasons: binding site spacing is often functionally unconstrained see section 3. We use the term homologous in its usual, phylogenetic, sense to denote the hypothesis that a binding site is present in two living species because it was present in their latest common ancestor and has persisted since.
Sequence similarity, in contrast, is simply an observation, and can be due to either homology or homoplasy. Once homologous binding sites have been identified, routine methods of comparative analysis can be applied to polarize character state transformations, identify reversals and parallel transformations, and reconstruct ancestral states. Most published comparisons of promoter structure involve just two species, with the emphasis typically on identifying conserved binding sites.
By surveying more taxa and incorporating functional data, it becomes possible to identify origins, losses, and turnover of binding sites. As with all comparative analyses, dense phylogenetic sampling provides a more robust understanding of evolutionary transformations within promoters, particularly in cases of rapid sequence divergence.
The only way to determine the expression profile produced by a promoter haplotype is to assay it in vivo, in its normal chromosomal and cell biological contexts.
This is most easily accomplished by examining the spatial and temporal distribution of transcripts using in situ hybridization, RNA gel blots, or quantitative PCR.
Because even small differences in promoter sequence can alter transcription see section 4. Similarly, the only way to understand the contribution of specific sequence differences within a promoter is to carry out comparative functional tests.
The most common kind of experiment involves coupling a test regulatory region to a reporter gene whose product is easily detected, and then placing this construct in embryos or cells where it is exposed to the array of transcription factors encountered by the endogenous promoter Carey and Smale Further experiments, such as testing the consequences of nucleotide substitutions within a specific binding site, deleting a binding site, altering spacing or orientation between binding sites, or testing restricted portions of the promoter can be immensely informative.
Although such experiments are laborious, they provide almost the only reliable information about binding site function. Fortunately, expression assays are feasible in a growing number of organisms.
For comparative analyses, it is important to carry out reciprocal functional tests, because transcription is a product of both cis and trans sequences. Comparative experimental tests, although unusual in the literature, are necessary for analyzing the evolution of promoter function. Issues of particular interest include the following. A binding site whose sequence is conserved between two species may nonetheless function differently in them, because the transcription factors and cofactors that interact with it are expressed differently or because an adjacent binding site for a cofactor has changed.
Multiple binding sites for the same transcription factor within a single promoter fig. The position, spacing, and orientation of individual binding sites in some cases matters a great deal and in other cases not at all see 3. Although some binding sites may function continuously and ubiquitously, most probably do so only during part of the life cycle, in certain cell types, or in response to particular environmental conditions.
Binding site function may also be context dependent, changing under different circumstances see section 3. Interspecific differences in transcription profiles might be due to changes in cis or trans see sections 4.
The functional consequences of changes in cis can be identified by means of in vivo expression assays examples reviewed in Paigen [] and Cavener []. The difficulty of carrying out comparative expression assays imposes severe practical constraints on analyses of promoter evolution.
Characterizing promoter function involves techniques that are labor intensive and unfamiliar to most molecular evolutionists. Yet without this information, it is difficult to interpret comparative sequence data meaningfully. Few promoters have been analyzed biochemically or functionally in more than one species, and even in these cases analyses have been limited to a fraction of the complete cis -regulatory region. The relative magnitude of likely functional consequences of a mutation within a promoter can be organized into a very rough rank order, as it can be more precisely within exons table 5.
Overall, however, the considerably less regular structure-function relationship within promoters will make it much more difficult to discern general patterns of sequence evolution.
Although tests of selection on promoters are not fundamentally different from tests on coding regions, they must be applied with caveats.
The major problems arise when applying tests that use classes of nucleotide substitutions to promoter data e. These tests classify coding mutations as synonymous or nonsynonymous, and they test for selection under the assumption that synonymous sites evolve neutrally. To apply these tests to promoter sequences, most authors classify promoter mutations as occurring within binding sites or within non-binding-site nucleotides and assume that nonbinding sites are evolving neutrally.
However, the functional consequences of mutations in promoters cannot be classified without additional functional data see section 5. More specifically, practical difficulties in identifying binding sites see section 5.
Binding sites absent in the species for which functional data are available but present in all other species will be missed, while those sites functional in the reference species but not in all other species may mistakenly be considered present in all. Both types of error will result in some sequence differences being classed incorrectly and will degrade the signal-to-noise ratio in tests for selection.
It follows that sequence comparisons among the promoters of closely related species, or classed tests that only use data from one species e. Even fewer data from comparative studies are usually available about the functional consequences of sequence differences within a binding site. Only rarely will it be possible to reliably detect the effect of nucleotide differences on changes in binding specificity between species see section 5.
A second problem with tests that rely on classes of sites relates to the mechanism by which binding sites arise and are presumably selected for. Although an excess of nonsynonymous substitutions relative to synonymous substitutions is good evidence for positive selection, it is difficult to imagine a situation in which an excess of binding-site substitutions relative to nonbinding-site substitutions can be interpreted in the same way.
This follows from three features of promoters. First, binding sites are sometimes not functionally restricted to a specific position. Individual binding sites may therefore turn over by changing position without positive selection Ludwig Second, sequences which have no binding affinity for any transcription factor often need only a single base-pair change in order to become a functional binding site Stone and Wray A single point mutation can therefore establish a new functional site consisting of several nucleotides.
Third, most nucleotide substitutions within a binding site modulate or eliminate its function, whereas relatively few mutations will change it into a binding site for a different transcription factor. Rarely will an excess of substitutions within a binding site be a signal of positive selection, because binding sites often simply cease to function after multiple substitutions. None of these three features precludes selection for changes in binding sites, only that they may combine to significantly reduce the ability of classed tests to detect this selection in practice.
All classed tests of section suffer from these problems, but non-classed tests of neutrality e. Often a combination of these tests, as well as studies of geographic structure of allele frequencies, may be necessary to detect the action of selection in promoters e. No general framework exists for understanding, interpreting, and predicting how transcription evolves. In this section, we present an initial attempt at providing such a framework, in the form of testable hypotheses derived from three sources: models of molecular evolution, mechanisms of promoter function, and empirical evidence.
Our hope is that these hypotheses will encourage investigators to dig a little deeper into their data to address a broad range of questions about promoter evolution. Thus, we emphasize hypotheses that can be tested with available techniques. Because promoters are organized and function differently from other regions of the genome, they are subject to distinct functional constraints. Predictable patterns of sequence evolution should therefore distinguish promoters from sequences that lack a role in transcriptional regulation.
Because most nucleotides in promoters do not affect transcription, most substitutions and many indels should have no functional consequence and should therefore evolve without constraint. Exceptions might include the promoters of genes encoding proteins that tolerate many amino acid substitutions, are under balancing selection, or whose introns contain regulatory sequences e.
Three factors may contribute to a distinctive evolutionary dynamic of length variation within promoter regions: the lack of a reading frame, the low density of functionally important nucleotides, and the ability of many binding sites to operate in a position-independent manner. Indels should be more common, the frequency spectrum of indel size should not be biased toward multiples of three as it is in coding sequences , and repeat variation and large indels should be much more common.
These patterns are evident in some cases e. Codons have an obligate colinearity with the amino acid sequence of their protein product, whereas binding sites and modules within promoters can often function to some extent independently of position and order see section 3.
Gross organizational changes within promoters should be limited largely by mutation, whereas in coding regions such changes should be limited primarily by functional constraints.
Small-scale inversions in promoters can exist within populations e. The analogous process of module shuffling within promoters may occur at a higher frequency. Several examples of mobile element insertions that have brought functional binding sites into range of a gene are known Britten ; Kidwell and Lisch The modular organization of many promoters means that a transcription profile could be dramatically modified in a functionally integrated way.
The output of a promoter derives from the nucleotide sequences and spatial arrangement of transcription factor binding sites see section 3. It follows that sequences that lie between binding sites should be free to vary, at most showing weak biases that reflect mutational processes or weak selection to maintain overall base composition or conformational properties. Nonetheless, some studies have found evidence for preferential conservation of binding sites e.
RNA also contains ribose sugar molecules. The main purpose of the translation process is protein synthesis. Ribosomes reach the mRNA and read the sequence of the bases. Each sequence of three bases is called a codon, and each codon contains the instructions for one amino acid.
The tRNA assembles the protein using the amino acids. The protein continues to be built until a stop codon is encountered. A stop codon is a three-base sequence that does not code an amino acid.
Three different types of RNA molecules are required for transcription and translation, and each type of RNA has a different function. Each codon specifies a particular amino acid the building blocks of a protein. When the ribosome attaches to the mRNA, the codons are read. The start and stop codons do not code for any of the 20 amino acids. Proteins are the result of DNA transcription and translation.
Proteins are macromolecules made of one or more polypeptide chains, which are made up of a sequence of 20 different amino acids.
Proteins bind to other molecules called ligands. After polypeptide chain s are completed, the chain s fold over onto themselves to create a 3-dimensional structure.
The resulting polypeptide chains direct the function of the protein in the cell. Antibodies bind to foreign particles eg, viruses, bacteria to protect the body. White blood cells, specifically B lymphocytes, produce antibodies. Enzymes perform or catalyze chemical reactions in cells eg, muscle contraction. Enzymes assist with bodily functions such as digestion and DNA replication.
0コメント