What is the significance of the invariant nucleotides




















Various types of RNAs see table function in protein synthesis. However, RNA is a very versatile molecule. In some cases, RNA performs functions typically considered DNA-like, such as serving as the genetic material for certain viruses, or roles typically carried out by proteins, such as RNA enzymes or ribozymes. See also: Enzyme ; Ribozyme. RNA is a linear polymer of four different nucleotides Fig. Each nucleotide is composed of three parts: a five-carbon sugar known as ribose; a phosphate group; and one of four bases attached to each ribose.

The four bases are adenine A , cytosine C , guanine G , and uracil U. The combination of base and sugar constitutes a nucleoside. The structure of RNA is basically a repeating chain of ribose and phosphate moieties, with one of the four bases attached to each ribose.

The structure and function of the RNA vary depending on its sequence and length. See also: Nucleotide. It differs by a single change in the sugar group ribose instead of deoxyribose and by the substitution of uracil for the base thymine T. Typically, RNA does not exist as long double-stranded chains as does DNA, but rather as short single chains with higher-order structure due to base pairing and tertiary interactions within the RNA molecule.

They act as adapters that translate the nucleotide sequence of the mRNA into protein sequence. They accomplish this by carrying the appropriate amino acid to ribosomes during the process of protein synthesis. Each cell contains at least one type usually several types of tRNA specific for each of the 20 amino acids. Complementary base-pairing interactions between different parts of the tRNA play a major role in specifying this structure, with additional contributions from higher-order interactions.

At one end of the L shape is the three-nucleotide anticodon sequence that binds its complementary sequence in the mRNA; at the other end is the amino acid specified by the mRNA codon. The base sequence in the mRNA directs the appropriate amino acid—carrying tRNAs to the ribosome to ensure that the correct protein sequence is made.

In the tRNA tertiary structure, a variable-length extra arm is accommodated within the hinge region that joins the two parts of the L-shaped structure. See also: Amino acid. Ribosomes are complex ribonucleoprotein particles that act as workbenches for the process of protein synthesis, that is, the process of linking amino acids to form proteins. Each ribosome is made up of several structural rRNA molecules and more than 50 different proteins, and it is divided into two subunits, termed large and small.

A bacterial ribosome contains one copy each of 5S rRNA approximately nucleotides and 23S rRNA approximately nucleotides in its large subunit, and one copy of 16S rRNA approximately nucleotides in its small subunit. In eukaryotic nucleus-containing organisms, these rRNAs are also present in somewhat longer versions in ribosomes of greater size that also contain an additional rRNA of 5.

See also: Ribosomes. The RNA components of the ribosome account for more than half of its weight. Each of these rRNAs is closely associated with ribosomal proteins and is essential in determining the exact structure of the ribosome.

In addition, the rRNAs, rather than the ribosomal proteins, are likely the basic functional elements of the ribosome. They are actively involved at most, if not all, stages of protein synthesis. In this process, they make direct and sometimes transient contacts, via short stretches of base pairing, with the mRNA, tRNA, and each other.

The nucleotide sequences of many rRNAs from different organisms have been characterized. The secondary stem-loop structures adopted by the RNAs are more highly conserved than the overall primary sequence, indicating that the higher-order structure is the functional one. The Background section has been revised to indicate that intron splicing is part of the mRNA processing step. Our method does not address this issue, but we selected one of the best complete genomic sequences at the time to ensure sequence errors are minimal.

Also, our models are not generative , but rather descriptive. Please specify where is and where is not a novelty of your method. This term seems not scientifically sound. Is an observed difference between cumulative evolutionary slope and ones of random sequences statistically significant? The intended meaning was in the context of simulation of a DNA-like sequence based only on the observed nucleotide frequencies, as multinomial.

Constructing the predictive model, the authors introduced a class of equivalence for the paired nucleotides. In fact, they proposed a code allowing the permutations within neighbouring nucleotides and between neighbouring sequence pairs. Is there a biological basis for these assumptions? This is very interesting question. As the separation to equivalence classes and the slope invariance given the code representation from the class, this invariance is purely mathematical property.

Is your slope measure invariant to different kind of sequence truncation at the beginning of a CDS, its end or repeated loci? The invariance relates only to the proposed coding of nucleotides.

Sequence truncation, repeated loci are likely to induce change in slope. How different length of truncated regions was normalized on a Figs. Is it scalable; if yes, please explain. Thanks for this question. The results shown at Figs. Please indicate your motivation and selection criteria for the organism of interest honeybee. Why genes located on a chromosome 1 only have been used for analysis.

Why Honeybee and why first chromosome? Results for other species might be important for evaluation of the method. Since our newly developed mathematical approach aims at quantifying sequence regularity within genes, we deemed it is necessary to emphasize the application to gene sequences of one organism at a time to see if the method is effective.

During our preliminary analysis November , NCBI Genome Database provided a collection of reference sequences of many prokaryotic and eukaryotic organisms. We set to study the genome of eukaryotes as their genes comprise of exons and introns.

Accurate exon-intron boundaries are crucial information in our work. Therefore, we must exclude any unfinished works or ambiguous sequences. This yields a list of reference genomes for eukaryotic organisms. The genome of Honeybee Apis mellifera was selected for several reasons:. Honeybee is one of several model organisms. We expected that it is subjected to extensive studies, including gene annotations, and the evolution and characterization of the gene structure.

This provides a higher confident over partial genomes of many higher eukaryotes. Instead, we focused on high confident annotated genes located on honeybee chromosome 1. This chromosome has the largest size and contains the highest number of genes.

What is an exact p-value for observed global slopes on Fig. Is it significant? Given the large sample sizes in calculating the spectra, the p values will be highly significant, given the overpowering. Moreover, the significance of p -values may not translate to an efficient classification procedure. Instead we opted classification measures confusion table to describe differences between coding and non-coding regions.

The estimations of specificity, sensitivity, robustness and reproducibility of the method must be reported. It was partially reported. It is important to demonstrate the advantages and disadvantages of your method in a comparison with alternative methods, for instance, the probabilistic models taking in to account triplet code and a relatively high evolutionary conservation of SDS sequences.

We stated that alternative methods all agree in the fact that coding regions translate to more irregular numerical objects while the non-coding regions exhibit increased regularity. Which programming language has been used? All sequence processing tasks were performed with in-house Perl scripts. The programming code and the instruction must be publicly available to research community according to open-source management of Biol.

Direct journal. Page 4. More information is needed for the reader to understand Haar transforms. As it is now, the background is missing which will distract many readers. Also, define the Hadamard square. We gave a brief and informal introduction to wavelets. Mindful of the audience, we avoided technicalities. However, we directed interested readers to important references where detailed descriptions can be found. Page 5. This is not informative.

Similarly, it is unclear to me at least why the slope is invariant with respect to permutations within and among AC and GT and not with respect to other permutations? It would be very difficult and possibly not illuminating to formalize these statements.

Since positive slope means that the average energy at the higher resolution level exceeds the energy in the coarser resolution level, positive slope signifies high irregularity. The signals with excess of irregularity Hurst exponents between 0 and 0. As regards the permutations of AC and GT within the pairs and among the pair, this is true only for s 1 column of Table 1. Note that one element of each column is needed for the invariant coding. Maybe a larger example would help.

This seems fundamental for the method. Thanks for your comments. So our point was that the proposed method leads to no ambiguities and translation is unique. Page 7. The two mechanisms of intron length increase are mentioned, but the description is cursory.

Explain more or drop. The simulation-based test seems simplistic. A permutation test one of many discussed in the literature I think would be more on target.

We are grateful for your suggestion and agree that permutation test here would be more on target. The way we did simulation is a variant of parametric bootstrap, where we selected samples from a multinomial distribution with prescribed probabilities of classes.

As such, the proportions of nucleotides are matched only via expectations and not exactly. Permutation test will keep these proportions fixed. We conducted the permutation test by taking the DNA string, permuting the string 20, times and for each permutation we found the spectral slope. The achieved significance rate for this test is 0. Please see the included figure illustrating the results of the permutation test.

Note that slope Therefore, given the fixed content of nucleotides, the autocorrelation among nucleotides, that is, spectral slopes in the wavelet domain, distinguish genuine genome from random sequences. Page 8. Please explain the biological interpretation of combination slopes. Also, referring to the study of Youden Index, please describe in more detail the sample that was used as a gold standard and reasons it is believed to be a gold standard. As a general remark, why is the honeybee genome considered?

The most natural choice is probably human genome, and even better, several genomes of distantly related species. The scaling separation of exons and introns is a phenomenological observation, and we did not find an explanatory biological interpretation so far. As regards the gold standard, the genome of honeybee is fully sequenced and we know exact locations of exons, introns, or combinations. We agree with you that human genome would be more interesting for the readership or some comparative analysis of genomes in distantly related species.

We thank you for the suggestion and we hope to address this in the future. The matrix W 4 or W 16 is not necessary to be the wavelet transformation. Another row corresponds to another type of the DNA walk.

Since the first row is not used in calculation of the scaling measure, it should be removed. Therefore, the authors should reconsider the matrix W 4 and W 1 6. If there is an implementation reason such that W 4 or W 1 6 is necessary to be the wavelet transformation, the authors should discuss that. The reviewers are correct. The matrices W 4 or W 16 may not necessarily be the wavelet transform matrices. However, there are compelling reasons why these should be selected as wavelet matrices:.

This preservation of energy is critical for coherent definition of spectra,. With non-wavelet matrices the counterpart of slope may not be connected with H in an obvious manner. The first row in D, as the reviewer points out is redundant, but this row is not taken in the calculation of spectral slopes for it corresponds to scaling wavelet coefficients and not the detail. Only detail coefficients are used for spectral assessment.

Non-wavelet matrices can be used if the clustering or classification of the nucleotide sequences is of interest, without being precise about exact degree of scaling. In Results and discussion The advantage of the proposed method is that the scaling measure is invariant to the assignment of nucleotides.

This implies that the method can capture any characteristics of nucleotides e. Nevertheless, the authors showed only an example of intron-exon sequences in honeybee.

As mentioned by the authors in Background , the exon sequences are correlated with GC-content. Therefore, it is validated that the method can capture the characteristics of the GC-content, but it can not be validated for the other characteristics. To claim the universality of the method, the authors should show that the method can be applied to any characteristics of nucleotides e.

The scaling measures slopes indeed depend on the GC-content of nucleotides. However, we demonstrated through our permutation test that when GC-content is fixed, spectral slopes still differ for different sequences.

In this paper, we have taken exons and introns as examples of types of sequences to be analyzed using our methodology. We plan to investigate other examples of sequences in the future. Minor comments 1. In Fig. Major comment: First of all, the Background section improved notably after the first review, and we have no more questions to it.

However, there are still open questions to the study design and methodology. It is very limited hope that the method may be competitive with dozen alternative methods using in the field. Unfortunately, the authors still ignore several general and basic requirements to the methodological works, including the reproducibility analysis and the comparison with known methods.

Many questions including statistical tests and significance values are still open. It is not easy what a benefit of the method for biological applications is.

The authors appreciate time you invested in detailed reading and useful response. Below are our answers to your specific concerns Equivalent classes. It remains unclear to me why authors use a paired nucleotide notation to introduce equivalent classes. Please explain. Is there any difference if we use a three for example, genetic code or four nucleotide notation? Equivalence classes are determined by the invariance of the slopes with respect to permutations of nucleotides.

In this sense, the classes can be thought as mathematical objects and they are not predesigned. It happens that a particular class can be fully described by permutations of the pairs of nucleotides. Using groups of three nucleotides, for example, would not keep the slopes unchanged. Although this pairing is purely mathematical, there are some biological consequences. From the biological perspective, an equivalence class that is based on paired nucleotide notation is more suitable over a three or four nucleotide notation mainly because interesting classification schemes of nucleotide symbols A, C, G, and T based on nucleotide characteristics occur in pairs purine-pyrimidine nucleotides, nucleotides with weak-strong hydrogen bonds, keto-amino nucleotides.

Therefore, in addition to its purely mathematical generative process, the proposed method allows such physicochemical characteristics of nucleotides to be captured simultaneously.

In addition, the use of paired nucleotide notation in equivalence class allows the interference of commonly found CG dinucleotides to be eliminated. It is nonsense. Probably there is a misunderstanding in the terminology use. In trying to assess the classification accuracy of the method, the annotation would mean that we know the ground truth. Of course, the method, once established on the training sample, would be applicable to data where the ground truth is not available.

We selected honeybee genome to be our illustration dataset, partly because it has a complete genome and is considered a model organism. Genomes of other eukaryotic species could be selected as well, of course, but we simply did not analyze them. Does it mean that authors take a complementary non-coding strand of a gene if it is located on the opposite strand?

If so, it is most probably to be incorrect. Given the complementary directions of DNA string on its double helix structure, it is sufficient to represent a DNA molecule by the nucleotide sequence of a single strand.

Figures 5 and 6. Our method used a window size of 32 nucleotides to calculate a cumulative slope. Without defining the combination regions, the slopes were getting averaged for portions of exons and introns together which affected the results. The significance is already provided in the main text, if I am correct. The question is related to a global slope calculation, but it looks like the authors did not understand the question.

The global slope measures the sequence regularity in a gene and ignores the sequence annotation introns, exons. We used this slope to compare the overall regularity of a given honeybee gene vs.

As the method of global calculation requires the sequence length to be a power of 2, each sequence was truncated accordingly. The average percentage of sequence length retained for global slope analysis across all genes is Truncation at the beginning of a gene is also possible but we did not do so. The reviewer commented that sequence truncation is also possible at other locations within a gene at the beginning of a CDS, its end, or repeated loci.

We did not examine such scenarios since i it is impractical to apply such setting to every gene in our gene set, and ii there is no way to ensure how many nucleotides to be removed per a sub-location to ensure the final gene length equals to 2 n. The aim of this paper was not to show the universal applicability of the proposed methodology to a range of species with decoded genome.

We reiterate that its main contribution is determination of scaling exponents in nucleotide sequences that are invariant with respect to assignment of particular nucleotides to numbers and the honeybee genome was used as an illustration in the context of well understood phenomenon of different scaling in exons and introns. Thus the honeybee chromosome 1 was a showcase for the methodology. Looking at the Fig. These nonstandard base pairs are weaker than other common base pairs, hence "wobble" hypothesis [6].

Most cells have a synthetase for each amino acid and the reaction is coupled to the hydrolysis of ATP. The process begins with ATP being hydrolyzed and donating an AMP , which binds on the carboxyl group of the amino acid thus forming an adenylated amino acid. The AMP is then transferred to a hydroxyl group on either the 2' or 3' carbon of the 3'-end nucleotide on the tRNA molecule, which allows the formation of an ester bond with the tRNA, therefore finally forming the aminoacyl tRNA [7]. Each class of aminoacyl tRNA synthetases - class 1 and 2 - bind to different faces on the underside of the tRNA molecules, and each group includes enzymes specific for 10 out of the 20 fundamental amino acids.

Jump to: navigation , search. Personal tools Log in. Namespaces Page Discussion.



0コメント

  • 1000 / 1000