Professional Documents
Culture Documents
Syntax
SeqNT = aa2nt(SeqAA) SeqNT = aa2nt(SeqAA, ...'GeneticCode', GeneticCodeValue, ...) SeqNT = aa2nt(SeqAA, ...'Alphabet' AlphabetValue, ...)
Input Arguments
SeqAA
String of single-letter codes specifying an amino acid sequence. For valid letter codes, see the table Mapping Amino Acid Letter Codes to Integers. Unknown characters are mapped to 0. Row vector of integers specifying an amino acid sequence. For valid integers, see the table Mapping Amino Acid Integers to Letter Codes. MATLAB structure containing a Sequence field that contains an amino acid sequence, such as returned by fastaread, getgenpept, genpeptread, getpdb, or pdbread.
Examples: 'ARN' or [1 2 3]
GeneticCodeValue Integer or string specifying a genetic code number or code name from the table Genetic Code. Default is 1 or 'Standard'.
Tip If you use a code name, you can truncate the name to the first two letters of the name.
AlphabetValue
Output Arguments
SeqNT Nucleotide sequence specified by a character string of letter codes.
Description
SeqNT = aa2nt(SeqAA) converts an amino acid sequence, specified by SeqAA, to a nucleotide sequence, returned in SeqNT, using the standard genetic code.
In general, the mapping from an amino acid to a nucleotide codon is not a one-to-one mapping. For amino acids with multiple possible nucleotide codons, this function randomly selects a codon corresponding to that particular amino acid. For the ambiguous characters B and Z, one of the amino acids corresponding to the letter is selected randomly, and then a codon sequence is selected randomly. For the ambiguous character X, a codon sequence is selected randomly from all possibilities.
SeqNT = aa2nt(SeqAA, ...'PropertyName', PropertyValue, ...) calls aa2nt with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:
code to use when converting an amino acid sequence to a nucleotide sequence. GeneticCodeValue can be an integer or string specifying a code number or code name from the table Genetic Code. Default is 1 or 'Standard'. The amino acid to nucleotide codon mapping for the Standard genetic code is shown in the table Standard Genetic Code. Tip If you use a code name, you can truncate the name to the first two letters of the name.
SeqNT = aa2nt(SeqAA, ...'Alphabet' AlphabetValue, ...) specifies a nucleotide alphabet. AlphabetValue can be 'DNA', which uses the symbols A, C, G, and T, or 'RNA', which uses the symbols A, C, G, and U. Default is 'DNA'.
Code Name
Code Number
9 10 11 12 13 14 15 16 21 22 23 Echinoderm Mitochondrial Euplotid Nuclear Bacterial and Plant Plastid Alternative Yeast Nuclear Ascidian Mitochondrial Flatworm Mitochondrial Blepharisma Nuclear Chlorophycean Mitochondrial Trematode Mitochondrial
Code Name
Standard Genetic Code Amino Acid Name Alanine Arginine Asparagine Aspartic acid (Aspartate) Cysteine Glutamine Glutamic acid (Glutamate) Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Amino Acid Code
A R N D C Q E G H I L K M F
Nucleotide Codon
GCT GCC GCA GCG CGT CGC CGA CGG AGA AGG ATT AAC GAT GAC TGT TGC CAA CAG GAA GAG GGT GGC GGA GGG CAT CAC ATT ATC ATA TTA TTG CTT CTC CTA CTG AAA AAG ATG TTT TTC
Amino Acid Name Proline Serine Threonine Tryptophan Tyrosine Valine Asparagine or Aspartic acid (Aspartate) Glutamine or Glutamic acid (Glutamate) Unknown amino acid (any amino acid) Translation stop Gap of indeterminate length Unknown character (any character or symbol not in table)
Nucleotide Codon
CCT CCC CCA CCG TCT TCC TCA TCG AGT AGC ACT ACC ACA ACG TGG TAT, TAC GTT GTC GTA GTG
Random codon from D and N Random codon from E and Q Random codon
TAA TAG TGA --???
X * ?
Examples
y y y y y
Convert an amino acid sequence to a nucleotide sequence using the standard genetic code.
aa2nt('MATLAP') ans = ATGGCGACGTTAGCGCCG
y y y y y y y y y
Convert an amino acid sequence to a nucleotide sequence using the Vertebrate Mitochondrial genetic code.
aa2nt('MATLAP', 'GeneticCode', 2) ans = ATGGCAACTCTAGCGCCT
Convert an amino acid sequence to a nucleotide sequence using the Echinoderm Mitochondrial genetic code and the RNA alphabet.
aa2nt('MATLAP','GeneticCode','ec','Alphabet','RNA')
y y y y y y y y y y y
ans = AUGGCCACAUUGGCACCU
Index to this page The RNA Codons The DNA Codons Codon Bias Exceptions to the Code
it signals the start of translation it codes for the incorporation of the amino acid methionine (Met) into the growing polypeptide chain
The genetic code can be expressed as either RNA codons or DNA codons. RNA codons occur in messenger RNA (mRNA) and are the codons that are actually "read" during the synthesis of polypeptides (the process called translation). But each mRNA molecule acquires its sequence of nucleotides by transcription from the corresponding gene. Because DNA sequencing has become so rapid and because most genes are now being discovered at the level of DNA before they are discovered as mRNA or as a protein product, it is extremely useful to have a table of codons expressed as DNA. So here are both. Note that for each table, the left-hand column gives the first nucleotide of the codon, the 4 middle columns give the second nucleotide, and the last column gives the third nucleotide.
UUC Phe UUA Leucine (Leu) UUG Leu CUU Leucine (Leu) C CUC Leu CUA Leu CUG Leu AUU Isoleucine (Ile) AUC Ile A AUA Ile
UCC Ser UCA Ser UCG Ser CCU Proline (Pro) CCC Pro CCA Pro CCG Pro
UAC Tyr UAA STOP UAG STOP CAU Histidine (His) CAC His CAA Glutamine (Gln) CAG Gln
C A
UGG Tryptophan G (Trp) CGU Arginine (Arg) CGC Arg CGA Arg CGG Arg AGU Serine (Ser) AGC Ser AGA Arginine (Arg) AGG Arg U C A G U C A G
ACU Threonine AAU Asparagine (Thr) (Asn) ACC Thr ACA Thr AAC Asn AAA Lysine (Lys) AAG Lys
AUG Methionine (Met) or ACG Thr START GUU Valine Val G GUC (Val) GUA Val GUG Val GCU Alanine (Ala) GCC Ala GCA Ala GCG Ala
GAU Aspartic acid GGU Glycine (Gly) U (Asp) GAC Asp GGC Gly C A G GAA Glutamic acid GGA Gly (Glu) GAG Glu GGG Gly
TTG Leu CTT Leu CTC Leu CTA Leu CTG Leu ATT Ile ATC Ile ATA Ile GTT Val GTC Val GTA Val
TCG Ser TAG STOP TGG Trp CCT Pro CAT His CCC Pro CAC His CCA Pro CAA Gln CCG Pro CAG Gln ACT Thr AAT Asn ACC Thr AAC Asn ACA Thr AAA Lys GCT Ala GAT Asp GCC Ala GAC Asp GCA Ala GAA Glu CGT Arg CGC Arg CGA Arg CGG Arg AGT Ser AGC Ser AGA Arg AGG Arg GGT Gly GGC Gly GGA Gly
GTG Val GCG Ala GAG Glu GGG Gly *When within gene; at beginning of gene, ATG signals start of translation.
Codon Bias
All but two of the amino acids (Met and Trp) can be encoded by from 2 to 6 different codons. However, the genome of most organisms reveals that certain codons are preferred over others. In humans, for example, alanine is encoded by GCC four times as often as by GCG. This probably reflects a greater translation efficiency by the translation apparatus (e.g., ribosomes) for certain codons over their synonyms. [More]
Mitochondrial genes
When mitochondrial mRNA from animals or microorganisms (but not from plants) is placed in a test tube with the cytosolic protein-synthesizing machinery (amino acids, enzymes, tRNAs, ribosomes) it fails to be translated into a protein. The reason: these mitochondria use UGA to encode tryptophan (Trp) rather than as a chain terminator. When translated by cytosolic machinery, synthesis stops where Trp should have been inserted. In addition, most
y y y
animal mitochondria use AUA for methionine not isoleucine and all vertebrate mitochondria use AGA and AGG as chain terminators. Yeast mitochondria assign all codons beginning with CU to threonine instead of leucine (which is still encoded by UUA and UUG as it is in cytosolic mRNA).
Plant mitochondria use the universal code, and this has permitted angiosperms to transfer mitochondrial genes to their nucleus with great ease. Link to discussion of mitochondrial genes.
Nuclear genes
Violations of the universal code are far rarer for nuclear genes. A few unicellular eukaryotes have been found that use one or two (of their three) STOP codons for amino acids instead.
selenocysteine. This amino acid is encoded by UGA. UGA is still used as a chain terminator, but the translation machinery is able to discriminate when a UGA codon should be used for selenocysteine rather than STOP. This codon usage has been found in certain Archaea, eubacteria, and animals (humans synthesize 25 different proteins containing selenium). pyrrolysine. In several species of Archaea and bacteria, this amino acid is encoded by UAG. How the translation machinery knows when it encounters UAG whether to insert a tRNA with pyrrolysine or to stop translation is not yet known.
British cosmologist Sir Fred Hoyle was an atheistic evolutionist when he began his inquiry into the chances of a living thing evolving from chemical. (He has since greatly changed his view.) Hoyle has said that if you filled the solar system shoulder-to-shoulder with blind men shuffling Rubik's cubes randomly (this would mean 1050 blind men), the chances of getting one simple long chain molecule of the type on which life depends is the same as all of those blind men simultaneously achieving the solution by random shuffling! He further points out that we would then only have one single useless molecule compared to the intricate and interrelated machinery of a functioning, living cell.
supercomputer (as of the end of 1999). The proposed speed of Blue Gene is hardly the speed of life. Go life! y y
y y
y y
y y
y y y
y y
y y
y y
biology. "The really important work that can be done with this technology is in smaller-scale simulations rather than the demonstration project of protein folding." For IBM, Blue Gene is a research program of its renowned Watson Labs.But it is the expected trickle-down of research knowledge into commercial uses that justifies the company's $100 million investment on Blue Gene. Since mid-1998, IBM has jumped from third in supercomputer installations worldwide, after the Cray division of Silicon Graphics and Sun Microsystems, to the top spot. In that time, IBM has nearly doubled its share of the 500 most powerful machines, from 75 to 141 last month, according to "The Top 500 Supercomputing Sites," a list compiled by three academic supercomputer experts. At the same time, the number of installations for Cray fell sharply while Sun Microsystems held steady. "There is no doubt in our mind that a lot of that improvement is because of what we learned with Deep Blue," said Paul Horn, senior vice president of research. "The payoff can be enormous." Several IBM supercomputers are already at work on the human genome project worldwide, including one that is host to one of the project's central databases in Toronto. The announcement Monday, just as the project is getting under way, is also clearly an image-burnishing step by IBM, intended to emphasize its commitment to supercomputing and to research. Blue Gene, experts agree, is a multidisciplinary endeavor requiring not only computer hardware, software and manufacturing expertise but also mathematicians, biologists, chemists and physicists. In addition, the Blue Gene project should serve as a kind of recruiting tool for IBM research -- and perhaps serve as a venture that could lift the stature of computer-science research in general. Such a lift, according to Kennedy of Rice University, is badly needed. Computer talent, to be sure, has perhaps never been in such great demand as it is today. Yet the excitement of Internet start-ups and the lure of stock options, Kennedy notes, has meant that computer-science students increasingly shun graduate studies and advanced research. "A few projects like this could re-establish research institutions -- academic or corporate -- as centers of excitement in computing," Kennedy said. "It's going to bring some of those minds back." The frontier of computational biology is certainly a field that can stir excitement in the research community as well as hold out the promise of being a huge industry someday. In the last few years, IBM has built a 30person team of researchers in computational biology. IBM hopes its supercomputer project will stimulate the field. "We want to attract significant interest and involvement from university researchers and from the scientific community in general," Dr. Sharon Nunes, a senior research manager, said. "If we can influence this fundamental research, it will happen faster." The computing innovation behind Blue Gene, in essence, is to build a computer that works much as nature works -- a triumph, if it succeeds, of marrying simplicity and complexity. The computer scientists at IBM plan to sharply simplify the RISC (reduced instruction-set computing) architecture used in the chips that run engineering work stations and supercomputers today. The "instruction set" -- the total vocabulary of machine-language instructions a computer understands -- will number 57 for Blue Gene, compared with about 200 for most RISC machines. Then, instead of putting a single microprocessor on a chip, Blue Gene will have 32 microprocessors -- the calculating engines of computers -- on each chip. Sixty-four such chips will be inserted on each motherboard, with eight motherboards in each of the 64 computing towers of Blue Gene. When completed, Blue Gene will stand about six feet high, occupying a floor space of 40 feet by 40 feet at the Watson labs in Yorktown Heights, N.Y. It will have a total of about 1 million microprocessors. Among the innovations computer scientists find most impressive about Blue Gene is that IBM will place memory for storing data on the same chip as the microprocessor. In conventional computer designs, the memory for storage is separate from the processor. Shuttling data from the memory to the processor is a major bottleneck in computers, slowing them down. Only within the last year or so, because of advances in chip making and miniaturization, has it become possible to consider putting memory and processing on the same chip in the way that IBM is developing. To attain the speeds Blue Gene seeks within five years, IBM must try a new architecture of computing. The conventional wisdom holds that microprocessor speeds can theoretically double every 18 months, a phenomenon known as Moore's Law, for Gordon Moore, the chip pioneer who first observed it. With Moore's Law, it would take about 15 years to achieve the speed target for Blue Gene.
y y
y y y
"There's no way you get to where IBM is heading unless you change today's computing architecture," said Arvind, a computer scientist at the Massachusetts Institute of Technology, who uses only a single name. "It looks as if they have an outstanding engineering plan. If they can execute it properly, it will be a real breakthrough." Blue Gene's speed target is a petaflop -- that is, a thousand trillion floating point operations, or calculations, each second. Such a speed would make the machine 500 times faster than the two fastest supercomputers in operation today -- an IBM supercomputer at the Lawrence Livermore national laboratory, and an Intel machine at the Los Alamos lab. To translate Blue Gene's speed into a personal computer scale: If a fast PC was represented as an inch tall, the IBM machine would be 20 miles high. The hardware design of Blue Gene is innovative indeed, but the real challenge, as is so often the case in computing, will be the software. For in simplifying the hardware design for speed, the complexity of protein folding is left to the software. And the software, among other things, has to be "self-healing" so that the simulation does not grind to a halt if a few processors break down. The software must recognize the flawed processors and re-route the data. "We have some idea how we're going to do this," said Marc Snir, a senior researcher at Watson. "But I would be lying if I said we have solved this. We do have research to do." If all the computer wizardry works as planned, it will still take Blue Gene about a year to simulate on the computer the folding of a single protein. How long does it take the body to fold one? Less than a second. "It is absolutely amazing the complexity of the problem and the simplicity with which the body does it every day," Ajay Royyuru, a researcher in IBM's computational biology center, noted. y
y y y y y
y y y y
Version 1.2 2004 Trevor Mander, www.DeepScience.com General Observations 1. Darwinists use the news media to cast all opponents as religious dogmatists preventing learning inserting religion into secular school 2. All these are attacks against character (ad hominem) but which don't deal with the scientific issues. 3. The fact remains that there are scientific problems with Darwinism that are quite independent of what anybody thinks of the Bible. 4. In addition, the doctrine of Darwinism can be shown to be a philosophical assumption not proved by scientific observation. 5. Intelligent design includes a belief in God, Darwinism includes a belief in materialism. Both are "religious" or philosophical worldviews. 6. Materialism and Naturalism did not found modern science, but the belief in God did: For example, Johannes Kepler, Blaise Pascal, Robert Boyle, Nicolaus Steno, Isaac Newton, Michael Faraday, Louis Agassiz, James Young Simpson, Gregor Mendel, Louis Pasteur, William Thomson (Lord Kelvin), Joseph Lister, James Clerk Maxwell, William Ramsay. 7. There is a difference between origin science (a type of forensic science which looks into evidence for past events) and operation science (which is observation of current events). Intelligent Design 1. Entropy (chaos) is increasing - therefore there was a beginning to the universe 2. Time is Limited - only a finite number of moments before this one 3. Limited Causality - can't have infinite series of causes of "being". 4. The universe had a beginning - three logical possibilities: Uncaused (but nothing never causes something) Self-caused (but it would have to exist before it existed in order to
y y
y y
cause its own existence - which is silly - like pulling yourself into the air by tugging on your shoe laces.) Caused by another - the most logical choice 5. Anthropological principle - Earth is so finely balanced to support life that it is practically impossible (as opposed to theoretically impossible) that this would have come about by random chance. 6. Intelligent Design arguments do not rule out religious solutions to the problem of pain and suffering in the world like materialistic Darwinian evolution does. 7. Specified complexity exists in all living things. There is no simple life - even a single celled amoeba has the complexity of the city of London and reproduces that complexity in only 20 minutes. Primary information is the chemical structure of something. Eg, cover a school white board with marker ink - the ink has a chemical structure. Secondary information is information that is added on top of and in addition to the chemical structural layer. Eg, write "take out the rubbish please" on the school white board. The ink still has exactly the same chemical structure of ink but that structure now also contains a second level of information. The message carried has nothing to do with the chemical structure - it has its own meaning. Just as books are not just complicated (binding, pages, ink etc) living things are more than just complicated groupings of chemicals. Both these things have specified complexity, not just complexity. It's the difference between any old mountain and Mt Rushmore. There is no natural process which can blindly construct secondary information structures. Only a guided process of construction can do it, ie. Either copying information from one place to another or the presence of an intelligent designer. DNA, the building structure for all living things has a chemical structure and a secondary information structure. If written down, the code for a human being might cover 500,000 pages of text. The presence of a digital watch lying in a field would point to a intelligent designer because of its specified complexity. The simplest form of life is more complicated than a jumbo 747 jet plane. Darwinian Evolution Darwinism Defined: Many transitional forms will be found (there are fewer today than in Darwin's day, some forms were found to be fakes) New species will be made (they have not) Purely natural processes (natural selection and random mutation) have created the different species observed today Common Problems: 1. Incorrect Distinction: There is a difference between micro-evolution, and macroevolution. Evidence given for evolution is (almost always) evidence of micro-evolution, basically that's small changes within species. Everybody accepts that micro-evolution (commonly known just as "evolution") occurs. Evidence for (micro) evolution is not evidence for macro evolution, darwinian evolution, or big changes from one species to another. 2. Begging the question, avoiding the issue, and materialism "You have a religious bias, why can't you accept the findings of science?" "All events are natural events because supernatural events don't happen." 3. Distancing the problem doesn't make it go away "Ok, so life didn't form on earth, it came from outer space." 4. Not understanding the problem "simple life is easy to make."
y y y y
5. Strawman arguments that also attack character not the issue: "All creationists are biblical literalists" "All creationists believe the universe is 10,000 years old." "All creationists are fundermentalist Christians who don't have proper training in science." 6. Absence of pre-cambrian fossil ancestors. The missing link is still missing. There are a bunch of fossils in the ground. People can see links between them in the same way that people see shapes in the clouds. Recognising similar design is not the same as showing causal order. 7. Can't offer a solution to the existence of pain and suffering 8. About 90% of people don't believe in Darwinian evolution 9. It is essentially a religious dogma unsupported by the scientific evidence 10. Changes in Finch beak length shows adaptation or micro evolution, it is not proof that humans are the result of a random, purposeless, materialist universe, slowly being accidentally changed from an amoeba. 11. Ultimately Darwinian evolution can't explain: The origin of first life (incredible specified complexity) The origin of species (fundermentally different forms of specified complexity)