You are on page 1of 3

Problem Set 5 Name:

Genetics 0350/Ernst
Due November 8, 2012 by 9:30 am
10 points total


You will need to access the internet to do this problem. You will be using a website run by the
National Center for Biotechnology (NCBI), which is used extensively by biologists. Tools on this
site include PubMed, which is used to search the scientific literature for papers, and BLAST,
which is used to find similarity between an entered nucleotide or protein sequence and the
databases. All sequences that biologists publish are added to the databases. These may be
from individual research labs or from genome projects.

This Word document has hotlinks to take you to the appropriate web pages. If it is easier since
you can cut and paste your sequences, you may type your answers in this document and turn it
in.

For this problem you are going to do a mini-genome project. You are going to assemble a small
contig, translate the DNA into all six reading frames, and identify the encoded protein. This is
essentially what is done in genome projects except that the DNA sequences are longer, there
are more DNA sequences, and not all ORFs encode proteins with similarity to known proteins.

a) 2 points. You have three sequences from a genomic DNA library. Use these three
sequences to create a sequence contig. The sequences are written from 5 to 3.

Sequence 1: T A C T G T T A C A C A G C T C T G A C A A A C G G G


Sequence 2: T G T A A C A G T A T C C T G G G C G A A C A T C T A


Sequence 3: G C T G G T T A G A G C A G C G C C C G T T T G T C


Contig:


5-TAGATGTTCGCCCAGGATACTGTTACA-3

5-TACTGTTACACAGCTCTGACAAACGGG-3

5-GACAAACGGGCGCTGCTCTAACCAGC-3


5-TAGATGTTCGCCCAGGATACTGTTACACAGCTCTGACAAACGGGCGCTGCTCTAACCAGC-3


3-ATCTACAAGCGGGTCCTATGACAATGTGTCGAGACTGTTTGCCCGCGACGAGATTGGTCG -5
b) 3 points. Take your contig sequence and translate it in all six reading frames. You will need
to use the one letter amino acid code in order to do part (c). I posted a codon table on
CourseWeb under the Useful Documents tab that gives the one letter amino acid code. For
each frame, you may stop translating if you reach a stop codon. The order that you do this (i.e.
which frame is 1, which is 2, etc.) does not matter.

Frame 1: TAG ATG TTC GCC CAG GAT ACT GTT ACA CAG CTC TGA CAA ACG GGC GCT GCT CTA ACC AGC
*
Frame 2: T AGA TGT TCG CCC AGG ATA CTG TTA CAC AGC TCT GAC AAA CGG GCG CTG CTC TAA CCA GC
R C S P R I L L H S S D K R A L L *
Frame 3: TA GAT GTT CGC CCA GGA TAC TGT TAC ACA GCT CTG ACA AAC GGG CGC TGC TCT AAC CAG C
D V R P G Y C Y T A L T N G R C S N Q
Frame 4: GCT GGT TAG AGC AGC GCC CGT TTG TCA GAG CTG TGT AAC AGT ATC CTG GGC GAA CAT CTA
A G *
Frame 5: G CTG GTT AGA GCA GCG CCC GTT TGT CAG AGC TGT GTA ACA GTA TCC TGG GCG AAC ATC TA
L V R A A P V C Q S C V T V S W A N I
Frame 6: GC TGG TTA GAG CAG CGC CCG TTT GTC AGA GCT GTG TAA CAG TAT CCT GGG CGA ACA TCT A
W L E Q R P F V R A V *

c) 2 points. For the two reading frames where you did not find a stop codon (in other words,
the sequence is in an open reading frame), compare the protein sequence to the protein
sequence database by going to: http://www.ncbi.nlm.nih.gov/BLAST/

Under Basic BLAST, click on protein blast. Type or paste each protein sequence using the one
letter amino acid code into the Enter Query Sequence box separately (i.e. you have to do two
separate searches). Click the blue BLAST box at the bottom of the page. In a few seconds to a
few minutes your search results will be displayed. At the top of the page there is a summary of
conserved domains if present and a graphical representation of similar proteins, in the middle
of the page is the list of similar protein, and at the bottom of the page are the actual sequence
alignments. The E value on the right of the list of proteins is a measure of how good the
similarity is and should be very low.

If you got stuck on part (b), you can do this part using the nucleotide sequences in part (a). In
this case, choose nucleotide blast under Basic BLAST. Enter the query sequences, and then
under Choose Search Set click on Others and pick Nucleotide collection (nr/nt).

In order to get credit for this part, you must show that you did the BLAST searches on your
own by printing the first page of the BLAST reports and attaching it to your problem set. The
report has a Query ID, date, and time, which will be unique with every BLAST search run.


What is the product of this gene? The product of this gene is Fibrillin 1.


Can you tell what species the DNA sequence came from? Why or why not? You can tell which
species the DNA sequence came from because the very low e-value that shows it has good
similarity as well as on the BLAST results, it confirms that the protein comes from a known
gene.






d) 1 point. Now go to the Genetics Home Reference website at http://ghr.nlm.nih.gov/, which
is a service of the US National Library of Medicine. Search with the name of the gene that you
have found. What syndrome is associated with this gene? (Use the first one listed; dont look
at Weill-Marchesani syndrome, which can be caused by more than one gene.)


The syndrome associated with this gene (FBN1) is Marfans Syndrome.



e) 2 points. This syndrome is inherited in an autosomal dominant fashion. If someone has the
syndrome, what are the two possible ways that they got a mutated allele?


The two possible ways they got a mutated allele is one they inherited it from a parent who had it
as well, and since it is dominant, it cant be hidden like with a autosomal recessive syndrome. A
second way to get this mutated allele is if there is a new mutation on the FBN1 gene, which can
occur when the family shows no history of the disorder.

You might also like