Professional Documents
Culture Documents
Abstra
t
We explore a problem suggested by Brian Hayes in 1998: what proteins in the two-
dimensional hydrophili
-hydrophobi
(H-P) model have unique optimal foldings? In par-
ti
ular, we prove that there are
losed
hains of monomers (amino a
ids) with this property
for all (even) lengths; and that there are open monomer
hains with this property for all
lengths divisible by four. Along the way, we prove and
onje
ture several stru
tural results
about bonds in the H-P model.
1 Introdu
tion
Protein folding is a
entral problem in bioinformati
s with the potential to reveal an un-
derstanding of the fun
tion and behavior of proteins, the building blo
ks of life. Su
h an
understanding would greatly in
uen
e many areas in biology and medi
ine su
h as drug de-
sign.
One of the most popular models of protein folding is the hydrophili
-hydrophobi
(H-P)
model [2, 4, 6℄, whi
h denes both a geometry and a quality metri
of foldings. This
om-
binatorial model is attra
tive in its simpli
ity, and already seems to
apture several essential
features of protein folding su
h as the tenden
y for the hydrophobi
omponents to fold to
the
enter of a globular protein [2℄. While the H-P model is most intuitively dened in 3-D
to mat
h the physi
al world, in fa
t it is more realisti
as a 2-D model for
omputationally
feasible sizes. The basi
reason for this is that the perimeter-to-area ratio of a short 2-D
hain
is a
lose approximation to the surfa
e-to-volume ratio of a long 3-D
hain [2, 6℄.
Mu
h work has been done on the H-P model. Re
ently, on the
omputational side,
Berger and Leighton [1℄ proved NP-
ompleteness of nding the optimal folding in 3-D, and
Cres
enzi et al. [3℄ proved NP-
ompleteness in 2-D. Hart and Istrail [5℄ have developed a
3/8-approximation for protein folding in 3-D and a 1/4-approximation for protein folding in
2-D.
In this paper we study several stru
tural aspe
ts of the 2-D H-P model. In parti
ular we
explore a problem suggested by Brian Hayes [6℄ about the existen
e of stable protein foldings
Institut
fur Grundlagen der Informationsverarbeitung, Te
hnis
he Universitat Graz, Ineldgasse 16b, A-
8010 Graz, Austria, email: oai
higi.tu-graz.a
.at. Supported by the Austrian Programme for Advan
ed
Resear
h and Te
hnology (APART).
y Fa
ulty of Computer S
ien
e, University of New Brunswi
k, P. O. Box 4400, Frederi
ton, N. B. E3B 5A3,
Canada, email: bremnerunb.
a
z Department of Computer S
ien
e, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada, email:
eddemaineuwaterloo.
a.
x Department of Computing and Information S
ien
e, Queen's University, Kingston, Ontario K7L 3N6,
Canada, email: henk
s.queensu.
a.
{ Departament de Matem ati
a Apli
ada II, Universitat Polite
ni
a de Catalunya, Pau Gargallo 5, 08028
Bar
elona, Spain, email: verama2.up
.es. Supported by DURSI Gen. Cat. 1999SGR00356 and Proye
to
DGES-MEC PB98-0933.
k S
hool of Computer S
ien
e, M
Gill University, 3480 University Street, Montr eal, Quebe
H3A 2A7,
Canada, email: soss
s.m
gill.
a.
1
of all lengths. We solve this problem in a positive sense for
ir
ular protein strands. We
also nearly solve the problem for open strands by exhibiting an innite
lass of proteins with
unique optimal foldings.
More pre
isely, we prove the following main results, in a sense establishing the existen
e
of stable protein foldings:
1. We exhibit a simple family of
losed
hains of monomers, one for every possible (even)
length, and prove that ea
h
hain has a unique optimal folding a
ording to the H-P
model.
2. We exhibit a related family of open
hains of monomers, one for every length divisible by
4, with the same uniquely-foldable property. Note that a result as strong as (1)
annot
be obtained for open
hains, be
ause there are some lengths for whi
h no uniquely
foldable open
hains exist.
We also make an interesting
onje
ture about the stru
ture of optimal protein foldings in
the H-P model; see Se
tion 3.1. Finally, we present experimental results supporting another
onje
ture, strengthing the se
ond result des
ribed above to all even lengths.
2 H-P Model
In this se
tion we review the H-P model and the very basi
s of the biology behind it.
Proteins are
hains of monomers, ea
h monomer one of the 20 naturally o
urring amino
a
ids. In the H-P model, only two types of monomers are distinguished: hydrophobi
(H),
whi
h tend to bundle together to avoid surrouding water, and hydrophili
(P), whi
h are
attra
ted to water and are frequently found on the surfa
e of a folding [2℄. In our gures we
use small gray disks to denote H monomers and bla
k disks to denote P monomers. These
monomers are strung together in some
ombination to form an H-P
hain, either an open
hain (path or ar
) or a
losed
hain (
y
le or polygon).
Proteins are folded onto the regular square latti
e. More formally, a latti
e embedding of
a graph is a pla
ement of verti
es on distin
t points of the (regular square) latti
e su
h that
ea
h edge of the graph maps to two adja
ent (unit-distan
e) points on the latti
e. In the H-P
model, proteins must fold a
ording to latti
e embeddings, so we also
all su
h embeddings
foldings.
The quality of a folding in the H-P model is simply given by the number of hydrophobi
monomers (light-gray H nodes) that are not adja
ent in the protein but adja
ent in the folding.
More formally, the bond graph of a folding has the same vertex set as the
hain, and there is
an edge between every two H verti
es that are adja
ent in the folding onto the latti
e, but
not adja
ent along the
hain. The edges of the bond graph are
alled bonds ; in our gures,
bonds are drawn as light-gray edges.
An optimal folding maximizes the number of bonds over all foldings. Intuitively, if a pro-
tein is folded to bond together many hydrophobi
monomers (H nodes), then those monomers
are hidden from the surrounding water as mu
h as possible.
There is a natural bije
tion between strings in fH; Pg and protein
hains. We
onsider
the nodes in a
hain as labeled by their order in the string. We sometimes use a limited
form of regular expressions to des
ribe
hains where e.g. Hk indi
ates k H nodes in sequen
e.
Similarly, if we walk along an embedded
hain in the order given and read o the dire
tion
of ea
h edge, we
an en
ode foldings as strings in fE; W; N; Sg .
2
3 General Observations and Ambiguous Foldings
In this se
tion we des
ribe some basi
stru
tural and
ombinatorial results about bonds in
the H-P model.
Fa
t 3.1 A folding of an open (
losed)
hain with h H nodes has at most h + 1 (h) bonds.
Corollary 3.3 If a folding of a
losed
hain with hH nodes has h bonds, then its bond graph
is a union of vertex-disjoint even
y
les.
Corollary 3.4 There
an be a bond between two H nodes only if they have opposite parity
(i.e., there is an even number of nodes between them) in the
hain.
Fa
t 3.5 Any optimal folding of the
losed
hain (PHP)4k has a bond graph
onsisting of k
4-
y
les. (See Figure 1.)
Fa
t 3.6 For any n, there exists an n-node
losed
hain with at least 2
(n) optimal foldings,
all with isomorphi
bond graphs. In addition to the trivial example of P n, the
hain(PHP)4k
is an example(see Figure 1).
For any
losed
hain, we
an
onsider the area of a folding, and whether a bond is internal
or external to the folding (polygon).
Conje
ture 3.7 If there is a latti
e point stri
tly interior to a folding of a
losed
hain, then
there is another folding of that
hain with smaller area and no fewer bonds.
A proof of this
onje
ture would support the experimental eviden
e that most proteins
fold into a tightly pa
ked \globule" [2℄.
Fa
t 3.8 For any folding of an n-node
losed
hain there are at most
n 4 internal bonds.
2
Corollary 3.9 There exists a
losed
hain whose optimal folding requires at least one external
bond. Namely, H12 is an example. (See Figure 3.)
n p
Fa
t 3.10 Any folding of an n-node
losed
hain has at most 2 external bonds, where p is
the perimeter of the bounding box of the folding.
Let (k) denote the minimum perimeter of a re
tangle
ontaining k latti
e points.
n (n)
Corollary 3.11 In any folding of an n-node
losed
hain, there are at most
2 external
bonds and at most
2 n (n) 4
total bonds.
p2 p
Fa
t 3.12 (k ) = 4(b k
1) + 2(dk=b k
2 e 1).
3
Theorem 4.1 For ea
h k 1, Fk is the unique optimal folding of Sk .
5 Uniquely Foldable Open Chains
Hayes [6℄ has established experimentally that for ea
h 1 n 14 ex
ept 3 and 5 there is
an open
hain with a unique optimal folding. We have extended Hayes's results as shown in
Figure 4. Figure 5 shows the unique foldings for these
hains.
A natural question is for what values of n there is an n-node open
hain with a unique
optimal folding. Based on our results about
losed
hains, one approa
h is to
onsider the
open version of Sk with the rst and last nodes removed. That is, dene Zk = (HP)u (PH)d
where u = dk=2e and d = bk=2
. It turns out that this
hain has multiple optimal folding for
odd k, but only one optimal folding for even k:
Theorem 5.1 The open
hain Z2j = (HP) (PH) has a unique optimal embedding for ea
h
j j
Combining this theorem with results from [6℄ and Figure 4, we know that there are open
hains with unique optimal foldings for n = 2, n = 4, and 6 n 20.
A
knowledgments. The authors would like to thank Godfried Toussaint for pointing us to this topi
and for organizing the workshop at whi
h the resear
h was initiated. The authors would also like to thank
Vida Dujmovi
, Je Eri
kson, Ferran Hurtado, and Suneeta Ramaswami for stimulating
onversations on this
and other topi
s.
Referen es
[1℄ B. Berger and T. Leighton. Protein folding in the hydrophobi
-hydrophili
(HP ) model is NP-
omplete.
Journal of Computational Biology, 5(1):27{40, 1998.
[2℄ H. S. Chan and K. A. Dill. The protein folding problem. Physi
s Today, pages 24{32, February 1993.
[3℄ P. Cres
enzi, D. Goldman, C. Papadimitriou, A. Pi
olboni, and M. Yannakakis. On the
omplexity of
protein folding. Journal of Computational Biology, 5(3), 1998.
[4℄ K. A. Dill. Dominant for
es in protein folding. Bio
hemistry, 29(31):7133{7155, August 1990.
[5℄ W. E. Hart and S. Istrail. Fast protein folding in the hydrophobi
-hydrophili
model within three-eights of
optimal. In Pro
eedings of the 27th Annual ACM Symposium on the Theory of Computing, pages 157{168,
Las Vegas, Nevada, May{June 1995.
[6℄ B. Hayes. Prototeins. Ameri
an S
ientist, 86:216{221, 1998.
Figure 3: A
folding of
Figure 1: Example of an optimal folding of Figure 2: Examples of Sk folded a
- H 12 with 5
PHP
( ) 4k .
ording to Fk for k 2 f2; 8; 9g. bonds.