You are on page 1of 4

Long Proteins with Unique Optimal Foldings in the H-P Model

Oswin Ai hholzer David Bremnery Erik D. Demainez Henk Meijerx


Vera Sa ristan{ Mi hael Sossk

Abstra t
We explore a problem suggested by Brian Hayes in 1998: what proteins in the two-
dimensional hydrophili -hydrophobi (H-P) model have unique optimal foldings? In par-
ti ular, we prove that there are losed hains of monomers (amino a ids) with this property
for all (even) lengths; and that there are open monomer hains with this property for all
lengths divisible by four. Along the way, we prove and onje ture several stru tural results
about bonds in the H-P model.

1 Introdu tion
Protein folding is a entral problem in bioinformati s with the potential to reveal an un-
derstanding of the fun tion and behavior of proteins, the building blo ks of life. Su h an
understanding would greatly in uen e many areas in biology and medi ine su h as drug de-
sign.
One of the most popular models of protein folding is the hydrophili -hydrophobi (H-P)
model [2, 4, 6℄, whi h de nes both a geometry and a quality metri of foldings. This om-
binatorial model is attra tive in its simpli ity, and already seems to apture several essential
features of protein folding su h as the tenden y for the hydrophobi omponents to fold to
the enter of a globular protein [2℄. While the H-P model is most intuitively de ned in 3-D
to mat h the physi al world, in fa t it is more realisti as a 2-D model for omputationally
feasible sizes. The basi reason for this is that the perimeter-to-area ratio of a short 2-D hain
is a lose approximation to the surfa e-to-volume ratio of a long 3-D hain [2, 6℄.
Mu h work has been done on the H-P model. Re ently, on the omputational side,
Berger and Leighton [1℄ proved NP- ompleteness of nding the optimal folding in 3-D, and
Cres enzi et al. [3℄ proved NP- ompleteness in 2-D. Hart and Istrail [5℄ have developed a
3/8-approximation for protein folding in 3-D and a 1/4-approximation for protein folding in
2-D.
In this paper we study several stru tural aspe ts of the 2-D H-P model. In parti ular we
explore a problem suggested by Brian Hayes [6℄ about the existen e of stable protein foldings
 Institut
fur Grundlagen der Informationsverarbeitung, Te hnis he Universitat Graz, In eldgasse 16b, A-
8010 Graz, Austria, email: oai higi.tu-graz.a .at. Supported by the Austrian Programme for Advan ed
Resear h and Te hnology (APART).
y Fa ulty of Computer S ien e, University of New Brunswi k, P. O. Box 4400, Frederi ton, N. B. E3B 5A3,
Canada, email: bremnerunb. a
z Department of Computer S ien e, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada, email:
eddemaineuwaterloo. a.
x Department of Computing and Information S ien e, Queen's University, Kingston, Ontario K7L 3N6,
Canada, email: henk s.queensu. a.
{ Departament de Matem ati a Apli ada II, Universitat Polite ni a de Catalunya, Pau Gargallo 5, 08028
Bar elona, Spain, email: verama2.up .es. Supported by DURSI Gen. Cat. 1999SGR00356 and Proye to
DGES-MEC PB98-0933.
k S hool of Computer S ien e, M Gill University, 3480 University Street, Montr eal, Quebe H3A 2A7,
Canada, email: soss s.m gill. a.

1
of all lengths. We solve this problem in a positive sense for ir ular protein strands. We
also nearly solve the problem for open strands by exhibiting an in nite lass of proteins with
unique optimal foldings.
More pre isely, we prove the following main results, in a sense establishing the existen e
of stable protein foldings:

1. We exhibit a simple family of losed hains of monomers, one for every possible (even)
length, and prove that ea h hain has a unique optimal folding a ording to the H-P
model.
2. We exhibit a related family of open hains of monomers, one for every length divisible by
4, with the same uniquely-foldable property. Note that a result as strong as (1) annot
be obtained for open hains, be ause there are some lengths for whi h no uniquely
foldable open hains exist.

We also make an interesting onje ture about the stru ture of optimal protein foldings in
the H-P model; see Se tion 3.1. Finally, we present experimental results supporting another
onje ture, strengthing the se ond result des ribed above to all even lengths.

2 H-P Model
In this se tion we review the H-P model and the very basi s of the biology behind it.
Proteins are hains of monomers, ea h monomer one of the 20 naturally o urring amino
a ids. In the H-P model, only two types of monomers are distinguished: hydrophobi (H),
whi h tend to bundle together to avoid surrouding water, and hydrophili (P), whi h are
attra ted to water and are frequently found on the surfa e of a folding [2℄. In our gures we
use small gray disks to denote H monomers and bla k disks to denote P monomers. These
monomers are strung together in some ombination to form an H-P hain, either an open
hain (path or ar ) or a losed hain ( y le or polygon).
Proteins are folded onto the regular square latti e. More formally, a latti e embedding of
a graph is a pla ement of verti es on distin t points of the (regular square) latti e su h that
ea h edge of the graph maps to two adja ent (unit-distan e) points on the latti e. In the H-P
model, proteins must fold a ording to latti e embeddings, so we also all su h embeddings
foldings.
The quality of a folding in the H-P model is simply given by the number of hydrophobi
monomers (light-gray H nodes) that are not adja ent in the protein but adja ent in the folding.
More formally, the bond graph of a folding has the same vertex set as the hain, and there is
an edge between every two H verti es that are adja ent in the folding onto the latti e, but
not adja ent along the hain. The edges of the bond graph are alled bonds ; in our gures,
bonds are drawn as light-gray edges.
An optimal folding maximizes the number of bonds over all foldings. Intuitively, if a pro-
tein is folded to bond together many hydrophobi monomers (H nodes), then those monomers
are hidden from the surrounding water as mu h as possible.
There is a natural bije tion between strings in fH; Pg and protein hains. We onsider
the nodes in a hain as labeled by their order in the string. We sometimes use a limited
form of regular expressions to des ribe hains where e.g. Hk indi ates k H nodes in sequen e.
Similarly, if we walk along an embedded hain in the order given and read o the dire tion
of ea h edge, we an en ode foldings as strings in fE; W; N; Sg .

2
3 General Observations and Ambiguous Foldings
In this se tion we des ribe some basi stru tural and ombinatorial results about bonds in
the H-P model.
Fa t 3.1 A folding of an open ( losed) hain with h H nodes has at most h + 1 (h) bonds.

Fa t 3.2 Any latti e-embeddable graph is bipartite.

Corollary 3.3 If a folding of a losed hain with hH nodes has h bonds, then its bond graph
is a union of vertex-disjoint even y les.

Corollary 3.4 There an be a bond between two H nodes only if they have opposite parity
(i.e., there is an even number of nodes between them) in the hain.

Fa t 3.5 Any optimal folding of the losed hain (PHP)4k has a bond graph onsisting of k
4- y les. (See Figure 1.)

Fa t 3.6 For any n, there exists an n-node losed hain with at least 2
(n) optimal foldings,
all with isomorphi bond graphs. In addition to the trivial example of P n, the hain(PHP)4k
is an example(see Figure 1).

3.1 Internal and External Bonds

For any losed hain, we an onsider the area of a folding, and whether a bond is internal
or external to the folding (polygon).
Conje ture 3.7 If there is a latti e point stri tly interior to a folding of a losed hain, then
there is another folding of that hain with smaller area and no fewer bonds.

A proof of this onje ture would support the experimental eviden e that most proteins
fold into a tightly pa ked \globule" [2℄.
Fa t 3.8 For any folding of an n-node losed hain there are at most
n 4 internal bonds.
2
Corollary 3.9 There exists a losed hain whose optimal folding requires at least one external
bond. Namely, H12 is an example. (See Figure 3.)
n p
Fa t 3.10 Any folding of an n-node losed hain has at most 2 external bonds, where p is
the perimeter of the bounding box of the folding.

Let (k) denote the minimum perimeter of a re tangle ontaining k latti e points.
n (n)
Corollary 3.11 In any folding of an n-node losed hain, there are at most
2 external
bonds and at most
2 n (n) 4
total bonds.
p2 p
Fa t 3.12 (k ) = 4(b k 1) + 2(dk=b k 2 e 1).

4 Uniquely Foldable Closed Chains


In this se tion we are on erned with losed H-P hains whose optimal foldings are unique
(modulo isometries).
For ea h k  1, we de ne a losed hain Sk as follows; see Figure 2. Let Am denote the
sequen e (HP)m . De ne u = dk=2e and d = bk=2 . Then de ne Sk as P Au P Ad .
We also de ne a folding Fk of Sk as follows; see Figure 2. Let Dm (a \down stair ase")
denote the alternating path (ES)m . Let Um (an \up stair ase") denote the alternating path
(WN)m . If k is even, de ne Fk as E Dd W Uu . If k is odd, de ne Fk as E Dd S Uu.

3
Theorem 4.1 For ea h k  1, Fk is the unique optimal folding of Sk .
5 Uniquely Foldable Open Chains
Hayes [6℄ has established experimentally that for ea h 1  n  14 ex ept 3 and 5 there is
an open hain with a unique optimal folding. We have extended Hayes's results as shown in
Figure 4. Figure 5 shows the unique foldings for these hains.
A natural question is for what values of n there is an n-node open hain with a unique
optimal folding. Based on our results about losed hains, one approa h is to onsider the
open version of Sk with the rst and last nodes removed. That is, de ne Zk = (HP)u (PH)d
where u = dk=2e and d = bk=2 . It turns out that this hain has multiple optimal folding for
odd k, but only one optimal folding for even k:
Theorem 5.1 The open hain Z2j = (HP) (PH) has a unique optimal embedding for ea h
j j

positive j. (See Figure 6 for an example.)

Combining this theorem with results from [6℄ and Figure 4, we know that there are open
hains with unique optimal foldings for n = 2, n = 4, and 6  n  20.
A knowledgments. The authors would like to thank Godfried Toussaint for pointing us to this topi
and for organizing the workshop at whi h the resear h was initiated. The authors would also like to thank
Vida Dujmovi , Je Eri kson, Ferran Hurtado, and Suneeta Ramaswami for stimulating onversations on this
and other topi s.

Referen es

[1℄ B. Berger and T. Leighton. Protein folding in the hydrophobi -hydrophili (HP ) model is NP- omplete.
Journal of Computational Biology, 5(1):27{40, 1998.
[2℄ H. S. Chan and K. A. Dill. The protein folding problem. Physi s Today, pages 24{32, February 1993.
[3℄ P. Cres enzi, D. Goldman, C. Papadimitriou, A. Pi olboni, and M. Yannakakis. On the omplexity of
protein folding. Journal of Computational Biology, 5(3), 1998.
[4℄ K. A. Dill. Dominant for es in protein folding. Bio hemistry, 29(31):7133{7155, August 1990.
[5℄ W. E. Hart and S. Istrail. Fast protein folding in the hydrophobi -hydrophili model within three-eights of
optimal. In Pro eedings of the 27th Annual ACM Symposium on the Theory of Computing, pages 157{168,
Las Vegas, Nevada, May{June 1995.
[6℄ B. Hayes. Prototeins. Ameri an S ientist, 86:216{221, 1998.

Figure 3: A
folding of
Figure 1: Example of an optimal folding of Figure 2: Examples of Sk folded a - H 12 with 5
PHP
( ) 4k . ording to Fk for k 2 f2; 8; 9g. bonds.

Len. Chain Max. Bonds


15 (PH)3 H3 (PH)3 7
17 H5 (PHHP)H4 (PHHP) 9
18 H3 (PH)3 H3 PHPPPH 9 Figure 6:
19 H3 (PH)3 H3 PHPPPHP 9
15 17 18 19 Unique
optimal
Figure 4: Experimentally omputed open Figure 5: Unique foldings for the folding of
hains with unique optimal foldings. hains in Table 4. Z8 .

You might also like