You are on page 1of 9

A Fast Direct N-body Solver on the

Connection Machine

Jean-Philippe Dmuel.
Alan Edelman
Jill P. Mcsirov
Thinking MllrlJj,,~. Cor"oral;on

:).15 Firs! Strl'N, Cllmbridg.., AI'! OZI42

(Jallll,lry 30, 1DOD)

Abstract

The direct method for .oh·jug ,v-hod)' 1"01,10"" l,," h""" effi-
ciently implemented on the Connection !\1Mhinc. The kef fertlure il
an opt;IlUll cOltllllunication p"tte", On the hypercube ntcllilc.ln,.. of
the eM-2, u.ing rot"I.,1 Gray code' to obll\in ti",.-w;le cdge disjoint
IIBulilt"";",, path•. 'I'l,. utili,,,tio" of the full cOll"nun],,,t;,,,, bllJnl·
width of th. ullIch'"'' m"k... the collUllunkntion cost ncgUgihlc for
l:ugc N.th"o then ••ut;on tillle i. 0(",2/1'), where I' illhe llu,uhcr
of proce"or>. Tinting> l\I"e pre,cnted for no UNt1pl. "ppll""t;"n c.uc
of the COIIIl'lIl"lion of the ,·.Iocily field imlueed b)' a oct of interacting
poin. "or(ic~s. Ip~rhlll'l gi\'~ "n "eh",1 timing here]

1
1 Introduction

The N-bod~' algorithm il a cril;ul kernel in II wide '-ariely of application u·


eu induding a~lrononlJ', molecub.r biolo!)', lind Ruid d)'nlUtlics, The dired
"",I hod (u opposed 10 10cIII correction I'!], hierarchic"ll'!, '!]. or mullipole
['!] mel hods) runs in O(N') serilll time for N bodi"" since 1I11,1&;"";se inlet,
IId;ons are compuled; Ihe force on each bod}', t';,;s gi"en b)'

f'orce(.';) C L, F.,("i)'

whele Fv,("i) i.lhe fOlce Clterled on Vi b}' Vi'

The direct mel hod is e~l'eciltlly i",portanl ;n ClUes where Ihe intelll<;lion
Coree is nOI COUIOlllhic,

At Ihe I'ighesl le"el, Ihe allloritlllll a••ociates e"ch bod}' with It I'roccuor.
To computc the N' i"teradiolll the dalll for ""d, body mu.1 be brOltdCMt 10
1111 of the other processors, I" the CUe of the OM·2, Ihi. corresponds 10 nil
to nil broadcasting 011 II h}'llercube, We will describe hO\.· Ihil (/In be opli·
mllll}' done in 2" - I comlll'lIliUlio" lIeps ulilizing Il,e fuD band"'idlh oliloe
h}'l'ercube [J!Il. The lechni'lue u.eo rolated Gray eodn 10 produce J li"'e·
wise edge disjoinl 1I.",iltoni.n pll.lJU Ihrough Ibe h}'pel'<ube. A Ihunilto-
nian path in a graph ,·i.ils a1llhe noon;n Ihe sraph onl)' once, Tilllc-w;"e
edlle disjoint meanS thill, although Ihe palhl may .hal'< cerlain edgel ofthe
hypercube, no 1..-0 ,"'Ihs Hi"""ue Ihe SlIme oedge on the ome COl1l1nu1\i""lion
lieI'.

In Seclion I _ .... ilI describe Ihe Oolln«lion Machine architecU,r", and


Ihe slice,,'i.e mood of which "'e mil'" uSe. 5""lion 2 conlai"s lin ""I.llIulion
of ],0'" the lbmiltollian 1'''1116 are generated .,nd Ihe dalll 1lI0lion of the

2
II1gorilhm. ~Io[e detailed implc'Il<;:nhtiOll issues are discusscd "' Section 3,
and Section ,j presents thc timings for a salllpic appliclltion.

2 The Connection Machine Architecture

The C~ I· 2 is composcd of a ",icrose'l"encer nnd .~ rn.~xi rnu 111 of 6.1 K sintlc· bit
processing e1emcnls. The processors [Ull in Sli\ID 'llode, wilh the instrllction
stre",m broadc"st by Ihe sequencer. II is possible to deselect lIny subset of
Ihe processors, sO th,,1 anl' instrllction is only performed by those processors
inlhe currenlly selected set. The se'tnenccr is controlled by an external front
end lIlnchine, usually a SUNlS'> SYi\IOOLlCS8l [,isp hlaehille, or V,\XSl

Each processor hns 6·IK or 256K bils of loc.,\ HAM, ",nd there is a single
high·speed no;,.ting point uuit for every 32 processors. There are 16 pro-
CelisOrS On " CM·2 chip '1"'[ the chips arc conncded in " boolean 'I-cubc
topology,c.g.,,, 12·cube for a MK proces-or machine. The s}'slem software
su['ports the nOlion of virtuul l>roec~~ors. This .,1Iows the prograltltuer to
i",plemenl hi. codc with the numbe[ of processors "ppropri"le for the ap·
plic"tion. Virtllnl processors a[e ",al'I'ed to ph}'sica[ IHoeessors by c"enly
segmenting Ihe mernor)' of the I'hysicnl processors and time multiplexing the
physicalproees!or•. The virt"ul IlrnceHor ratio is the number of ,-irtllal
processors ,,".ignctl to e.'ch physical processor b}' the mapping. In the N-
body ""d, bod)" is .,ssoei"ted with a virtu;,.1 processor which carrics out the
cOlllpntnlioll of the forces for it. [This is 1I0t ex""tll' lrue in the .[icewise
model c....e, but I wanted to give "" ex"",],le of the one dala element to "p
I'l\radigm. Perh;,.l's I should usc the 01le "[' per grid point or so'ncthing clse
instead?]

3
,
l,iii":»u;>\"",, l"lllt'li JoJ ~.\!I\'UJ"I[.
(Ul~tiD' .{II1! """:>UI I I" ......... OU~ 110).\ III,! '['II ... , 0\ .\".. I'{St' '''II IOU

A.q_qo,d S! 1!'1.LJ ','>. UO!I)nllJ"! :>1"ldUIO) "",""1 IOU 01' I I!"n IU!O<l 2,,!I ....Y
"'11 a)II!' .('\,SS:»>U "''''!':>'''''' $! '!'U ';>U\'P"lIl :"11 JO 11"I)OU1 :W!'''l'I''!l I''''
:>S!,"""!!' ;>'ll lI:J'l'''P'l 'pJOj puu ~)"(l oll 0\ "'l"P SUlsQd''''!J\ .\'l ';)Il{!uu,1 S!
1\ l\tlll ;>IOU 1'\"0'1';1'" '1!1I" l\l!od llu!ll<0ll "'ll Olll! 1'''''' 1\ I,o,,:»o,d no,,,,,
")!l' l[Q-Zl: " :>pI) UnO u! "a"! ','on;>;>Old 1"',"!)01J\1 '!'I1I1 mOlJ tI'''I' liP»"
"l['lop" 'I!"" lll!od jll!I"OIl "1(1 'P!'{,'10 "! J .... :>lP'l\l·... U"'''' lI'>'1gUl :>\'!'P'"II
"'II JO ppom '!l[1 'J::I.\G:IIOI'Z ':>UO JO 1)Y.)1$1I! ~pou 1""1»11000 1I~'''1:>'l 1l:>11
-,,"'!" 1I0!1\,"!Un'll,mo) 0....\ 'II!" :tl:l'''',::od.\'l ~lIO!'.:>t"!P II"\' n FP:>III1Q)

'SOlpoU I"!OO jU!,\,0U 8111i: gu'O»<I (;"I'\:O J~:><ud lit9 '01 'IU!Ott__ "'!" '!~I

U1m,i ·,.xs:l;)O,d ';>d I!q-\ ..... ! 'Olpou :"11 "! "osu:w,d It "'II JO »!J0IlQ",
:;up nO>:)11 ;»!l' I!q'll: It U! (I:>'OI"! 1"0... "'\"1'0'" "'!"'''>!l' "I{I ul ·.(JoUl.. m
, •• 0,s:>,,0l<lllt~!'·{'ld It JO 'W\IIt!l'"mb-l! ;:;C "! (1:>1'01:>011" "'11'1"0... pJO.... \!Il'lt
U JO ,,5"'01' :1111 \:>1'0'" ;'S!'''l'PlJ lun,,, n'll "I ';:;-1VJ n,p JO SJOSSOl"Old \u~!
""'ld 11:>11t!""'''''lC "'II JO .(,o",..", :>'lI 1"''01 I!"" 1"!Qd llU!lI!QU" JO "'1'l'U:>llU:>
::"11 ""I OI"U!'P"", :>I{l "0 S:>llOU ll"!"""<IJd ,:>l'!'uoo ..... 'S! 1"1{~ '["1'0111 "'!""
-":>!l!i :>'(1 '1! 00' 11:>""1'" '! 1"'1'" U! :>,,!'(nw :>'11 1" llu!'1 QOI Jq l"'''!''I'I0 "U"
_n.n '! ;»UO!,u,oJJ..d '''11''''1 ,uO!lvlndwoo In!Od SU!I"U iU!'UJoJJ:>d u"l(j\\

'iu!dd"lU "PO" .,,,,:) l"no!.


''':>W!I'!I\lllU \I ill!'n "''1'''' [00'1 :>'1' o,uo "''',''IJOS ,,,:>\S... "'(I ,\'1 I_odm!
_J:>dn•.\U....!I\l'Uo'n.. "! 'I:>!'I 'p!J:lI.."O!.lI:>W!P .~ " 'v II:>JnSglloo .'II""P!U:>
;>{l "".. :>"!I{.. ltlll "'11 ,u1:>1I'1d "O!I'OI"!""l1lUlQ" pmnpllJIS :>10'" JO,i 'UO!I'II
-"d'lIo" "lI\l :>1"\d'l/o:> 01 "Ju."..n" SlI lllO...... OJd 11I:>J"!l!P JO ""!JO'l/n", :>'ll
u:>:>,,,pq p:>llll"q"u:>q "'OIl "1"1' q:>!,\ ... "qllO!I1!l!UllUm,o:> p:>'''q J:>ll1!od I'/l
.,,:>)1 '! :>1:>'I.L ·,UI,!U"'I,,",llI "O!IV"!Ull""UOO >!'v'I 0"', 'IJoddnr Z'I~J ;"1.1,
11 is wilh thi~ model of the Cl\I-2 thaI we illll>lelllenl"d our fa~t dirc<:t
N-body soh'"r. On II CM-2 "'ilh 2J physical processon, Ihere :He P == 2"'-'
slicewioe flollting point IIodes. We divide the ,]n.I., for the N bodi"" e"enly
amollS the IIodes; each node will be responsible for IICCUIllUllltillS the (orccs
for NIP bodies. We nrc Ihu~ left wilh Ihe problem of how to optil11a1l)'
brOIl,k",t the .lnta for the bodies al ""ch node to ,,11 of lhe nodes. In the
IIexll"'O scctions we "'ill describe how thaI is done.

3 Hypercubes, Gray codes, and Hamiltonian


paths

3.1 Backgrollnd

For the pUI'ooes of this section, we can think of II J·,limensional hYI,ereube


ns II STaph witlt 2J nodes labeled by the J-bit binELry representalion of Ihe
iutegers 0 10 2"-1. Each bil in the J·bit representatiou rel>rcsenls II differenl
dimen.ion of the cube. There is lin cdS" between two IIodes i lind j in the
hypercube if and only if Iheir binar>" represenlations diff"r in only one bit.
We cnn think of t he edges u.s t r1,,"ersillg di fferent dimensions of t he hyt'ercu be.

A Gray code i~ II circuit of all binary d·tuple~ such thlll only one coor-
dillate position chllnges at each slel" 1'1",", olle eM, think of., Gmy code ns
i< 11a",iho"iall circuit or p.,tlt throllgh a hypercu],e. There are lots of ways
to cOllstruet Gmy codes, a slnlldard eXMnp!c is the hi"n.ry reflected Gmy
code [[?)I. The dimension codc correspondillS to II Grny code is lhe list
of positions "'hich challge lit eacl, step of the code, or the list of dimensions
traversed on the hypcrcube lit "ach step of the circuit. In FiSure I we show
th~ binarr reJleelc<! Gray wde for 3-tllple, as well its dimension codc "nd
the "ssoeialc<! lI"mi1to"iali pa1h through thc 3-cube.

Gi'-en" Gray code On d-Illples b"8inning at 0, one can genernle a Gmr


wde .t"rti"s 0.1 a"y olher olle i, b)- t"kirlg the ~ulllsice Or of i wilh the
d-tuples ill Ihe origi""l code_ The rC"'lOIl for this is Obl'ious, since if i and
k differ onlr in One hit, thc" i 0; and k (!) i will nlso differ by the Mrne
bi\. 1Ilor"01'er, Ihis lIle""S th"l the dimension CQd~.,. for the 111'0 Gr.')- codes
will be Ihe sn",,,. Fig"re 2 shows the erny code st"rtillg at notle 5 '" 101
resulting fromlrnnslating the code ill Fig"re 1 ns we h.wej"st described.

3.2 All to All Broadcasting

HecalJ thM Our problem is to send d"t" ill eyery node of ., h)'pereuhe, s"y
of dimension d, to el'ery other notle using all of the edges (or wires) of the
cllhe all of the ti",e and ""oiding collisiolls_ We ..-m assume thill we ha\"<, "I
l"ast d pi...:es of dala stored in each 1I0de 50 that we hM'e enollgh 10 kcep the
edges filled.

We starl by considering the binary-reflected Gra)" code On d-Iuplcs and


the associated Hamiltoniall path through the d-cube startillS "t 1I0de O. Dr
performing ., I-hit left cirelll"r shift of thc bits, One can geller.,te l\llother
Grny code. tllorc<,,"cr Ihe enlries in the associated dimension code hal'e in-
creased b)' l(mod"lod). III p"rticular this ItIC""S thnt if we followed the IWO
Ilamihoninll paths defolled b)' the Gray codes, at each slep ther arc gUllTa,,-
teed to lraverse a differenl di,llensioll.

6
000 000
, ,
0" 0>0
, on no 3
, ,
0>0 >0O
3 no ,
, n, '"
n, ,
, on
, '" ,
3
>0O 0'''

By rel'e.llillg the S.lme operation" lotal of d rotl\ted Gray codes Cl\n


!>e generated which in lllrn define d rohted llatllihoniall paths 011 the (/
dilllen~ional hypercube which h.werse " .lilTerent dimen.ion .,t each step.
They ar", gi"ell in Figure 3 for the J·cube. Because we hl\,"'" conslruck,l
the p.lths br rotating the Gray codes, we !He assurcd that at each lillle step
",ach of th", J path. is traversing" dilTercnt one of the d dilllellsions of the
cube. Thus, while the d Hamiltoniall pl\ths mar .hare some edges, tl,er are
guaranteed not to tr.wers", tl", 'RlI'e edge ill the same direction at the .a1l1e
tillie, i.e., ther arc li11lewisc edge ,)j~joillt. Thus is we hlld d pieces of datil
in node 0 they could he ''''Ilt alit alollg th""e d palhs "lid ther wo"ld each
vi,it ""'cr}' olher !lode of the <I·cube in ""actlr zd - I Slel'S withollt wII/licl•.
The dimen.ion code. for 3 Ilau,ilto"i.," I,.,th. ill lhe 3-clI!>e cOlIstr"cted as
we h",'c just described arc gi"en below, the correspondillg Gr"y code. and
[mths through the cllbe are shown ill Figure'!.

7
1 2 J
:1 J 1
1 2 J
J 1 2
1 1 J
, , I
I , 3

What aboul tl,,:' d"la residenl III Ihe oll.eT noda'!' We un consl.ud
.. Gray (O(le be,;inninr; 11l lin)' node i in Ihe cube b)' tllking the cuf.tJlre
or of i wid, the J.t"l.lu of Ihe bi""f)' 'eB"d~1 Grll)' coole ...·hieb be,;;"s 1I
O. ,\g1\;n h)' t1\king J fot1\lions of tloii Gm)' code we C01lStfuct J time'w;1('
edge indeJ.cndcnt 1l'"l1illOlli1\n p1\tlos through the J·cube n",t;ng 1\t node
i. Note tlont the dilllcnsion cOlle~ of these paths nre i<lentic1\\ with the oneS
thnt res,,!tcd from the J path. starling "t O. In fncl one co"ld think of
the cOlln.uelion as just Imnsl"'ing the <Iirnell~ioll codes f,om node to nooe,
,Mlle. thnn I.ansl"ting the Q'iginnl e.ay code and rQt",tin" it. In fill)' usc,
w!l"t ...·c now nll"e lire d li",o:- ...ise edge disjoinl Ilamiltoniall 1,"lhs III cvc.)'
on" or til" ozoI nooes of tI,,, cube. It i, straidJlfo,w",d to ice Ihal Ihese
palh. lrav"rs" tI." J·cu~ in ,uch II way Ihal no I...... or them le1\'''' on Ihc
same di""'".ion aft". th,,)· inl"'KCI al 1I nooe. The.do.e, dliia conllid. are
1I,...id"d. For elUph:uUl, ...·c make thc oh.';o", o!Men...tion, that "'ith "dalll
ilelUi 1I1 nch noo" it lakeio 2" - 1 i1epl ror all 01 tn" datll 10 ,·i.;t &II of
lhe nodes,;L%Id e"er)' diltlen~i01l i. tran.'·eroed from e,'uy node at e"e,ylirne
"ep.

The ke~' 10 going fro'" lhi. (on.""C\ion to ...."asonable i'''I,]en,enIUio"


on a hn>ercube, is Ihc USe of .he dimen.ion codes. \\,,,, u..., them to dele.",;ne
loo...· lloe d incoming dal.a at e1\ch IIode ..." di~lrihuted II.\UQlIg th", ,f oullIOi",;
di",,,,,~iollS. Since Ihe J I,,..h. ti""li"g:lt each node hal.., the "'m" dime",ion

8

0:0d<=5, at <=ath li",e 51ep luccessive row. of the "mc table of dimension
cod", ,,,115 u. ho.... 10 .-.:dirO:CI IIu:: data cOIning ill from eacll of th" different
dimensions. For uamplc, '..krin! 10 tbe table (or tbe 3-c:abc aoo,-c. al
cOlumullc:uiou 11q> 2, the data coming in to ,,'-err node alo0l! dimensions I,
2, and 3l!10 out ,.]on! dimclIsiom 2. 3, and 1 respcdi .. cl,", ,\t Ilel' 3. data
fron. dim<:plioM I, 2, And 3, go Oul ..louS 3, I, lOtld '2 ,esp«li."c:!y. In fact,
CVC')" 10"" of the dimension code tnhle is jail 11 cireul'lf shifl of (1,2,. ,_ ,dj,
and the idenlic:u shirt works for .,,'e.y node. Thul ....., ollly need 10 k~p II Ii"
of the the nmount of cach .hift ( ....e call this the rolntion pn.nIlH,ler), fo'
,,"C'Y tillle slcp_ The f.'Cl 110", the u"'c 1i.1 works ror C"'~IY IIll<lc s;llll'lir.el
the i lllpl"",cn t"t ion.

4 Implementation on the CM-2

5 A sample application with timings

You might also like