You are on page 1of 32

SOCK_RAW Demystified by ithilgore - ithilgore.ryu.L@gmail.com soc -ra!.org " soc -ra!.homeu#i$.

org %ay &''(

'$'. '$*. '$&. '$+. '$/. '$1. '$2. '$3. '$(.

)#de$ )#troductio# Creatio# ),_-DR).CL ra! i#0ut ra! out0ut Summary Co#clusio# Refere#ces

'$*. )#troductio# 44444444444444444 5his 0a0er6s 0ur0ose is to e$0lai# the ofte# misu#derstood #ature of ra! soc ets. 5he dri7i#g force of !riti#g this te$t !as the curiosity of the author to lear# the i#s a#d outs of this 0o!erful soc et ty0e also #o!# as SOCK_RAW. What is goi#g to be discussed here !ill 8#ot8 be a#other tutorial o# ho! to ha#d-craft o#e6s o!# 0ac ets. 5his to0ic has bee# o7erly discussed ma#y times a#d o#e ca# fi#d 9uite a fe! refere#ces o# the #et about it :mi$ter etc;. What is goi#g to be discussed here is !hat ra! soc ets do behi#d the sce#es. We are goi#g to del7e i#to #et!or stac i#ter#als for this reaso#. )t is assumed that the reader has already some e$0erie#ce !ith soc ets a#d is !illi#g to loo at some er#el code si#ce ra! soc ets im0leme#tatio# is actually OS de0e#de#t. We !ill co7er both <ree=SD 3.' a#d Li#u$ &.2 im0leme#tatio#s. %ost thi#gs co7ered for <ree=SD may also a00ly to O0e#=SD> .et=SD a#d e7e# %AC OS ?.

'$&. Creatio# 4444444444444 <irst thi#gs first. Creatio#. -o! is a ra! soc et created@ What are the mai# i#tricacies i#7ol7ed@ A ra! soc et is created by calli#g the soc et:&; syscall a#d defi#i#g the soc et ty0e as SOCK_RAW li e thisA i#t fd 4 soc et:A<_).B5> SOCK_RAW> ???;C !here ??? is the 8i#t 0rotocol8 that> as !e shall discuss further o#> is the mai# source of co#fusio# a#d 0roblems> del7i#g from the 7ery fact that differe#t combi#atio#s ca# a00ly here. Dalid 7alues areA ),,RO5O_RAW> ),,RO5O_)C%,> ),,ORO5O_)E%,> ),,RO5O_5C,> ' :cautio# here - see belo!;> ),,RO5O_FD, etc. With a differe#t combi#atio# arises a differe#t beha7iour. A#d this beha7iour is critical to the !ay the er#el i#teracts !ith the a00licatio# creati#g the ra! soc et. =efore getti#g i#to the s0ecific combi#atio#s for each OS> let6s first ta e a loo at the actual mea#i#g of the 80rotocol8 7alue. <or all 0rotocols to !or co#curre#tly> a s0ecific commo# desig# a00roach has bee# used. Accordi#g to it> similar 0rotocols are grou0ed i#to domai#s. A domai# is usually defi#ed by !hat is #o!# as ,rotocol <amily or Address <amily> :the latter bei#g a most rece#t 0ractice; a#d a #umber of co#sta#ts are used to differe#tiate bet!ee# them. 5he most commo# o#es areA

,<_).B5 " A<_).B5 --G )#ter#et 0rotocols :5C,> FD, etc; ,<_LOCAL> ,<_F.)? " A<_LOCAL> A<_F.)? --G F#i$ local ),C 0rotocol ,<_ROF5B " A<_ROF5B --G routi#g tables

Li#u$ defi#es these co#sta#ts i# "usr"src"li#u$-&.2.8"i#clude"li#u$"soc et.h "8 Su00orted address families. 8" Hdefi#e A<_F.S,BC ' Hdefi#e A<_F.)? * "8 F#i$ domai# soc ets Hdefi#e A<_LOCAL * "8 ,OS)? #ame for A<_F.)? Hdefi#e A<_).B5 & "8 )#ter#et ), ,rotocol "8 ... 8" "8 ,rotocol families> same as address families. 8" Hdefi#e ,<_F.S,BC A<_F.S,BC Hdefi#e ,<_F.)? A<_F.)? Hdefi#e ,<_LOCAL A<_LOCAL Hdefi#e ,<_).B5 A<_).B5 "8 ... 8" <ree=SD defi#es the abo7e 7alues :#early the same; i# "usr"src"sys"sys"soc et.h As you might already ha7e guessed !e are goi#g to occu0y oursel7es !ith the A<_).B5 family. 5he i#ter#et family brea s do!# its 0rotocols i#to 0rotocol ty0es !ith each ty0e ha7i#g the 0ossibility of co#sisti#g of more tha# o#e 0rotocol. Li#u$ defi#es the i#ter#et family 0rotocol ty0es i# "usr"src"li#u$-&.2.8"i#clude"li#u$"#et.h e#um soc _ty0e I SOCK_S5RBA% SOCK_DERA% SOCK_RAW SOCK_RD% SOCK_SBJ,ACKB5 SOCK_DCC, SOCK_,ACKB5 KC 4 4 4 4 4 4 4 *> &> +> /> 1> 2> *'>

8" 8"

8"

<ree=SD defi#es the A<_).B5 ty0es i# "usr"src"sys"sys"soc et.h "8 8 5y0es 8" Hdefi#e SOCK_S5RBA% Hdefi#e SOCK_DERA% Hdefi#e SOCK_RAW Hif __=SD_D)S)=LB Hdefi#e SOCK_RD% He#dif Hdefi#e SOCK_SBJ,ACKB5

* & + / 1

"8 stream soc et 8" "8 datagram soc et 8" "8 ra!-0rotocol i#terface 8" "8 reliably-deli7ered message 8" "8 se9ue#ced 0ac et stream 8"

)f you ha7e do#e some soc et 0rogrammi#g i# the 0ast> the# you 0robably recog#ise some of the abo7e. O#e of them has to be the &#d argume#t of a soc et:A<_).B5> ...> ...; call. 5he +rd argume#t is the ),,RO5O_??? 7alue !hich defi#es the actual 0rotocol abo7e ),. )t is im0orta#t to u#dersta#d this im0licatio#. 5his 7alue"#umber is !hat the ), layer !ill !rite to the 0rotocol_ty0e field i# its header to defi#e the u00er le7el 0rotocol. )t is the L,rotocolL field as you see i# the ), header belo! :R<C 3M*;.

' * & + ' * & + / 1 2 3 ( M ' * & + / 1 2 3 ( M ' * & + / 1 2 3 ( M ' * N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N ODersio#O )-L O5y0e of Ser7iceO 5otal Le#gth O N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N O )de#tificatio# O<lagsO <ragme#t Offset O N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N O 5ime to Li7e O ,rotocol O -eader Chec sum O N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N O Source Address O N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N O Desti#atio# Address O N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N O O0tio#s O ,addi#g O N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N )t is o#e of the most crucial fields si#ce it is the o#e that !ill be used by the ), layer o# the recei7er e#d to u#dersta#d to !hich layer abo7e it :for e$am0le 5C, or FD,; the datagram has to be deli7ered. Li#u$ defi#es these 0rotocols i# "usr"src"li#u$-&.2.8"i#clude"li#u$"i#.h "8 Sta#dard !ell-defi#ed ), 0rotocols. 8" e#um I ),,RO5O_), 4 '> "8 Dummy 0rotocol for 5C, 8" ),,RO5O_)C%, 4 *> "8 )#ter#et Co#trol %essage ,rotocol 8" ),,RO5O_)E%, 4 &> "8 )#ter#et Erou0 %a#ageme#t ,rotocol 8" ),,RO5O_),), 4 /> "8 ),), tu##els :older KAMJ tu##els use M/; 8" ),,RO5O_5C, 4 2> "8 5ra#smissio# Co#trol ,rotocol 8" ),,RO5O_BE, 4 (> "8 B$terior Eate!ay ,rotocol 8" ),,RO5O_,F, 4 *&> "8 ,F, 0rotocol 8" ),,RO5O_FD, 4 *3> "8 Fser Datagram ,rotocol 8" ),,RO5O_)D, 4 &&> "8 ?.S )D, 0rotocol 8" ),,RO5O_DCC, 4 ++> "8 Datagram Co#gestio# Co#trol ,rotocol 8" ),,RO5O_RSD, 4 /2> "8 RSD, 0rotocol 8" ),,RO5O_ERB 4 /3> "8 Cisco ERB tu##els :rfc *3'*>*3'&; 8" ),,RO5O_),D2 4 /*> "8 ),72-i#-),7/ tu##elli#g 8" ),,RO5O_BS, 4 1'> "8 B#ca0sulatio# Security ,ayload 0rotocol 8" ),,RO5O_A- 4 1*> "8 Authe#ticatio# -eader 0rotocol 8" ),,RO5O_=BB5,- 4 M/> "8 ), o0tio# 0seudo header for =BB5 8" ),,RO5O_,)% 4 *'+> "8 ,rotocol )#de0e#de#t %ulticast 8" ),,RO5O_CO%, 4 *'(> "8 Com0ressio# -eader 0rotocol 8" ),,RO5O_SC5, 4 *+&> "8 Stream Co#trol 5ra#s0ort ,rotocol 8" ),,RO5O_FD,L)5B 4 *+2> "8 FD,-Lite :R<C +(&(; 8" ),,RO5O_RAW 4 &11> "8 Ra! ), 0ac ets 8" ),,RO5O_%A? KC <ree=SD defi#es the ),,RO5O_??? 7alues i# "usr"src"sys"#eti#et"i#.h -ere6s the e$am0le for ),,RO5O_RAWA Hif __,OS)?_D)S)=LB G4 &''**& Hdefi#e ),,RO5O_RAW &11 Hdefi#e ).B5_ADDRS5RLB. *2 He#dif "8 ra! ), 0ac et 8"

With so ma#y differe#t combi#atio#s it is best to discuss thi#gs serially so let6s begi# !ith the much used 8'8 0rotocol 7alue. Did you e7er !o#der ho! the soc et:&; system call magically fi#ds !hich 0rotocol to use e7e# if it has bee# called li e soc et:...> ...> ';@

<or e$am0le !he# a# a00licatio# calls it li eA soc et:A<_).B5> SOCK_S5RBA%> ';C ho! does the er#el fi#d out !hich 0rotocol to associate the soc et !ith@ Well the fact is that the er#el does#6t ma e a#y i#d of guess - the er#el does#6t 0lay dice !ith the users0ace : to 9uote a craPy 0hysicist Q great mi#d htt0A""!!!.9uotedb.com"9uotes"(3( i# a slightly differe#t co#te$t ; - i# most cases that is : see e#cry0tio# Q e#tro0y for the o00osite 0aradigm ; O# <ree=SD all that it does is associate the first 0rotocol to fi#d i# the domai#s li# ed list through the fu#ctio# 0ffi#d0roto:dom> ty0e;. Let6s be more s0ecific a#d see some ,OC codeA A soc et is created usi#g the er#el fu#ctio# socreate:; !hich is defi#ed li e thisA :source code from <ree=SD 3.' "usr"src"sys" er#"ui0c_soc et.c; "8 8 socreate retur#s a soc et !ith a ref cou#t of *. 5he soc et should be 8 closed !ith soclose:;. 8" i#t socreate:i#t dom> struct soc et 88aso> i#t ty0e> i#t 0roto> struct ucred 8cred> struct thread 8td; I struct 0rotos! 80r0C struct soc et 8soC i#t errorC if :0roto; 0r0 4 0ffi#d0roto:dom> 0roto> ty0e;C else 0r0 4 0ffi#dty0e:dom> ty0e;C "8 .... 8" K 5he secret lies i# the t!o fu#ctio#s 0ffi#d0roto:; a#d 0ffi#dty0e:;. )f 0roto 44 ' the# 0ffi#dty0e is called> !hich is less strict tha# 0ffi#d0roto:;. As !e ca# see from the code> 0ffi#dty0e:; does#6t chec the 0rotocol 7alue at all a#d Rust retur#s the 8first8 0rotos! struct it fi#ds> deduci#g it from Rust the 0r_ty0e :0rotocol ty0eA SOCK_S5RBA% > SOCK_DERA%> SOCK_RAW etc ; a#d family"domai# : A<_).B5> A<_LOCAL etc ;. Bach 0rotos! :0rotocol s!itch; struct is a# associatio# of a SOCK_??? ty0e a#d ),,RO5O_??? 0rotocol. All of the 0rotos! structs are i#side the i#ets!ST table !hich is 0oi#ted by the i#etdomai# e#try i# the global domai#s li# ed list. A gra0hical re0rese#tatio# might clear thi#gs out a bitA

domai#sA --------O O --------O O------------G

:domai# li# ed list;

isodomai#A i#etdomai#A ----------------------- O O -----G -------O O -------G ..... O --------O --------O O O O O---G isos!STA O---G i#ets!STA ----------------O O O ), O ----------------O O O FD, O ----------------O O O 5C, O ----------------O O O),:ra!;O :default e#try; ----------------O O O )C%, O ----------------O )E%, O --------O ... O --------O ... O --------O),:ra!;O :!ildcard e#try; ---------

.oteA 5he 0lace of ),:ra!; i# the /th i#de$ :i#ets!S+T; is me#tio#ed for historical reaso#s a#d #e!er im0leme#tatio#s of the <ree=SD stac differ i# that regard. )# 0articular> if SC5, su00ort is defi#ed i# the er#el the# ),:ra!;> )C%, a#d the rest are mo7ed + 0laces u0 the i#ets!ST array effecti7ely becomi#g i#ets!S2T> i#ets!S3T etc. Of course> this does#6t ha7e a#y sig#ifica#t differe#ce i# er#el sources si#ce i#ets!ST is #e7er accessed by i#de$ but by #ame. 5hroughout the !hole te$t> !e are goi#g to use the co#7e#tio# referri#g to the default e#try :i#ets!S+T; as default_RAW a#d the !ildcard RAW e#try :the last member of i#ets!ST a#d i# the old times i#ets!S2T; as !ildcard_RAW for clarity reaso#s. 5he ra! !ildcard e#try :the o#e !ith .0r_0rotocol #o# assig#ed a 7alue a#d thus ha7i#g a 7alue of '; is defi#ed as the last member of the i#ets!ST array i# "usr"src"sys"#eti#et"i#_0roto.hA "8 ra! !ildcard 8" I .0r_ty0e 4 .0r_domai# 4 .0r_flags 4 .0r_i#0ut 4 .0r_ctlout0ut 4 .0r_i#it 4 .0r_usrre9s 4 K> KC "8 e#d of i#ets!ST 8"

SOCK_RAW> Qi#etdomai#> ,R_A5O%)CO,R_ADDR> ri0_i#0ut> ri0_ctlout0ut> ri0_i#it> Qri0_usrre9s

=ac

to the search !hich goes li e thisA 0ffi#dty0eA *. fi#d corres0o#di#g domai# through the 8family8 7alue &. retur# the first e#try of the corres0o#di#g 0rotos! table !hich matches the 8ty0e8 7alue

0ffi#d0rotoA *. fi#d corres0o#di#g domai# through the 8family8 7alue &. retur# the first match of the 0air 8ty0e8 - 80rotocol8 +. if #o 0air is fou#d a#d ty0e is SOCK_RAW the# retur# the default e#try of ra! ), - default_RAW :see belo!; both fu#ctio#s retur# a# i#ets!ST e#try :that is a 0rotos! 8 0oi#ter to the corres0o#di#g array offset; "usr"src"sys" er#el"ui0c_domai#.cA struct 0rotos! 0ffi#dty0e:i#t I struct struct 8 family> i#t ty0e; domai# 8d0C 0rotos! 80rC

for :d0 4 domai#sC d0C d0 4 d0-Gdom_#e$t; if :d0-Gdom_family 44 family; goto fou#dC retur# :';C fou#dA for :0r 4 d0-Gdom_0rotos!C 0r U d0-Gdom_0rotos!.,RO5OSWC 0rNN; if :0r-G0r_ty0e QQ 0r-G0r_ty0e 44 ty0e; retur# :0r;C retur# :';C

struct 0rotos! 8 0ffi#d0roto:i#t family> i#t 0rotocol> i#t ty0e; I struct domai# 8d0C struct 0rotos! 80rC struct 0rotos! 8maybe 4 'C if :family 44 '; retur# :';C for :d0 4 domai#sC d0C d0 4 d0-Gdom_#e$t; if :d0-Gdom_family 44 family; goto fou#dC retur# :';C fou#dA for :0r 4 d0-Gdom_0rotos!C 0r U d0-Gdom_0rotos!.,RO5OSWC 0rNN; I if ::0r-G0r_0rotocol 44 0rotocol; QQ :0r-G0r_ty0e 44 ty0e;; retur# :0r;C if :ty0e 44 SOCK_RAW QQ 0r-G0r_ty0e 44 SOCK_RAW QQ 0r-G0r_0rotocol 44 ' QQ maybe 44 :struct 0rotos! 8;'; maybe 4 0rC K retur# :maybe;C K )#s0ecti#g the code abo7e> !e #otice a#other great im0orta#ce of SOCK_RAW. =asically> !hat the last fe! li#es of 0ffi#d0roto:; do> is use SOCK_RAW as a fallbac default 0rotocol. 5his mea#s that if a user a00licatio# calls soc et:&; li e thisA soc et:A<_).B5> SOCK_RAW> +';C

!here a 0rotocol of 7alue +' is #ot s0ecified i# the er#el> the# i#stead of faili#g> the !ildcard_RAW is used. 5his is because it is the o#ly 0rotos! struct i# the i#ets!ST array co#tai#i#g a .0r_0rotocol of ' a#d a 0r_ty0e of SOCK_RAW. 5he same goes for a call li e thisA soc et:A<_).B5> SOCK_RAW> ),,RO5O_5C,;C 5he SOCK_RAW ty0e a#d ),,RO5O_5C, do#6t match si#ce SOCK_RAW o#ly has e#tries for )C%,> )E%, a#d ra! ),. -o!e7er> it is a 0erfect 7alid call si#ce the er#el !ill retur# the !ildcard e#try of SOCK_RAW. 5his goes for <ree=SD. As far as Li#u$ is co#cer#ed> as you may already ha7e #oticed from the code s#i00et from abo7e :i#.h;> the 8'8 7alue is actually defi#ed as a#other 0rotocol ty0e for 5C,. 5his 0ractice has some stro#g effect i# 0orti#g a00licatio#s bet!ee# 8=SD a#d Li#u$. <or e$am0le> for 8=SD this is correctA soc et:A<_).B5> SOCK_RAW> ';C .ote that by issui#g a 7alue of ' !ill retur# the default_RAW e#try a#d #ot the !ildcard e#try> si#ce 0ffi#dty0e:; !ill retur# the first 0rotos! struct !ith ty0e of SOCK_RAW i#side the i#ets!ST table a#d the first o#e is the default_RAW e#try. O# Li#u$ you !ill get a# B,RO5O.OSF,,OR5 error. See the u#dersta#d !hy this ha00e#s. "usr"src"li#u$-&.2.8"#et"i07/"af_i#et.cA "8 F0o# startu0 !e i#sert all the eleme#ts i# i#ets!_arrayST i#to 8 the li# ed list i#ets!. 8" static struct i#et_0rotos! i#ets!_arrayST 4 I I .ty0e 4 SOCK_S5RBA%> .0rotocol 4 ),,RO5O_5C,> .0rot 4 Qtc0_0rot> .o0s 4 Qi#et_stream_o0s> .ca0ability 4 -*> .#o_chec 4 '> .flags 4 ).B5_,RO5OSW_,BR%A.B.5 O ).B5_,RO5OSW_)CSK> K> I .ty0e 4 SOCK_DERA%> .0rotocol 4 ),,RO5O_FD,> .0rot 4 Qud0_0rot> .o0s 4 Qi#et_dgram_o0s> .ca0ability 4 -*> .#o_chec 4 FD,_CSF%_DB<AFL5> .flags 4 ).B5_,RO5OSW_,BR%A.B.5> er#el code belo! to

K> I

.ty0e 4 SOCK_RAW> .0rotocol 4 ),,RO5O_),> "8 !ild card 8" .0rot 4 Qra!_0rot> .o0s 4 Qi#et_soc ra!_o0s> .ca0ability 4 CA,_.B5_RAW> .#o_chec 4 FD,_CSF%_DB<AFL5> .flags 4 ).B5_,RO5OSW_RBFSB>

KC

static i#t i#et_create:struct #et 8#et> struct soc et 8soc > i#t 0rotocol; I "8 ... 8" "8 Loo for the re9uested ty0e"0rotocol 0air. 8" a#s!er 4 .FLLC loo u0_0rotocolA err 4 -BSOCK5.OSF,,OR5C rcu_read_loc :;C list_for_each_rcu:0> Qi#ets!Ssoc -Gty0eT; I a#s!er 4 list_e#try:0> struct i#et_0rotos!> list;C "8 Chec the #o#-!ild match. 8" if :0rotocol 44 a#s!er-G0rotocol; I if :0rotocol V4 ),,RO5O_),; brea C K else I "8 Chec for the t!o !ild cases. 8" if :),,RO5O_), 44 0rotocol; I 0rotocol 4 a#s!er-G0rotocolC brea C K if :),,RO5O_), 44 a#s!er-G0rotocol; brea C K err 4 -B,RO5O.OSF,,OR5C a#s!er 4 .FLLC

"8 ... 8" K

Remember from abo7e that ),,RO5O_), 4 '. 5he abo7e code ca# be bro e# do!# to the follo!i#g casesA *; soc et:A<_).B5> SOCK_S5RBA%> ),,RO5O_5C,;C 0rotocol 4 2 a#s!er 4 i#et_0rotos!S'T 0rotocol 4 a#s!er-G0rotocol A first LifL 5RFB OK &; soc et:A<_).B5> SOCK_DERA%> ),,RO5O_FD,;C 0rotocol 4 *3 a#s!er 4 i#et_0rotos!S*T 0rotocol 4 a#s!er-G0rotocol A first LifL 5RFB OK +; soc et:A<_).B5> SOCK_S5RBA%> ';C 0rotocol 4 ' a#s!er 4 i#et_0rotos!S'T if :0rotocol 44 a#s!er-G0rotocol; A <ALSB chec else A "8 Chec for the t!o !ild cases. 8" if :),,RO5O_), 44 0rotocol; I 0rotocol 4 a#s!er-G0rotocolC brea C K A 5RFB #ote that 0rotocol 7alue ' is substituted !ith the real 7alue of ),,RO5O_5C, i# li#eA 0rotocol 4 a#s!er-G0rotocolC OK

/; soc et:A<_).B5> SOCK_DERA%> ';C 0rotocol 4 ' a#s!er 4 i#et_0rotos!S*T if :0rotocol 44 a#s!er-G0rotocol; A <ALSB chec else A "8 Chec for the t!o !ild cases. 8" if :),,RO5O_), 44 0rotocol; I 0rotocol 4 a#s!er-G0rotocolC brea C K A 5RFB #ote that 0rotocol 7alue ' is substituted !ith the real 7alue of ),,RO5O_FD, i# li#eA 0rotocol 4 a#s!er-G0rotocolC OK 1; soc et:A<_).B5> SOCK_RAW> ';C 0rotocol 4 ' a#s!er 4 i#et_0rotos!S&T 0rotocol 44 ),,RO5O_), so A if :0rotocol V4 ),,RO5O_),; #ot OK -G B,RO5O.OSF,,OR5 2; soc et:A<_).B5> SOCK_S5RBA%> M;C :!here M ca# be a#y 0rotocol e$ce0t ),,RO5O_5C,; 0rotocol 4 M a#s!er 4 i#et_0rotos!S'T if :0rotocol 44 a#s!er-G0rotocol; A <ALSB chec else A "8 Chec for the t!o !ild cases. 8" if :),,RO5O_), 44 0rotocol; I 0rotocol 4 a#s!er-G0rotocolC brea C K if :),,RO5O_), 44 a#s!er-G0rotocol; brea C both are a <ALSB #ot OK -G B,RO5O.OSF,,OR5 3; soc et:A<_).B5> SOCK_DERA%> M;C :!here M ca# be a#y 0rotocol e$ce0t ),,RO5O_FD,; same as abo7e #ot OK -G B,RO5O.OSF,,OR5 (; soc et:A<_).B5> SOCK_RAW> M;C :!here M ca# be 8a#y8 0rotocol e$ce0t '; 0rotocol 4 M a#s!er 4 i#et_0rotos!S&T if :0rotocol 44 a#s!er-G0rotocol; A <ALSB chec else A "8 Chec for the t!o !ild cases. 8" if :),,RO5O_), 44 0rotocol; I 0rotocol 4 a#s!er-G0rotocolC brea C K A <ALSB if :),,RO5O_), 44 a#s!er-G0rotocol; brea C A 5RFB OK Case ( demo#strates ho! Li#u$ uses SOCK_RAW as a fallbac !e sa! !ith <ree=SD abo7e. 0rotocol Rust li e

is <ALSB

5hese are #ot the o#ly differe#ces bet!ee# the t!o systems. We !ill discuss more of them further o#.

'$+. ),_-DR).CL 444444444444444 O#e of the most serious decisio#s !he# !riti#g lo! le7el #et!or 0rograms !ith ra! soc ets> is if the a00licatio# !ill co#struct alo#g !ith the tra#s0ort le7el 0rotocol header> the ), header as !ell. )t !ould be best to do some historical flashbac first> si#ce ma#y thi#gs ha7e cha#ged si#ce the old days. .o!adays> the ob7ious !ay to tell the ), layer #ot to 0re0e#d its o!# header is by calli#g the setsoc o0t:&; syscall a#d setti#g the ),_-DR).CL :-eader )#cluded; o0tio#. -o!e7er> #ot al!ays did this o0tio# e$ist. )# releases before .et"+> there !as #o ),_-DR).CL o0tio# a#d the o#ly !ay to #ot ha7e the er#el 0re0e#d its o!# header !as to use s0ecific er#el 0atches a#d set the 0rotocol as ),,RO5O_RAW : i#ets!S+T -- !ildcard e#try ;. 5hese 0atches !ere first made for /.+=SD a#d .et"* to su00ort 5raceroute that #eeded to !rite its o!# com0lete ), datagrams> si#ce it messed !ith the 55L field. 5he i#teresti#g 0art is that si#ce the arri7al of ),_-DR).CL> Li#u$ a#d <ree=SD chose differe#t !ays to co#ti#ue the Ltraditio#L. O# Li#u$ !he# setti#g the 0rotocol as ),,RO5O_RAW> the# by default the er#el sets the ),_-DR).CL o0tio# a#d thus does #ot 0re0e#d its o!# ), header. 5he code belo! 0ro7es this. "usr"src"li#u$-&.2.8"#et"i07/"af_i#et.c static i#t i#et_create:struct #et 8#et> struct soc et 8soc > i#t 0rotocol; I "8 ... 8" if :SOCK_RAW 44 soc -Gty0e; I i#et-G#um 4 0rotocolC if :),,RO5O_RAW 44 0rotocol; i#et-Ghdri#cl 4 *C "8 set ),_-DR).CL 8" K "8 ... 8" K O# <ree=SD ho!e7er> the er#el #e7er sets ),_-DR).CL by default> e7e# if ),,RO5O_RAW is used. 5his mea#s that the a00licatio# has to e$0licitly set the o0tio# !he#e7er it !a#ts to ha#d-craft its o!# ), header. We !ill see belo! !he# !e del7e i#to more details of ra! soc ets i#0ut"out0ut that some ), header fields are"!ere al!ays set by the er#el. 5o sum u0> to ma e 0ortable ra! soc ets a00licatio#s that #eed to co#struct their o!# ), header> it is best to set the ),_-DR).CL al!ays> si#ce it is the commo# !ay for both Li#u$ a#d 8=SD.

'$/. ra! i#0ut 44444444444444 After i#itialiPi#g a ra! soc et i# our a00licatio#> !e ha7e to #o! i# ad7a#ce !hich datagrams !e are e$0ecti#g to recei7e. 5he im0leme#tatio# a00roach ta e# here is e#tirely differe#t bet!ee# Li#u$ a#d <ree=SD a#d deser7es our atte#tio#. a. Li#u$ 88888888 Li#u$ has chose# the less resource-sa7y a00roach. After the ), layer 0rocesses a #e! i#comi#g ), datagram> it calls i0_local_deli7er_fi#ish:; er#el fu#ctio# !hich is res0o#sibe for calli#g a registered tra#s0ort 0rotocol ha#dler by i#s0ecti#g the 0rotocol field of the ), header :remember from abo7e;. -o!e7er before it deli7ers the datagram to the ha#dler> it chec s e7ery time if a#

10

a00licatio# has created a ra! soc et !ith the 8same8 0rotocol #umber. )f there is o#e or more such a00licatio#s> it ma es a co0y of the datagram a#d deli7ers it to them as !ell.

"usr"src"li#u$-&.2.8"#et"i07/"i0_i#0ut.c static i#t i0_local_deli7er_fi#ish:struct s _buff 8s b; I __s b_0ull:s b> i0_hdrle#:s b;;C "8 ,oi#t i#to the ), datagram> Rust 0ast the header. 8" s b_reset_tra#s0ort_header:s b;C rcu_read_loc :;C I "8 .oteA See ra!.c a#d #et"ra!.h> RAWD/_-5A=LB_S)WB44%A?_).B5_,RO5OS i#t 0rotocol 4 i0_hdr:s b;-G0rotocolC i#t hashC struct soc 8ra!_s C struct #et_0rotocol 8i00rotC resubmitA hash 4 0rotocol Q :%A?_).B5_,RO5OS - *;C ra!_s 4 s _head:Qra!_7/_htableShashT;C "8 )f there maybe a ra! soc et !e must chec - if #ot !e 8 do#6t care less 8" if :ra!_s QQ Vra!_7/_i#0ut:s b> i0_hdr:s b;> hash;; ra!_s 4 .FLLC "8 ... 8" "8 tra#s0ort 0rotocol ha#dler calli#g 8" "8 ... 8" K A 9uestio# that arises here is !hat 0art of the datagram is 0assed to the ra! soc et@ 5he ), 0ayload o#ly or the !hole ), datagram :alo#g !ith the ), header that is;@ ,ayi#g a closer loo at the code abo7e !e ca# deduce that the !hole ), datagram is 0assed. Let6s see !hyA <irst !e ha7eA __s b_0ull:s b> i0_hdrle#:s b;;C What this esse#tialy does is shift the data 0oi#ter of the s _buff to 0oi#t Rust belo! the ), header :although this sou#ds co#tro7ersial to !hat !e deduced abo7e> !ait u#til you ha7e read the !hole 0art before raisi#g ob7ious obRectio#s;. )t does that by addi#g a siPe of i0_hdrle#:s b; le# to the s b-Gdata member. 5his mea#s that the s b_data #o! 0oi#ts to the ), 0ayload a#d si#ce the 0ac et tra7els u0!ard the #et!or stac the begi##i#g of the ), 0ayload is actually the begi##i#g of the tra#s0ort header :most li ely the 5C, header;

8"

11

__s b_0ull is defi#ed i#A "usr"src"li#u$-&.2.8"i#clude"li#u$"s buff.h static i#li#e u#sig#ed char 8__s b_0ull:struct s _buff 8s b> u#sig#ed i#t le#; I s b-Gle# -4 le#C =FE_O.:s b-Gle# U s b-Gdata_le#;C retur# s b-Gdata N4 le#C K After __s b_0ull !e 67e got a#other s b-rele7a#t fu#ctio# bei#g calledA s b_reset_tra#s0ort_header:s b;C !hich u0dates the s b-Gtra#s0ort_header member of the s b struct to 0oi#t i#to the #e! 0lace the s b-Gdata 0oi#ted Rust before - guess !hat - the tra#s0ort header. s b_reset_tra#s0ort_header:s b; is defi#ed i#A "usr"src"li#u$-&.2.8"i#clude"li#u$"s buff.h static i#li#e 7oid s b_reset_tra#s0ort_header:struct s _buff 8s b; I s b-Gtra#s0ort_header 4 s b-GdataC K Our s _buff struct !ill #o! loo s _buff I li e thisA

buffer -------------.s b_data O O O -------------O O ), header O U---O -------------O -----------------G -G O ), 0ayload O O O O O O .tra#s0ort_header ---------O O O O -------------O .#et!or _header -----------------------------------K So i# !hat 0art do !e actually see that the 8!hole8 ), datagram :alo#g !ith the ), header; is se#t to the user 0rocess ha7i#g created a ra! soc et@ B#ter ra!_7/_i#0ut:;. As the #ame itself suggest this is the mai# ra! soc et ha#dler a#d is called by this li#e i# i0_local_deli7er_fi#ish:;A if :ra!_s QQ Vra!_7/_i#0ut:s b> i0_hdr:s b;> hash;;

.ote the seco#d argume#t i0_hdr:s b; to ra!_7/_i#0ut. i0_hdr is defi#ed i# "usr"src"li#u$-&.2.8"i#clude"li#u$"i0.hA static i#li#e struct i0hdr 8i0_hdr:co#st struct s _buff 8s b; I retur# :struct i0hdr 8;s b_#et!or _header:s b;C K a#d s b_#et!or _header is defi#ed i# "usr"src"li#u$-&.2.8"i#clude"li#u$"s buff.hA static i#li#e u#sig#ed char 8s b_#et!or _header:co#st struct s _buff 8s b; I retur# s b-G#et!or _headerC K

12

What the abo7e basically does is 0ass the ), header of the curre#t 0ac et to the ra!_7/_i#0ut fu#ctio# as a se0arate i0hdr struct. 5his is do#e by casti#g the s buff data 0art !here the s b-G#et!or _header member 0oi#ts :the begi##i#g of the ), header; i#to a# i0hdr struct. )ts time for the ra!_7/_i#0ut a#alysisA "usr"src"li#u$-&.2.8"#et"i07/"ra!.cA "8 ), i#0ut 0rocessi#g comes here for RAW soc et deli7ery. 8 Caller o!#s SK=> so !e must ma e clo#es. 8 8 R<C **&&A S-OFLD 0ass 5OS 7alue u0 to the tra#s0ort layer. 8 -G )t does. A#d #ot o#ly 5OS> but all ), header. 8" i#t ra!_7/_i#0ut:struct s _buff 8s b> struct i0hdr 8i0h> i#t hash; I struct soc 8s C struct hlist_head 8headC i#t deli7ered 4 'C read_loc :Qra!_7/_loc ;C head 4 Qra!_7/_htableShashTC if :hlist_em0ty:head;; goto outC s 4 __ra!_7/_loo u0:__s _head:head;> i0h-G0rotocol> i0h-Gsaddr> i0h-Gdaddr> s b-Gde7-Gifi#de$;C !hile :s ; I deli7ered 4 *C if :i0h-G0rotocol V4 ),,RO5O_)C%, OO Vicm0_filter:s > s b;; I struct s _buff 8clo#e 4 s b_clo#e:s b> E<,_A5O%)C;C "8 .ot releasi#g hash tableV 8" if :clo#e; ra!_rc7:s > clo#e;C 4 __ra!_7/_loo u0:s _#e$t:s ;> i0h-G0rotocol> i0h-Gsaddr> i0h-Gdaddr> s b-Gde7-Gifi#de$;C

K s K outA K

read_u#loc :Qra!_7/_loc ;C retur# deli7eredC

.ote that the __ra!_7/_loo u0:; fu#ctio# !hich esse#tialy chec s if a# a00licatio# has created a ra! soc et !ith the s0ecific 0rotocol #umber me#tio#ed i# the ), header> ta es i#to accou#t the member i0h-G0rotocol as e$0ected. O#e of the most im0orta#t 0arts of the code abo7e is the s b_clo#e:; fu#ctio# !hich is called as ma#y times as the #umber of the a00licatio#s that #eed to recei7e a co0y of the curre#t datagram. 5his is im0leme#ted i#side the !hile loo0. 5he #e$t 0art is the ra!_rc7:; callA

13

i#t ra!_rc7:struct soc 8s > struct s _buff 8s b; I if :V$frm/_0olicy_chec :s > ?<R%_,OL)CX_).> s b;; I free_s b:s b;C retur# .B5_R?_DRO,C K #f_reset:s b;C s b_0ush:s b> s b-Gdata - s b_#et!or _header:s b;;C ra!_rc7_s b:s > s b;C retur# 'C

5his fu#ctio# accom0lishes t!o thi#gsA :its im0erati7e that the belo! is u#derstood as this is the 0art that sol7es the 0ossible obRectio#s to our se#te#ce about the 0assi#g of the !hole ), datagram to the ra! soc et; *; Calls s b_0ush:; to re7ert the s buff i#to its old form. s b_0ush:; is the e$act o00osite of s b_0ull:;A "88 8 8 8 8 8 8 8 8" static I s b_0ush - add data to the start of a buffer @s bA buffer to use @le#A amou#t of data to add 5his fu#ctio# e$te#ds the used data area of the buffer at the buffer start. )f this !ould e$ceed the total buffer headroom the er#el !ill 0a#ic. A 0oi#ter to the first byte of the e$tra data is retur#ed. i#li#e u#sig#ed char 8s b_0ush:struct s _buff 8s b> u#sig#ed i#t le#; s b-Gdata -4 le#C s b-Gle# N4 le#C if :u#li ely:s b-GdataUs b-Ghead;; s b_u#der_0a#ic:s b> le#> curre#t_te$t_addr:;;C retur# s b-GdataC

K So #o! remember its argume#ts a#d ho! the s buff !as li e from the diagram abo7e. s b_0ush:s b> s b-Gdata - s b_#et!or _header:s b;;C 5he s buff !ill #o! loo s _buff I li e thisA

buffer -------------.s b_data O O O -------------O------------------G O ), header O U----------------O -G O ), 0ayload O O O O O O .tra#s0ort_header ---------O O O O -------------O .#et!or _header -----------------------------------K Xou might be #o! !o#deri#g !hy do the s b_0ull:; 0art from abo7e if !e are goi#g to tur# bac to this form a#y!ay by calli#g s b_0ush:; to do the e$act o00osite o0eratio#@ 5he a#s!er to this 9uestio# is that ra! soc ets is a# i#termediate 0lace !here datagrams go. )# the usual case !here #o ra! soc ets are o0e# for some ),,RO5O_ 0rotocol> the# as the 0ac et mo7es u0 the #et!or stac s b_0ull:; is o#ly #atural si#ce the 0ac et is slo!ly stri00ed of the

14

lo!er layer headers. -o!e7er> this o7erhead could be a7oided by 0erha0s calli#g s b_0ull:; after the ra! soc ets mecha#ism is ser7ed a#d thus a7oidi#g the e$tra s b_0ush:; i#side the ra!_rc7:;. :5his #eeds to be tested though; &; Calls ra!_rc7_s b:; !hich is res0o#sible for calli#g the ge#eric fu#ctio# that 0asses the s _buff to the a00licatio# soc et :a#d also i#creme#t a #ot yet im0leme#ted ra! dro0s cou#ter i# case this fails;A static i#t ra!_rc7_s b:struct soc 8 s > struct s _buff 8 s b; I "8 Charge it to the soc et. 8" if :soc _9ueue_rc7_s b:s > s b; U '; I "8 <)?%BA i#creme#t a ra! dro0s cou#ter here 8" free_s b:s b;C retur# .B5_R?_DRO,C K K retur# .B5_R?_SFCCBSSC

<rom all the abo7e !e co#clude that #ot o#ly is the ), 0ayload 0assed to the ra! soc et but the ), header as !ell.

b. <ree=SD 8888888888 <ree=SD ta es a#other a00roach. )t 8#e7er8 0asses 5C, or FD, 0ac ets to ra! soc ets. Such 0ac ets #eed to be read directly at the datali# layer by usi#g libraries li e lib0ca0 or the b0f A,). )t also 8#e7er8 0asses a#y fragme#ted datagram. Bach datagram has to be com0leteley reassembled before it is 0assed to a ra! soc et. <ree=SD 0asses to a ra! soc etA a; e7ery ), datagram !ith a 0rotocol field that is #ot registered i# the er#el b; all )E%, 0ac ets after er#el fi#ishes 0rocessi#g them c; all )C%, 0ac ets :e$ce0t echo re9uest> timestam0 re9uest a#d address mas re9uest; after er#el fi#ishes 0rocesses them We are goi#g to study the first case !hich is of more i#terest. =ut before that> some i#troductory material o# ho! the <ree=SD er#el registers the 0rotocol ha#dlers> has to be studied. )# 0art '$&. Creatio# !e me#tio#ed a fe! thi#gs about the 0rotos! structs !hich are associatio#s of a SOCK_??? ty0e !ith a ),,RO5O_??? 0rotocol a#d reside i# the global table i#ets!ST. A0art from that> they ha7e members !hich are fu#ctio# 0oi#ters to the corres0o#di#g 0rotocol ha#dlers :the so called hoo s;. 0r_i#0ut:; is res0o#sible for ha#dli#g i#comi#g data from a lo!er le7el 0rotocol !hile 0r_out0ut:; ha#dles outgoi#g data from a higher le7el 0rotocol.

15

"usr"src"sys"sys"0rotos!.hA struct 0rotos! I short 0r_ty0eC struct domai# 80r_domai#C short 0r_0rotocolC short 0r_flagsC "8 0rotocol-0rotocol hoo s 8" 0r_i#0ut_t 80r_i#0utC 0r_out0ut_t 80r_out0utC 0r_ctli#0ut_t 80r_ctli#0utC 0r_ctlout0ut_t 80r_ctlout0utC "8 user-0rotocol hoo 8" 0r_usrre9_t 80r_ousrre9C "8 utility hoo s 8" 0r_i#it_t 80r_i#itC 0r_fasttimo_t 80r_fasttimoC 0r_slo!timo_t 80r_slo!timoC 0r_drai#_t 80r_drai#C KC So the ), layer !ill usually call the corres0o#di#g 0rotocol ha#dler-hoo 0r_i#0ut:; through this table !he# it recei7es a 0ac et. 5o #o! !hich 0rotocol goes to !hich ha#dler> a# i#itialiPatio# 0hase is ma#datory. <irst> all 0rotos! structures are i#itialiPed i# "usr"src"sys"#eti#et"i#_0roto.c 5his resembles !hat !e sa! earlier o# Li#u$ at "usr"src"li#u$-&.2.8"#et"i07/"af_i#et.c "usr"src"sys"#eti#et"i#_0roto.cA struct 0rotos! i#ets!ST 4 I I .0r_ty0e 4 .0r_domai# 4 .0r_0rotocol 4 .0r_i#it 4 .0r_slo!timo 4 .0r_drai# 4 .0r_usrre9s 4 K> I .0r_ty0e 4 .0r_domai# 4 .0r_0rotocol 4 .0r_flags 4 .0r_i#0ut 4 .0r_ctli#0ut 4 .0r_ctlout0ut 4 .0r_i#it 4 .0r_usrre9s 4 K> I .0r_ty0e 4 .0r_domai# 4 .0r_0rotocol 4 .0r_flags 4 .0r_i#0ut 4 .0r_ctli#0ut 4 .0r_ctlout0ut 4 .0r_i#it 4 .0r_slo!timo 4 .0r_drai# 4 "8 "8 "8 "8 soc et ty0e used for 8" domai# 0rotocol a member of 8" 0rotocol #umber 8" see belo! 8"

"8 i#0ut to 0rotocol :from belo!; 8" "8 out0ut to 0rotocol :from abo7e; 8" "8 co#trol i#0ut :from belo!; 8" "8 co#trol out0ut :from abo7e; 8"

"8 fast timeout :&''ms; 8" "8 slo! timeout :1''ms; 8" "8 flush a#y e$cess s0ace 0ossible 8" "8 su0ersedes 0r_usrre9:; 8"

struct 0r_usrre9s 80r_usrre9sC

'> Qi#etdomai#> ),,RO5O_),> i0_i#it> i0_slo!timo> i0_drai#> Q#ousrre9s SOCK_DERA%> Qi#etdomai#> ),,RO5O_FD,> ,R_A5O%)CO,R_ADDR> ud0_i#0ut> ud0_ctli#0ut> i0_ctlout0ut> ud0_i#it> Qud0_usrre9s SOCK_S5RBA%> Qi#etdomai#> ),,RO5O_5C,> ,R_CO..RBJF)RBDO,R_)%,LO,CLO,R_WA.5RCDD> tc0_i#0ut> tc0_ctli#0ut> tc0_ctlout0ut> tc0_i#it> tc0_slo!timo> tc0_drai#>

16

.0r_usrre9s 4 K> "8 Hifdef SC5, 8" "8 ... 8" "8 He#dif 8" I .0r_ty0e 4 .0r_domai# 4 .0r_0rotocol 4 .0r_flags 4 .0r_i#0ut 4 .0r_ctli#0ut 4 .0r_ctlout0ut 4 .0r_usrre9s 4 K> I .0r_ty0e 4 .0r_domai# 4 .0r_0rotocol 4 .0r_flags 4 .0r_i#0ut 4 .0r_ctlout0ut 4 .0r_usrre9s 4 K> I .0r_ty0e 4 .0r_domai# 4 .0r_0rotocol 4 .0r_flags 4 .0r_i#0ut 4 .0r_ctlout0ut 4 .0r_i#it 4 .0r_fasttimo 4 .0r_slo!timo 4 .0r_usrre9s 4 K> I .0r_ty0e 4 .0r_domai# 4 .0r_0rotocol 4 .0r_flags 4 .0r_i#0ut 4 .0r_ctlout0ut 4 .0r_usrre9s 4 K>

Qtc0_usrre9s

SOCK_RAW> Qi#etdomai#> ),,RO5O_RAW> ,R_A5O%)CO,R_ADDR> ri0_i#0ut> ri0_ctli#0ut> ri0_ctlout0ut> Qri0_usrre9s SOCK_RAW> Qi#etdomai#> ),,RO5O_)C%,> ,R_A5O%)CO,R_ADDRO,R_LAS5-DR> icm0_i#0ut> ri0_ctlout0ut> Qri0_usrre9s SOCK_RAW> Qi#etdomai#> ),,RO5O_)E%,> ,R_A5O%)CO,R_ADDRO,R_LAS5-DR> igm0_i#0ut> ri0_ctlout0ut> igm0_i#it> igm0_fasttimo> igm0_slo!timo> Qri0_usrre9s SOCK_RAW> Qi#etdomai#> ),,RO5O_RSD,> ,R_A5O%)CO,R_ADDRO,R_LAS5-DR> rs70_i#0ut> ri0_ctlout0ut> Qri0_usrre9s

5he 9uestio# is ho! does the ), layer u#dersta#d to !hich 0rotocol ha#dler it should ha#d the 0ac et to@ 5his is do#e by i#s0ecti#g the ,rotocol field i# the ), header :!e me#tio#ed this abo7e; a#d the# loo i#g u0 the i0_0roto$ST table !hich is i#itialiPed i# fu#ctio# i0_i#it:;.

17

"usr"src"sys"#eti#et"i0_i#0ut.cA 7oid i0_i#it:7oid; I struct 0rotos! 80rC i#t iC 5A)LJ_).)5:Qi#_ifaddrhead;C i#_ifaddrhashtbl 4 hashi#it:).ADDR_.-AS-> %_)<ADDR> Qi#_ifaddrhmas ;C 0r 4 0ffi#d0roto:,<_).B5> ),,RO5O_RAW> SOCK_RAW;C if :0r 44 .FLL; 0a#ic:Li0_i#itA ,<_).B5 #ot fou#dL;C "8 )#itialiPe the e#tire i0_0roto$ST array to ),,RO5O_RAW. 8" for :i 4 'C i U ),,RO5O_%A?C iNN; i0_0roto$SiT 4 0r - i#ets!C "8 8 Cycle through ), 0rotocols a#d 0ut them i#to the a00ro0riate 0lace 8 i# i0_0roto$ST. 8" for :0r 4 i#etdomai#.dom_0rotos!C 0r U i#etdomai#.dom_0rotos!.,RO5OSWC 0rNN; if :0r-G0r_domai#-Gdom_family 44 ,<_).B5 QQ 0r-G0r_0rotocol QQ 0r-G0r_0rotocol V4 ),,RO5O_RAW; I "8 =e careful to o#ly i#de$ 7alid ), 0rotocols. 8" if :0r-G0r_0rotocol U ),,RO5O_%A?; i0_0roto$S0r-G0r_0rotocolT 4 0r - i#ets!C K "8 ... 8" K 5he i0_0roto$ST array is #othi#g else tha# a sim0le associati7e array !hich is used by ), to demulti0le$ i#comi#g datagrams based o# the 8real8 tra#s0ort le7el 0rotocol #umber. What do !e mea# by real@ 5he actual #umber residi#g i# the ), header i# the ,rotocol field - the sta#dardised #umber by all R<Cs. 5he #umber defi#ed i# "usr"src"li#u$-&.2.8"i#clude"li#u$"i#.h for Li#u$ a#d "usr"src"sys"#eti#et"i#.h for <ree=SD. ),,RO5O_???. %ost 0rotocols ha7e such a global commo# #umber. 5he i#ets!ST table does#6t ha7e the 0rotocol 0rotos! structs arra#ged by this #umber. <or e$am0le 5C, !hich is defi#ed to be 0rotocol #umber 2 is i# i#ets!S&T. So i0_0roto$S2T has to 0oi#t at i#ets!S&T. <airly sim0le. 5he same logic goes for all 0rotocols that ), may #eed to ha#d datagrams to. A#other diagramA

18

i0_0roto$ST ' * & + / 1 2 3 ...

i#ets!ST

----------------O + O O ), O ----------------O / O-------------------------O O FD, O --------O --------O 1 O------O O-------------------G O 5C, O --------O O O --------O O O O O O),:ra!;O :default; --------O O O --------O O O O O---------G O )C%, O --------O O --------O O O-----------------------------G O )E%, O --------O --------O + O----------------O O ... O ----------------O O O ... O ----------------O ... O O),:ra!;O :!ildcard; -----------------

:.ot re0rese#ted abo7e> to a7oid clutteri#g> are the 0oi#ters comi#g from the u#registered 0rotocols a#d goi#g to the ),:ra!; default e#try. Where do SOCK_RAW a#d ),,RO5O_RAW come i#to 0lay i# this 0art@ 5o 0ut it sim0ly> if their 0rotos! does#6t e$ist> the er#el 0a#icsV Why is that@ =ecause e7ery si#gle e#try i# i0_0roto$ST has to be i#itialiPed to 0oi#t to the :first; ),:ra!; 0rotos! struct. A#d !ith a good reaso#A if the er#el does#6t #o! ho! to ha#dle a 0rotocol :there is #o other registered ha#dler for it; it 0asses it to the default e#try ha#dler that is SOCK_RAW - ),,RO5O_RAW :default_RAW;. Such is the im0orta#ce of its e$iste#ce i# all co#tem0orary er#els. Of course> all i0_0roto$ST e#tries that 8do8 ha7e a ha#dler for the 0rotocol #umber s0ecified are defi#ed to 0oi#t to the actual i#ets!ST e#try> thus o7er!ritti#g the 0oi#ti#g to the default e#try. =ut all those !ho do#6t ha7e> base their :#ot official; e$iste#ce o# ),:ra!;. K#o!i#g this i#forma#tio#> i0_i#0ut after doi#g all chec s that ha7e to be do#e> :for e$am0le> to see if the chec sum of the datagram is correct; ca# fi#ally call the higher-le7el 0rotocol ha#dler. .oteA 5he higher le7el 0rotocol does#6t #ecessarily ha7e to be tra#s0ort le7el> si#ce ), ca# ha#d datagrams to )C%,> )E%, !hich are 8#ot8 tra#s0ort 0rotocols but are co#sidered to be 0art of the #et!or layer.

19

"usr"src"sys"#eti#et"i0_i#0ut.cA "8 8 )0 i#0ut routi#e. Chec sum 8 try to reassemble. ,rocess 8" 7oid i0_i#0ut:struct mbuf 8m; I struct i0 8i0 4 .FLLC struct i#_ifaddr 8ia 4 struct ifaddr 8ifaC i#t chec if> hle# 4 u_short sumC i#t dchg 4 'C struct i#_addr odstC "8 ... 8" "8 8 S!itch out to 0rotocol6s i#0ut routi#e. 8" i0stat.i0s_deli7eredNNC :8i#ets!Si0_0roto$Si0-Gi0_0TT.0r_i#0ut;:m> hle#;C retur#C m_freem:m;C a#d byte s!a0 header. )f fragme#ted o0tio#s. ,ass to #e$t le7el.

.FLLC 'C "8 dest cha#ged after f! 8" "8 origi#al dst address 8"

badA K

Su00ose #o! that i0-Gi0_0 :the ,rotocol field i# the i0 header; is somethi#g that 8does #ot8 ha7e a registered ha#dler i# the er#el. What !ill the follo!i#g li#e from i0_i#0ut:; call @ :8i#ets!Si0_0roto$Si0-Gi0_0TT.0r_i#0ut;:m> hle#;C Xou guessed right. 0r_i#0ut of i#ets!Sdefault_RAWT :SOCK_RAW - ),,RO5O_RAW;. 5ime to a#alyse ri0_i#0ut:;. "usr"src"sys"#eti#et"ra!_i0.cA "8 8 Setu0 ge#eric address a#d 0rotocol structures 8 for ra!_i#0ut routi#e> the# 0ass them alo#g !ith 8 mbuf chai#. 8" 7oid ri0_i#0ut:struct mbuf 8m> i#t off; I struct i0 8i0 4 mtod:m> struct i0 8;C i#t 0roto 4 i0-Gi0_0C struct i#0cb 8i#0> 8lastC ).,_).<O_RLOCK:Qri0cbi#fo;C ri0src.si#_addr 4 i0-Gi0_srcC last 4 .FLLC L)S5_<ORBAC-:i#0> Qri0cb> i#0_list; I ).,_LOCK:i#0;C if :i#0-Gi#0_i0_0 QQ i#0-Gi#0_i0_0 V4 0roto; I doco#ti#ueA ).,_F.LOCK:i#0;C co#ti#ueC K

20

Hifdef ).B52 if ::i#0-Gi#0_7flag Q ).,_),D/; 44 '; goto doco#ti#ueC He#dif if :i#0-Gi#0_laddr.s_addr QQ i#0-Gi#0_laddr.s_addr V4 i0-Gi0_dst.s_addr; goto doco#ti#ueC if :i#0-Gi#0_faddr.s_addr QQ i#0-Gi#0_faddr.s_addr V4 i0-Gi0_src.s_addr; goto doco#ti#ueC if :Railed:i#0-Gi#0_soc et-Gso_cred;; if :hto#l:0riso#_geti0:i#0-Gi#0_soc et-Gso_cred;; V4 i0-Gi0_dst.s_addr; goto doco#ti#ueC if :last; I struct mbuf 8#C # 4 m_co0y:m> '> :i#t;%_CO,XALL;C if :# V4 .FLL; :7oid; ra!_a00e#d:last> i0> #;C "8 ??? cou#t dro00ed 0ac et 8" ).,_F.LOCK:last;C

K if :last V4 .FLL; I if :ra!_a00e#d:last> i0> m; V4 '; i0stat.i0s_deli7ered--C ).,_F.LOCK:last;C K else I m_freem:m;C i0stat.i0s_#o0rotoNNC i0stat.i0s_deli7ered--C K ).,_).<O_RF.LOCK:Qri0cbi#fo;C K F0o# readi#g the abo7e source code> !e ca# discer# the follo!i#g 0oi#tsA *; )f a ra! soc et :ty0e SOCK_RAW; is created !ith a 0rotocol 7alue of ' the#A a; if the 0rocess has called bi#d:; a#d the address s0ecified matches the desti#atio# address i# the datagram the# 8ALL8 such datagrams !hich the er#el does#6t #o! ho! to ha#dle> are 0assed to its ra! soc et b; if the 0rocess has called co##ect:; a#d the address s0ecified matches the foreig# address i# the datagram the# 8ALL8 such datagrams !hich the er#el does#6t #o! ho! to ha#dle> are 0assed to its ra! soc et. c; if the 0rocess has#6t called bi#d:; a#d co##ect:;> the# 8ALL8 datagrams !hich the er#el does#6t #o! ho! to ha#dle> are 0assed to its ra! soc et. d; a#y combi#atio# from a or b that fails its chec datagram is #ot 0assed to the a00licatio#. mea#s that the

K last 4 i#0C

&; )f a ra! soc et :ty0e SOCK_RAW; is created !ith a 0rotocol 7alue of #o#Pero the# !e ha7e the follo!i#g casesA a; if the 0rotocol 7alue s0ecified has a registered 0rotocol ha#dler i# the er#el> the a00licatio# does 8#ot8 recei7e a#y datagram :e$ce0tio# are )C%,> )E%, but follo! a#other mecha#ism #ot discussed here;.

21

b; if the 0rotocol 7alue s0ecified does#6t ha7e a registed 0rotocol ha#dler i# the er#el> the a00licatio# recei7es 8o#ly8 the datagrams !hich ha7e this 0rotocol 7alue s0ecified i# the ), header.

5he abo7e ca# be deduced from the follo!i#g code s#i00etA if :i#0-Gi#0_i0_0 QQ i#0-Gi#0_i0_0 V4 0roto; I doco#ti#ueA ).,_F.LOCK:i#0;C co#ti#ueC K if :i#0-Gi#0_laddr.s_addr QQ i#0-Gi#0_laddr.s_addr V4 i0-Gi0_dst.s_addr; goto doco#ti#ueC if :i#0-Gi#0_faddr.s_addr QQ i#0-Gi#0_faddr.s_addr V4 i0-Gi0_src.s_addr; goto doco#ti#ueC Fsi#g re7erse logic> the first chec ig#ores a#y datagram that has a #o#-Pero 0rotocol 7alue a#d the ra! soc et that is chec ed does#6t ha7e this 0rotocol s0ecifed. 5his mea#s that a#y ra! soc et that has s0ecified a 0rotocol 7alue of ' !ill 0ass this chec a#d !ill #ot be ig#ored. 5he same goes for a#y ra! soc et that has a #o#-Pero 0rotocol 7alue but has the same 0rotocol 7alue as the o#e i# datagram chec ed. 5he same a00lies to the t!o chec s that follo! the first> that test the local a#d foreig# addresses :i# case bi#d:; or"a#d co##ect:; !as called;. Of course the case !here the a00licatio# does #ot recei7e a#y datagram although it has created a ra! soc et :but !ith a 0rotocol !hich is registered; comes from the fact that ri0_i#0ut:; !ill #ot be called i# that case at all. A small 0are#thesis here to comme#t o# somethi#g from the 8=SD ma# 0ages. )# 0articular ma# / i0 me#tio#s thatA Ra! ), soc ets L)f 0roto is '> the default 0rotocol ),,RO5O_RAW is used for outgoi#g 0ac ets> a#d o#ly i#comi#g desti#ed for that 0rotocol are recei7ed.L 5his is 0artially co#fusi#g> si#ce it sou#ds li e creati#g a ra! soc et !ith a 0rotocol 7alue of ' !ould result i# o#ly 0ac ets desti#ed for ),,RO5O_RAW :as i# &11; recei7ed a#d #othi#g else. 5his is #ot true ho!e7er> a#d the ma# 0ages author !a#ts to say that e7ery ), datagram that has a 0rotocol 7alue that 8results i# the ), demulti0le$i#g do#e by i0_0roto$ST to 0oi#t to a ),,RO5O_RAW e#try8 :remember that all u# #o!# 0rotocols 0oi#t to that default e#try; the# the soc et !ill recei7e these datagrams. =y this defi#itio#> it is correct. 5he fu##y thi#g is that s0ecifyi#g a 0rotocol of ),,RO5O_RAW !he# creati#g a ra! soc et !ill 8#ormally8 result i# the soc et #ot recei7i#g 8a#y8 datagram based o# the logic abo7e. .ormally> because datagrams !ith such a 0rotocol #umber do"shou#d #ot a00ear o# the !ire. Of course this does#6t mea# that a 0rocess ca#6t forge a 0ac et !ith such a 0rotocol #umber a#d se#d it. )t6s a#other case of the fu#dame#tal differe#ce bet!ee# should#6t a#d ca#6t. )# additio# to that #ot all 0ac et-forgi#g 0rograms ha7e the desire to al!ays commu#icate... :as i# t!o-!ay commu#icatio#; O# Li#u$> the same !ould ha00e#> si#ce ra!_7/_i#0ut:; is called e7ery time a#d !ill chec if a ra! soc et !ith 0rotocol #umber ),,RO5O_RAW e$isted i# the user s0ace. 5he follo!i#g t!o small 0rograms demo#strate the abo7e fu#ctio#ality of ),,RO5O_RAW o# Li#u$. Remember from 0art '$+. ),_-DR).CL that the Li#u$ er#el sets the ),_-DR).CL by default !he# creati#g a soc et !ith ),,RO5O_RAW so !e !ill ha7e to create our o!# header lest !e !a#t to se#d garbage o# the !ire. 5he demo#stratio# uses the loo0bac de7ice but !ill ha7e the same effects o# a#y real ether#et de7ice. =oth 0rograms ru# o# <ree=SD"O0e#=SD too> although crafti#g our o!# ), header

22

is #ot #ecessary :the# #eti#et"i#.h should be used i# 0lace of #eti#et"i0.h; "888 ),,RO5O_RAW recei7er 888" Hi#clude Usys"soc et.hG Hi#clude Usys"ty0es.hG Hi#clude U#eti#et"i0.hG Hi#clude Uar0a"i#et.hG Hi#clude Ustri#g.hG Hi#clude Ustdio.hG Hi#clude Ustdlib.hG i#t mai#:7oid; I i#t sC struct soc addr_i# saddrC char 0ac etS1'TC if ::s 4 soc et:A<_).B5> SOCK_RAW> ),,RO5O_RAW;; U '; I 0error:LerrorAL;C e$it:B?)5_<A)LFRB;C K memset:0ac et> '> siPeof:0ac et;;C soc le#_t 8le# 4 :soc le#_t 8;siPeof:saddr;C i#t fromle# 4 siPeof:saddr;C !hile:*; I if :rec7from:s> :char 8;Q0ac et> siPeof:0ac et;> '> :struct soc addr 8;Qsaddr> Qfromle#; U '; 0error:L0ac et recei7e errorAL;C i#t i 4 siPeof:struct i0hdr;C "8 0ri#t the 0ayload 8" !hile :i U siPeof:0ac et;; I f0ri#tf:stderr> LYcL> 0ac etSiT;C iNNC K 0ri#tf:LZ#L;C K e$it:B?)5_SFCCBSS;C K "888 ),,RO5O_RAW se#der 888" Hi#clude Usys"soc et.hG Hi#clude Usys"ty0es.hG Hi#clude U#eti#et"i0.hG Hi#clude Uar0a"i#et.hG Hi#clude Ustri#g.hG Hi#clude Ustdio.hG Hi#clude Ustdlib.hG Hdefi#e DBS5 L*&3.'.'.*L i#t mai#:7oid; I i#t sC struct soc addr_i# daddrC char 0ac etS1'TC "8 0oi#t the i0hdr to the begi##i#g of the 0ac et 8" struct i0hdr 8i0 4 :struct i0hdr 8;0ac etC if ::s 4 soc et:A<_).B5> SOCK_RAW> ),,RO5O_RAW;; U '; I 0error:LerrorAL;C e$it:B?)5_<A)LFRB;C

23

K daddr.si#_family 4 A<_).B5C daddr.si#_0ort 4 'C "8 #ot #eeded i# SOCK_RAW 8" i#et_0to#:A<_).B5> DBS5> :struct i#_addr 8;Qdaddr.si#_addr.s_addr;C memset:daddr.si#_Pero> '> siPeof:daddr.si#_Pero;;C memset:0ac et> 6A6> siPeof:0ac et;;C "8 0ayload !ill be all As 8" i0-Gihl 4 1C i0-G7ersio# 4 /C i0-Gtos 4 'C i0-Gtot_le# 4 hto#s:/';C "8 *2 byte 7alue 8" i0-Gfrag_off 4 'C "8 #o fragme#t 8" i0-Gttl 4 2/C "8 default 7alue 8" i0-G0rotocol 4 ),,RO5O_RAWC "8 0rotocol at L/ 8" i0-Gchec 4 'C "8 #ot #eeded i# i0hdr 8" i0-Gsaddr 4 daddr.si#_addr.s_addrC i0-Gdaddr 4 daddr.si#_addr.s_addrC !hile:*; I slee0:*;C if :se#dto:s> :char 8;0ac et> siPeof:0ac et;> '> :struct soc addr 8;Qdaddr> :soc le#_t;siPeof:daddr;; U '; 0error:L0ac et se#d errorAL;C K e$it:B?)5_SFCCBSS;C

Ru##i#g them !ill result i# their 0erfect 7alid commu#icatio# si#ce #o o#e sto0s them from usi#g ),,RO5O_RAW:&11; as a# L/ 0rotocol. H."se#d Htc0dum0 -i lo -? -77 *MA/MA&3.'/(/+* ), :tos '$'> ttl 2/> id *23'1> offset '> flags S#o#eT> 0roto u# #o!# :&11;> le#gth 1'; localhost.localdomai# G localhost.localdomai#A i0-0roto-&11 +' '$''''A /1'' ''+& /*/* '''' /'ff +a(a 3f'' '''* B..&AA..@.A..... '$''*'A 3f'' '''* /*/* /*/* /*/* /*/* /*/* /*/* ....AAAAAAAAAAAA '$''&'A /*/* /*/* /*/* /*/* /*/* /*/* /*/* /*/* AAAAAAAAAAAAAAAA '$''+'A /*/* AA *MA/MA&(.'/(/22 ), :tos '$'> ttl 2/> id *23'1> offset '> flags S#o#eT> 0roto u# #o!# :&11;> le#gth 1'; localhost.localdomai# G localhost.localdomai#A i0-0roto-&11 +' '$''''A /1'' ''+& /*/* '''' /'ff +a(a 3f'' '''* B..&AA..@.A..... '$''*'A 3f'' '''* /*/* /*/* /*/* /*/* /*/* /*/* ....AAAAAAAAAAAA '$''&'A /*/* /*/* /*/* /*/* /*/* /*/* /*/* /*/* AAAAAAAAAAAAAAAA '$''+'A /*/* H."rec7 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA <urther o# !ith the ri0_i#0ut:; a#alysisA Li e Li#u$ uses s b_clo#e to ma e a co0y of the datagram> <ree=SD uses m_co0y : mbuf i#stead of s _buff struct ;. Of course> a co0y is made o#ly !he# deemed #ecessary - that is a matchi#g ra! soc et :that is #ot ig#ored for o#e of the reaso#s me#tio#ed abo7e; is fou#d duri#g the for loo0 search. m_co0y:; is 9uite a# e$0e#si7e o0eratio# si#ce it i#cludes actual memory co0yi#g> #ot Rust mo7i#g 0oi#ters arou#d. <or that reaso#> <ree=SD uses a smart !ay to a7oid doi#g this o0eratio# if o#ly o#e a00licatio# #eeds the datagram. 5his is accom0lished !ith the use of the last 7ariable !hich 0oi#ts to the last soc et fou#d to #eed the datagram. <or com0lete#ess6 sa e !e are goi#g to discuss the logicA

24

last first 0oi#ts to .FLL. )f a soc et is co#sidered 7iable to ha7e the curre#t datagram > !hich mea#s it 0assed all the ig#ore-chec s> o# say iteratio# i> the# last !ill 0oi#t to it but the if co#ditio# !ill fail :the first time;. O# iteratio# iN.> !here . is the soc et after soc et i !hich !as also deemed 7iable to recei7e the datagram> the m_co0y:; !ill ta e 0lace si#ce the if co#ditio# !ill #o! be true. 5he# ra!_a00e#d:; !ill be called> !hich !ill be res0o#sible to 0ass the datagram to the a00licatio# by calli#g sba00e#daddr_loc ed:; :see ra!_a00e#d:; code;. 5his mecha#ism !ill co#ti#ue to ha00e# u#til the e#tire ra! soc et list is tra7ersed. 5he if co#ditio# !hich is outside the L)S5_<ORBAC- loo0> is #eeded to 0ass a co0y of the datagram to the last ra! soc et that #eeds it. )f #o soc et #eeds it> the# #o co0y is made. last 4 .FLLC L)S5_<ORBAC-:i#0> Qri0cb> i#0_list; I "8 !hich-soc ets-to-ig#ore logic 8" "8 ... 8" if :last; I struct mbuf 8#C # 4 m_co0y:m> '> :i#t;%_CO,XALL;C if :# V4 .FLL; :7oid; ra!_a00e#d:last> i0> #;C "8 ??? cou#t dro00ed 0ac et 8" ).,_F.LOCK:last;C

K last 4 i#0C

if :last V4 .FLL; I if :ra!_a00e#d:last> i0> m; V4 '; i0stat.i0s_deli7ered--C ).,_F.LOCK:last;C K else I m_freem:m;C i0stat.i0s_#o0rotoNNC i0stat.i0s_deli7ered--C K ).,_).<O_RF.LOCK:Qri0cbi#fo;C .ote that Li#u$ has to al!ays ma e at least o#e co0y e7e# if o#ly o#e ra! soc et #eeds the datagram. 5his ha00e#s because as !e said before> Li#u$ uses the ra! soc et mecha#ism as a ma#-i#-the-middle> mea#i#g that it may ha7e to 0ass the datagram to a #ormal a00licatio# alo#g !ith a ra!-soc ets-usi#g o#e. <ree=SD 0asses the !hole ), datagram :alo#g !ith the ), header; to the a00licatio# li e Li#u$.

'$1. ra! out0ut 444444444444444 -a7i#g a#alysed ho! i#comi#g ra! datagrams are 0rocessed> !e are ready to mo7e o# !ith the details of the ra! out0ut mecha#ism. What !e are goi#g to i#s0ect here is !hat 7alues of the ), header are !ritte# by default by each er#el a#d for !hich o#es !e ha7e the res0o#sibility to fill i# our o!#. 5his of course o#ly a00lies i# case the ),_-DR).CL o0tio# is o# :either set by us or the er#el by default SLi#u$ - ),,RO5O_RAWT;.

25

a. Li#u$ 88888888 5he mai# ra! soc et out0ut ha#dler is ra!_se#dmsg:; !hich is defi#ed i# "usr"src"li#u$.&.28"#et"i07/"ra!.cA static i#t ra!_se#dmsg:struct iocb 8iocb> struct soc 8s > struct msghdr 8msg> siPe_t le#; I struct i#et_soc 8i#et 4 i#et_s :s ;C struct i0cm_coo ie i0cC struct rtable 8rt 4 .FLLC "8 ... 8" if :i#et-Ghdri#cl; err 4 ra!_se#d_hdri#c:s > msg-Gmsg_io7> le#> rt> msg-Gmsg_flags;C "8 ... 8" K )f ),_-DR).CL has bee# set the# ra!_se#d_hdri#c:; is called :defi#ed i# the same file;A static i#t ra!_se#d_hdri#c:struct soc 8s > 7oid 8from> siPe_t le#gth> struct rtable 8rt> u#sig#ed i#t flags; I "8 ... 8" if :le#gth G rt-Gu.dst.de7-Gmtu; I i0_local_error:s > B%SES)WB> rt-Grt_dst> i#et-Gd0ort> rt-Gu.dst.de7-Gmtu;C retur# -B%SES)WBC K "8 We do#6t modify i#7alid header 8" i0hle# 4 i0h-Gihl 8 /C if :i0hle# G4 siPeof:8i0h; QQ i0hle# U4 le#gth; I if :Vi0h-Gsaddr; i0h-Gsaddr 4 rt-Grt_srcC i0h-Gchec 4 'C i0h-Gtot_le# 4 hto#s:le#gth;C if :Vi0h-Gid; i0_select_ide#t:i0h> Qrt-Gu.dst> .FLL;C i0h-Gchec 4 i0_fast_csum::u#sig#ed char 8;i0h> i0h-Gihl;C K if :i0h-G0rotocol 44 ),,RO5O_)C%,; icm0_out_cou#t:::struct icm0hdr 8; s b_tra#s0ort_header:s b;;-Gty0e;C err 4 .<_-OOK:,<_).B5> .<_),_LOCAL_OF5> s b> .FLL> rt-Gu.dst.de7> dst_out0ut;C K "8 ... 8"

5he code here is fairly ob7ious. )f the a00licatio# has#6t defi#ed the ), source address the# it is filled i#. 5he same goes for the ), ide#tificatio# #umber. 5he t!o fields that are al!ays filled i# areA the ), chec sum :ho0efully for us - it sa7es us the trouble; a#d the total le#gth> i0h-Gtot_le#> of the datagram :#o corru0tio# let here;. 5he abo7e are co#firmed by ma# 3 ra!A

26

N---------------------------------------------------N O), -eader fields modified o# se#di#g by ),_-DR).CL O N----------------------N----------------------------N O), Chec sum OAl!ays filled i#. O N----------------------N----------------------------N OSource Address O<illed i# !he# Pero. O N----------------------N----------------------------N O,ac et )d O<illed i# !he# Pero. O N----------------------N----------------------------N O5otal Le#gth OAl!ays filled i#. O N----------------------N----------------------------N .otice that i# the e#d> the fu#ctio# dst_out0ut:; is called a#d the ), layer is#6t i#7ol7ed at all. 5his has o#e #egati7e effect i# result :although i# 0erforma#ce it6s better;A #o ), fragme#tatio# !ill ta e 0lace if #eeded. 5his mea#s that a ra! 0ac et larger tha# the %5F of the i#terface !ill 0robably be discarded. )#stead i0_local_error:;> !hich does ge#eral s _buff clea#i#g> is called a#d a# error B%SES)WB is retur#ed. O# the other ha#d> #ormal ra! soc et fragme#tatio# ta es 0lace !he# !e do #ot i#clude our o!# ), header.

b. <ree=SD 8888888888 Ra! out0ut o# <ree=SD is ha#dled by ri0_out0ut:;. A chai# of are called before !e reach ri0_out0ut:; ho!e7er. er#el fu#ctio#s

<irst sose#d_ge#eric:; !hich is defi#ed i# "usr"src"sys" er#"ui0c_soc et.c accesses the 0r_usrre9s :user re9uest; struct a#d issues call li e thisA 8so-Gso_0roto-G0r_usrre9s-G0ru_se#d:...; !here so is ty0e struct soc et 8so. so_0roto is a struct 0rotos! 8 0oi#ter a#d 0oi#ts to the 0rotocol of the soc et. 0r_usrre9s is a struct !hose members are 0oi#ters to fu#ctio#s !hich are used by the 0rotocol to ser7ice re9uests comi#g from the soc et layer :usually issued by system calls o# the a00licatio# le7el;. <or the RAW 0rotocol 0r_usrre9s is called ri0_usrre9s a#d is defi#ed i# "usr"src"sys" er#"ra!_i0.c Rust before the e#d of the fileA struct 0r_usrre9s ri0_usrre9s .0ru_abort 4 .0ru_attach 4 .0ru_bi#d 4 .0ru_co##ect 4 .0ru_co#trol 4 .0ru_detach 4 .0ru_disco##ect 4 .0ru_0eeraddr 4 .0ru_se#d 4 .0ru_shutdo!# 4 .0ru_soc addr 4 .0ru_sosetlabel 4 KC 4 I ri0_abort> ri0_attach> ri0_bi#d> ri0_co##ect> i#_co#trol> ri0_detach> ri0_disco##ect> i#_get0eeraddr> ri0_se#d> ri0_shutdo!#> i#_getsoc addr> i#_0cbsosetlabel>

5his mea#s that calli#g 8so-Gso_0roto-G0r_usrre9s-G0ru_se#d:...; o# a ra! soc et> !ill call ri0_se#d:; !hich is defi#ed i# ra!_i0.c a#d calls ri0_out0ut:; !ith !hich !e are goi#g to occu0y oursel7es #o!.

27

static i#t ri0_se#d:struct soc et 8so> i#t flags> struct mbuf 8m> struct soc addr 8#am> struct mbuf 8co#trol> struct thread 8td; I struct i#0cb 8i#0C u_lo#g dstC "8 ... 8" retur# ri0_out0ut:m> so> dst;C

ri0_out0ut:; !ill either 0artially co#struct the ), header before ha#di#g it to the ), layer> or i# case !e ha7e set the ),_-DR).CL o0tio# mar the 0ac et as #ot-to-be-altered by setti#g the ),_RAWOF5,F5 flag. )# the e#d> it calls i0_out0ut:; a#d rests i# 0eace:ri0; for the mome#t. "8 8 Ee#erate ), header a#d 0ass 0ac et to i0_out0ut. 8 5ac o# o0tio#s user may ha7e setu0 !ith co#trol call. 8" i#t ri0_out0ut:struct mbuf 8m> struct soc et 8so> u_lo#g dst; I struct i0 8i0C i#t errorC struct i#0cb 8i#0 4 sotoi#0cb:so;C i#t flags 4 ::so-Gso_o0tio#s Q SO_DO.5ROF5B; @ ),_ROF5B5O)< A '; O ),_ALLOW=ROADCAS5C "8 8 )f the user ha#ded us a com0lete ), 0ac et> use it. 8 Other!ise> allocate a# mbuf for a header a#d fill it i#. 8" if ::i#0-Gi#0_flags Q ).,_-DR).CL; 44 '; I if :m-Gm_0 thdr.le# N siPeof:struct i0; G ),_%A?,ACKB5; I m_freem:m;C retur#:B%SES)WB;C K %_,RB,B.D:m> siPeof:struct i0;> %_DO.5WA)5;C if :m 44 .FLL; retur#:B.O=F<S;C ).,_LOCK:i#0;C i0 4 mtod:m> struct i0 8;C i0-Gi0_tos 4 i#0-Gi#0_i0_tosC if :i#0-Gi#0_flags Q ).,_DO.5<RAE; i0-Gi0_off 4 ),_D<C else i0-Gi0_off 4 'C i0-Gi0_0 4 i#0-Gi#0_i0_0C "8 !rite ??? 0rotocol 7alue me#tio#ed i# soc et:A<_).B5> SOCK_RAW > ???; 8" i0-Gi0_le# 4 m-Gm_0 thdr.le#C if :Railed:i#0-Gi#0_soc et-Gso_cred;; i0-Gi0_src.s_addr 4 hto#l:0riso#_geti0:i#0-Gi#0_soc et-Gso_cred;;C else i0-Gi0_src 4 i#0-Gi#0_laddrC i0-Gi0_dst.s_addr 4 dstC i0-Gi0_ttl 4 i#0-Gi#0_i0_ttlC K else I "8 chec s 8" "8 ... 8" "8 do#6t allo! both user s0ecified a#d setsoc o0t o0tio#s> a#d do#6t allo! 0ac et le#gth siPes that !ill crash 8"

28

if :::i0-Gi0_hl V4 :siPeof :8i0; GG &;; QQ i#0-Gi#0_o0tio#s; OO :i0-Gi0_le# G m-Gm_0 thdr.le#; OO :i0-Gi0_le# U :i0-Gi0_hl UU &;;; I ).,_F.LOCK:i#0;C m_freem:m;C retur# B).DALC K if :i0-Gi0_id 44 '; i0-Gi0_id 4 i0_#e!id:;C "8 ??? 0re7e#t i0_out0ut from o7er!riti#g header fields 8" flags O4 ),_RAWOF5,F5C i0stat.i0s_ra!outNNC K "8 ... 8" error 4 i0_out0ut:m> i#0-Gi#0_o0tio#s> .FLL> flags> i#0-Gi#0_mo0tio#s> i#0;C ).,_F.LOCK:i#0;C retur# errorC K As far as the case !he# the er#el 0re0e#ds its o!# header> it is !orth our time to me#tio# that the ,rotocol field of the ), header is assig#ed the 7alue me#tio#ed i# the soc et:A<_).B5> SOCK_RAW> ???; a#d 8#ot8 ),,RO5O_RAW e7e# if the default_RAW e#try is actually its ha#dler. 5he case !hich i#terests us is the o#e sho!# belo! the else co#ditio#> :case !here user ha#ded com0lete ), 0ac et header i#cluded;. Well> i# the old days a 0ac et !ith a big le#gth could 0ote#tially crash some thi#gs do!# si#ce the chec s o# i0_le# :!hich is the total le#gth of the datagram; !ere#6t there. A#other i#teresti#g thi#g here is that the er#el !ill use a ra#dom 7alue to set the i0_id :ide#tificatio# #umber - used i# fragme#tatio#; if it has bee# left !ith a ' 7alue. .e$t thi#g is the flag ),_RAWOF5,F5 !hich !e me#tio#ed Rust earlier. We are goi#g to see ho! this is goi#g to i#flue#ce i0_out0ut:; !hich is defi#ed i# the same #ame .c file i# "usr"src"sys"#eti#et"i0_out0ut.cA i#t i0_out0ut:struct mbuf 8m> struct mbuf 8o0t> struct route 8ro> i#t flags> struct i0_mo0tio#s 8imo> struct i#0cb 8i#0; I "8 ... 8" if ::flags Q :),_<ORWARD).EO),_RAWOF5,F5;; 44 '; I i0-Gi0_7 4 ),DBRS)O.C i0-Gi0_hl 4 hle# GG &C i0-Gi0_id 4 i0_#e!id:;C i0stat.i0s_localoutNNC K else I hle# 4 i0-Gi0_hl UU &C K "8 ... 8" "8 8 )f small e#ough for i#terface> or the i#terface !ill ta e 8 care of the fragme#tatio# for us> !e ca# Rust se#d directly. 8" "8 ... 8" i0-Gi0_sum 4 'C if :s!_csum Q CSF%_DBLAX_),; i0-Gi0_sum 4 i#_c sum:m> hle#;C

29

"8 ... 8" "8 8 )f the source address is #ot s0ecified yet> use the address 8 of the outoi#g i#terface. 8" if :i0-Gi0_src.s_addr 44 ).ADDR_A.X; I "8 )#terface may ha7e #o addresses. 8" if :ia V4 .FLL; I i0-Gi0_src 4 )A_S).:ia;-Gsi#_addrC K K "8 ... 8"

As !e ca# easily see here if the 0ac et is mar ed !ith the flag ),_RAWOF5,F5 the# :#early; #o field is ta e# care by the er#el. 8O#ly8 the ), chec sum is al!ays com0uted for us. )# additio# to that> if the a00licatio# has#6t s0ecified a#y source i0 address> the# the address of the outgoi#g i#terface is used. )# ge#eral> the user has full co#trol o7er the ), header. After i0_out0ut:; fi#ishes 0rocessi#g the 0ac et> it is ha#ded out to the more lo! le7el fu#ctio# if_out0ut:; !hose i#ter#als is beyo#d the sco0e of this te$t. )# co#trast !ith Li#u$> ra! 0ac ets o# <ree=SD !ill al!ays :e7e# if !e i#clude our o!# ), header that is; be fragme#ted if #eeded> gi7i#g the user more fle$ibility o# crafti#g large 0ac ets.

'$2. Summary 444444444444 5he research results of the curre#t 0a0er ca# be summariPed i# the follo!i#g 0oi#tsA a. Li#u$ 88888888 soc et:A<_).B5> SOCK_RAW> ';C --G B,,RO5O.OSF,,OR5 ----------------------------soc et:A<_).B5> SOCK_RAW> ),,RO5O_RAW;C --------------------------------------O O O O -----------------O i#0ut O O out0ut O -----------------o#ly datagrams ),_-DR).CL A 0rotocolA :user s0ecified; !ith a 0rocotol :it is by default set; fieldA &11 :),,RO5O_RAW; soc et:A<_).B5> SOCK_RAW> ???;C ------------------------------O O O O -----------------O i#0ut O O out0ut O -----------------o#ly datagrams ),_-DR).CL A 0rotocolA :user s0ecified; !ith a 0rocotol V),_-DR).CL A 0rotocolA ??? fieldA ??? :!hate7er that may be;

30

b. <ree=SD 8888888888 soc et:A<_).B5> SOCK_RAW> ';C --------------------------------------O O O O -----------------O i#0ut O O out0ut O -----------------all ra! datagrams ),_-DR).CL A 0rotocolA :user s0ecified; V),_-DR).CL A '

soc et:A<_).B5> SOCK_RAW> ),,RO5O_RAW;C --------------------------------------O O O O -----------------O i#0ut O O out0ut O -----------------o#ly datagrams ),_-DR).CLA 0rotocolA :user s0ecified; !ith a 0rocotol V),_-DR).CL A ),,RO5O_RAW fieldA &11 :),,RO5O_RAW; soc et:A<_).B5> SOCK_RAW> ???;C ------------------------------O O O O -----------------O i#0ut O O out0ut O -----------------o#ly datagrams ),_-DR).CL A 0rotocolA :user s0ecified; !ith a 0rocotol V),_-DR).CL A 0rotocolA ??? fieldA ??? a#d o#ly 8u#registered8 0rotocols

'$3. Co#clusio# 444444444444444 )t is beyo#d doubt that SOCK_RAW is a 0o!erful soc et ty0e that deser7es a#yo#e6s> that is seriously i#terested i# lo! le7el #et!or 0rogrammi#g> atte#tio#. What !e ha7e discussed i# this 0a0er is Rust a glim0se at the com0le$ #et!or i#ter#als that ta e 0lace i# t!o of the most 0o0ular #et!or stac s !ith regard to ra! soc ets. 5he ra! soc ets mecha#ism im0leme#tatio# may sou#d com0le$ but ca# actually 0ro7ide a good starti#g 0oi#t for a#yo#e !ishi#g to del7e i#to the details of !hat com0rise of today6s i#ter#et6s fou#datio#s. ) ho0e this te$t !ill 0ro7ide a com0rehe#sible guide o# it6s o!#> but ) stro#gly ad7ice a#yo#e !ho has a ee# i#terest o# this stuff to get his ha#ds o# all the boo s me#tio#ed i# Refere#ces. 5hey ha7e bee# 0ro7e# i#7aluable a#d e$0lai# i# detail ma#y of the thi#gs that someo#e usually ta es for gra#ted. <or the future> ) ho0e to !rite a more security orie#ted te$t that goes far beyo#d the basic stuff discussed here a#d is goi#g to deal !ith some of the u#commo# e#d cases that lur i# the cor#ers of the #et i#t sources.

31

'$(. Refere#ces 444444444444444 *. 5C,"), )llustrated Dol.& 5he )m0leme#tatio# &. F#i$ .et!or ,rogrammi#g - 5he Soc ets .et!or i#g A,) +. F#dersta#di#g Li#u$ .et!or )#ter#als /. er#el sources of Li#u$.&.2.&/ a#d <ree=SD-3.'

32

You might also like