You are on page 1of 17

Discovering Bitcoin’s Public Topology and Influential Nodes

Andrew Miller∗ James Litton∗ Andrew Pachulski∗ Neal Gupta∗ Dave Levin∗
Neil Spring∗ Bobby Bhattacharjee ∗

Abstract can be parlayed directly into gains in coins mined [10].


The Bitcoin network relies on peer-to-peer broadcast Furthermore, a broadcast advantage enables an attacker
to distribute pending transactions and confirmed blocks. to pull off forms of double-spending attacks against fast-
The topology over which this broadcast is distributed af- payments processors [14].
fects which nodes have advantages and whether some at- In this paper, we describe mechanisms to discover two
tacks are feasible. As such, it is particularly important features of Bitcoin’s topological structure: first, we map
to understand not just which nodes participate in the Bit- the public topology consisting of the edges that comprise
coin network, but how they are connected. the peer-to-peer network. Next, within the discovered
In this paper, we introduce AddressProbe, a technique topology, we find “influential” nodes that appear to di-
that discovers peer-to-peer links in Bitcoin, and apply rectly interface with a hidden topology that consists of
this to the live topology. To support AddressProbe and mining pools that are otherwise not connected to the pub-
other tools, we develop CoinScope, an infrastructure to lic Bitcoin network. (We are unable to map the hidden
manage short, but large-scale experiments in Bitcoin. We intra-pool topology, since these are private networks op-
analyze the measured topology to discover both high- erating using potentially proprietary protocols.)
degree nodes and a well connected giant component. Yet, To map the public topology, we introduce a probing
efficient propagation over the Bitcoin backbone does not mechanism called AddressProbe that efficiently discov-
necessarily result in a transaction being accepted into the ers peer links. Using AddressProbe, we can efficiently
block chain. We introduce a “decloaking” method to find map the connectable Bitcoin network, i.e., AddressProbe
influential nodes in the topology that are well connected will find links between x and y iff x and y are con-
to a mining pool. Our results find that in contrast to Bit- nected, and permit incoming connections. AddressProbe
coin’s idealized vision of spreading mining responsibil- can also find links made by non-connectable nodes (e.g.,
ity to each node, mining pools are prevalent and hidden: nodes that are behind a NAT), as long as such a node ini-
roughly 2% of the (influential) nodes represent three- tiates connections to a probe host. Our techniques also
quarters of the mining power. identify a set of artificially high-degree nodes that at-
tempt to connect to many peers, potentially to reduce la-
1 Introduction tency in learning about and propagating new blocks and
Bitcoin communication is built upon peer-to-peer broad- transactions. We map these nodes to various services,
cast, which carries transactions, blocks, and other global including mining pools and wallet services.
protocol state. Through broadcast, Bitcoin achieves Although AddressProbe is able to map the entire con-
eventual consistency: information about all transactions nectable Bitcoin network within minutes, discovering
and blocks is relayed to all peers, and despite inconsis- the public topology alone is insufficient to account for
tent ordering and partial incompleteness, all honest peers the Bitcoin ecosystem’s mining power. The original
eventually “agree” on a globally consistent state of com- Bitcoin paper [22] proposed a democratic world-view
mitted transactions and blocks. (“one-cpu-one-vote”) where peers participating in the
The underlying peer topology over which protocol broadcast would also mine for new coins. As Bitcoin
messages are exchanged is of critical importance: broad- has become popular and financially relevant, coin min-
cast over this topology is the only mechanism by which ing has become the domain of specialized miners op-
peers learn and inform each other of transactions and erating special-purpose hardware around the world or-
blocks. Understanding how information propagates ganized into “mining pools.” Miners often do not con-
throughout the Bitcoin ecosystem is therefore integral to nect directly to the Bitcoin broadcast network. Instead,
being able to reason about attacks and manipulation in pool operators deploy gateway hosts that transfer trans-
the Bitcoin network. For instance, initial studies have actions and blocks between the Bitcoin network and pool
revealed that “advantages” within the broadcast network members. Pools may choose to remain clandestine about
∗ University
of Maryland, College Park. {amiller, litton, their gateway because they may be targeted by attackers,
ajp, ngupta, dml, nspring, bobby}@cs.umd.edu such as other competing pools [13, 34], and it is not clear

1
how or where these mining pools connect to the Bitcoin coin nodes in Section 5, and analyze the most influential
broadcast network. nodes in Section 6. Finally, we conclude in Section 7.
Furthermore, because of the well-known attacks on
Bitcoin if a single principal gains more than 50% of 2 Background and Related Work
the mining power, large pools may prefer to disguise One of Bitcoin’s salient features, especially in contrast
some of their power. In fact, while a malicious princi- to digital currencies that have preceded it (e.g., Digi-
pal with a majority of the hash power can subvert Bit- Cash [32]), is that it runs on a decentralized peer-to-peer
coin’s most basic security goals (i.e., revise the transac- network and has no formally designated administrators.
tion history arbitrarily), prior work has shown that even Instead, participation in Bitcoin is open, and the network
a third is sufficient to unbalance the incentive structure largely self-organizes. Most of the novel aspects of Bit-
and reward scheme [10]. Thus, large enough pools may coin’s design, and indeed its peculiarities, arise from the
choose to mine anonymously, hiding their true mining challenge of converging on a global and coherent state.
power by paying out to different keys, without disclos- In this section we provide a brief introduction to the
ing their gateway(s). basic operation of the Bitcoin network, with a focus on
A primary contribution of our work is uncovering in- how it achieves globally consistent state (for more gen-
fluential nodes within the public topology that provide eral information on Bitcoin, please see [5]). We also dis-
connectivity to mining pools. In particular, we find that cuss the role that Bitcoin’s peer-to-peer broadcast plays
efficient propagation over the Bitcoin network does not in ensuring consistency and fairness. Finally, we close
necessarily result in a transaction being accepted into this section by reviewing related work on measuring and
the block chain or a block being extended. Instead, there analyzing Bitcoin’s broadcast topology.
are a few (approx. 100) hitherto unidentified nodes that
act as “front-ends” to mining pools, and it is far more
2.1 The Bitcoin protocol
important that these nodes receive a transaction or block The Bitcoin protocol is built around the creation and dis-
more efficiently than others. We introduce “decloaking” semination of a public global log of the state of all bit-
techniques to identify these influential nodes, and ana- coins in the system. Each entry in this log is a trans-
lyze how these nodes map to different mining pools. Our action, which represents the transfer of virtual currency
analysis cannot reveal if an influential node is merely from one “account” to another. Transactions consist of
well connected to a pool or is a gateway run by the pool inputs and outputs: a transaction “spends” a set of trans-
operator; instead, our techniques allow us to map specific action inputs and “creates” a set of transaction outputs.
nodes to blocks that are claimed by a different pools. In general, each transaction input contains a reference to
In summary, our contributions are as follows: (i.e., the hash of) an output of a previous transaction—
in this sense, the log is append-only, and extending it
• We introduce the AddressProbe technique, which can requires access to the latest log entry. Each transaction
find the connectable Bitcoin topology, within minutes, output contains a value representing the quantity of bit-
and can discover other peers over time. Using Ad- coin currency, as well as information describing which
dressProbe, we show that the deployed Bitcoin topol- user “owns” those coins. Transactions are structured as
ogy is not a random graph. a directed graph, which facilitates maintaining invariants
• We introduce decloaking techniques to find influential about the transaction log, (for example, that users cannot
nodes that skew broadcast fairness. First, a random- spend more money than they have).
ized technique efficiently finds candidate influential The protocol guarantees each transaction output can
nodes, and a related technique validates these candi- be spent by at most one subsequent transaction. Con-
dates. flicting transactions are a pair of distinct transactions that
spend a common transaction input. A valid transaction
• All of our measurements use a new software infras- is one that has valid signatures for each of the transac-
tructure, CoinScope. This paper presents results of tion inputs; additionally, the sum of the values of the out-
running AddressProbe and decloaking using Coin- puts must not exceed the sum of inputs (the difference is
Scope over the live Bitcoin network. treated as a transaction processing fee, as we’ll describe
shortly). The central tenet behind Bitcoin is that if ev-
The rest of the paper is structured as follows: In Sec- eryone has access to the log of transactions, then anyone
tion 2, we present a background on the pertinent details can verify who has the right to spend what bitcoins.
of Bitcoin, along with related work on mapping the Bit-
coin network. Section 3 introduces the design and im- 2.2 The role of broadcast in Bitcoin
plementation of our AddressProbe technique, which we The Bitcoin protocol’s primary goal is to converge on
apply in Section 4 to analyze the live Bitcoin network. an eventually consistent sequential log of transactions.
We present our techniques for finding influential Bit- For the system to succeed, users must be able to submit

2
transactions for timely inclusion in the directed transac- 2.3 Miners and mining pools
tion graph, and the network should converge quickly to a A novel and unusual aspect of Bitcoin’s design is the
single valid (prefix of a) graph. If an attacker could pre- “mining” mechanism by which the network robustly
vent transactions from entering this graph or delay agree- reaches global agreement on the set of committed trans-
ment, it would deny service to users. Alternatively, if an actions. Some nodes on the network, called “miners,” opt
attacker could revert an agreed-upon graph, he could ef- in to attempt to solve proof-of-work puzzles [2]. As these
fectively steal funds by double-spending. proofs of work are based on finding inputs to hashes that
Bitcoin achieves this consistency through the use of yield digests with many leading zero bits, they are solv-
a peer-to-peer broadcast topology. Unfortunately, little able only through brute force. The difficulty of the puz-
is standardized about how exactly this broadcast oper- zle is set so that on average, some miner on the network
ates beyond the format of the various messages; we de- finds a solution every 10 minutes. Each puzzle identifier
scribe here what the reference implementation (“Satoshi contains the hash of a previous puzzle solution as well as
client”) does.1 Every Bitcoin peer maintains a database, a new batch of transactions to append to the log, together
called the addrMan, of peers it has heard about. A peer called a “block”; the proof-of-work solutions therefore
first learns about a set of peers by contacting bootstrap form a “blockchain”. Miners always work to extend the
DNS nodes; peers subsequently exchange addrMan in- longest such blockchain they know of.2 Upon producing
formation with one another to learn about new peers. a block, the miner propagates it to the rest of the network
Every Bitcoin peer initiates a connection to up to eight using the same broadcast mechanism as that for transac-
others, and maintains a maximum of 125 total connec- tions. Honest miners who receive this block accept it and
tions (incoming and outgoing), rejecting any connection begin working to append it.
request that would push it beyond this capacity. Its total Although it is possible during ordinary operation for
connections constitute that peer’s neighbors. two different puzzle solutions extending equal-length
chains to be found at approximately the same time,
Bitcoin’s peer-to-peer broadcast is based on flooding this happens infrequently since the average time be-
neighbors’ links in a gossip-like manner. At a high level, tween blocks is relatively slow compared to network la-
when a peer learns of a new transaction or block, it in- tency; [9, 19] any such event is quickly resolved when
forms its neighbors by sending a INV message containing one “fork” or the other gets extended and pulls ahead.
the item’s hash to each of its neighbors. In response, if Mining is expensive; participation is encouraged
a given neighbor does not yet have that item, it requests through use of an incentive mechanism. Upon produc-
it with a GETDATA message. The original peer responds ing a valid block, the winning miner is rewarded with
with a TX or BLOCK message containing the relevant data. bitcoins in two forms: first, any difference between the
Finally, because this new neighbor has learned about a input value and output value of a transaction included in
new transaction or block, the entire process continues re- the block is rewarded as a “fee”; second, a “block cre-
cursively until all reachable peers receive it. ation bonus” of newly minted coins. A miner claims
This ad hoc broadcast protocol forms the entire ba- these rewards by including a single transaction with no
sis of Bitcoin’s global, eventually consistent log, and is inputs, called the “coinbase” transaction.
therefore of utmost importance to its correct and fair op- These rewards also provide incentives for other, in-
eration. If a data item does not spread throughout the teresting behaviors. In particular, miners have incentive
network quickly then the system risks reaching an incon- to garner as much computational power as possible; the
sistent state. Moreover, if a peer were somehow able to more CPUs they have, the greater then chance they will
have their messages spread more quickly than others’, it be able to extend the block before anyone else. Addition-
could help that peer gain disproportionate profits from ally, miners have incentive to collude by pooling their re-
deviating from the protocol [10]. Bitcoin’s network for- sources together and splitting the profits rather than com-
mation procedure is intended to induce a random graph peting for them. This has led to so-called mining pools,
topology that should propagate information efficiently; who recruit other miners to collude.
however, as the topology has not been studied, it is un- On the one hand, mining pools require some degree
known whether this ideal is actually attained. Thus a of exposure (being able to advertise high win rates can
quantitative, thorough measurement and analysis of the be a useful tool for recruiting other members). However,
Bitcoin peer-to-peer network is of critical importance to they also have incentive not to appear to have grown too
understanding and evaluating its properties. large: if any one entity approached a majority of the min-
ing power, they could possibly prevent the rest of the net-
work from globally converging on a growing transaction
1 Other Bitcoin clients and variants appear to adopt the same proto- 2 More accurately, the blockchain with the greatest cumulative puz-
col details. zle difficulty. These can differ when the difficulty is adjusted.

3
log [11, 19, 22]. It is therefore of critical importance to theoretic analyses of Bitcoin mining coalitions [10, 16]
develop the ability to investigate the true extent of mining but, to our knowledge, no systematic empirical analy-
pools’ collusion. sis that identifies mining entities. However, speculation
about miner activities abounds [23], and we believe our
2.4 Related Work work (Section 5) provides the first systematic mechanism
Previous empirical studies have examined facets of Bit- to locate nodes correlated with mining pools in the wild.
coin’s surrounding ecosystem, such as online currency
exchanges (and their tendency to collapse) [21] and il- 3 Mapping the Broadcast Topology
licit marketplaces [8, 18], and botnets that supplement In this section we describe an efficient method for dis-
their income through Bitcoin mining [12, 26]. A ma- covering peer links in the Bitcoin network. We validate
jor focus has been on evaluating user privacy. While our approach, AddressProbe, using ground-truth data,
users can interact with Bitcoin using only pseudonyms, and present an analysis of the broadcast topology.
it has been demonstrated that Bitcoin transactions can be
linked across pseudonyms to effectively “deanonymize” 3.1 Using timestamps to infer links
users [6, 15, 18, 24, 28, 30]. Recall that new Bitcoin nodes find initial network peers
Relatively less work has examined the structure of the by querying a set of hard-coded DNS servers. The DNS
Bitcoin communication network itself. Decker et al. [9] servers provide joining nodes with their initial peer list to
measured the rate of information propagation through- try to connect to. Each node maintains a local database
out the network, and proposed modifications to accel- called the addrMan, which it tries to populate with the ad-
erate it. Babaioff et al. [1] pointed out that peers have dresses of other peers. Nodes exchange address informa-
incentives not to participate in broadcast, but this could tion using two protocol messages: GETADDR and ADDR.
be remedied by paying fees to relaying peers. The ef- GETADDR messages are requests and ADDR messages are
fectiveness of the broadcast mechanism is essential to replies that contain address information.
Bitcoin’s operation for two main reasons. First, it deter- Upon initiating a new connection a Bitcoin node (say
mines the potential security of “fast payments,” by which x) requests address information by issuing a GETADDR re-
transactions are accepted based on their apparent propa- quest (say to a peer y). Node y will reply with up to 1000
gation through the network, even before being ratified entries from its addrMan database, chosen uniformly at
through inclusion in proof-of-work blocks [4, 14]. Sec- random.3 For each chosen entry, the reply ADDR message
ond, an advantage in the broadcast network can be lever- includes the (IP address, port) pair, a timestamp, and a
aged by a non-compliant miner to gain disproportionate list of services offered. The vast majority of entries in
rewards from mining [3, 10]. Prior work had focused addrMan do not correspond to active connections but to
on strategies non-compliant miners could use to gain re- addresses that the node has learned from ADDR messages
ward if they had a broadcast advantage. Our work studies of others. Bitcoin includes a somewhat unintuitive way
the underlying network topology, and shows that indeed, of updating the timestamp corresponding to an address,
substantial broadcast advantage can be gained on the de- which we exploit in formulating the AddressProbe tech-
ployed network. nique. (Appendix A gives a more detailed walkthrough
Our AddressProbe technique is directly related to pre- of the actual code).
vious work on network structure detection and analy- • For outgoing connections, i.e., ones that it initiates, a
sis [20, 27, 29]. Biryukov et al. [6] disclose a technique node updates the timestamp (corresponding to the peer
that also uses Bitcoin address propagation messages to IP address and port in addrMan) each time it receives
detect peer links. Their technique, however, is some- a message from the peer. Therefore, timestamps for
what invasive since it involves polluting each node’s ta- outgoing connections are updated frequently, on the
ble of potential peers with fake entries. In contrast, Ad- order of a few minutes maximum.
dressProbe only gathers and analyzes information that
nodes readily provide, and is better suited to network- • For incoming connections, the timestamp is set to
wide mapping. when the connection was created, but it is not updated
While Bitcoin mining was initially performed us- as the peers exchange messages. Thus, if an incoming
ing commodity computer equipment (i.e., “one-cpu-one- connection is long-lived, the ADDR message does not
vote” [22]), the mining landscape has evolved accord- provide information to distinguish it as live.
ing to two main trends: first, mining pools [31] have al-
• For all other (address, port) pairs that a node learns
lowed users to pool their resources and share the rewards;
of (from ADDR messages sent by others), the node
and second, customized Bitcoin-specific mining hard-
ware have been developed that vastly outperform general 3 This is a simplification that describes the general behavior. For full
purpose computers [33]. There have been several game- details see Algorithm 1 in the appendix

4
“ages” the address by adding a two hour penalty be- B’s ts of A
fore adding the address to its addrMan. Unique Not unique
A’s ts of B ts≥2hr & ts<2hr & ts <2hr
Finally, we note that a node can send unsolicited ADDR @ edge ∃ edge Unclear
ts ≥ 2hr
messages in two cases: first, upon receiving an incoming B→A A 6→ B
connection from n, a node x sends an ADDR message, to a Unique and ∃ edge ∃ edge ∃ edge
randomly chosen peer, containing only x with the times- ts<2hr A→B A↔B A→B
tamp set to the current local time. Nodes keep state about Not unique and Unclear ∃ edge
Unclear
which addresses their neighbors know of (because they ts<2hr B 6→ A B→A
received/sent information about these nodes from/to their Table 1: Connection Inference rules for AddressProbe:
neighbor). When nodes receive a ADDR messages with Timestamps for nodes A and B as reported by repeated
fewer than 10 entries, either as the result of a GETADDR GETADDR requests. Here, A → B denotes that we can infer
response or a new connection, they choose two peers at that A initiated the connection to B.
random and relay the same ADDR information (without
updating any timestamps), as long as the node believes
the neighbors don’t already have this address. Nodes Therefore, by issuing GETADDR messages to all nodes
purge all information about what addresses their neigh- in the network to whom we can connect, and analyzing
bors know every twenty-four hours. Thus when a node the timestamps, we can obtain a map of connections, and
with a hitherto unknown address joins the network, the potentially even when connections are made. By analyz-
relay messages containing the node’s IP address eventu- ing discrete two hour differences in timestamps, we can
ally flood the entire network. Whenever an existing node infer how nodes learn about each other. We summarize
x makes a new connection, its address propagates in the the inferences based on timestamps obtained by issuing
network; how far depends on how many nodes already GETADDR messages in Table 1.
knew about x. Finally, a node will update addrMan upon Unfortunately, the description above simplifies true
receiving an ADDR message with a newer timestamp for behavior in a few ways, which can lead to both false pos-
an address. The two-hour aging penalty is applied to the itive and false negative inferences.
new timestamp learned. False Positives The addrMan database is not updated
We illustrate how timestamps update with a simplified when connections break or peers depart. Hence, a re-
example: consider a scenario where node x initiates a cently (less than two hours) broken outgoing connection
connection to node y at time t. Suppose that node y relays leads to an inferred edge that no longer exists.
the information about the connection to neighbor n, and Simply receiving a “recent” timestamp is not an in-
the randomized protocol further relays information about dication of an active outgoing connection, since relay
the connection to nodes r0 , r1 , . . . . Finally, assume node nodes (ri in our example) also respond with the connec-
l learns about node x from node r0 at a later time (not tion genesis timestamp (t in our example). Thus, for
during the initial relay but as a reply to a GETADDR). new connections, all relay nodes (potentially the entire
network) will respond with a recent and identical times-
• As long as this connection is active, node x will report tamp, which can lead to a false positive diagnosis. This
a recent timestamp for node y. is why Table 1 requires that a timestamp be both recent
and unique in inferring outgoing connections.
• Node y will not update its timestamp t for x, regardless
of how long the connection stays open, unless it hears False Negatives The addrMan structure is finite, and
about x with a more recent timestamp than t − 20m nodes may evict addresses, including those of actively
from a third node. connected peers. Addresses included in replies to
GETADDR messages are chosen randomly, and it is possi-
• Similarly, node y’s neighbor n and the relay nodes ri ble that multiple GETADDR messages may “miss” an ad-
will initially also report the same timestamp t for node dress, possibly of an outgoing connection. Both these
x. If this single relay manages to flood the network, scenarios will cause the inference to miss existing edges.
then all nodes except those with outgoing connections
to x will report timestamp t for x. They may update
3.2 AddressProbe Implementation
their timestamp for x later if they hear of a more recent Our approach to measuring the Bitcoin topology relies
timestamp or initiate a connection to x. on short bursts of message activity to create a snapshot
of the network. Yet, to achieve both a swift measure-
• Suppose node l received timestamp t 0 for node x. Node ment and a wide one requires addressing a few technical
l will report t 0 − 2h, since it will age the timestamp by challenges.
two hours. Any node that learns of x through l will First, connections take time to establish, which means
further age the timestamp by two hours, and so on. that an experimental platform must be long-running. To

5
0.8 0.35
0.7 0.30

Fraction of nodes
Fraction of streaks
0.6 0.25
GT Num. Num. Num. 0.5
0.20
Node true pos. false pos. false neg. 0.4
0.15
0.3
A 15.12 ± 1.84 0.08 ± 0.03 5.02 ± 0.69 0.10
0.2
B 8.29 ± 1.10 0.41 ± 0.17 2.13 ± 0.36 0.1 0.05
C 8.28 ± 1.13 0.29 ± 0.14 2.22 ± 0.37 0.0 0.00
0 5 10 15 20 0 10 20 30 40 50 60 70
D 7.63 ± 2.12 0.02 ± 0.04 2.92 ± 0.95 Number of consecutive FNs Number of ADDRs with 2 or more reports
E 6.52 ± 0.81 0.20 ± 0.13 1.64 ± 0.27
Figure 1: Ground truth validation of Ad- Figure 2: False negatives typi- Figure 3: The vast majority of
dressProbe, using runs spanning October 20– cally do not persist across more ADDR messages containing two
November 7 with five ground-truth nodes. Val- than one or two AddressProbe or more addresses arrive within
ues denote averages with 95% confidence inter- experiments, thus they are due to at most 24 GETADDR queries, thus
vals. under-scraping peers’ addrMan. we rarely under-scrape addrMan.

connect to some hosts requires patience, as they may be To provide both persistent storage and the feedback re-
temporarily “full” of connections and refuse more. In- quired by some CoinScope clients, we apply a logserver
coming connections, such as from hosts behind NATs that marshals tagged messages from the connector to
and firewalls, accumulate slowly, as such hosts must first subscribers. A “verbatim” subscriber collects all events
learn about our node through ADDR advertisements and and writes them to disk for archival. A CoinScope client
then choose to connect to us. Once established, connec- may subscribe to only relevant log messages, for exam-
tions must be kept active to be maintained. ple, to only those associated with ADDR messages to de-
To provide a long-running platform, we isolate typical, termine when to stop requesting from a given host. We
base Bitcoin protocol functionality that accepts, creates, have found that the tagged logserver architecture simpli-
and maintains connections. This forms the CoinScope fies the combined task of archiving data while supporting
“connector”. on-line experiments.
Second, the measurement techniques we apply are di- In sum, the CoinScope architecture permits a wide
verse and rely on wide distribution of messages. For view of the network by maintaining long-lived connec-
example, requests for addresses may be sent to all con- tions; supports concurrent, short-lived experiments by
nected hosts or inventory messages (Section 5) sent to a allowing clients to issue requests to the connector via a
select set. Fortunately, techniques do not always need to control channel; and transparently stores measurement
process responses on-line. results persistently by marshaling all responses through
a logging system.
We design the interface for CoinScope clients so that
they issue commands to the connector using a library that 3.3 Validation using Ground Truth
connects via a control channel, and handle responses in To validate AddressProbe’s accuracy, we operated five
a separate path. Typical commands include requests to “ground-truth nodes” throughout our experiments (from
connect to an address, list connected peers, or to broad- October 20, 2014 to November 7, 2014). Each of our
cast GETADDR messages. Multiple CoinScope clients can ground-truth nodes was a mainline Satoshi client. Ev-
connect to the control channel simultaneously to support ery two minutes, we collected a list of all active connec-
concurrent (non-conflicting) experiments. CoinScope tions each peer has (as reflected in the PeerInfo data
is efficient: it can saturate a 1Gbps network connec- structure). Unfortunately, not all ground-truth nodes re-
tion with Bitcoin protocol messages without significant mained up the entire 18 days; in the results that follow,
CPU overhead. In our experiments, CoinScope is throt- we average only over the experiments when a given node
tled such that messages to the entire network, such as was available.
GETADDR requests, are transmitted over one minute. For the purposes of comparing ground-truth edges to
Third, developing measurement techniques can re- edges inferred from AddressProbe, we distinguish be-
quire substantial reinterpretation of data as a model of tween what we call stable and transient edges. We de-
protocol behavior is refined. For example, our approach fine an edge to be “stable” with respect to a given ex-
to interpreting ADDR messages based on their size has periment if it appears in all PeerInfo snapshots within
evolved. This encourages persistent storage of responses four hours before and four hours after the experiment. If
at the most primitive level—connection events as they AddressProbe fails to detect a stable edge, then we treat
occur and messages as they are received—so that these that as a false negative—it is very likely that the edge
results can be reinterpreted. exists during the experiment if our ground-truth says it

6
was consistently present before and after the experiment.
Similarly, if an edge appears in at least one such snapshot
but not all, then we call it a “transient” edge. Failing to
detect a transient edge is less dire than for a stable edge:
it might not have existed during the experiment. Thus, if
AddressProbe detects a transient edge, we count it as a
true positive, whereas if it fails to detect a transient edge,
we do not count it as a false negative. Finally, we clarify
that we only consider edges between our ground-truth
nodes and other nodes to whom we could connect—in
particular, we do not include NAT’ed nodes.
We present our ground-truth results in Figure 1. Our
false positive rates are extremely low across all ground-
truth nodes: that is, AddressProbe is very unlikely to ever
assert that an edge exists when it does not. A false pos-
itive can occur when a non-unique time-stamp less than
two hours old appears to us to be unique, for instance be-
cause the time-stamp did not propagate far into the net-
work. These data reflect such a case to be rare.
Bitcoin Network Bifubao web
EC2/Linode Unclassified
Next, we turn to the false negative rates in Figure 1, Affiliate Mining Pool wallet service

that is, cases when AddressProbe fails to find an edge.


Figure 7: A snapshot of the (reachable) Bitcoin network
Although higher than its near-zero false positive rates,
discovered by AddressProbe on Nov. 5. The highest de-
AddressProbe’s false negative rates are still consider-
gree nodes (with degrees ranging 90–708) are colored.
ably less than its true positive rate. Broadly speaking,
there are two possible causes for a false negative Ad-
dressProbe: believe this is an example of eviction. Fortunately, while
1. AddressProbe under-scraped a peer’s addrMan and the tail is long, it is also shallow, and thus we conclude
simply did not send enough GETADDR messages to that the root cause of false negatives is under-scraping.
learn about all of the peer’s edges, or We could of course improve AddressProbe’s false neg-
ative rate by simply scraping more. However, Figure 3
2. the peer evicted the edge’s corresponding entry from shows that we quickly reach a point of diminishing re-
its addrMan. This can occur in the mainline Satoshi turns. In this figure, we show how many ADDR mes-
client if its addrMan has too many entries. sages we received that contain two or more addresses (as
Ideally, the more common cause would be under- these are the ADDR messages in response to our GETADDR
scraping: if eviction was common, that would be prob- queries). The x-axis represents how many GETADDR
lematic for AddressProbe, as it would violate our as- queries we sent until a peer stopped sending us ADDR
sumption that if an edge exists, then so does a time- responses with two or more addresses. The concentra-
stamp less than two hours in at least one of the peer’s tion at x = 0 corresponds to the set of nodes who had yet
addrMan. Because peers’ responses to GETADDR choose to populate their addrMan data structures. Most nodes’
randomly from addrMan, we expect that if we were to addrMan were exhausted after sending 16 GETADDR re-
under-scrape, then it would be very unlikely to obtain a quests, with a sharp decline after the 20th GETADDR.
false negative on the same edge for many consecutive ex- Guided by this, we instituted a cut-off at 24 GETADDR
periments. To evaluate this, we plot in Figure 2 how often messages in order to balance between completeness of
AddressProbe obtains the same false negative for multi- results and bandwidth preservation.
ple consecutive experiments. This shows that 70% of To summarize, AddressProbe is effective at accurately
the false negatives are resolved in the subsequent exper- detecting edges, and its threats to validity (predominantly
iment; nearly 90% are resolved within two experiments. eviction) are extremely uncommon. Our main source of
This provides strong evidence that under-scraping is the errors is due to our decision to trade off false negatives
most likely cause for AddressProbe’s false negatives. for increased bandwidth consumption, but future appli-
Figure 2 also shows evidence of eviction. There was cations of AddressProbe need not make this trade-off.
a single edge, for instance, which AddressProbe failed
to detect over 19 consecutive experiments. Because it is 4 The Bitcoin Topology
astronomically unlikely for an edge not to be randomly We begin our analysis of the Bitcoin topology by inves-
chosen from a peer’s addrMan after so many trials, we tigating the topological features of the peer-to-peer net-

7
7000 20

#Nodes probed

#Communities
6500 18
1 6000 16
Min 14
0.9 Mean 5500 12
Max 5000 10
0.8 4500 8
P(degree of node) < d

0.7 4000 6
10/20 10/23 10/26 10/29 11/01 11/04 11/07 10/20 10/23 10/26 10/29 11/01 11/04 11/07
0.6

Size of largest community


Date Time
0.5

Connectedness
0.4 2400
0.32 Measured
2000 0.28
0.3
1600 0.24
0.2 1200 0.2
0.1 800 0.16 Random
0 400 0.12
1 10 100 1000 10000 10/20 10/23 10/26 10/29 11/01 11/04 11/07 10/20 10/23 10/26 10/29 11/01 11/04 11/07
degree (d), log-scale Date Time

Figure 4: Degree distributions from Figure 5: Nodes probed and largest Figure 6: Community connectedness.
AddressProbe runs on 10/20–11/7. community.

work using AddressProbe measurements over a period it permits a maximum of 125 active connections, but
of 18 days. Most of our measurements span four con- we consistently see nodes that far exceed this maximum,
secutive days (10/28–10/31), allowing us to see if there sometimes by a factor of nearly 80. This is not a singu-
are topological changes within a relatively short win- lar phenomenon; we see these high degree nodes across
dow of time. We also performed AddressProbe mea- all runs of AddressProbe. That is, extremely high-degree
surements one week before and one week after this pe- nodes are persistent over time.
riod of time, though less frequently, so as to determine if Benign measurement studies seeking to understand the
these were representative or sensitive to longer-term di- Bitcoin topology could appear to be high-degree nodes.4
urnal patterns. In sum, we collected 133 snapshots of the In an effort to understand the root cause behind these
network using AddressProbe. high degrees, we tried to manually classify all nodes with
We present a representative snapshot of the Bitcoin degree at least 90 in the network snapshot of Figure 7.
topology in Figure 7. Each node is sized proportionally Of the nodes we could classify, over half of them were
to its degree. Two features immediately stand out: First, members of the Bitcoin Affiliate Network mining pool.
a handful of nodes have far greater degree than others in Also, we identify one node from a Bitcoin wallet ser-
the system; in this section, we identify these high-degree vice. These results indicate that the long tail of high de-
nodes. Second, upon visual inspection, this graph ap- gree nodes is indeed not an anomaly of AddressProbe,
pears to be random; we demonstrate in this section, how- but rather an accurate reflection of coordinated efforts to
ever, that it exhibits properties that distinguish it from a measure the network. We also identify many high-degree
random graph. nodes running within cloud providers such as EC2, but
Node degree. Figure 4 shows the degree distributions have thus far been unable to classify them further.
averaged across all our network snapshots. We also plot Graph randomness. Visual inspection of Figure 7
the single minimal and single maximal distributions (as seems to indicate that the Bitcoin network is random. To
determined by the runs’ average degree). This demon- evaluate this more formally, we apply the so-called Lou-
strates that the mean is representative, and that the shape vain community detection algorithm [7] to each snapshot
is upheld across all 2.5 weeks of our experimentation. graph of the Bitcoin network AddressProbe returns. This
On average, the majority of nodes to which we could algorithm returns a set of communities, with the property
connect have degree in the range of 8–12. This is con- that nodes within a given community exhibit greater con-
sistent with the mainline Satoshi client implementation: nectivity than two nodes in different communities. Fig-
unmodified peers seek to maintain eight outgoing con- ure 5 shows that the largest community in the network
nections, and permit incoming connections, as well. Be- typically constitutes roughly 25% of the overall network.
cause this is constrained only to the edges for which For a given community C, let Eintra (C) denote the set
we could connect to both nodes, this result does not in- of intra-community edges (edges connecting two mem-
clude any edges from NAT’ed nodes, and thus nodes are bers within C) and let Eall (C) denote all edges involv-
likely to have greater degrees than is plotted here. How- ing a member of C (inter- and intra-community edges).
ever, recall that peers cannot create outgoing connections For each graph returned by AddressProbe, we com-
to NAT’ed nodes—if AddressProbe is accurate, then it pute what we call the graph’s community connectedness:
should be able to detect all outgoing edges. The fact that the weighted average of communities’ fraction of intra-
the degree distribution is so heavily concentrated near community edges. Concretely, if a graph has N nodes
eight indicates that AddressProbe is indeed accurate at 4 Acknowledging this, AddressProbe does not reply to GETADDR
detecting these outgoing edges. messages like other clients, so we expect that our experiments have
Another feature of the mainline Satoshi client is that not affected others.

8
and M communities (C1 , . . . ,CM ), then its community
Bitcoin Network
connectedness is: Pool Peer

Gateway
...
M
|Ci | |Eintra (Ci )| Pool Peer
∑ · (1) bitcoin rpc protocol
i=1 N |Eall (Ci )|
Pool Server e.g., stratum+tcp://uk1.ghash.io:3333
This definition of connectedness serves as a useful metric
stratum protocol
for determining how well the graph supports fast mix-
ing and dispersion of data. Note that community con- Member ... Member e.g., BFGminer, cgminer
nectedness is maximized with a value of one for graphs
whose communities’ edges are strictly inside the com- Figure 8: A typical mining pool architecture. The gate-
munity, that is, the graph would be fragmented. Broadly way and pool servers are administered by the pool op-
speaking, low values of connectedness indicate a health- erator. A pool server generally has a publicly known ad-
ier Bitcoin network. dress, while the gateway does not. A gateway may accept
Figure 6 shows the community connectedness (bot- inbound connections.
tom) and overall number of communities (top) across all
133 snapshots of the Bitcoin network we obtained with
2. The Bitcoin graph is, for the most part, consistently
AddressProbe. Note that the community connectedness
well connected, but it exhibits properties that distin-
remains within a relatively small bound over these 18
guish it from a truly random graph.
days, tightly centered around 0.24. This low variance
indicates that there are few major changes to the inherent In small doses, these topological abnormalities are not
structure of Bitcoin’s network over time. detrimental to Bitcoin’s operation. However, it is im-
We use connectedness to answer the question of portant to note that the Bitcoin protocol does nothing
whether the Bitcoin network is truly a random graph. to keep them from happening. Malicious or misguided
To lend perspective to the raw values of connectedness, behavior could drive the network to bad states, such
we also generated random graphs with the same vertex as worse connectedness or, worse yet, a disconnected
and edge counts of each 133 networks and computed network. Regularly running AddressProbe to collect
their connectedness. We do not model the high-degree network-wide topology information can help the com-
nodes in our generation of these random graphs; because munity more quickly detect and react to attacks and mis-
the high-degree nodes constitute a small fraction of the takes. We continue to run AddressProbe and will make
overall edges, we believe this not to affect the results. our data publicly available for the Bitcoin and research
Figure 6 presents the average connectedness among ten communities.
random graphs, as well as the average plus two stan-
dard deviations. We observe that truly random graphs of 5 Discovering Influential Nodes
the same size exhibit connectedness that is statistically AddressProbe can efficiently map the reachable Bitcoin
significantly smaller than those of the Bitcoin network. topology, and our analysis has shown both the struc-
In more than 98% of the runs, the measured graph’s ture of the network, and how malicious nodes can af-
connectedness was more than two standard deviations fect broadcast. However, the vast majority of compute
greater than that of the random graph. This indicates that power that sustains Bitcoin and extends the block chain is
the Bitcoin network is not purely random. We hypothe- not visible, but instead is concealed within mining pools.
size that this deterministic structure is due to the process This is apparent from observing that these pools mine the
by which nodes join the Bitcoin network: recall that they vast majority of blocks and reap new coins. In this sec-
connect to a small set of DNS nodes, and slowly perco- tion, we describe the structure of these pools, and then
late to more distal parts of the graph as they learn of new introduce techniques that discover how pools interact and
peers. Quantitatively evaluating the source of determin- interface with the visible broadcast network.
ism is an area of future work.
5.1 Structure of Mining Pools
Discussion. AddressProbe provides an accurate, re-
peatable snapshot of the entire Bitcoin network within Commercial mining pools have a centralized administra-
tens of minutes. With this unprecedented view into the tive structure. The mining pool operator uses a command
Bitcoin topology, we make two important observations: and control (C&C) system to coordinate the pool’s com-
putation.
1. There are peers who connect to ∼125× more than the A prototypical mining pool (Figure 8) consists a pool
typical peer; these almost exclusively correspond to server, one or more gateways, and the pool members. We
mining pools and other measurements nodes. describe each in turn next.

9
Pool Server The pool server assigns units of work to of hash power the pool members contribute. Mining
pool members. Specifically, the mining pool periodically pools often claim credit for the blocks they produce by
assigns each member a Merkle root of a set of trans- posting them on the pool’s website. (Presumably pub-
actions, and a range of nonce-space to search through. licly claiming blocks help pools recruit new members.)
Mining participants explore this search space looking for Pools often use a longstanding public key, and mine
nonces that result in valid “shares”, which are partial blocks that pay the block reward to this key. It is possible
proofs-of-work that are a superset of the ones that qual- for a pool to mine a block, forego the block reward, and
ify as actual Bitcoin blocks. Mining pools support one reward another, since only the public key is necessary
of several protocols for communicating with members’ to pay the block reward to a key. Pools may also claim
mining-rigs, the most popular of which is Stratum [25]. its blocks by including a short message in the coinbase
Generally, the IP address of a pool’s Stratum server is transaction (effectively, a string that miners can use to
publicly known, and can be found in documentation for convey arbitrary information); these can be more readily
joining the pool. spoofed since it does not require the miner to forego the
Many pools operate multiple pool servers in differ- block reward.
ent geographic regions to improve fault tolerance and la- Occasionally, large enough pools may have an incen-
tency. There are several open source implementations of tive to conceal their hashpower. Since there are devas-
pool servers, although mining pools may also use custom tating attacks on Bitcoin (e.g., history revision attacks)
implementations.5 that become feasible when a single entity wields more
Gateways A mining pool typically connects its pool power than the rest of the network, when a pool grows
server to a local or trusted instance of a Bitcoin node very large in size it attracts the concern and suspicion of
—the gateway– to which the pool server communicates the Bitcoin community [17].
using the Bitcoin RPC protocol. Gateways interface pool Several aggregator websites, including blockchain.
resources with the global Bitcoin network, and must pro- info, maintain low-latency connections to a thousands of
vide low latency broadcast for the following reasons: nodes and record the IP address of the first connected
node to relay each block. This method is most effec-
• When a pool member finds a share that is a winning
tive when the aggregator has a direct connection to the
Bitcoin solution, the pool operator needs to claim the
gateways that initiate the broadcast of each block; in any
embedded coins and fees by broadcasting the new
case, this method is sensitive to transmission latency.
block to the rest of the network. This should be
Several other websites such as http://organofcorti.
done as quickly as possible, since any delay increases
blogspot.com and http://mempool.info/ use the methods
the risk that a competing pool will find a competing
described along with other anecdotal evidence specifi-
block (we ignore for now block-withholding attacks,
cally to monitor the activity of large mining entities.
in which a pool may deliberately delay block propaga-
tion if it harms competing pools more than itself [10]). 5.2 Influential Nodes
• A pool must also learn about blocks and transactions In the rest of this section, we describe a new technique
broadcast by other nodes in the network quickly to for finding a small set of “influential” nodes in the Bit-
minimize wasted work done by mining on an old coin broadcast network. We show that this small set of
block. The pool also has an incentive to learn about nodes (≈100 or so) seemingly account for over three-
transactions so it can include them in its blocks and quarters of the hash power. We hypothesize that these
collect the transaction fees. nodes are either gateways to mining pools, or are other-
wise connected with low-latency to gateways.
Pool Members Mining pool participants typically run
Interestingly, the set of influential nodes have entirely
specialized mining-rig control software, such as BFG-
benign topological and protocol features: they don’t have
Miner, cgminer, and poclbm. One reason for the use
exceptionally high degree, don’t form an exclusive com-
of various software like this is the need to control vari-
munity, don’t have a unique version string, or are even
ous mining equipment, such as overclocking, monitoring
particularly long lived in the network, making it difficult
temperature, and detecting errors. Mining pool partici-
to find and track these nodes. However, these nodes pro-
pants need not run an ordinary Bitcoin node; the task of
vide an exceptional network advantage for broadcast: as
receiving, validating, and relaying transactions is dele-
long as a transaction (block) reaches these nodes, it is far
gated to the mining pool operator.
more likely to be included in a block (extended in the
Tracking mining pool power On average, a mining pool block chain).
wins a number of blocks in proportion to the total amount
Prior work [10] has shown that a selfish mining pool
5 See https://en.bitcoin.it/wiki/Poolservers for a comparison of pub- can gain advantage if it initially withholds blocks it finds,
licly available pool server software. but then releases them as soon as a competing block is

10
found. The advantage accrues proportional to the frac- InvBlocking Bitcoin uses a flooding protocol to prop-
tion of the rest of the mining pool that extends the block agate transactions (and blocks) throughout the network.
released by the selfish pool. Assuming honest pools To reduce bandwidth, and indeed to curb uncontrolled
broadcast using the standard protocol, and the selfish flooding, a node does not simply broadcast any new
pool selectively creates low latency connections to the transaction (or block) to all neighbors, but instead em-
influential set, the latter can gain huge broadcast advan- ploys a three-round protocol. After a node accepts a new
tage. In the limit, the attacker can win every broadcast transaction into its memPool, it floods an INV message
race, and therefore profits from selfish mining regardless containing the transaction hash only to each neighbor.
of its fraction of hashpower. As the rest of the section Neighbors may choose to pull the transaction using a
will demonstrate, such an attack is not hypothetical, but GETDATA message with the corresponding hash.
indeed feasible on the current Bitcoin network. As a broadcast makes its way through the network,
a node may receive the same INV from more than one
5.3 Decloaking neighbor prior to receiving the transaction itself. Upon
We introduce a randomized protocol for identifying in- receiving subsequent INV messages containing the same
fluential nodes. Our protocol has two phases: Candidate hash, the node adds subsequent message to a queue, and
Selection (CS), which finds potential influential nodes, waits for two minutes before timing out on any outstand-
followed by Influence Validation (IV), which demon- ing GETDATA messages before sending another a neigh-
strates the candidates do indeed represent disproportion- bor who had also sent an INV. It is therefore possible to
ate mining power. In practice, the CS and IV phases block a node from hearing about a transaction for two
should be run concurrently, with IV validating the out- minutes by sending an INV message and then ignoring
put of CS. the resultant GETDATA.6
The CS and IV algorithms both use two low-level The InvBlock procedure (Algorithm 2) sends a set
primitives, which we introduce first. of INV messages (corresponding to a set of conflicting
transactions when coloring nodes), before sending out
Coloring Nodes Each Bitcoin node maintains a set of the transaction itself. This provides a two minute win-
transactions it has received in a data structure called dow over which selected transactions can be sent to spe-
memPool. Typical mining pool server implementations cific nodes, without having to win a network latency race.
(e.g., CoiniumServ) use the memPool of the connected InvBlocks could used to distinctly color each node, but
gateway node to determine which transactions to include it can also be used to color disjoint sets of nodes, where
in their blocks. Two Bitcoin transactions “conflict” if each node in a set receives the same transaction, but each
they both spend a common transaction input (i.e., repre- set receives a different transaction. This latter method
sent a double spend). Bitcoin requires that among a pair allows a more precise control on bandwidth overhead,
of conflicting transactions, at most one may be included since the number of INV messages sent to each node is
in a block. Nodes must guarantee that none of the trans- not dependent on the size of the network but on the num-
actions in their memPool are conflicting. The reference ber of sets. We use this generalized InvBlock in the CS
client always prefers the first transaction it receives, i.e., and IV algorithms as described next.
a node discards any transaction that conflicts with one 5.3.1 Candidate Selection (CS)
already resident in its memPool.
The CS algorithm 3 is parameterized with an integer M
Ideally, we could color each node with a different corresponding to the number of node sets in generalized
conflicting transaction, and upon repeating this step, we InvBlock. The set of all known nodes is partitioned into
would find the influential nodes, because the transac- M (roughly) equal sets, with each node having the same
tions sent to them would “win”, or be included in a probability of being mapped to any set. These sets are
block, more often. This could potentially be accom- colored using generalized InvBlock, i.e., each node in
plished by creating a unique conflicting transaction for the randomly generated set receives the same transaction,
each reachable node in the network, and delivering them and each set receives a conflicting transaction.
to each node simultaneously. (Otherwise, once a node
One of these transactions eventually gets included in a
receives its transaction, following the protocol, it would
block. We identify the set to which this transaction was
forward to others, and the mapping of transaction to node
sent, and increment a “win” score for each node in the
would no longer remain one-to-one.) Unfortunately, due
set by one.
to varying network latencies and connections between
nodes via other unreachable (NAT’ed) nodes, it is practi- 6 Technically, it is possible to delay a node longer by sending multi-
cally impossible to simultaneously send a distinct trans- ple INV messages. This bug was found concurrently to our work by Bit-
action to each node using simultaneous delivery. We de- coin developers (see https://github.com/bitcoin/bitcoin/pull/4547) but a
scribe a feasible technique for coloring next. patch has not yet been deployed.

11
50 The IV experiment shows that the candidate nodes
Observed found by the CS experiment are indeed influential, ac-
40 Monte Carlo counting for roughly 3/4 of transactions injected during
Number of ”wins”

our two day experiment. In this section, we present an


30
analysis of the “winning-most” IV nodes (referred to as
20 the IV set below). Here, we say an IV node “wins” when
the transaction sent to it (and it alone) is the unique one
10 (among the set of conflicting transactions) that gets in-
cluded in a block.
0 0
10 101 102 103 Our analysis infers details about the organization of
Number of nodes mining pools (at the time of writing); in some cases we
can corroborate these details with supplemental evidence
Figure 9: Candidate Selector Results found by public records on the web and DNS records.
We present our results in Table 2, which shows aggre-
gate behavior of nodes (columns) and pools (rows) that
Over multiple trials with different randomly generated had multiple “wins”. Each entry shows how often a node,
sets, this simple procedure identifies influential nodes as that corresponds to an IP address and port, but we’ve des-
they tend to have disproportionately high scores. Fig- ignated with a letter, wins for a pool. The table includes
ure 9 shows the observed results of approximately 500 pools our experiment has discovered.
trials on the deployed Bitcoin network with M = 100. The two largest pools at the time of writing, Discus-
Each trial was spaced 5 minutes apart over two days. Fish and GHash.IO, account for 17/35 nodes and 115
The figure shows the number of wins per node (IP ad- wins among the multiple winners in the IV set. Ad-
dress, port) compared to a distribution obtained through dresses for the DiscusFish nodes (registration, location)
100 rounds of Monte Carlo simulation in which the win- or their protocol messages did not distinguish them as
ning node is chosen uniformly at random. (We used a influential; we were able to locate these nodes only by
MC simulation since the number of nodes in the network the CS experiment. The IP addresses for two of the
changed in each run). The signal from CS is stark: com- nodes (L and O) associated with GHash.IO resolve to
pared to uniform chance, CS cleanly identifies the small the DNS name of GHash.IO’s Amsterdam pool server
number of influential node candidates. (nl1.ghash.io); the two nodes with the most wins for this
5.3.2 Influence Validation (IV) pool (H and I) are located within the same /24 block.
We have seen that CS cleanly distinguishes a small set BitFury is in fact not a mining pool, but rather a com-
of nodes as influential. The IV algorithm validates this mercial entity that manufactures Bitcoin mining equip-
set (or indeed a candidate set generated from any other ment and operates large-scale mining centers. All of
source) using a similar procedure. IV uses generalized the “wins” from blocks paying out to BitFury’s publicly-
InvBlock with the following structure: each CS identified known Bitcoin address correspond to two nodes (S and
influential node is a singleton set, and the rest of the net- T) with IP addresses within the same /24 block registered
work is one other (giant) set. Thus each potential influ- in Iceland (one of the locations where BitFury claims to
ential node works on a different conflicting transaction, have a mining operation, see bitfury.com).
and the rest of the network works on a single (conflicting) Three “unknown” entities, identified only by the ad-
transaction. This procedure is detailed in Algorithm 4. dresses they pay out to (1AcAj..., 19vvt..., and 1FeDt...)
We ran IV 258 times in total—always after running correspond respectively to nodes located in Germany and
CS—with runs spaced 5 minutes apart. A different con- Georgia. Nodes U and V are both resolved to by names
flicting transaction was sent to the influential nodes, and within the “high.re” DNS hierarchy. The association be-
the rest of the network received yet another conflicting tween node U and 1AcAj... was previously suspected
transaction. The influential nodes comprised the top 86 (see below), and was confirmed by CS and IV; the other
from CS and the 14 top relayers of unknown blocks from IP and Bitcoin address associations are new.
blockchain.info/pools (together < 2% of the network in Associating Bitcoin addresses with pools (or IP ad-
total) and won in 189/258 trials (73%). The candidates dresses) is an open problem. There is public speculation
identified by CS accounted for 179 out of 189 wins. about an association between the anonymous Bitcoin ad-
dress 1AcAj... and BitFury [23]. Our techniques inde-
6 Bitcoin’s Influential Nodes pendently associated node U’s IP address with the pay-
In this section, we present an analysis of the influential out address, as it did for the previously unknown nodes
nodes: how they map to mining pools, and how our pro- V and W with different anonymous Bitcoin addresses.
cedure can identify pools that were previously unknown. We end by noting that the nodes S, T, U, V, and W are

12
A B C D E F G H I J K L M N O P Q R S T U V W X Y
DiscusFish 21 14 13 12 10 3 3
GHash.IO 8 6 5 5 3 3 3 3 4
KnCMiner 1 3 2 2
BitFury 13 7
1AcAj... 2
19vvt... 2
1FeDt... 19
Slush 2 2 2
Total 21 14 13 12 10 3 3 9 6 5 5 3 3 3 3 5 2 2 13 7 2 3 23 2 3
Table 2: Selected results from the Gateway Identifier experiment. Each column (A-Y) represents the IP address of a
reachable node, and each row represents a mining pool. Each value indicates the number of times a transaction sent to
the corresponding node was included in a block mined by the corresponding pool.
DiscusFish BitFury 1FeDt...
GHash.IO 1AcAj... Slush
KnCMiner 19vvt... Other
DiscusFish (7 nodes) [76 wins]
GHash.IO (10 nodes) [39 wins]
KnCMiner (3 nodes) [9 wins]
BitFury (2 nodes) [20 wins]
1AcAj... (1 nodes) [2 wins]
19vvt... (1 nodes) [3 wins]
1FeDt... (1 nodes) [23 wins]
Slush (4 nodes) [7 wins]
Other IV (6 nodes) [10 wins]
Total IV (35 nodes) [189 wins]
Other (~4800 nodes) [69 wins]
0 50 100 150 200 250
Trial #
Figure 10: IV set versus the rest of the network: wins over time. The win patterns appear ‘bursty’ because a single
block often contains multiple IV transactions.

associated with IP addresses registered in countries that described techniques for efficiently mapping the Bitcoin
BitFury publicly claims to host infrastructure in, they all broadcast network and for identifying nodes who have
run the same version of Bitcoin, host the same web server disproportionate influence on the Bitcoin network.
(with the same version of nginx), and T-W serve the same Our AddressProbe technique is distinguished from
(insecure) authentication page. prior work in that it is efficient and operates without pol-
In Figure 10 we show the time-varying behavior dur- luting peers’ local state—we therefore believe that it is
ing the IV experiment. Horizontal lines across time show more scalable and more likely not to affect Bitcoin peers
wins by different sets of nodes, e.g., all wins for nodes when applied widely. Indeed, with our CoinScope im-
that win for a pool, e.g., A-G for DiscusFish, are ag- plementation, we show that we are able to scan the entire
gregated on the highest horizontal line. The penultimate network in minutes, and at regular intervals.
lowest horizontal line shows all wins by the IV set and Bitcoin network topologies reconstructed using Ad-
the lowest horizontal line tabulates the wins by all the dressProbe show that the broadcast network is resilient,
other reachable nodes in Bitcoin. Recall that this time- but does not behave like a traditional random graph.
series spans two days in total, with new conflicting trans- A major contribution of this paper is the finding that
actions, configured as described in the IV experiment, the broadcast topology conceals influential nodes that
injected every five minutes. The figure provides a visual represent disproportionate amounts of mining power. We
measure of the pool influence, and also of how influential introduced novel mechanisms to find these nodes and
the IV set is. to measure their impact. Our results show that roughly
2% of the nodes account for three-quarters of the mining
7 Conclusion power.
Bitcoin’s successful, fair operation is predicated on the Our intent is that these techniques can be applied to
notion of peers being able to reach a global consensus. A perform longitudinal analyses of the Bitcoin network.
critical component of the Bitcoin protocol is a broadcast We acknowledge, however, that AddressProbe is some-
substrate that serves as the sole means by which honest what “fragile” in that it depends on undocumented fea-
peers can learn from and inform other peers. We have tures of the mainline Satoshi client; it would not be in-

13
feasible for a developer to modify some of the inter- References
nal data structures in a way that would confuse Ad- [1] BABAIOFF , M., D OBZINSKI , S., O REN , S., AND Z OHAR , A.
dressProbe. Conversely, disabling the decloaking tech- On Bitcoin and red balloons. In Proceedings of the 13th ACM
Conference on Electronic Commerce (2012), ACM, pp. 56–73.
niques would be far more difficult, as it is based on the
three-round exchange on which Bitcoin’s efficient broad- [2] BACK , A. Hashcash—a denial of service counter-measure. http:
cast depends. We hope that the findings in this paper— //www.hashcash.org/papers/hashcash.pdf, 2002.
that understanding topology can identify structural faults [3] BAHACK , L. Theoretical Bitcoin attacks with less than half of
to the broadcast—will encourage the Bitcoin commu- the computational power (draft). arXiv preprint arXiv:1312.7013
nity to evolve the protocol to explicitly support efficient (2013).
topology discovery. [4] BAMERT, T., D ECKER , C., E LSEN , L., WATTENHOFER , R.,
AND W ELTEN , S. Have a snack, pay with Bitcoins. In Peer-
to-Peer Computing (P2P), 2013 IEEE Thirteenth International
Conference on (2013), IEEE, pp. 1–5.

[5] BARBER , S., B OYEN , X., S HI , E., AND U ZUN , E. Bitter to bet-
ter—how to make Bitcoin a better currency. In Financial Cryp-
tography and Data Security. Springer, 2012, pp. 399–414.

[6] B IRYUKOV, A., K HOVRATOVICH , D., AND P USTOGAROV, I.


Deanonymisation of clients in Bitcoin P2P network. CoRR
abs/1405.7418 (2014).

[7] B LONDEL , V. D., G UILLAUME , J.-L., L AMBIOTTE , R., AND


L EFEBVRE , E. Fast unfolding of community hierarchies in large
networks. Journal of Statistical Mechanics: Theory and Experi-
ment 10 (2008).

[8] C HRISTIN , N. Traveling the silk road: A measurement anal-


ysis of a large anonymous online marketplace. In Proceedings
of the 22nd international conference on World Wide Web (2013),
International World Wide Web Conferences Steering Committee,
pp. 213–224.

[9] D ECKER , C., AND WATTENHOFER , R. Information propaga-


tion in the Bitcoin network. In Peer-to-Peer Computing (P2P),
2013 IEEE Thirteenth International Conference on (2013), IEEE,
pp. 1–10.

[10] E YAL , I., AND S IRER , E. G. Majority is not enough: Bitcoin


mining is vulnerable. In Financial Cryptography and Data Secu-
rity (2014).

[11] G ARAY, J. A., K IAYIAS , A., AND L EONARDOS , N. The Bitcoin


backbone protocol: Analysis and applications. https://eprint.iacr.
org/2014/765.pdf, 2014.

[12] H UANG , D. Y., D HARMDASANI , H., M EIKLEJOHN , S., DAVE ,


V., G RIER , C., M C C OY, D., S AVAGE , S., W EAVER , N., S NO -
EREN , A. C., AND L EVCHENKO , K. Botcoin: monetizing stolen
cycles. In Proceedings of the Network and Distributed System
Security Symposium (NDSS) (2014).

[13] J OHNSON , B., L ASZKA , A., G ROSSKLAGS , J., VASEK , M.,


AND M OORE , T. Game-theoretic analysis of DDoS attacks
against Bitcoin mining pools. In Financial Cryptography and
Data Security, Lecture Notes in Computer Science. Springer,
2014, pp. 72–86.

[14] K ARAME , G. O., A NDROULAKI , E., AND C APKUN , S. Double-


spending fast payments in Bitcoin. In Proceedings of the 2012
ACM conference on Computer and communications security
(2012), ACM, pp. 906–917.

[15] KOSHY, P., KOSHY, D., AND M C DANIEL , P. An analysis of


anonymity in Bitcoin using P2P network traffic. In Financial
Cryptography and Data Security (2014), International Financial
Cryptography Association.

14
[16] K ROLL , J. A., DAVEY, I. C., AND F ELTEN , E. W. The eco- [33] TAYLOR , M. B. Bitcoin and the age of bespoke silicon. In Pro-
nomics of Bitcoin mining, or Bitcoin in the presence of adver- ceedings of the 2013 International Conference on Compilers, Ar-
saries. In Proceedings of WEIS (2013), vol. 2013. chitectures and Synthesis for Embedded Systems (2013), IEEE
Press, p. 16.
[17] M ATONIS , J. The Bitcoin mining arms race:
Ghash.io and the 51% issue. http://www.coindesk.com/ [34] VASEK , M., T HORNTON , M., AND M OORE , T. Empirical anal-
bitcoin-mining-detente-ghash-io-51-issue/, July 2014. ysis of denial-of-service attacks in the Bitcoin ecosystem. In 1st
Workshop on Bitcoin Research. Lecture Notes in Computer Sci-
[18] M EIKLEJOHN , S., P OMAROLE , M., J ORDAN , G., ence, Springer (March 2014) (2014).
L EVCHENKO , K., M C C OY, D., VOELKER , G. M., AND
S AVAGE , S. A fistful of Bitcoins: characterizing payments
among men with no names. In Proceedings of the 2013 con- A Bitcoin Address Propagation
ference on Internet measurement conference (2013), ACM, This section lists pseudocode for the behavior followed by the Bitcoin
pp. 127–140. reference client for handling peer addresses. The pseudocode, listed in
Algorithm 1 describes the algorithm found in the version 0.9.2 satoshi
[19] M ILLER , A., AND L AV IOLA J R , J. J. Anonymous Byzantine client, files addrman.[cpp,h], and net.cpp. The data structures used in
Consensus from Moderately-Hard Puzzles: A Model for Bitcoin. the algorithm are as follows:
Tech. rep., University of Central Florida, 2014.
• addrMan: A data structure containing every address the Bitcoin node
[20] M ISLOVE , A., M ARCON , M., G UMMADI , K. P., D RUSCHEL , currently knows about.
P., AND B HATTACHARJEE , B. Measurement and Analysis of
• addrKnown: Bitcoin will not send the same ADDR message to a node
Online Social Networks. In Proceedings of the 5th ACM/Usenix
twice within 24 hours. This structure maps ADDR sets to the peers
Internet Measurement Conference (IMC’07) (San Diego, CA,
they have been sent to.
October 2007).
• addrBuf: A buffer that accrues future ADDR messages for a given
[21] M OORE , T., AND C HRISTIN , N. Beware the middleman: Empir- node.
ical analysis of Bitcoin-exchange risk. In Financial Cryptography
and Data Security. Springer, 2013, pp. 25–33.
B Pseudocode for Candidate Selec-
[22] NAKAMOTO , S. Bitcoin: A peer-to-peer electronic cash system.
http://bitcoin.org/bitcoin.pdf, 2008.
tion and Influence Validation Al-
[23] N EIGHBOURHOOD P OOL WATCH. June 22nd 2014 Weekly Net-
gorithms
work and Block Solver Statistics. http://organofcorti.blogspot. In Algorithm 2, we provide detailed pseudocode listings for the In-
com/2014/06/june-22nd-2014-weekly-network-and-block. vBlock procedure. In Algorithm 3 (resp. Algorithm 4) we provide
html?q=bitfury, 2014. pseudocode for the Candidate Selection (resp. Influence Validation)
routines described in Section 5
[24] O BER , M., K ATZENBEISSER , S., AND H AMACHER , K. Struc-
ture and anonymity of the Bitcoin transaction graph. Future in-
ternet 5, 2 (2013), 237–250. InvBlock(n, tx, τ) :
for dτ/(2 minutes)e do
[25] PALATINUS , M. Stratum mining protocol - asic ready. https: send INV[H (tx)] to n
//bitcointalk.org/?topic=108533.0, September 2012.
on recv GETDATA[H (tx)] from n do
[26] P LOHMANN , D., AND G ERHARDS -PADILLA , E. Case study of Ignore; Do not send tx to n.
the miner botnet. In Cyber Conflict (CYCON), 2012 4th Interna-
tional Conference on (2012), IEEE, pp. 1–16. Algorithm 2: The InvBlock procedure delays node n
[27] P OUWELSE , J., G ARBACKI , P., E PEMA , D., AND S IPS , H. The from learning about transaction tx for time τ.
Bittorrent P2P File-Sharing System: Measurements and Anal-
ysis. In 4th International Workshop on Peer-To-Peer Systems
(IPTPS) (2005).

[28] R EID , F., AND H ARRIGAN , M. An analysis of anonymity in the


Bitcoin system. Security and Privacy in Social Networks (2012),
197.

[29] R IPEANU , M., F OSTER , I., AND I AMNITCHI , A. Mapping the


Gnutella network: Properties of large-scale peer-to-peer systems
and implications for system design. IEEE Internet Computing
Journal 6 (2002), 2002.

[30] RON , D., AND S HAMIR , A. Quantitative analysis of the full


Bitcoin transaction graph. In Financial Cryptography and Data
Security. Springer, 2013, pp. 6–24.

[31] ROSENFELD , M. Analysis of Bitcoin pooled mining reward sys-


tems. arXiv preprint arXiv:1112.4980 (2011).

[32] S CHOENMAKERS , B. Security aspects of the ecashtm payment


system. State of the Art in Applied Cryptography (1998).

15
at node x store the following data:
addrMan: a mapping from node n to timestamp ts
list of connected peers, for each peer
outbound // peer is an outbound connection
addrKnown // peer knows about address (can set/get)
addrBuf // set of addresses to send to peer

on receive (∗, y) // any message received from peer y do


if y.outbound // y is an outbound connection then
if addrMan[y].ts < (now − 20 minutes) then addrMan[y].ts ← now

on receive (ADDR[addr vector], y) // ADDR message from peer y with addresses in addr vector do
for each address a in addr vector do
y.addrKnown  a
if a.ts is invalid (very old or 10+ minutes in the future) then a.ts ← (now − 5 hours)
if a.ts < (now − 10 minutes) then
choose 1-2 nodes n uniformly at random
buffer-to-send(n, a)
addrMan[a].ts ← (now − 2 hours) // store in addrMan with a 2 hour penalty

on receive (GETADDR, y) do
y.addrBuf ← 0/ // clear send buffer
q ← up to 2500 addresses chosen uniformly at random from addrMan
buffer-to-send(y, q)
on receive (VERSION, y) do
send GETADDR to y
if y.outbound then buffer-to-send(y, x)
procedure buffer-to-send(peer y, addr vector A)
for each address a in A do
/ y.addrKnown then y.addrBuf  a
if a ∈

every 100 ms do
for one randomly chosen connected peer p do
p.addrKnown  p.addrBuf // upto 1000 (addr,ts) in each message
send p.addrBuf to p and clear p.addrBuf // could be multiple messages
every 24 hours do
for every connected node p do buffer-to-send(p, x)

Algorithm 1: Address Propagation Behavior

16
CandidateSelection(n1 , ..., nN ) :
Partition the N nodes into M = 100 random sets
N
c1 , ..., cM of size C = d M e where each
ci = (ci,1 , ..., ci,C ).
for 1 ≤ i ≤ M do
// Each txi conflicts with the others
txi := MakeT x(tx0/ [0])

for 1 ≤ i ≤ M do
for 1 ≤ j ≤ N do
InvBlock(n j , txi , 20m)

Wait for time ∆? for INV to settle


for 1 ≤ i ≤ C do
for ci, j ∈ ci do
send TX[txi ] to ci, j

Wait until a block is found containing some txi .


Increment wins j for every n j ∈ ci .

Algorithm 3: Candidate Selection (CS)

InfluenceValidation(w1 , ..., wW , n1 , ..., nN ) :


/* (w1 , ..., wW ) are the candidates */
for 1 ≤ j ≤ W + 1 do
// Each txi conflicts with the others
tx j := MakeT x(tx0/ [0])

for 1 ≤ i ≤ W + 1 do
for 1 ≤ j ≤ N do
InvBlock(n j , txi , 20m)

Wait for ∆? for INV to settle


for 1 ≤ i ≤ W do
send TX[txi ] to wi
for 1 ≤ j ≤ N do
send TX[txW +1 ] to n j
Wait until a block is found containing some txi . If
i = W + 1, then discard. Otherwise, determine which
mining pool P is associated with this block and
increment winsi,P .
Algorithm 4: Influence Validation (IV)

17

You might also like