Professional Documents
Culture Documents
1 Introduction
With the rapid popularization of mobile Internet and the wide application of smart
mobile devices, mobile social networks have gained enormous momentum, such as
WeChat, Twitter and Momo, which provide a more convenient way for people to
communicate with each other [13]. Meanwhile, it provides convenience for malicious
users to engage in illegal activities such as fraud. WeChat is one of the largest mobile
social networks in China, which has a wide range of users around the world. For the
advantage that mobile social network applications can access to the location of devices
conveniently, a wide variety of Location-Based Social Networks (LBSNs) and par-
ticular Location-Based Services (LBSs) sprang up [4, 5], which make it possible to
geolocate social network users. Carrying out the research of geolocating technology for
mobile social network users is signicative to pinpoint the location of malicious user
and raise awareness of location privacy protection for ordinary users, meanwhile, help
service providers afford more secure services to users [6]. The paper is focusing on
WeChat, one of the most popular social network applications in China, to explore
whether the existing privacy protection strategies of WeChat can protect users location
privacy effectively. The existing geolocation methods for WeChat users can be divided
into two categories: number theory based geolocation algorithm and successive
approximation based geolocation algorithm.
Number theory based geolocation algorithm abstract the relation between reported
distance and actual distance as an ideal mathematical model. By setting probes
equidistantly with certain rules in the region where the target user is located, reported
distances of the target user collected from probes are constraint solved, thereby the
location of the target user is pinpointed. Paper [7] has proposed one-dimension
adversarial method rstly to detect the target users location along a line, and further
extended the method to two-dimensional space. Whats more, the method is proved
theoretically that it can get high positioning accuracy under ideal conditions. Paper [8]
analyzed the influence of noise on the reported distance based on hypothesis and a
heuristic number theory approach was proposed, what reduced the influence of noise
on geolocating procedures to a certain extent. Paper [9] pointed out that the existence of
noise makes geolocating errors of number theory-based algorithm increases as the
actual distance between probes and the target user increases. In addition, placement
strategies for the rst probe was proposed to improve the practicality of the number
theory based geolocation algorithm. Number theory based geolocation algorithm has
high geolocation accuracy in theory, but in practical, as there is no stable relation
between reported distance and actual distance, the gap between actual results and
theoretical precision is large and the theoretical precision is hard to achieve.
Successive approximation based geolocation algorithm rstly determines the target
users location within a certain region. Then the region is divided into several
sub-regions by collecting target users reported distance from different positions. By
determining which sub-region the target user is located, the size of potential region is
reduced. Constantly repeating the procedure to approach the real location of the target
user. Paper [10] geolocated WeChat user based on the improved triangle geolocation
algorithm which utilizes the band-like reported distance characteristic of WeChat. The
center of the intersection of rings determined by several probes was taken as the
location of the target user. In order to break the minimum reported distance limit of
WeChat, Paper [11] has proposed a geolocation algorithm based on space partition.
The algorithm rst determines the target user within the minimum reported distance,
and then partition the space determined by the minimum reported distance until the
threshold is achieved.
Space partition based geolocation algorithm is less time consuming and easy to
implement. As shown in paper [11], the algorithm is able to geolocate 50% of users in
less than 40 m, and the average positioning accuracy is about 51 m. However, actual
tests show that there is no stable correspondence between reported distance and actual
distance affected by the update of location protection strategy of WeChat, and the
algorithm is difcult to geolocate target users with high precision in current conditions.
A WeChat User Geolocating Algorithm 225
To analyze the correspondence between reported distance and actual distance, fur-
thermore, improve the geolocating accuracy for WeChat users in actual environment, a
geolocating algorithm based on optimization parameters selection of space partition is
proposed.
In this paper, we improve space partition based geolocation algorithm by selecting
optimization parameters based on statistical characteristics of the relation between
reported distance and actual distance, and stepwise strategies are proposed to improve
the accuracy rate of space partition. Experimental results show that, if the target user
can be discovered, the proposed algorithm can geolocate WeChat users with higher
accuracy compared with the classical space partition based algorithm and the heuristic
number theory based algorithm, and the highest geolocating accuracy is within 10 m.
2 Problem Statement
In this Section, Location-based Social Discovery (LBSD) services and location pro-
tection strategies of WeChat are introduced. Then theory and shortcomings of the
classical space partition based geolocation algorithm are discussed.
Each probe reports the relative distance to the target user in bands of K and the
relation between reported relative distance Wd and actual distance d can be for-
malized as follows:
d
Wd 1 K 1
K
Probe
Target User
Moving Direction
Intermediate Results
The algorithm is based on the idea of successive approximation. For the simplicity
of problem presentation, the algorithm consider the minimum distance limit as the box
rather than the circle. Firstly, the location of probes is dynamically changed until the
target user occurs with the minimum reported distance in query results list of the probe.
So the potential area where the target user is located is determined as a box with 200 m
in length, and current position of the probe is taken as intermediate geolocation result.
Secondly, the position of the probe is shifted relative to the last intermediate result,
consequently, the range determined by the minimum reported distance covers half of
the potential area. Thirdly, the half region that the target user is located is determined
by judging whether the reported distance of the target user getting from the shifted
probe changes. If it changes, it is derived that the target user is in the un-overlapped
half. Otherwise, the target user is located in overlapped half. In this way, the potential
area is reduced to half after each round check. Modifying intermediate result to make it
in the center of current potential area. Repeating this partition for multiple rounds until
the expected accuracy is achieved and taking the last intermediate result as the location
of the target user.
The space partition based geolocation algorithm takes the region determined by the
minimum reported distance as the target space, and decide the sub-space that contains
the target by checking whether the reported distance of the target user is increasing after
A WeChat User Geolocating Algorithm 227
shifting the probe. The location accuracy of the algorithm depends on the strict cor-
respondence between reported distance and actual distance, that is formula (1). The
algorithm will achieve high accuracy if the correspondence is satised. However, actual
test shows that there is no strict correspondence between reported distance and actual
distance. The situation that the reported relative distance shows 200 m or bigger is
frequent even though the actual distance between two users is less than 100 m. In such
conditions, the algorithm will identify the target users location in wrong half space,
which will lead to misjudgment and enlarge the geolocation error.
The update of privacy protection policy can be taken as a possible reason for the fact
that there is no strict correspondence between reported distance and actual distance in
People Nearby function of WeChat. In this paper, we only consider the region within
reported distance of 1000 m. By analyzing the relation between reported distance and
actual distance, optimal parameters that decrease the probability of misjudgment are
selected. In addition, stepwise strategies are adopted to improve the accuracy rate of
space partition.
Assuming that the algorithm take Wpi as the target reported distance, which means
that the algorithm starts to geolocate the target user when the reported distance of the
target user is Wpi in the query results list of probes. For the simplicity of problem
presentation, we consider the area determined by Wpi as a box rather than a circle,
which centered on the current probe and take 2D as the edge length. Taking R as
effective actual distance determined by Wpi . When the actual distance is less than R, the
probability that the reported distance of the target user is not greater than Wpi is higher
compared with the probability that the reported distance is greater than Wpi . The
framework of WeChat users geolocating algorithm based on optimization parameter
selection of space partition is shown in Fig. 2 and the main steps are as follows.
Select single
Reported Calculate query point
distances Effective
variance of
under distance R
upper limit The target
different corresponds
distribution reported
to the target Modify the
actual of actual distance
reported location of N
distances distance for
distance probes
reported Take the
distances Y central
Select N Maximum side Y Maximum side
multiple length of current length of current position of
query points space<2R space<threshold the current
space
The target
N user is in
overlapped
Target users region Positioning
Select the
appear Determine Query the results for
most The reported
within the initial space target user Narrow the target users
frequent distance
target range several target space
reported increase
reported times
distance The target
distance
user is in un-
overlapped
Y
region
How to determine the target reported distance and the corresponding geographical
spatial scope is the rst problem to be solved when geolocating a WeChat user. In this
paper, we study the relation between reported distance and actual distance by analyzing
the statistical characteristics of the actual distance upper limit corresponding to each
reported distance (from 100 m to 900 m), and determine the target space range based
on the characteristics.
Pi d D P 2
Where d is the actual distance. The space where the target user is located is
determined as a box centered on the probe, with 2D as the side length. The probability
that the determined area covering the target user is more than P when the target user is
within the target reported distance of the probe.
Determining the value of D according to cumulative probability instead of the
maximum value, which can make the target space cover the location of target user with
high probability (if P is big enough) as well as narrow the target space to reduce time
consumption.
230 W. Shi et al.
For the fact that value D is selected to make the determined target area cover the target
users location with high probability, the reported distance is very likely to be bigger
than Wpi even the actual distance is smaller than D. The paper delimits effective space
range of the target reported distance and proposes stepwise strategies to improve the
accuracy rate of space partition.
Fig. 4. The variance of actual distance upper limit for each reported distance
The gure shows that variances for each reported distance are fairly large, and
variances have an increasing tendency with the increases of reported distance. Taking
200 m as the target reported distance as it has smaller variance. It is worth noting that
the variance of 100 m reported distance is abnormally large. The abnormal phe-
nomenon is caused by the frequent appearance of following situation: even two users
are geographically close to each other, the reported distance of user A getting from user
B is 200 m or bigger, rather than 100 m. The reason for this may be that WeChat
thought it privacy- threating if report relative distance with 100 m when the actual
distance between users is short, for the location of users can be determined within a
smaller area. So the reported distance of 100 m will be avoided deliberately.
The probability distribution is shown in Fig. 5. It is observed that the upper actual
distance of 200 m reported distance obey the normal distribution approximately, and
the upper value has the maximum probability in about 170 m. The probability that
reported distance changes to 300 m is pretty high when the actual distance between
users is about 170 m. Based on the strategy of parameters selection, calculating the
value that makes cumulative probability of the upper limit probability distribution for
the target reported distance up to 95% and 50% respectively and corresponding actual
distances value are taken as the initial target space range D and the effective distance R.
In this way, the probability that the determined target space cover the location of target
users is greater than 95%, and the determined effective space range meets the con-
straints. The obtained parameters are 225 and 173 for D and R.
A WeChat User Geolocating Algorithm 233
0.012
0.01
0.008
Probability
0.006
0.004
0.002
0
0 50 100 150 200 250 300
Actual Distance/m
Fig. 5. Probability distribution of the upper limit of the actual distance for the target reported
distance
1
Space Partition based Algorithm
0.9 Heuristic Number Theory based Algorithm
Proposed Algorithm
0.8
0.7
Cumulative probability
0.6
0.5
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100 120
Error/m
Experimental results show that the average error of the proposed algorithm is
56.5 m, which is lower than 68.9 m for the original algorithm and 63.1 m for the
heuristic number theory based algorithm. As shown in Fig. 6, under the same exper-
imental conditions, the minimum positioning error of the proposed algorithm is less
than 20 m, however, all the positioning errors of the original algorithm are higher than
20 m. Moreover, 56% of the localization error for the proposed algorithm is less than
60 m, which is higher than 34.8% of the original algorithm and 42% of the heuristic
number theory based algorithm. The reason for the high geolocating error of the
original space partition based algorithm may be that, even reported distances of target
users are the minimum reported distance, actual relative distances between target users
and the probe are greater than 100 m. The initial space is failed to cover the location of
target users, whats more, frequently misjudge during space partition enlarges the
deviation between positioning results and actual location of target users. For the
heuristic number theory based algorithm, the unstable relationship between reported
distance and actual distance make it hard to get accurate coordinates of target users. It is
obvious to see from Fig. 6 that the proposed algorithm can geolocate WeChat users
accurately in practical environment. The factors which may affect the geolocating
accuracy includes: errors of fake location applications, low-probability misjudgment of
space partition, the time of users location is cached in WeChat server, and so on.
7 Conclusion
LBSD services of mobile social networks report the relative distance of nearby users in
concentric bands. However, there is no strict correspondence between the reported
distance and the actual distance. In this paper, the relationship between reported dis-
tance and actual distance of WeChat is analyzed. We improve space partition based
geolocation algorithm by selecting optimization parameters based on statistical char-
acteristics between reported distance and actual distance, and stepwise strategies are
proposed to improve the accuracy rate of space partition. Experimental results show
that, compared with the original space partition based geolocation algorithm and the
heuristic number theory based algorithm, the proposed algorithm has higher localiza-
tion accuracy. Noted that the proposed algorithm is effective only when the user uses
the LBSD service and can be discovered by other users. In future work, we will focus
on the difference of the relation between reported distance and actual distance in
different orientations, as well as the relation between the proximity of users in query
results list and the difference of actual relative distances to the probe. The research
hopes to provide technical support for geolocating malicious LBSN users and raise the
awareness of ordinary users for location privacy protection.
Acknowledgment. The work presented in this paper is supported by the National Natural
Science Foundation of China (No. U1636219, 61379151, 61401512, 61572052), the National
Key R&D Program of China (No. 2016YFB0801303, 2016QY01W0105) and the Key Tech-
nologies R&D Program of Henan Province (No. 162102210032).
A WeChat User Geolocating Algorithm 235
References
1. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media?
In: Proceedings of the International Conference on World Wide Web, pp. 591600. ACM,
Raleigh (2010)
2. Nemelka, C.L., Ballard, C.L., Liu, K., Xue, M., Ross, K.W.: You can yak but you cant hide.
In: Proceedings of the ACM Conference on Online Social Networks, p. 99. ACM, Stanford
(2015)
3. Wang, G., Wang, B., Wang, T., Nika, A., Zheng, H., Zhao, B.Y.: Whispers in the dark:
analysis of an anonymous social network. In: Proceedings of the Internet Measurement
Conference, pp. 137150. ACM, Vancouver (2014)
4. Zheng, Y.: Location-based social networks: users. In: Zheng, Y., Zhou, X. (eds.) Computing
with Spatial Trajectories, pp. 243276. Springer, New York (2011). doi:10.1007/978-1-
4614-1629-6_8
5. Hoang, N.P., Asano, Y., Yoshikawa, M.: Your neighbors are my spies: location and other
privacy concerns in dating apps. In: Proceedings of the 18th International Conference on
Advanced Communication Technology, pp. 715721. IEEE, PyeongChang (2016)
6. Shokri, R., Theodorakopoulos, G., Papadimitratos, P., Kazemi, E., Hubaux, J.: Hiding in the
mobile crowd: location privacy through collaboration. IEEE Trans. Dependable Secure
Comput. 11(3), 266279 (2014)
7. Xue, M., Liu, Y., Ross, K.W., Qian, H.: I know where you are: thwarting privacy protection
in location-based social discovery services. In: Proceedings of the IEEE Conference on
Computer Communications Workshops, pp. 179184. IEEE, Hong Kong (2015)
8. Peng, J., Meng, Y., Xue, M., Hei, X., Ross, K.W.: Attacks and defenses in location-based
social networks: a heuristic number theory approach. In: Proceedings of the International
Symposium on Security and Privacy in Social Networks and Big Data, pp. 6471. IEEE,
Hangzhou (2015)
9. Cheng, H., Mao, S., Xue, M., Hei, X.: On the impact of location errors on localization
attacks in location-based social network services. In: Wang, G., Ray, I., Alcaraz Calero, J.
M., Thampi, S.M. (eds.) SpaCCS 2016. LNCS, vol. 10066, pp. 343357. Springer, Cham
(2016). doi:10.1007/978-3-319-49148-6_29
10. Ding, Y., Peddinti, S.T., Ross, K.W.: Stalking Beijing from Timbuktu: a generic
measurement approach for exploiting location-based social discovery. In: Proceedings of
the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices,
pp. 7580. ACM, Scottsdale (2014)
11. Li, M., Zhu, H., Gao, Z., Chen, S., Ren, K., Yu, L., Hu, S.: All your location are belong to
us: breaking mobile social networks for automated user location tracking. In: Proceedings of
the 15th ACM international symposium on Mobile ad hoc networking and computing,
pp. 4352. ACM, Philadelphia (2014)
12. Polakis, I., Argyros, G., Petsios, T., Sivakorn, S., Keromytis, A.D.: Wheres Wally? Precise
user discovery attacks in location proximity services. In: Proceedings of the 22nd
ACM SIGSAC Conference on Computer and Communications Security, pp. 817828.
ACM, Denver (2015)
13. Wang, R., Xue, M., Liu, K., Qian, H.: Data-driven privacy analytics: a WeChat case study in
location-based social networks. In: Xu, K., Zhu, H. (eds.) WASA 2015. LNCS, vol. 9204,
pp. 561570. Springer, Cham (2015). doi:10.1007/978-3-319-21837-3_55