Professional Documents
Culture Documents
1 2
http://qgis.org/en/site/ https://en.wikipedia.org/wiki/Moran%27s_I
1|P a g e
2.2 Kernel Density Estimation (KDE) elaborate on how to solve it and prevent error occurring in the
In Kuala Lumpur, Malaysia, the spatial density of dengue cases following sections. Better and updated geocoding API is
within six zones was examined using spatial statistical tools. The encouraged to be used in this step.
KDE interpolation technique was utilized in order to analyze the 3.1.2 Data Formatting
hotspot localities [3]. KDE is referred as an advanced technique
Based on the simulated data provided by prof, the format of
to generalize the incident locations to the whole study area
NOTIF_DATE attribute is mm/dd/yyyy. However, based on
where it is involved in the identification of high risk areas within
our later discovery, we realized that this format is not able to be
point patterns of disease incidence by producing a continuous
used during the cases selection process, the reason of which will
and smooth surface which gives the information about the level
be elaborated later. Therefore, before importing the data into
of risk for a particular area [4].
QGIS, we reformatted the date in yyyy-mm-dd.
In Singapore, the dengue clusters are also formed using KDE on
raster layer, then convert the output layer into vector layer. And
extract the same of the cluster bases on an approximate value of 3.2 Clustering
KDE value. The map below shows an example of cluster view
of dengue distribution3. 3.2.1 Customized Clustering Method
As discussed in related works section, most countries perform
dengue hotspot clustering based on statistical method such as
Moran I or Kernel Density. The clusters formed are based on
statistical measures and may cut through buildings. For better
managing roles and responsibilities of implementing corrective
and preventive actions, we are proposing a more intuitive view
of clusters in the shape of building buffers.
Cases are reported with the persons residential addresses which
are located at the centroids of the residential buildings. We
define a cluster by joining buildings with dengue cases reported
within 14 days within 150 meters. Number of cases is counted
by cluster. The day cluster formed is used to label each cluster.
2|P a g e
Buildings outside Singapore: In order to efficiently select
only those buildings located within Singapore coastal outline,
building layer is intersected with Singapore Costal Outline
data layer in QGIS by using Vector > GeoProcess Tools >
Intersect to select buildings within coastal outline. After
selection, 44,792 buildings are kept.
3|P a g e
more powerful query function to reduce steps in the whole
process.
3.2.3 Techniques Used in Clustering However, as an open source software, PostGIS may post
1. Geoprocessing Tools security concern to government authorities. Hence, It is has
This is an easier way to link cases to clusters by referencing higher requirements for users and not as light weighted as QGIS
clusters identifier, i.e. cluster ID and cluster date. in-build methods.
Geoprocessing tool such as join attribute by location can be used 3. SpatialLite
to append cluster information to each case. SpatiaLite is a spatial extension to SQLite, providing vector
geodatabase functionality. It is light weighted, the whole SQL
The advantage of this method is that it is an embedded function
engine is directly embedded within the application itself: a
in QGIS hence can be automated in QGIS modeler. And do not
require any database server license. complete database simply is an ordinary file which can be freely
copied (or even deleted) and transferred from one computer/OS
However, it requires more complicated front end display logic to to a different one without any special precaution. The SQL
retrieve cases bases on cluster ID and cluster date. In addition, query uses a slightly different way. Following are the query used
this method requires duplicated case information to be stored to append contained cases id behind each cluster:
which may reduce the efficiency in data storage and increased SELECT cluster_id, group_concat(obj_ID) as casesIDs,
data processing time. count(*) as numCases
2. PostGIS FROM "caseWithClusterId "
Leveraging the database management technologies to manage GROUP BY cluster_id
all the special data information of dengue cases and dengue ORDER BY cluster_id
clusters will increase the data processing efficiency and execute However, SpatialLite is not integrated with QGIS modeler.
4|P a g e
3.3 Modeler
3.3.1 Automation with Modeler
QGIS Modeler is an inbuilt tool with QGIS, which provides
easy solution to complex operation flows. By creating complex
models using the graphical modeler, we can easily replace
manual execution with automation workflows, which saves
much time and effort.
Of course, we need to first figure out what processes are needed
using modeler. After discussion and consulting prof, we Figure 3.3.2 Expression for selecting cases within 14
finalized the following steps (underscores means process days
5|P a g e
After brainstorming, we decided to try to convert the date format
in csv files into integer so that we can easily use minus 14
in QGIS selection expression. Unfortunately, this ways also
failed because if we want the generated web map to be flexible
and interactive, we then need to prompt users to input any date
as they want to retrieve different data. The $now attribute in
Date and Time field can be used to form clusters on a daily
bases (here we assume that NEA will update todays dengue
cases on a daily bases). Where there comes the problem is that
Figure 3.3.4 NoneType Error
the format of $now is in yyyy-mm-dd, so it cannot be used
together with date in integer format for selection query, which
means we have to figure out another way to make it work.
Finally, after considering the usability, effort taken and the
technical limitation of QGIS, we decide to change the format of
date to yyyy-mm-dd in the csv file, which is the same as we
introduced in the above Data Pre-processing section and still
use csvt to specify the Date format of this attribute.. We
believe that this will not take more time or effort compared with
the current date format being used because once the format is
set, all newly updated data will follow the same format. In this
way, we can then use the $now function with QGIS selection
query to retrieve those cases within 14 days based on
requirement.
6|P a g e
prepared user interface for visualization. This step is considered 3. Wrong coordinates in geojson file
troublesome, because every time there is newly updated dengue
cases and newly generated clusters, users would have to Successfully generating geojson file is a big step, meaning that
manually finish the exportation process. Given that NEA we can realize the automation in file exportation. During testing
updates new cases on a daily bases, this manual process is phase, the geojson file can also display correct cluster map as
QGIS vector layer. However, when we further test the geojson
definitely not preferred, so we decided to figure out a more
automated alternative to replace this process, which can also file, the coordinates inside are totally wrong all the numbers
consist with our initial motivation automation. are very huge, making it impossible for our code to read and
display properly in our web app.
Fortunately, QGIS Modeler allows us to choose to save final
generated clusters and ids as geojson file. Thus, our web After debugging for quite a while, we realized that geojson files
application can directly read data from the specified folder and target projection is WGS84 [EPSG:4326] and our shapefiles are
update map and report based on new data, which means projected as SVY21/Singapore [EPSG:3414], and thats why the
whenever the model is executed, the web application page can coordinates do not match in geojson files. After further
exploration, we found that we can easily change the projection
be updated accordingly just by simply refreshing, without any
manual exportation. to WGS84 using QGIS Modeler. In this way, we could finally
successfully achieve the automation in web map visualization
and report generation in correct and user-friendly interface.
As mentioned before, there are two outputs generated by the To display the cluster layer based on the date, we used the filter
modeler. One contains appended clusters data which is used to function provided by Leaflet for adding the geojson layer to the
display cluster layer on the web application, the other one web application.
contains the appended cases data which is used to generate the
report of each dengue cluster.
In order to read the two output files, we used the
XMLHttpRequest which is an API provides client functionality
for transferring data between client and server. As a result, this
function provides an easy way to get data from URL without Figure 4.3 Filter Cluster Data
reload the page. This functionality can be used to retrieve any
type of data. After retrieve data from the file, we convert the text As shown in the code, the layer was filter based on the date that
into JSON format for further processing. The code below shows the cluster exists.
the steps to read the data.
4.2.2 Hotspot Legend
In order to make the clusters more meaningful and easier for the
user to analyze, we classified different clusters based on number
of dengue cases in each clusters. As shown in the legend, the
clusters with more cases will have deeper color.
4.2 Display Cluster Layer Figure 4.4 Legend for display Cluster
As shown above, open Street Map is used as the base map of the
web application. The clusters are shown on the base map is
based on the geometry data generated. It is clear to identify the To classify different clusters, we used the features of leaflet and
location of each cluster. In general, we used Leaflet to display change the style of each clusters as shown in the code (the
the map. Leaflet is an open-source JavaScript library to display example is used to style the cluster with 1 or 2 cases inside it).
interactive maps. As the input data file of the web application is
geojson, we add the data into the web application as a layer
function - L.geoJson( <Object> geojson?, <GeoJSON
options> options? ). The file for displaying the clusters is the
appended cluster data.
8|P a g e
to explore, there will be a window pop up to show the basic
information like cluster ID, cluster type (building type inside the
cluster) and number of cluster case about the selected cluster (as
shown in the figure). We implemented the pop up window
feature of leaflet and customized the information fro display.
And code shown below is how we implemented the feature.
9|P a g e
Figure 4.12 Overall Web Application
5 DISSCUSSION
5.1 New Practice Introduced 3. NEA would still prefer to combine QGIS Modeler and
The highlight of our project is the use of QGIS Modeler, which ArcGIS, which is what they are currently using. However,
is a new open source tool introduced by prof and further due to the limitation of software adaptability, this is not
explored by our team. Different from other Geospatial Analytics feasible for now.
projects, we do not focus on analysis part, but more for proving 4. NEA cares about the speed of running model with large
a concept. Since our aim is to customize and propose a solution amount of data, which was not tested using the simulated
to NEA, we focused on the automation and flexibility of our data. Further improvement can be worked on this part.
cluster formation process and the use of our web map and report. 5. The dynamic web report and interactive way of showing
dengue cases distribution and demographic distribution are
With the use of QGIS Modeler, we successfully implemented impressive.
the whole cluster formation process in a much more 6. The flexibility of change algorithms parameters in QGIS
convenience and easy way, which also provides flexibility in Modelers for better customization also caters to their need
terms of possible changes in criteria, adjustment of details steps, and interest.
type of output files and etc. 7. The display of cluster map with different colors darkness
In addition, with our extra code in the web map side, we also indicating cluster density is also interesting and impressive.
provide an interactive approach to dynamically view and update
maps as well as dengue case report, which we believe is a better
5.3 Disadvantage of Our Method
alternative for unchangeable pdf report. Firstly, the cluster formed are simply according to the rule that
whether two building buffer touches each other. If more
5.2 User Study buildings are clustered, it implies more cases are in the cluster,
On poster day, we were honored to present our project outcome even if there is only one case in each building. This shows it
to representatives from NEA. Based on our observation, we cannot demonstrate the density in the same building. In another
found that some of our proposed methods are really impressed to word, the values to show the severity of the clusters are not
them and has the potential to be taken and implemented by taking into account the factor of the area size of the cluster.
NEA. A few points mentioned by representatives from NEA are Secondly, even if there is only one case in the building. The area
listed below. of 150m building buffer is identified as cluster. It allows
1. Clustering can be auto-generate by QGIS Modeler without authority to manage each indivisual cases at its emerging states.
human effort, which can be very important and a big effort- It is hardly to be called as a cluster in traditional sense.,
saving method for NEA.
2.
10 | P a g e
6 FUTURE WORK
Our system currently is just an implementation of proposal to
NEA the use of QGIS Modeler and building buffer. It may not
8 REFERENCES
100% fits NEAs current execution/research process, so further [1] Phaisarn Jeefoo, Nitin Kumar Tripathi and Marc Souris.
refinement and improvement can also be important. 2011. Spatio-Temporal Diffusion Pattern and Hotspot
Detection of Dengue in Chachoengsao Province, Thailand.
6.1 Database for large amount of data Int, J. Envirron. Res. Public Health 2011, 8, 51-74.
Doi:10.3390/ijerph8010051.
The system right now is directly using shapefiles for executing
models and generating web map and report, which works [2] Do Thi Thanh Toan, Wenbiao Hu, Pham Quang Thai,Luu
perfectly for small size of data (400 rows of data was used for Ngoc Hoat, Pamela Wright and Pim Martens. 2013. Hot
testing) by joining different files and retrieve data based on spot detection andspatio-temporal dispersion of
primary keys. However, as what representatives from NEA were denguefever in Hanoi, Vietnam. PUBLIC HEALTH IN
concerned about, it may take longer for large amount of data. VIETNAM, Coation.
Luckily, there is an alternative existing integration of database [3] S. Aziz, R.M. Aidil, M.N. Nisfariza, R. Ngui, Y.A.L. Lim,
and QGIS. During our exploration of clustering formation, we W.S. Wan Yusoff & R. Ruslan 2014. Spatial density of
did try to integrate PostGIS with QGIS Modeler and proved that Aedes distribution in urban areas: A case study of breteau
the speed of executing QGIS Modeler can increase data retrieval index in Kuala Lumpur, Malaysia. Department of
efficiency by simply using database queries with correct Geography, Faculty of Arts and Social Sciences, University
database connection and configuration, which solves the of Malaya
potential problem and can be further implemented.
[4] Bitchell JF. An application of density estimation to
6.2 Integration with other tools geographical epidemiology. Stat Med 1990; 9: 691701.
Although QGIS Modeler allows much convenience by [5] Levine N. Crimestat: A spatial statistic program for the
automating the execution process and provides flexibility with analysis of crime incident locations (Version 3.0).
various parameter configurations. However, as the situation Washington, D.C.: Levine & Associates; and Houston, TX:
mentioned by NEA, they prefer to integrate QGIS Modeler with National Ned Institute of Justice 2007; p. 12.
ArcGIS to decrease learning curve and adoption time. Although [6] Daley, D. J. & Gani, J. (2005). Epidemic Modeling: An
such integration is not implemented currently, QGIS Modeler Introduction. NY: Cambridge University Press.
can export models as Python scripts which can be read by other
software like ArcGIS. This potential integration can be tempting [7] N.A. (n.d.). The graphical modeler. Documentation QGIS
to our potential users, but the configurations and connection in 2.0:http://docs.qgis.org/2.0/ca/docs/user_manual/processing
order to enable integration can also be challenging and tedious. /modeler.html
[8] N.A. (n.d.). XMLRequest. Mozilla Developer Network:
6.3 Auto-Geocoding https://developer.mozilla.org/en-
If we go through the whole process, geocoding is one of the very US/docs/Web/API/XMLHttpRequest
few parts where manual work is involved, and the accuracy and [9] N.A. (n.d.). Using GeoJSON with Leaflet. Leaflet:
efficiency of geocoding can also be crucial to the following http://leafletjs.com/examples/geojson.html
steps. For now the X-Y coordinates are converted using the
geocoding API provided, which is slightly outdated. Therefore,
a more accurate auto-geocoding process can be implemented to
further improve the workflow.
7 CONCLUSION
The customized dengue cluster process allows authorities to
automate their dengue clustering process. They can generate
clusters bases on building shape, view clusters by different date,
and view case report in both summary format and individual
case format. It uses a more intuitive way to provide reference to
authoritys decision making and strategy setting.
11 | P a g e