You are on page 1of 29

TRIBHUVAN UNIVERSITY

INSTITUTE OF ENGINEERING
CENTRAL CAMPUS, PULCHOWK

EVALUATION of PEDESTRIAN LOS at SIDEWALK by CLUSTER ANALYSIS and


QUESTIONNAIRE SURVEY (A CASE STUDY OF KATHMANDU VALLEY)

by

Ajeya Acharya
2073/MSTR/252

A THESIS PROPOSAL
SUBMITTED TO THE DEPARTMENT OF CIVIL ENGINEERING

DEGREE OF MASTER OF SCIENCE IN


TRANSPORTATION ENGINEERING

DEPARTMENT OF CIVIL ENGINEERING


LALITPUR, NEPAL

September, 2018
ABSTRACT
CONTENTS
ABSTRACT .................................................................................................................................... 2
LIST OF FIGURES ........................................................................................................................ 4
LIST OF TABLES .......................................................................................................................... 5
Chapter 1 ......................................................................................................................................... 6
INTRODUCTION .......................................................................................................................... 6
1.1 Background ........................................................................................................................... 6
1.2 Statement of Problem ............................................................................................................ 7
1.3 Objectives ............................................................................................................................. 7
1.4 Overall Framework of the Study........................................................................................... 8
Chapter 2 ......................................................................................................................................... 9
LITERATURE REVIEW ............................................................................................................... 9
Chapter 3 ....................................................................................................................................... 17
METHODOLOGY ....................................................................................................................... 17
3.1 Methods of cluster analysis ................................................................................................. 17
3.2 Cluster validation and selection of best cluster: ................................................................. 22
3.3 Questionnaire Form Survey: ............................................................................................... 23
3.4 Factor analysis (KMO test) ................................................................................................. 23
3.5 Reliability test (Cronbach’s Alpha) .................................................................................... 24
3.6 PLOS model based upon perception of pedestrian ............................................................. 25
Chapter 4 ....................................................................................................................................... 26
STUDY AREA and DATA COLLECTION ................................................................................ 26
4.1 Study Area: ......................................................................................................................... 26
4.2 Data Collection: .................................................................................................................. 26
Chapter 5 ....................................................................................................................................... 27
RESULT and ANALYSIS ............................................................................................................ 27
Chapter 6 ....................................................................................................................................... 28
SUMMARY AND CONCLUSION ............................................................................................. 28
REFERENCES ............................................................................................................................. 29
LIST OF FIGURES
LIST OF TABLES
Chapter 1

INTRODUCTION

1.1 Background
The population of Nepal in urban areas has increased significantly from 269 thousand in 1955 to
5.8 million in 2018 and is estimated to grow to around 8 million by the year 2030. Kathmandu is
only city of Nepal having population more than 1 million. The annual growth rate of motor vehicle
population has been increasing in significant amount. One of the important factor for urban
development is transportation and so every model trip has walking trip a significant proportion,
which implies that pedestrian, is inseparable part of transportation system. For the design of urban
and transportation facilities needs of pedestrian also must be considered along with needs of motor
vehicles. Large percentage of people in Kathmandu travels on foot or by public transportation but
with exponential increase of traffic vehicles there is no adequate attention given to public transport
and pedestrian facilities which has led to higher pedestrian fatalities. In addition, there is high
levels of air pollution and noise pollution due to large vehicular population as well as other
conditions like heat, dust, poor walking area condition and large distance trip and so people prefer
driving and riding than walking.
Walking provides mobility to large percentage of people in cities like Kathmandu especially to
tourists, students and poor people. It is also essential to support public transport facilities and short
distance trips for people owing private vehicles. An improved pedestrian safety and safer walkable
environment has many benefits like improvement of accessibility of pedestrians, reduction of
transportation cost, increase of parking efficiency, aesthetic environment, reduction of pollution,
improvement of heath of people due to walking.
According to HCM, Pedestrian Level of Service (PLOS) is “qualitative measure that describe the
operational characteristics of pedestrian which is based upon several service measures like speed,
travel time, comfort, convenience, interruptions and freedom to maneuver.” Six classes from “A”
to “F” that describe operations from best to worst for each type of facilities describe PLOS.
The traffic in urban street of Nepal is highly heterogeneous consisting of various kinds of vehicles
with different operational behaviors like motorbike, car, bus, micro, tempo, etc. moving in same
carriageway of roads. Our country don’t have flexible working hour system and so most of the
people like students, employees, etc. make their journey at similar time frame that results to huge
congestion at peak hours. Highly heterogeneous traffic flow, poor enforcement of traffic law,
illegal parking, poor surface condition, obstructions, unauthorized vendors activities, etc. are some
major factors that affect PLOS of urban off street facilities i.e. sidewalk. Sidewalk characteristics
in low-income country are much different compared to that of high income and developed
countries. Therefore, analysis of PLOS based upon models of developed country may not suite in
low-income country like Nepal.
In Nepal, there is no proper methodology to evaluate Pedestrian Level of Service (PLOS) in urban
areas. The suitable methodologies need to be developed that can help in planning, design and
operation phase of transportation projects. An attempt will be made in this study to define and
evaluate PLOS in urban sidewalk of Nepal. Both qualitative and quantitative methods will be used
to define and classify PLOS. Qualitative methods shall be based upon pedestrian perception
analysis at real time and quantitative method will be done to classify the PLOS of speed, flow,
space and volume to capacity ratio by powerful cluster analysis algorithms.
1.2 Statement of Problem
Rapid growth of vehicles has created threats upon pedestrian level of safety. Engineers often fail
to provide satisfactory facilities on the roadside or they compromise their safety for designing
better transportation facilities. Large percentage of death and injury are among pedestrians as they
are most vulnerable road users. There must be provision of safe environment for pedestrians and
prevent them without any conflicts with other modes of transportation.
PLOS analysis defines operating condition of pedestrian facility and this analysis helps in
addressing growth management. PLOS criteria judge the operational efficiency of infrastructures
of pedestrians. The road users, traffic, road facilities and environmental factors of Nepal are
completely different from those in context of developed countries like US, Canada, Australia, etc.
So, the modified criteria is must in Nepalese context for qualitative measurement of service level
of sidewalk of pedestrians.

1.3 Objectives
This thesis shall attempt to determine factors that affect sidewalk performance, which is based
upon perception of pedestrians, and level of services will be calculated from provided information.
In addition, video graphic techniques will be adopted to obtain pedestrian data for quantitatively
estimation of PLOS.
1. To discover suitable criteria which are appropriate for urban street of Nepalese context to
evaluate pedestrian LOS.
2. To determine perception based PLOS by the methods of questionnaire surveys.
3. To determine PLOS of pedestrian average speed, speed, flow rate and v/c ratio by the help
of various clustering algorithms.
4. To find most suitable cluster analysis algorithms for defining the ranges of PLOS
5. To compare the obtained ranges of PLOS with other existing international widely used
PLOS models.
1.4 Overall Framework of the Study

Selection of study area: sidewalk

Data Collection

Traffic data( video data): speed, volume


Geometric data: effective walkway width, length of segment
Questionanaire survey based upon user perception

Analysis and check(Quantitative method):


1. Pedestrian avg space, flow rate, speed, v/c ratio
2. Cluster analysis and Cluster validation
4. Determination of ranges of PLOS categories

Analysis and check(Qualitative method):


1. Gathering qualitative data by questionnaire survey
2. Factor test ( KMO test)
3. Reliability test (Cronbach's alpha)
4. PLOS modelling and its range

Conclusion and report writing

Fig: overall framework of the study


Chapter 2

LITERATURE REVIEW

Level of Service (LOS) is used to traffic engineers to evaluate effectiveness of transport facility.
LOS was introduced in Highway Capacity Manual (HCM), 1965 for the first time to describe
quality of service by a given facility at different conditions. Later HCM 2000 defined PLOS into
two broad segments that are uninterrupted and interrupted pedestrian facility. HCM 2010 analyzed
PLOS by measurement of flow rate of pedestrian that incorporates speed, density and volume and
sidewalk space. Pedestrian speed decreases as volume and density increase and so pedestrian space
reduces which reduce the ease of maneuver.
HCM defines six level of service that range from A to F. In sidewalk with LOS A, pedestrian are
ability to move in desired path without the need of alter movement. With LOS B, there is sufficient
area for pedestrians to walk freely, bypass others, occasional need of adjusting path and avoiding
conflicts. At LOS C, Pedestrian need to adjust path frequently to avoid conflicts and space is
sufficient for normal walking speed. At LOS D, there is restriction of freedom to walk in normal
speed and bypass other slower pedestrians. At LOS E, there is restriction of pedestrians’ walking
speed virtually and the need to adjust their walking behaviors frequently. LOS F is worst condition
where all walking speeds are severely restricted and there is frequent contact with other
pedestrians.
Following steps are used to determine PLOS of wide walk in HCM.

Determine effective walkway width

Calculate pedestrian flow rate

Calculate average pedestrian space

Determine LOS

Step1: Determine effective walkway width


HCM defines effective walkway width as a portion of walkway that can be used effectively by
pedestrians. Various obstructions reduce the walkway area.
WE=WT-W0
Where
WE= Effective walkway width (m)
WT= Total walkway width (m)
WO= Width reduction by various types of obstruction whose value are mentioned in HCM (m)
Step2: Calculation of pedestrian flow rate
An hourly demand is converted into peak 15 minute flows and LOS is based upon busiest 15
consecutive minutes during an hour.
𝑉ℎ
V15=
4∗𝑃𝐻𝐹
where,
V15= pedestrian flow rate during 15 min (p/15-min)
Vh= pedestrian demand in hour
PHF= Peak Hour Factor
If peak 15 min pedestrian volumes are available than highest 15 min volume is used directly and
is converted into unit flow rate as below.
𝑉15
Vp =
15∗𝑊𝐸
Where
Vp = pedestrian flow per unit width (p/m/sec)

Step3: Calculation of Average pedestrian space


Pedestrian space is inverse of density which is defined by:
AP= SP/VP
Where
AP= pedestrian space (m2/p)
SP= pedestrian speed (m/s)
VP= pedestrian flow per unit width per sec
Volume to capacity (v/c) ratio is also important parameter for determination of PLOS. Hourly
volume can be found from video data collection and capacity of sidewalks can be obtained from
manual.

Step4: Determination of LOS


6 ranges of LOS from A to F can be measured based upon average pedestrian space and other
related measures like flow rate, average speed and v/c ratio. HCM also defined LOS regarding
cross flows, queuing areas and platoons flow conditions.
Table: PLOS categories in HCM 2010
LOS
Average Related measures
space Comments
(ft2/p)
Flow Average v/c ratio
rate speed
(p/m/ft) (ft/s)
A >60 <=5 >4.25 <=0.21 Ability to move in desired path, no need
to alter movements
B >40-60 >5-7 >4.17-4.25 >0.21-0.31 Occasional need to adjust path to avoid
conflicts
C >24-40 >7-10 >4-4.17 >0.31-0.44 Frequent need to adjust path to avoid
conflicts
D >15-24 >10-15 >3.75-4 >0.44-0.65 Speed and ability to pass slower
pedestrians restricted
E >8-15 >15-23 >2.5-3.75 >0.65-1 Speed restricted, very limited ability to
pass slower pedestrians
F 8<= variable <=2.5 variable Speeds severely restricted, frequent
contact with other users

Jaskiewicz (2000) proposed method of evaluation of PLOS based upon trip quality. Nine specific
evaluations of pedestrian system were measured in terms of pleasantness, safety, and functionality
and the nine measures are enclosure/definition, complexity of path networks, building
articulations, complexity of spaces, transparencies, buffers, shades, trees,
overhangs/awnings/varied roof lines, and physical components/conditions. Each of these measures
was derived from a combination of safety issues, volume and capacity consideration.
Miller et al. (2000) used visualization as simulation tool to validate and calibrate PLOS in sub
urban areas. The simulation were produced in short time and respondents were able to understand
from the visualizations what type of improvements were being considered.
Pascal (2003) included the obstacles in pedestrian simulations. A person requires a 0.3m lateral
spacing on each side and extra longitudinal space for speed deviation. Based on this research, the
measured distance to obstacles is 0.45m for wall, 0.35m for fence and roadway, 0.3m for poles.
Sarkar (2003) introduced some major theoretical guidelines for qualitative evaluation of the levels
of comfort offered along walkways in major activity centers. Researches on urban design,
environmental psychology, landscape architecture, and urban planning were used to develop the
method. The method included two separate evaluations; one service level, which gives standards
for overall desirable and undesirable comfort condition at the macro level, and other the quality
level, which looks at the micro level finer details of comfort of pedestrians. Service level and
quality level were based on physical, physiological, and psychological comfort. Comfort
requirements were vary depending on cultural and spatial.
Rahaman (2005) tried to explore qualitative level of pedestrians comfort in Dhaka by offering six
broad categories of roadside walking environment in terms of safety, security, convenience and
comfort, continuity of the walkway, system coherence, and attractiveness by some specific
facilities. Some qualitative data had been collected from observation survey, whereas the walker's
responses had been recorded through questionnaire survey. The questionnaire was designed to get
the opinion of pedestrians concerning the sidewalks environment with those six criteria. Result of
the research stated that pedestrians were neglected for their safety and convenience. Hence, city
authorities must give more attention in pedestrian infrastructures rather than those for motorized
vehicles.
Kim et al. (2006) found that street performers have negative impact on pedestrian LOS because
they create congestion, limit access and interfere with pedestrian flows. Petritsch et al.(2006)
incorporated traffic volumes on the adjacent roadway and exposure at conflict points with
intersections and driveway and the study reveals that traffic volumes on the adjacent roadway and
the density of conflict points along the facility are the primary factors in the LOS model for
pedestrians traveling along urban arterials with sidewalks.
Dandan (2007) developed a method to assess pedestrian LOS with pedestrian perceptions.
Respondents were categorized into three groups based on age, gender, and walking experience,
then a questionnaire survey was conducted. Stepwise regression model was used to build a model.
The model included several variables like bicycle volume, pedestrian volume, vehicle volume,
driveway access quantity per meter, and distance between sidewalks to vehicle lanes. He studied
the methods of assessing pedestrian level of service by analyzing the relationship between the
pedestrian's subjective perceptions and the quality of the road physical facilities as well as the
traffic flow operation. The model was developed using the 395 real-time observations from 12
urban roadway segment sidewalks in China. After data collection he calculated Pearson correlation
coefficient linear relation between two variable using formulas the following model was developed
which is given below.
PLOS= -1.43 + 0.006QB - 0.003QP + .056QV/Wr+11.24(P-1.17P3)
Where
QB= bicycle traffic during 5 min period
QP=pedestrian traffic during 5 min period
QV=vehicle traffic during 5 min period
P=driveway access quantity per meter
Wr= distance between sidewalk and vehicle lane

Jianhong et al. (2008) analyzed the pedestrian flow characteristics on basis of one-way
passageways, two-way passageways, descending stairways, and ascending stairways in Shanghai
metro stations and revealed that consistent rules for traffic flow, density, and speed could be
applied to both pedestrian flows and vehicle flows from the view of macroscopic statistics.
Aultman-Hall et al. (2009) analyzed that season and weather have an effect on levels of pedestrian
volume in downtown Montpelier, Vermont. Precipitation reduces the average hourly volume level
by nearly 13% and the winter months reduce it by 16%. It was noted that at best a combination of
weather variables accounts for 30% of the variance measured in hourly volumes.
Australian Method: The Australian method for PLOS depends on three factors, namely the
physical characteristics, location factors, and user factors. Pedestrian conditions are described by
PLOS grade from PLOS A (ideal pedestrian condition) to PLOS E (unsuitable pedestrian
conditions. Physical characteristics include path width, surface quality, obstructions, crossing
opportunities, and support facilities. Location factors address issues related to connectivity, path
environment, and the potential for vehicle conflict. Path environment is a measure of the degree
of pleasantness of the surrounding environment and relates to distance from the roadway. User
factors takes into consideration pedestrian volume, mix of path users and personal security. It also
effectively determines which factors contribute to the high or low PLOS. While using the
Australian Method for determining the PLOS values for the study sidewalks, geometric, location
and user friendly factors are considered.
Hidayat (2010) determines factors affecting sidewalk's performance based on pedestrians'
perception. A questionnaire with a 27 items was developed to measure pedestrian perception in
five different areas: (a) safety, (b) sidewalk performance (c) accessibility (d) vendors presence,
and (e) comfort/convenience. It is believed that each item could potentially affect sidewalk
performance. Data collection is performed in Pratunam area, one of the commercial areas in
Bangkok, Thailand. Street vendors exist side by side along the sidewalks at Pratunam area having
the majority (60%) were female. Respondents grouped in age in under 18 years (32%), from 18 to
30 years (61%), and 31 to 56 years (5%). Walking behavior included two persons (45%), walking
alone (29%), walking in-group with 3 persons (12%)), and walking in-group with more than three
persons (10%). About 67 % of respondents stated that walking was their main mode during the
survey. After this Kaiser-Meyer-Olkin (KMO) test and/or the Barlett's test of sphericity undertook
for examining the interview data to see whether it is appropriate to use factor analysis. Reliability
test can be used to measure the consistency of a questionnaire form surveyed by him.
Sahani and Bhuyan (2013, 2015) studied off street pedestrian facility in mid-sized city in India
.They calculate pedestrian space, flow rate ,v/c ratio and average walking speed as measure of
effectiveness in population less than million- Bhubaneswar and Rourkela using self-organizing
map in artificial neural network system. In their study, clustering analysis has been taken up. Self-
Organizing Mapping (SOM) is a hierarchical agglomerative clustering in Artificial Neural
Network (ANN) use for grouping subset of small typical traffic pattern for determination of
appropriate number of groups.
Prativa Gywali (2014) developed PLOS model in context of Kathmandu city of Nepal. A little
perception of pedestrian was also considered during analyisis.
Y=1.76158+0.0048FR+1.00495D-0.2531W-0.6712B

Where,
Y= PLOS
FR= Flow Rate
D= Density
W= Width
B= Buffer

Cluster Analysis:
Clustering is the formation of groups of object based upon the information in data that describes
their relationships. Clustering is the way to form group from a large data set. Various cluster
analysis methods are used for pedestrian level of service categories like k-means, fuzzy c-means,
hierarchical agglomerative clustering, SOM in ANN, affinity propagation and GA Fuzzy Cluster
Analysis.
K-means clustering
It is a kind of learning(that are not supervised) used when there are unlabeled data (i.e., data
without well-defined groups). The algorithm finds groups in the data, where the numbers of groups
are represented by the variable K. The algorithm uses iterations to assign each data point to one of
K groups based on the features provided. Data points are clustered based on similarity of features.
The function k-means partitions the observed data into k mutually exclusive clusters. It also returns
a vector of indices indicating to which of the k clusters it has assigned each observation. The
algorithm minimizes the sum of distances from each object the centroid of clusters. This algorithm
moves objects between clusters until the sum cannot be decreased further.
Kim and Yamashita (2005) used k-means clustering to analyze the pedestrian crash pattern. They
illustrated the use of k-means clustering technique to analyze the locations and patterns of traffic
accidents. They also found that for pedestrian safety analysis k-means is the most appropriate
method to locate compact, localized clusters.
Fuzzy c-means clustering
In Fuzzy c-means clustering generalization of partition clustering methods (such as k-means) is
done. It allows us to classify an individual into more than one cluster. Suppose we have k clusters
and we define a set of variables that represent the probability that object i is classified into cluster
k in partition clustering algorithms, one of these values will be one and the rest will be zero. These
algorithms classify an individual into one and only one cluster. However, in fuzzy clustering,
objects are not assigned to a particular cluster: they possess a membership function indicating the
strength of membership in all or some of the clusters. This is called fuzzification of the cluster
configuration. The concept of a membership function derives from fuzzy logic, an extension of
Boolean logic in which the concepts of true and false are replaced by that of partial truth. Boolean
logic can be represented by set theory, and in an analogous manner, fuzzy logic is represented by
fuzzy set theory.
Chakroborthy and Kikuchi (1990) have shed light on the application of fuzzy set theory to the
analysis of highway capacity and level of service. The authors have shown the inadequacies in use
of the current procedure to determine highway capacity and service level. The values of input
variables and output variables involved in calculating capacity and service level were represented
by the fuzzy numbers. In this study it has been shown that it is much better if the levels of service
categories are defined as fuzzy sets.
Hierarchical agglomerative clustering
Hierarchical clustering investigates grouping in data, simultaneously, by creating a cluster tree.
The tree is not a single set of clusters, but rather a multi-level hierarchy, where clusters at one level
are joined as clusters at the next higher level. This allows us to decide what level or scale of
clustering is most appropriate in our application.
Lingra (1995) compared grouping of traffic pattern using the Hierarchical Agglomerative
Clustering and the Kohonen Neural Network methods in classifying traffic patterns. It has been
mentioned that the Kohonen neural network integrates the hierarchical grouping of complete
patterns and the least-mean-square approach for classifying incomplete patterns. It is advantageous
to use hierarchical grouping on a small subset of typical traffic patterns to determine the
appropriate number of groups and change its parameters to reflect the changing traffic patterns.
Such an approach is useful in using hour-to-hour and day-to-day traffic variations in addition to
the monthly traffic-volume variation in classifying highway sections.
Self-Organizing Map (SOM) Clustering
Self-Organizing Map (SOM) is an Artificial Neural Network (ANN). It has the capability to learn
the pattern of input and to find out correlations in their input and responses. For clustering of speed
data, the application particular problem to define the LOS of urban street artificial neural network
(ANN) may be used. Levy et al. (1994) compared the ability of supervised and unsupervised
learning method for classification and clustering. Garni and Abdennour (2008) developed a
method to detect and count the vehicles plying on road from the video graph data using the ANN
neural network. Author applied a self-organizing neural network pattern recognition method to
classify highway traffic states into some distinctive cluster centers.
Jian-ming (2010) devised a way to combine ANN and Genetic Algorithm method for the
prediction of traffic volume in Sanghai Metropolitan Area. The accuracy of prediction of traffic
volume of future traffic improved due to this combined algorithm. Cetiner et. al. (2010) developed
a back propagation Neural Network traffic flow model for prediction of traffic volume of Istanbul
City. The model uses the historical data at major junctions of the city for prediction of future traffic
volume. Florio and Mussone (1995) have taken the advantage of application of ANN in
classification problem to develop the flow-density relationship of a motorway. The author defined
the stability and instability of spacing of vehicle in traffic stream. Murat and Basken (2006) used
ANN for determination of non-uniform delay which is part of total vehicular delay at signalized
intersections. Sharma et al. (1994) studied and compared the learning ability of both supervised
and unsupervised type of learning method for clustering.
Affinity Propagation (AP) clustering
Affinity propagation is a theoretic clustering method developed by Frey and Dueck (2007). It
considers all of data points as center point. Every message is sent to reflect the latest interest that
is owned by each data point to be able to select another data points as their center point also called
as exemplar. The researchers have used this algorithm in solving various clustering problems. Frey
and Deuck (2007) used AP algorithm to cluster images of faces and genes in microarray data. They
found AP to perform accurately and one-hundredth time as fast as other conventional methods of
clustering . Conroy and Xi (2009) developed a semi-supervised AP algorithm for face-image
clustering and functional Magnetic Resonance Imaging (fMRI) volumetric pixel clustering. Xia
et.al. (2008) presents two variants of AP for grouping large scale data with a dense similarity
matrix. The local approach was Partition Affinity Propagation (PAP) and the global method was
landmark affinity propagation (LAP). Refianti et.al. (2012) compared accuracy and effectiveness
of AP and K-Means algorithm. They found that AP was effective than K-Means by implementing
these algorithms on the relationship between two variables i.e Grade Point Average (GPA) and
duration of Bachelor-Thesis completion at Gunadarma University. Zhang and Zhuang (2008)
presented a modified AP algorithm called voting partition affinity propagation (voting-PAP) which
is a method of clustering using evidence accumulation.. Yang et.al. (2010) used this AP clustering
algorithm in traffic engineering. A model-based temporal association scheme and novel pre-
processing and post-processing operations have been proposed by the authors, which together
with affinity propagation makes a successful method for vehicle detection and on traffic
surveillance. Zhang et.al. (2012) proposed an instant traffic clustering algorithm using AP to find
points on road having similar traffic pattern. Authors found the algorithm to be suitable in
predicting the traffic pattern and for finding the influence of traffic pattern at one point to that at
another point.
Chapter 3

METHODOLOGY

Pedestrian LOS is determined by two methods. One is quantitative method by cluster analysis on
pedestrian data ( average speed, flow rate, v/c ratio and pedestrian space) and other is qualitative
method by questionnaire survey based on pedestrian perceptions.
3.1 Methods of cluster analysis
The key step involves for applying methodology are:

Study area selection, video and field data

Cluster analysis algorithm (k-means, Fuzzy c-means, Hierarchical


agglomerative clustering,SOM, AP,GA-Fuzzy)

Cluster validation (Silhouette width index)

Determination of PLOS categories range

K-means clustering algorithms


The following steps are followed in k-mean clustering.
Step 1: Place K points into the space represented by the data that are being clustered. These points
represent initial group centroids.
Step 2: Assign each object to the group whose centroid is closest to the object.
Step 3: Recalculate the positions of the K centroids after assigning all objects
Step 4: Repeat above Steps 2 and 3 until the centroids no longer move. This produces a separation
of the objects into groups.
List of data to be
clustered

Assign number of
cluster

Centriod
No object End
move
group
Object to centriod
distance

Grouping of data basd


on minimum distance

Fig: Flow-chart for k-means clustering

Mathematically,
Step 1: From a data set of N points, k-means algorithm allocates each data point to one of c clusters
to minimize the within-cluster sum of squares.
D2ik=(xk-vi)T (xk-vi), 1≤i≤c, 1≤k≤N
Where,
D2ik is the distance matrix between data points and the cluster centers,
xk is the kth data point in cluster i
vi is the mean for the data points over cluster i, called the cluster centers.

Step 2: Selecting points for a cluster, which are having the minimal distances from the centroid.
Step 3 Calculating cluster centers
𝑁𝑖 𝑥
∑𝑗=1 𝑖
𝑣𝑖 (𝑙) =
𝑁𝑖

Max |v(l)-v(l-1)|≠0
Where, Ni is the number of objects in the cluster i, j is the jth cluster; l is the number of iterations
Fuzzy c-means clustering
The Fuzzy c-means clustering algorithm is based on the minimization of an objective function
called c-means functional.

Here, 1<=m<∞
M= real number >1
Uij= degree of membership of xi in the cluster j
Xi= ith of d-dimensional measured data
Cj= d-dimension center of the cluster
||*|| is any norm expressing the similarity between any measured data and the center.
Fuzzy partitioning is carried out through an iterative optimization of the objective function Jm, with
the update of membership uij and the cluster centers cj by:

This iteration will stop when,

Where
Ɛ =termination criterion between 0 and 1,
k are the iteration steps.
This procedure converges to a local minimum or a saddle point of Jm.

The algorithm for fuzzy c-means algorithm is shown below:


Step 1: Initialize U=[uij] matrix, U(0)
Step 2: At step k, calculate the centers vectors C(k)=[cj] with U(k)
Step 3: Update U(k) , U(k+1)
Step 4: If || U(k+1) - U(k)||< then STOP; otherwise GOTO to step 2.
Hierarchical Agglomerative Clustering Algorithm:
Step 1: Find the similarity or dissimilarity between every pair of objects in the data set. Here, we
calculate the distance between objects using the distance function.
Euclidean distance is given by,

2 2 2
D(I,j)=√(𝑥𝑖1 − 𝑥𝑗1 ) + (𝑥𝑖2 − 𝑥𝑗2 ) + ⋯ . . +(𝑥𝑖𝑝 − 𝑥𝑗𝑝 )

City block distance or Manhattan distance is given by,


D(I,j) = |𝑥𝑖1 − 𝑥𝑗1 | +|𝑥𝑖2 − 𝑥𝑗2 |+…..+|𝑥𝑖𝑝 − 𝑥𝑗𝑝 |

Minkowski distance which is generalization of both Euclidean and Manhattan distance is given
by,
D(I,j) = (|𝑥𝑖1 − 𝑥𝑗1 |q +|𝑥𝑖2 − 𝑥𝑗2 |q+…..+|𝑥𝑖𝑝 − 𝑥𝑗𝑝 |q)1/q

Here, q>=1, if q=1 then this distance will be Manhattan distance and for q=2 this distance will be
Euclidean distance.
Step 2: Group the objects into a binary, hierarchical cluster tree. Here, we link together pairs of
objects that are in close proximity using the linkage function. The linkage function uses the
distance information generated in step 1 to determine the proximity of objects to each other. As
objects are paired into binary clusters, the newly formed clusters are grouped into larger clusters
until a hierarchical tree is formed.
Step 3: Determine where to divide the hierarchical tree into clusters. Here, we divide the objects
in the hierarchical tree into clusters using the cluster function. The cluster function can create
clusters by detecting natural groupings in the hierarchical tree or by cutting off the hierarchical
tree at an arbitrary point.

SOM Algorithm:
In SOM, a set of nodes is arranged in a geometric pattern which is typically a 2-dimensional lattice.
This arrangement of neuron may be grid, hexagonal or random topology. Each node is associated
with a weight vector with the same dimension as the input space. The purpose of the SOM is to
find a good mapping. During training, each node is presented to the map so also the input data
associated with it. The clustering using SOM algorithm follows two steps.
Step 1:
 Compare input data with all the input weight vectors mi(t)
 Identify Best Matching Unit (BMU) on the map. The BMU is the node having the lowest
Euclidean distance w.r.t input pattern x(t) . The final topological organization of the map
is heavily influenced by this distance.
BMU mc(t) is identified by:
For all i, ||x(t)-mc(t)||<= ||x(t)-mi(t)||
Step 2: Update weight vectors of BMU as:
mi(t+1)=mi(t)+αhb(x)i (x(t)-mi(t)) where, hb(x) is neighbourhood function defined as:
2
||𝑟𝑖 −𝑏(𝑥)||
(− )
ℎ𝑏(𝑥) = 𝛼(𝑡)𝑒 2𝜎2 (𝑡)
, 0<α(t)<1 is learning factor and it decreases with each iteration. ri and
rb(x) are locations of neuron in input lattice.

Affinity Propagation (AP) Algorithm:


Step 1: Input similarity matrix s(i,k)
Step 2: Initialize availabilities a(I,k) to zero. I.e. a(i,k)=0
Step 3: Update all responsibilities r(I,k)
Step 4: Update all availabilities a(I,k)
Step5: Availabilities and responsibilities matrix are added to monitor the exemplar decisions. For
a particular data point I, a(i,k) + r(i,k) > 0 for identification exemplars.
Step 6: If decisions made in step 3 did not change for a certain times of iteration or a fixed number
of iteration reaches, go to step 5. Otherwise, go to step 1.

End
Start

Construct Change in
similarity decision?
matrix

Availability When E>0 exemplar


A=0 identified

Update Update Availability


Responsibility A E=A+R
R
Fig: flowchart of AP clustering
GA-Fuzzy Algorithm:
Genetic Algorithms are search algorithms that are based on concepts of natural selection and
natural genetics. Genetic algorithm was developed to simulate some of the processes observed in
natural evolution, a process that operates on chromosomes (organic devices for encoding the
structure of living being). The genetic algorithm differs from other search methods in that it
searches among a population of points, and works with a coding of parameter set, rather than the
parameter values themselves. It also uses objective function information without any gradient
information. The transition scheme of the genetic algorithm is probabilistic, whereas traditional
methods use gradient information because of these features of genetic algorithm; they are used as
general purpose optimization algorithm. They also provide means to search irregular space and
hence are applied to a variety of function optimization, parameter estimation and machine learning
applications.
The quality of cluster result is determined by the sum of distances from objects to the centers of
clusters with the corresponding membership values:
𝑚
J=∑𝑚 𝑐
𝑘=1 ∑𝑖=1(𝜇𝑘𝑖 ) 𝑑(𝑣𝑘 , 𝑥𝑖 )

Here,
d(vk,xi)= Euclidean distances between object xi=(xi1,xi2,…,xin)π/3 and center of cluster
vk=(vk1,vk2,…vkn)
mϵ(1,∞) is exponential weight that determine fuzziness of clusters
The local minimum obtained with the fuzzy c-means algorithm often differs from the global
minimum. Due to large volume of calculation, realizing the search of global minimum of function
J is difficult. GA, which uses the survival of fittest, gives good results for optimization problem.
3.2 Cluster validation and selection of best cluster:
From the above six clustering methods, the best method which will be relevant for city in Nepal
context has to be determined which could be evaluated by the help of Silhouette Width Index.
Silhouette Width Index was proposed by Rousseeuw (1987) to evaluate clustering results.
Silhouette width index (Si) is a composite index which reflects the compactness and separation of
the clusters. The average s(i) of all data points reflects the quality of clustering result. Larger
silhouette value signifies good cluster.
Silhouette width is calculated as follows:
𝑏(𝑖) − 𝑎(𝑖)
𝑆(𝑖) =
max⁡{𝑎(𝑖), 𝑏(𝑖)}
Where,
a(i)= average distance of a data point i to other data point in the same cluster
b(i)= average distance of that particular data point to all the data points belonging to the nearest
cluster

The silhouette ranges from -1 to 1. A high value indicates that the object is well matched to its
own cluster and poorly matched to neighboring clusters. If most objects have a high value, then
the clustering configuration is appropriate. If many points have a low or negative value, then the
clustering configuration may have too many or too few clusters.
3.3 Questionnaire Form Survey:
The key step involves for applying methodology are:

Study area selection and questionnaire survey to gather qualitative data

Factor Analysis (KMO test)

Reliability test (Cronbach's alpha)

PLOS modelling

3.4 Factor analysis (KMO test)


Factor analysis involves two steps that are as follow:
1. Correlation matrix computation for all variables
2. Determination of number of factors necessary to represent data
Kaiser-Meyer-Olkin (KMO) Test is a measure of how suited the data is for Factor Analysis. The
test measures sampling adequacy for each variable in the model and for the complete model. The
statistic is a measure of the proportion of variance among variables that might be common
variance. The lower the proportion, the more suited data is to Factor Analysis. KMO returns values
between 0 and 1. KMO values between 0.8 and 1 indicate the sampling is adequate. KMO values
less than 0.6 indicate the sampling is not adequate and that remedial action should be taken but
some authors put this value at 0.5 also.

For reference, Kaiser put the following values on the results:


KMO values Factor test
0.00 to 0.49 unacceptable
0.50 to 0.59 miserable
0.60 to 0.69 mediocre
0.70 to 0.79 middling
0.80 to 0.89 meritorious
0.90 to 1.00 marvelous

The formula for the KMO test is:


∑𝑖≠𝑗 𝑟 2 𝑖𝑗
KMOj = ∑𝑖≠𝑗 𝑟 2 𝑖𝑗+∑𝑖≠𝑗 𝑢2 𝑖𝑗

Where⁡
R = [rij] is the correlation matrix and
U = [uij] is the partial covariance matrix.

3.5 Reliability test (Cronbach’s Alpha)


Cronbach’s alpha, α (or coefficient alpha), developed by Lee Cronbach in 1951 is a measure of
reliability or internal consistency. “Reliability” is how well a test measures what it should.
Cronbach’s alpha tests to see if multiple-question “Likert scale” surveys are reliable. These
questions measure latent variables (hidden or unobservable variables like a person’s
conscientiousness, neurosis or openness) that are very difficult to measure in real life. Cronbach’s
alpha will tell you if the test you have designed is accurately measuring the variable of interest.
The internal consistency is examined to ensure at a certain level that the scale (1-6) for measuring
the relative significance of the questionnaire the same result over time.
The formula for Cronbach’s alpha is:
𝑁. c̄
∝=
v̄ + (N − 1). c̄
Where:
∝ = Cronbach’s alpha
N = the number of items.
c̄ = average covariance between item-pairs
v̄ = average variance

Cronbach’s alpha Internal consistency


α>=0.9 Excellent
0.9>α>=0.8 Good
0.8>α>=0.7 Acceptable
0.7>α>=0.6 Questionable
0.6>α>=0.5 Poor
0.5>α Unacceptable

3.6 PLOS model based upon perception of pedestrian


A linear relationship will be framed in between PLOS and the data from questionnaire by inverse
variance method.
Y= a0+a1X1+a2X2+a3X3+a4X4+a5X5
Where
Y=PLOS
X1= safety condition
X2= comfort condition
X3= obstruction condition
X4= accessibility condition
X5= environmental condition
a0, a1, a2, a3, a4, a5 are coefficient
After formation of model, the other thing is to determine range of PLOS from questionnaire survey.
For this PLOS for best and worst condition ymax and ymin will be obtained which will be divided
by appropriate interval and obtain range of PLOS from A to F.
Chapter 4

STUDY AREA and DATA COLLECTION

4.1 Study Area:


The study area of this study will be sidewalks of important area of Kathmandu city. The map of
Kathmandu city will be taken from Google map for the study of location before data collection.
The study area will be such that the sidewalk will be heavily rushed one like in school area, market
area, office area, market area, hospital etc.
4.2 Data Collection:
Three techniques will be used for data collection. First, one is video graphic technique which
involves collecting data from video like speed and flow of pedestrians. Large sample of pedestrians
will be observed at various sidewalk location about 2 months in both peak and off-peak hour of
working and non-working days. After video collection the video clips will be loaded in computer
to extract data like 15-min peak flow rate and average pedestrian speed in each segment of
sidewalk. Flow rate, volume/ capacity ratio and average pedestrian space are then calculated based
upon video data.
Second, one is Field observation for geometric data. The total width of sidewalk, which will be
used for study purpose, will be measured. Different obstruction also will be observed and thus
effective width of sidewalk will be calculated by using HCM 2010 manual. A stretch of 6 meter
was marked in every sidewalk under study to observe the flow of pedestrian. Land use pattern of
such area are also observed for analysis purpose.
The third technique is questionnaire form technique in which data is collected based on pedestrian
perception. Questionnaires are developed in different perception areas like safety, comfort,
obstruction, accessibility and environmental condition that could potentially affect sidewalk
performance. All questions will be given a score from 1 to 6 (1 represent strongly agree and 6
represent strongly disagree) and this will be filled by the perception of pedestrian at real time by
interview technique.
Chapter 5

RESULT and ANALYSIS


Chapter 6

SUMMARY AND CONCLUSION


REFERENCES

You might also like