A Regression Analysis On The Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces
A study presented to Ms. Angela D. Nalica Professor, Stat 136
In partial fulfilment of the requirements for STAT 136: Introduction to Regression Analysis University of the Philippines, Diliman, Quezon City
CRUZ, Clemence-Fatima MACARAIG, Miguel Rodrigo SANTOS, Marvin Allan
May 28, 2010
Abstract
This study focuses on the provincial crime rate in the Philippines for the year 2000. The aim is to ascertain the possible factors that affect the crime rate in the Philippines using multiple linear regression. The results present that, at a 0.05 level of significance, the following variables contribute to crime rate: population density, poverty incidence, number of policemen, and number of courts.
Introduction
Crime is a truth that exists for all, whether it is taken as a moral or legal construct. This is a truth that we would have to accept, no matter how appalling it may seem. Many efforts have been exerted in order to eradicate crime. Unfortunately, a world or a country without crime is strictly utopian. As such, an existence without crime is impossible to achieve. For us to be able to eradicate the concept of crime, we must first remove the concepts it violates which are morals and laws, which is, again, a utopian task. This would be a discourse meant for philosophical minds, and would therefore beyond our concerns. Thus, the best course of action would be to deter or lessen the crime incidence or lower the crime rate that prevails in our country.
Here in the Philippines, crime is one of the foremost problems present. However, in the midst of more urgent problems such as poverty, corruption, and hunger, crime loses most of its significance and is then relegated to the bottom of the Philippiness long list of problems. The solution to crime deterrence becomes limited to debates on revising punishments for crimes and reinstating the death penaltya punishment which has no proven effect of deterring crime rate. What country officials fail to recognize is that bandaid solutions such as imposing severe punishments do not work on large-scale problems such as this. If so, what do we have to do in order to address this problem?
To any problem, the solution is to ascertain its true cause and attack it from its roots. This may sound easy and simple enough; however in our country where problems are more tangled than politics, this would be a complicated task. In most cases, people do not seem to know for certain which problem causes which. For example: is the Philippines poor because there is a high incidence of corruption? Or is the Philippines corrupt because there is poverty? The same goes for crime rate. Is there a high rate of crime because the Philippines is poor? Or is it that the Philippines is poor because there is a high crime rate? Here lies the dilemma.
If the possible factors that affect crime rate are correctly or sufficiently identified, a more feasible solution, by means of addressing or alleviating these factors, may be formulated. And thus, we ask this question: what are the possible factors that affect crime rate here in the Philippines, and by how much do these factors affect crime rate? It is in hopes of answering these questions that this study was conducted.
This study aims to ascertain the possible factors that affect the crime rate here in the Philippines, and to provide an estimate regarding the impact that these factors present through the use of multiple linear regression. This study will focus on crime rate in the Philippines using provincial data from the National Statistical Coordination Board (NSCB). All data retrieved were from the year 2000.
Definition of Terms
Crime Rate** - is the number of reported crimes per 100,000 population. Cohort Survival Rate the percentage of enrollees at the beginning grade or year in a given school year who reached the final grade or year of the elementary of secondary level. Consumer Price Index (CPI) Indicator of the change in the average prices of a fixed basket of goods and services commonly purchased by households relative to a base year. Enrolment - total number of pupils/students who register/enlist in a school year. Family Income includes primary income and receipts from other sources received by all family members during the calendar year as participants in any economic activity or as recipients of transfers, pensions, grants, etc. (2000 FIES, NSO) Primary income includes: Salaries and wages from employment. Commissions, tips, bonuses, family and clothing allowance, transportation and representation allowance and honoraria. Other forms of compensation and net receipts derived from the operation of family-operated enterprises/activities and the practice of profession or trade. Income from other sources include: Imputed rental values of owner-occupied dwelling units. Interests. Rentals including land owners share of agricultural products Pensions Support and value of food and non-food items received as gifts by the family (as well as the imputed value of services rendered free of charge to the family). Receipts from family sustenance activities, which are not considered as family operated enterprise. Family Expenditures refers to the expenses or disbursements made by the family purely for personal consumption during the reference period. They exclude all expenses in relation to farm or business operations, investment ventures, purchase
of real property and other disbursements which do not involve personal consumption. Gifts, support, assistance or relief in goods and services received by the family from friends, relatives, etc. and consumed during the reference period are included in the family expenditures. Value consumed from net share of crops, fruits and vegetables produced or livestock raised by other households, family sustenance and entrepreneurial activities are also considered as family expenditures. Functional Literacy - represents a significantly higher level literacy which includes not only reading and writing skills but also numeracy skills. This skill must be sufficiently advanced to enable the individual to participate fully and effectively in activities commonly occurring in his life situation that require a reasonable capability of communicating by written language. Gini Ratio the ratio of the area between the Lorenz curve and the diagonal (the line of perfect equality) to the area below the diagonal. Notes: It is a measure of the extent to which the distribution of income/ expenditure among families/individuals deviates from a perfectly equal distribution, with limits 0 for perfect equality and 1 for perfect inequality. Gross Regional Domestic Product - aggregate of the gross value added or income from each industry or economic activity of the regional economy. Human Development Index - a measure of how well a country has performed, not only in terms of real income growth, but also in terms of social indicators of peoples ability to lead a long and healthy life, to acquire knowledge and skills, and to have access to the resources needed to afford a decent standard of living. Literacy rate, Simple/Basic the percentage of the population 10 years old and over, who can read, write and understand simple messages in any language or dialect. Population Density refers to the number of persons per unit of land area (usually in square kilometers). This measure is more meaningful if given as population per unit of arable land.
Poverty Incidence the proportion of families/individuals with per capita income / expenditure less than the per capita poverty threshold to the total number of families/individuals. Province the largest unit in the political structure of the Philippines. It consists, in varying numbers, of municipalities and, in some cases, of component cities. Its functions and duties in relation to its component cities and municipalities are generally coordinative and supervisory. Social Services - this covers expenditures for education, health, social security, labor and employment, housing and community development and other social activities. Unemployment Rate proportion in percent of the total number of unemployed persons to the total number of persons in the labor force.
__________________
National Statistical Coordination Board. (2009). Philippine statistical yearbook. (2009 edition). Makati City, Philippines: Author.
**Note that crime rate = (total crime incidence/population)*100,000. 100,000 is a magnifier, and as such any power of ten may be used. Usually, 1,000 and 100,000 are used as magnifiers.
Review of Related Literature
Before, crime had been viewed as a moral and social construct. Over the years, however, there has been a shift from a social point of view to an economic one. At the forefront of this economic view on crime is Gary Beckers work Crime and Punishment: An Economic Approach published in 1968. Here, Becker views crime as an economic constructone that presents opportunity and economic costs and has a supply of offenses. He further asserts that some persons become criminals...not because their basic motivation differs...but because their benefits and costs differ. Thus, according to Becker, crime is not entirely psychological or social, but economic in which choices and utility are of importance.
Ehrlich (On the Relation Between Education and Crime, 1975) attempts to establish a link between education and crime, again from an economic perspective. In his work, Ehrlich states that education may be viewed as an opportunity-maker. Education, Ehrlich postulates, is important for on-the-job training. These two, in turn help determine labour distribution and personal income. He further states that it is not educational attainment that is closely related to crimes. Rather, it is the inequalities in the distribution of schooling that is strongly related to the incidence of many crimes. Ehrlich echoes Beckers statement that the behaviour of crime is not merely psychological or social, but also economic. Specifically, this is due to the relative earnings of offenders between legitimate and illegitimate activities. As a form of rehabilitation, Ehrlich suggests training geared towards legitimate activities before convicts are released from prison.
According to Wadsworth (2001), employment is an important factor that affects crime rate. He asserts, both industrial composition and labor force participationhave direct and indirect effects on violent and property crime rates. These effects cannot be explained entirely by the fact that individuals who are unemployed commit more crimes. There is a contextual influence of weak labor market opportunity that operates above and beyond influencing individual employment experiences. As such, it is not merely the
fact that they are unemployed that turns them into criminals. It is mostly due to the individual experiences a person has of employment or lack thereof. This is a perspective that Ehrlich, Becker, and Wadsworth seem to share. As for Reynolds (2000), an implementation of more get-tough policies would be a helpful deterrent to crime rate since federal programs to reduce the so-called root causes have donemore harm than good. Curtly said, Reynolds believes that mere programs to alleviate the root causes of crime would not deter it. Serious and strict policy-making is the key to deterring crime rate. Other factors may be the urbanization of a place (since urbanization opens new avenues for crimes) and police visibility (SanidadLeones, 2010).
Yasir, et al. (2009) also state in their study of crime rate in Pakistan that poverty, unemployment, inflation, and volatile policies may contribute to the rise of crime rate. They further assert that a possible way to alleviate this would be the formulation of stable economic policies. In general, economic factors such as mentioned above affect crime rate. This is especially true for the policy-sensitive variables. A possible solution to this is the combination of counter-cyclical redistributive policiesand increases in the resources of apprehending and convicting criminalsespecially during economic recessions (Fajnzylber, et al., 1998).
Gillado and Cruz (2004) constructed a regression model for three different classifications of crimeagainst property, against person, and rape. It is in this work that they incorporated social, demographic, and economic factors. The following variables were considered for the three models: per capita regional domestic products, average income of people in rural and urban areas, cohort survival rates in elementary and secondary education, corruption index, police population, population density, alcohol consumption, Gini coefficient, unemployment rate, and consumer price index.
Although crime may still be a social construct, it can be observed that there is a shift from a social perspective to an economic one. However, considering only one
10
perspective would be insufficient, since both perspectives are applicable to crime. Although there may be opposing views regarding the factors of crime, it cannot be denied that these factors, both economic and social, play their roles in affecting crime.
The variables considered for this study are heavily based on the works abovementioned. For this particular study, the variables that were taken into consideration are poverty incidence, unemployment rate, police force population, CPI, and population density. These variables coincide with the variables mentioned in the study of Gillado and Cruz. Data on cohort survival rates, however, are not available for provinces. In order to account for the possible effect of education on crime, the researchers have included literacy, functional literacy, and enrolment as variables. Other variables not included in the abovementioned works, but were included in this study are expenditures on social services, number of courts, family income and expenditure, human development index (HDI), and geographical setting (based on archipelagic division).
11
Methodological Sketch
The data used in this study are obtained from the National Statistical Coordination Boards publication The Philippine Countryside in Figures, which is available both in print and in electronic format. The electronic format may be accessed via http://www.nscb.gov.ph/countryside/default.asp. Note that in the NSCB Philippine Statistical Yearbook, crime rate is defined as the number of crimes per 100,000 population. However, for this study, crime rate is defined per 1,000 population.
A level of significance of 0.05 was set prior to any fitting or testing procedure. This level of significance was chosen since studies in crime rate do not present very severe consequences. However, the subject matter itself is of importance, since it is one of the foremost problems present in the Philippines.
Before fitting, the data is subjected to checking. Here we check for any missing values for the variables, especially for the dependent variable. Since SAS would omit observations with missing values, these observations were deleted from the data set. The data set was also checked for any possible encoding errors. For example, the province of Camiguin recorded a number of 73549 policemen, whereas the rest of the observations would range from about 400 to 2000 only. This observation was, then, deleted as it presents a possible encoding error.
The variable crime rate was then regressed on the seventeen variables population density, poverty incidence, family income, family expenditures, literacy rate, functional literacy rate, consumer price index, human development index, unemployment rate, expenditures on social services, number of courts, number of policemen, enrolment rate, cohort survival rates for elementary and secondary education, and the two dummy variables for location. In order to check whether at least one of the independent variables would be able to explain the variability found in crime rate, the F-test was used. Each of these independent variables significance was assessed through the t-test, in which a pvalue of greater than the stipulated 0.05 level of significance will lead to the removal of
12
that corresponding independent variable. As such, the independent variable with the highest p-value is removed first. Crime rate is then regressed on the remaining independent variables, and the same process is repeated until all the independent variables become significant. The coefficient of multiple determination, R2, is not the only criterion in checking the soundness of the model. In order to assess whether or not the model is good, several diagnostic tests have to be performed. These include tests on multicollinearity, normality, heteroskedasticity, linearity, and autocorrelation. Furthermore, an assessment of outliers is essential to ascertain whether or not any outliers would greatly influence the model.
Multicollinearity among independent variables was checked with the use of condition indices and proportion of variation. Normality was checked using Wilk-Shapiro, Kolmogorov-Smirnov, Cramer-Von Mises, and Anderson-Darling tests, for which pvalues should not be less than the level of significance, 0.05. Heteroskedasticity was checked through the shape of the residual plot (versus the predicted value of y). A funnelshaped plot would indicate a problem in heteroskedasticity. In order to be more certain about problems with heteroskedasticity, Whites test and the spec option were utilized. Autocorrelation was checked using the Durbin-Watson statistic for which a value of d close to 2 is desired. Departures from linearity were checked using partial regression plots. As for outliers, there are two areas for which they have to be detected: outliers in the dependent variable and outliers in the independent variables. Outliers among the dependent variable are detected using Studentized Residuals. Values for the Studentized residuals that exceed those corresponding to the t table imply that the observations corresponding to that Studentized residual value are considered outliers. Outliers on the independent variables are detected through the leverage (the diagonals of the Hat matrix). A leverage greater than the cut off (2p/n) implies that the observation corresponding to that leverage is an outlier. Not all outliers are influential. Thus, it is also necessary to check the influence of the outliers. Influence may be checked through Cooks D, DFFITS, and DFBETAS.
13
If any of these criteria is violated, necessary actions such as transformations and removal of unimportant independent variables would have to be performed. On the other hand, if the proposed model meets all the criteria, then the model can reasonably predict crime rate.
14
Results and Discussion Preliminary results and Diagnostic Checking
In order to ascertain which factors influence crime rate, a regression model was built. The initial model for crime rate is = 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 1 + 15 2 + 16 + 17 + where crime = crime rate per province popden = population density per province povinc = poverty incidence per province unemployr = unemployment rate per province pnp = number of policemen per province faminc = average family income per province famexp = average family expenditure per province litr = literacy rate per province flit = functional literacy per province enrolment = enrolment rate per province cpi = consumer price index per province hdi = human development index per province socserv = expenditures on social services per province courts = number of courts per province geog1 = 1 if the province is in Luzon, 0 if otherwise geog2 = 1 if the province is in Visayas, 0 if otherwise. cohsurve = cohort survival rate for elementary education cohsurvs = cohort survival rate for secondary education ~ N(0, 2)
15
Using the F-test under ANOVA, it is apparent that at least one of the independent variables can explain crime rate. And in checking the t-values, there are, indeed, some independent variables that are significant.
Table 1. Analysis of Variance Results

Sum of Squares 1102.64607 349.53181 1452.17788 2.95606 4.30810 68.61629 Mean Square 64.86153 8.73830
Source Model Error Corrected Total
DF 17 40 57
F Value 7.42
Pr > F <.0001
Root MSE Dependent Mean Coeff Var
R-Square Adj R-Sq
0.7593 0.6570
Table 2. Individual T-tests

Parameter Estimates Variable Intercept POPDEN POVINC UNEMPLOYR PNP FAMINC FAMEXP LITR FLIT ENROLMENT CPI GEOG1 GEOG2 HDI SOCSERV COURTS COHSURVE COHSURVS Label Intercept POPDEN POVINC UNEMPLOYR PNP FAMINC FAMEXP LITR FLIT ENROLMENT CPI GEOG1 GEOG2 HDI SOCSERV COURTS COHSURVE COHSURVS DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Parameter Estimate 6.50746 0.00550 -0.14712 -0.15076 -0.00532 -0.00013465 0.00009779 0.12483 -0.13971 -0.07913 -0.02358 1.23942 1.09009 49.39113 0.00952 0.45229 -0.27471 0.11458 Standard Error 22.06194 0.00250 0.06577 0.09856 0.00126 0.00007928 0.00009885 0.17442 0.11416 0.10521 0.04865 2.05934 1.53537 27.34424 0.00579 0.11380 0.12593 0.21994 t Value 0.29 2.20 -2.24 -1.53 -4.22 -1.70 0.99 0.72 -1.22 -0.75 -0.48 0.60 0.71 1.81 1.64 3.97 -2.18 0.52 Pr > |t| 0.7695 0.0333 0.0309 0.1340 0.0001 0.0972 0.3285 0.4783 0.2282 0.4564 0.6306 0.5507 0.4818 0.0784 0.1082 0.0003 0.0351 0.6053
The coefficient of multiple determination has a value of 0.7593. This means that the model formulated can explain 75.93 percent of the variability found in crime rate. The mean sum of squares due to regression is also relatively large compared to the mean sum of squares due to error. This means that the variability found in crime rate may be attributed to the regression model rather than the error.
16
Out of the seventeen independent variables in the model, only four turned out to be significant. These are population density, poverty incidence, number of policemen, and the number of courts. The first variable to be removed is CPI since it has the highest p-value at a value (0.6306).
Table 3. Individual T-tests without CPI

Parameter Estimates Variable Intercept POPDEN POVINC UNEMPLOYR PNP FAMINC FAMEXP LITR FLIT ENROLMENT GEOG1 GEOG2 HDI SOCSERV COURTS COHSURVE COHSURVS Label Intercept POPDEN POVINC UNEMPLOYR PNP FAMINC FAMEXP LITR FLIT ENROLMENT GEOG1 GEOG2 HDI SOCSERV COURTS COHSURVE COHSURVS DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Parameter Estimate 1.22757 0.00516 -0.14110 -0.15067 -0.00531 -0.00013025 0.00009276 0.11264 -0.14148 -0.07440 1.01635 0.98527 52.12947 0.00910 0.44333 -0.26792 0.12224 Standard Error 19.00492 0.00237 0.06398 0.09764 0.00125 0.00007802 0.00009738 0.17098 0.11303 0.10378 1.98843 1.50581 26.50338 0.00567 0.11123 0.12397 0.21731 t Value 0.06 2.18 -2.21 -1.54 -4.25 -1.67 0.95 0.66 -1.25 -0.72 0.51 0.65 1.97 1.60 3.99 -2.16 0.56 Pr > |t| 0.9488 0.0354 0.0331 0.1305 0.0001 0.1027 0.3464 0.5137 0.2178 0.4775 0.6120 0.5166 0.0560 0.1165 0.0003 0.0366 0.5768
After removing CPI, cohort survival rate for elementary became significant. The R2 dropped from 0.7593 to 0.7579. GEOG1 was then removed from the model because of its high p-value.
17
Table 4. Individual T-tests without CPI and GEOG1

Parameter Estimates Variable Intercept POPDEN POVINC UNEMPLOYR PNP FAMINC FAMEXP LITR FLIT ENROLMENT GEOG2 HDI SOCSERV COURTS COHSURVE COHSURVS Label Intercept POPDEN POVINC UNEMPLOYR PNP FAMINC FAMEXP LITR FLIT ENROLMENT GEOG2 HDI SOCSERV COURTS COHSURVE COHSURVS DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Parameter Estimate -3.28597 0.00495 -0.12928 -0.14410 -0.00547 -0.00013425 0.00010185 0.09829 -0.14496 -0.05650 0.58537 53.08694 0.00877 0.46781 -0.25504 0.16347 Standard Error 16.68016 0.00232 0.05913 0.09593 0.00120 0.00007695 0.00009490 0.16717 0.11183 0.09682 1.27523 26.20356 0.00559 0.09950 0.12031 0.20001 t Value -0.20 2.14 -2.19 -1.50 -4.55 -1.74 1.07 0.59 -1.30 -0.58 0.46 2.03 1.57 4.70 -2.12 0.82 Pr > |t| 0.8448 0.0384 0.0344 0.1406 <.0001 0.0884 0.2893 0.5597 0.2020 0.5626 0.6486 0.0492 0.1240 <.0001 0.0400 0.4184
For this model the R2 is 0.7563, which is still not different from the initial models R2. The number of significant variables did not change in any way, either. The variable GEOG2 is then deleted since its p-value is the highest among the remaining variables.
Table 5. Individual T-tests without CPI, GEOG1, and GEOG2

Parameter Estimates Variable Intercept POPDEN POVINC UNEMPLOYR PNP FAMINC FAMEXP LITR FLIT ENROLMENT HDI SOCSERV COURTS COHSURVE COHSURVS Label Intercept POPDEN POVINC UNEMPLOYR PNP FAMINC FAMEXP LITR FLIT ENROLMENT HDI SOCSERV COURTS COHSURVE COHSURVS DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Parameter Estimate -2.28893 0.00495 -0.12928 -0.13270 -0.00533 -0.00013622 0.00010224 0.09915 -0.14972 -0.05250 53.44495 0.00898 0.46252 -0.24798 0.14043 Standard Error 16.38564 0.00230 0.05859 0.09181 0.00115 0.00007612 0.00009402 0.16562 0.11032 0.09554 25.95045 0.00552 0.09792 0.11823 0.19183 t Value -0.14 2.16 -2.21 -1.45 -4.63 -1.79 1.09 0.60 -1.36 -0.55 2.06 1.63 4.72 -2.10 0.73 Pr > |t| 0.8896 0.0366 0.0327 0.1556 <.0001 0.0806 0.2829 0.5525 0.1818 0.5855 0.0455 0.1107 <.0001 0.0419 0.4681
18
Here, it can be observed that there are six significant variables after removing geog2. The coefficient of multiple determination dropped from 0.7563 to0.7551, which is still not a far cry from the initial R2.
The next variables to be deleted were enrolment, literacy rate, cohort survival rate for secondary education, functional literacy, family expenditure, family income, human development index, cohort survival rate for elementary education, and expenditures on social services, deleted one at a time. The results are shown on the following table.
Table 6. ANOVA Results and Individual T-tests for the Modified Model
Analysis of Variance Source Model Error Corrected Total DF 5 56 61 Sum of Squares 1014.48120 487.93771 1502.41891 2.95181 4.07475 72.44154 Mean Square 202.89624 8.71317 F Value 23.29 Pr > F <.0001
R-Square Adj R-Sq
0.6752 0.6462
Parameter Estimates Variable Intercept POPDEN POVINC UNEMPLOYR PNP COURTS Label Intercept POPDEN POVINC UNEMPLOYR PNP COURTS DF 1 1 1 1 1 1 Parameter Estimate 9.07796 0.00669 -0.11473 -0.12234 -0.00414 0.39710 Standard Error 1.84835 0.00182 0.03338 0.07180 0.00096282 0.08870 t Value 4.91 3.68 -3.44 -1.70 -4.30 4.48 Pr > |t| <.0001 0.0005 0.0011 0.0940 <.0001 <.0001
Note that in this modified model, there are four significant variables at a level of significance of 0.05: population density, poverty incidence, number of policemen, and number of courts. It can be observed that unemployment rate is still not significant. However, it is retained since theoretically, unemployment rate would have an effect on crime rate. The F-test for this model implies that at least one of the independent variables will be able to explain the variability found in crime rate. The R2, however, dropped. From an initial value of 0.7597, it is now only 0.6752. Therefore, this model can only
19
explain 67.52 percent of the variability found in crime rate. This is understandable, though, since 12 variables were removed from the model.
The new model is given by: = 0 + 1 + 2 + 3 + 4 + 5 + where crime = crime rate per province popden = population density per province povinc = poverty incidence per province unemployr = unemployment rate per province pnp = number of policemen per province courts = number of courts per province ~ N(0,2)
Before this model can be accepted as the best model, diagnostic checking is necessary. In checking for normality, we have the following result.
Table 7. Tests for Normality

Tests for Normality Test Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling --Statistic--W D W-Sq A-Sq 0.90266 0.121649 0.211265 1.377216 -----p Value-----Pr Pr Pr Pr < > > > W D W-Sq A-Sq 0.0001 0.0222 <0.0050 <0.0050
Note that for all the tests for normality, the p-values are less than the level of significance, 0.05. The null hypothesis of normality of error terms is then rejected. It is necessary for remedial measures such as transformations to be performed. These will be discussed later on in this section.
20
After normality, homoskedasticity is checked. A residual plot (versus the predicted value of crime rate) is utilized in order to check for homoskedasticity. The residual plot does not seem to exhibit any shape (funnel or diamond) that would imply heteroskedasticity. In order to ascertain this, the spec option under the regression procedure was used. The result indicates that the null hypothesis of constant variance should not be rejected. Autocorrelation and multicollinearity were also checked. For autocorrelation, the test statistic under the Durbin-Watson test is 2.246 with a first order autocorrelation of -0.127. For a negative value of the first order autocorrelation, the statistic 4 d is used instead of d. This will yield a value of 1.754. This value is compared to the tabulated values for the Durbin-Watson test. If d>du, the null hypothesis is not rejected. However, if d<dl, the null hypothesis is rejected. Using the table, the values are n = 65, k = 5, dl = 1.438, and du = 1.767, it can be observed that d lies in between dl and du. In this case, the test becomes inconclusive. As for multicollinearity, it can be observed that the condition indices do not exceed 30. Thus, the model is free of problems on multicollinearity. Results for these are shown on Table 8.
Table 8. Results for Spec Option, Durbin-Watson Test, and Multicollinearity Indicators
Collinearity Diagnostics* Number 1 2 3 4 5 6 Eigenvalue 4.83022 0.63842 0.24895 0.13439 0.11903 0.02898 Condition Index 1.00000 2.75062 4.40480 5.99517 6.37016 12.90959
Test of First and Second Moment Specification DF 20 Chi-Square 16.96 Pr > ChiSq 0.6558 2.246 62 -0.127
Durbin-Watson D Number of Observations 1st Order Autocorrelation
*Proportion of variation is omitted.
21
Linearity is also checked to ascertain whether any departures from it may be observed. Partial regression plots were obtained for crime rate versus each independent variable, population density, poverty incidence, number of policemen, and number of courts. The plots show that no distinct departure from linearity may be observed (plots found in the appendices).
For outliers, it can be observed that there are three possible outliers from the observations. These may be detected through Studentized residuals for dependent variables and leverages for independent variables. The cut-off for Studentized residuals is equal to two. As for the leverage, the cut-off computed is 0.1935. Observations 12, 18, and 58 are possible outliers. The same observations may be influential as well. In checking for influence, Cooks D, DFFITS, and DFBETAS have to be consulted. The cut-offs for DFFITS and DFBETAS are 0.622 and 0.254 respectively.
Table 9. Outliers and Influential Observations

Output Statistics Obs 12 18 Dependent Predicted Std Error Variable Value Mean Predict 25.6633 19.7931 1.6627 10.0296 4.4596 0.8913 Obs 12 18 Obs 58 RStudent 2.5191 2.0341 Residual 5.8702 5.5701 Hat Diag H 0.3173 0.0912 Residual 8.3155 Hat Diag H 0.1896 Std Error Student Residual Residual 2.439 2.407 2.814 1.979 Cov Ratio 0.8476 0.7934 Cook's D 0.449 0.066
DFFITS 1.7174 0.6442 Cook's D 0.382
Dependent Predicted Std Error Variable Value Mean Predict 22.6754 14.3599 1.2853 Obs 58 RStudent 3.4141
Std Error Student Residual Residual 2.657 3.129 Cov Ratio 0.4339
DFFITS 1.6514
Observation 12 has a Studentized residual equal to 2.407 and a Studentized deleted residual of 2.5191. Its leverage is equal to 0.3173. These values exceed the cutoffs computed. This observation may be considered influential since its Cooks D has the highest value relative to the other observations. Moreover, its DFFITS is equal to 1.7174
22
which is well beyond the cut-off. DFBETAS under the variables UNEMPLOYR, PNP, and COURTS exceed the cut-off, and may be candidates for influential observations.
Observation 18 has a Studentized deleted residual equal to 2.0437 but its leverage is equal to 0.0912. This implies that the value of crime rate may be an outlier for this observation while its independent variables are not. Its Cooks D is not relatively high; however, its DFFITS is equal to 0.6442. The DFBETAS under the variables POVINC, PNP, and COURTS exceed the cut-off, and may also be candidates for influential observations.
Observation 58 has a Studentized residual equal to 3.129, a Studentized deleted residual equal to 3.4141, and leverage equal to 0.1896. These support the supposition that observation 58 may be an outlier. It may also be influential since its DFFITS is equal to 1.6514, and the DFBETAS under POPDEN and POVINC exceed the cut-off.
23
Corrective Measures
Since there is a problem in normality, it is necessary to perform corrective measures. One possible corrective measure is transformation of variables. For this particular study, several transformations of variables were made. Some of these are the natural logarithms and square roots. Several combinations of transformed variables were also considered in order to correct the problem. Finally, the combination of the square roots of crime rate, poverty incidence, and number of courts (coded as sqcrime, sqpovinc, and sqcourts respectively) and population density helped in correcting the problem of normality.
Thus, the transformed model is = 0 + 1 + 2 + 3 + 4 + 5 +
where sqcrime = square root of the crime rate per province popden = population density per province sqpovinc = square root of the poverty incidence per province unemployr = unemployment rate per province pnp = number of policemen per province sqcourts = square root of the number of courts per province ~ N(0, 2)
Again, this model is checked for normality, autocorrelation, heteroskedasticity, linearity, multicollinearity, and outliers.
24
After transformation, there was a significant increase in the R2 of the model. Recall that the R2 of the previous model was 0.6752. The R2 of the transformed model is equal to 0.7141. This means that there is an improvement in the amount of variability the transformed model can explain. The four independent variables are still significant at a level of significance of 0.05, while unemployment rate remains insignificant. Results of which are found on the following table.
Table 10. ANOVA Results and Parameter Estimates for the Transformed Model
R-Square Adj R-Sq
0.7141 0.6886
Parameter Estimates Variable Intercept POPDEN SQPOVINC UNEMPLOYR SQCOURTS PNP Label Intercept POPDEN SQPOVINC UNEMPLOYR SQCOURTS PNP DF 1 1 1 1 1 1 Parameter Estimate 2.95684 0.00081505 -0.29978 -0.00672 0.55679 -0.00096767 Error 0.55258 0.00036395 0.07181 0.01379 0.09904 0.00018859 t Value 5.35 2.24 -4.17 -0.49 5.62 -5.13 Pr > |t| <.0001 0.0291 0.0001 0.6278 <.0001 <.0001
The first tests to be performed after transformation are tests for heteroskedasticity, multicollinearity, and autocorrelation. The results for which are shown on Table 11.
25
Table 11. Tests for Multicollinearity, Heteroskedasticity and Autocorrelation

Collinearity Diagnostics Number 1 2 3 4 5 6 Eigenvalue 5.15775 0.45889 0.18424 0.10986 0.07841 0.01086 Condition Index 1.00000 3.35256 5.29102 6.85197 8.11067 21.79524
Collinearity Diagnostics -----------------------------Proportion of Variation---------------------------Number Intercept POPDEN SQPOVINC UNEMPLOYR PNP SQCOURTS 1 2 3 4 5 6 0.00060965 0.00234 0.00195 0.01717 0.00027899 0.97765 0.00725 0.40328 0.22684 0.01756 0.25731 0.08777 0.00093940 0.01132 0.00249 0.05286 0.02890 0.90349 0.00474 0.02230 0.14538 0.71655 0.04728 0.06375 0.00512 0.00123 0.45895 0.14420 0.36060 0.02990 0.00305 0.00690 0.04345 0.02612 0.72823 0.19225
Durbin-Watson D Number of Observations 1st Order Autocorrelation
Note that the condition indices are less than 30. This implies that there is no multicollinearity after transforming the model. The Durbin-Watson test statistic has a value equal to 2.063, and a first order autocorrelation equal to -0.037. For this, consider instead the statistic 4 d. This yields a value of 1.937, which is close to 2. Also, if this statistic is compared to the value on the Durbin-Watson table (recall that du is equal to 1.767), it can be observed that the value for 4 d exceeds du. This will lead to the nonrejection of the null hypothesis of no autocorrelation. Thus, it can be concluded that there is no problem of autocorrelation.
Next, the outliers and influential observations were addressed. Recall that there are three outliers and influential observations. These observations were removed one at a time, and at each removal of observation, a diagnostic check is performed. At each stage, the model is checked for multicollinearity, autocorrelation, normality, outliers, and
26
heteroskedasticity. After removing these three observations, the final model is, again checked for the previously mentioned criteria. The results for these are found in Appendices A-30 onwards.
Table 12. ANOVA, Parameter Estimates, and Multicollinearity Tests

R-Square Adj R-Sq
0.7487 0.7246
Parameter Estimates Variable Intercept POPDEN SQPOVINC UNEMPLOYR PNP SQCOURTS Label Intercept POPDEN SQPOVINC UNEMPLOYR PNP SQCOURTS DF 1 1 1 1 1 1 Parameter Estimate 2.59768 0.00076341 -0.25736 -0.00848 -0.00082873 0.53004 Standard Error t Value 0.48076 0.00031984 0.06291 0.01183 0.00015836 0.08455 5.40 2.39 -4.09 -0.72 -5.23 6.27 Pr > |t| <.0001 0.0207 0.0001 0.4767 <.0001 <.0001 Variance Inflation 0 1.60853 1.44434 1.02209 1.54602 2.10035
Collinearity Diagnostics Number 1 2 3 4 5 6 Eigenvalue 5.17743 0.44097 0.18328 0.10913 0.07905 0.01014 Condition Index 1.00000 3.42650 5.31492 6.88793 8.09268 22.60067
-----------------------------Proportion of Variation----------------------------Number Intercept POPDEN SQPOVINC UNEMPLOYR PNP SQCOURTS 1 2 3 4 5 6 0.00057169 0.00239 0.00261 0.01606 0.00010800 0.97826 0.00744 0.43102 0.23433 0.02305 0.22789 0.07627 0.00086302 0.01025 0.00378 0.05667 0.01191 0.91652 0.00465 0.02384 0.10920 0.79816 0.00065690 0.06349 0.00501 0.00092325 0.47734 0.04533 0.43714 0.03426 0.00302 0.00725 0.04222 0.00171 0.75883 0.18696
27
Table 13. Results for Spec Option, Durbin-Watson Test, and Tests for Normality
Durbin-Watson D Number of Observations 1st Order Autocorrelation Tests for Normality Test Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling --Statistic--W D W-Sq A-Sq 0.980824 0.096603 0.095625 0.486635
-----p Value-----Pr Pr Pr Pr < > > > W D W-Sq A-Sq 0.4877 >0.1500 0.1290 0.2244
After these corrective measures, the final model has a coefficient of multiple determination equal to 0.7487. This means that the model can explain 74.87 percent of the variability found in crime rate. Note that its VIFs do not greatly exceed 10 and its condition indices are all less than 30. This means that multicollinearity is not a problem with this model. Linearity is checked using partial regression plots, and again, there seem to be no distinct departures from linearity. Also, the Durbin-Watson test statistic shows a value equal to 2.142 with a first order autocorrelation equal to -0.081. Again, the statistic 4 d is considered instead, and a value of 1.858 is obtained, which is still close to 2. If this is compared to the tabulated value (under the Durbin-Watson Table, n = 60, k = 5), d = 1.858 is greater than du = 1.767. The null hypothesis of no autocorrelation is not rejected. Thus, there is no autocorrelation present. Under this model, the p-values are well beyond the level of significance of 0.05. This implies that the null hypothesis that the error terms are normally distributed is not rejected. Thus, the error terms follow a normal distribution. In testing for heteroskedasticity, the spec option in SAS is used. Since the Chi-square value computed is equal to 11.33, and its p-value is equal to 0.9372, the null hypothesis of constant variance is not rejected. Thus, there is no heteroskedasticity.
28
Conclusion
The final model of crime rate is then, equal to = 0 + 1 + 2 +3 + 4 + 5 + where crime rate * = sqcrime povinc* = sqpovinc pnp* = sqpnp courts* = sqcourts.
The estimated model for crime rate is then equal to = 2.1856 + 0.0008 0.2657 0.0001 0.0008 + 0.5014 where each parameter estimate after 0 represents an increase or decrease in the estimated mean of crime rate per unit increase in the corresponding independent variable holding all other variables constant.
Note
that
for
the
independent
variables
poverty incidence
(povinc),
unemployment rate (unemployr), and number of courts (courts), the signs of the coefficients are adverse to theoretical expectations. As common sense would dictate, a rise in poverty incidence would entail a rise in the crime rate. The same can be said for unemployment rate. On the other hand, a rise in the number of courts would mean a decrease in the crime rate. However, for this model, the relationships between poverty incidence and crime rate, and unemployment rate and crime rate are inverted. That is, for every increase in poverty incidence, crime rate decreases. For every increase in unemployment rate, crime rate decreases. This is owed to the fact that during the year 2000, there was political and economic instability due to the ouster of Former President Joseph Estrada. If the poverty incidence, unemployment rate, and crime rate for this year
29
are compared to the others, it can be observed that the poverty incidence and the unemployment rate for the year 2000 are high and the crime rate for the same year is low, relative to other years. As for the number of courts, it can be observed that there is a direct relationship between this and crime rate. This may be due to the fact that as the number of courts increases, the opportunity for people to file cases would also increase. Thus, there would also be an increase in the number of reported crimes, which would lead to an increase in the crime rate. Although there is a difference between theory and empirical data in this study, the model obtained is not extraordinary, and is still a plausible one.
Thus, through this model, we were able to establish a linear relationship between crime and the factors population density, poverty incidence, number of police per province, and number of courts per province. Since this model has satisfied the conditions and assumptions, the model may be a plausible predictor of crime rate in the Philippines.
30
Recommendations
As previously mentioned, this particular study focused on crime rate in the Philippines from provincial data for the year 2000 only. As a means of improving the study, the group recommends considering data from other time periods as well as data from municipalities or cities. The group also recommends formulating separate regression lines for the different classifications of crimes (index and non-index crimes, and crimes against property and person) as different factors may affect each category of crime. Separating the regression lines would allow for classification of factors among different types of crime. This may lead to a better model in terms of the coefficient of determination.
31
References The Philippine countryside in figures. (n.d.). Retrieved May 18, 2010, from http://www.nscb.gov.ph/countryside/default.asp Becker, G. (1968). Crime and punishment: an economic approach. The Journal of Political Economy, 76(2), 169-217. Retrieved from http://www.jstor.org/ Ehrlich, I. (1975). The deterrent effect of capital punishment: a question of life and death. The American Economic Review, 65(3), 397-417. Retrieved from http://www.jstor.org/ Ehrlich, I. (1975). On the relation between education on crime. In F.T. Juster (Ed). Education, income, and human behavior (pp. 313338). United States of America: National Bureau of Economic Research. Fajnzylber, P., et al. (1998). Determinants of crime rates in Latin America and the world: an empirical assessment. Washington, D.C., United States of America: The World Bank. Gillado, M. F., & Cruz, T.T. (2004, October 4-5). Panel data estimation of crime rates in the Philippines. Retrieved from www.nscb.gov.ph/ncs/9thncs/papers/publicOrder_PanelData.pdf National Statistical Coordination Board. (2003). The Philippine countryside in figures (2003 edition). Makati City, Philippines: Author. Reynolds, M. (2000). Crime and punishment in Texas in the 1990s. Retrieved from http://www.ncpa.org/pub/st237?pg=6. Sanidad-leones, c. (2010). The current situation of crime associated with urbanization: problems experienced and countermeasures initiated in the Philippines. Retrieved from http://www.unafei.or.jp/english/pdf/PDF_rms/no68/09_Leones-1_p133150.pdf. Wadsworth, T. (2001). Employment, crime, and context: a multi-level analysis of the relationship between work and crime (Doctoral Dissertation, University of Washington). Available from the National Criminal Justice Reference Service (NCJRS) website http://www.ncjrs.gov/pdffiles1/nij/grants/198118.pdf. Yasir, S., et al. (2009). Unemployment, poverty, inflation, and crime nexus: cointegration and causality analysis of Pakistan. Pakistan Economic and Social Review, 47(1), pp. 79-98.
32
Appendices
Appendix A. Results (SAS Outputs)
Appendix B. Durbin-Watson Table

A Regression Analysis On The Determinants of Crime Rates Across Philippine Provinces - Revised

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Regression Analysis On The Determinants of Crime Rates Across Philippine Provinces - Revised

Uploaded by

Copyright:

Available Formats

A study presented to Ms. Angela D. Nalica Professor, Stat 136

CRUZ, Clemence-Fatima MACARAIG, Miguel Rodrigo SANTOS, Marvin Allan

May 28, 2010

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Review of Related Literature

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Results and Discussion Preliminary results and Diagnostic Checking

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Table 1. Analysis of Variance Results

Source Model Error Corrected Total

Root MSE Dependent Mean Coeff Var

R-Square Adj R-Sq

Table 2. Individual T-tests

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Table 3. Individual T-tests without CPI

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Table 4. Individual T-tests without CPI and GEOG1

Table 5. Individual T-tests without CPI, GEOG1, and GEOG2

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Root MSE Dependent Mean Coeff Var

R-Square Adj R-Sq

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Table 7. Tests for Normality

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Durbin-Watson D Number of Observations 1st Order Autocorrelation

*Proportion of variation is omitted.

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Table 9. Outliers and Influential Observations

DFFITS 1.7174 0.6442 Cook's D 0.382

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Thus, the transformed model is = 0 + 1 + 2 + 3 + 4 + 5 +

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Root MSE Dependent Mean Coeff Var

R-Square Adj R-Sq

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Table 11. Tests for Multicollinearity, Heteroskedasticity and Autocorrelation

Durbin-Watson D Number of Observations 1st Order Autocorrelation

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Table 12. ANOVA, Parameter Estimates, and Multicollinearity Tests

Root MSE Dependent Mean Coeff Var

R-Square Adj R-Sq

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

A Regression Analysis on the Determinants of Crime Rates Across Philippine Provinces

Appendix A. Results (SAS Outputs)

Appendix B. Durbin-Watson Table