This document can be accessed here: http://tiny.cc/w15pirate
We need to start by loading the packages and reading in the data
if (!require("rspatial")) devtools::install_github('rspatial/rspatial')
library(rspatial)
city <- sp_data('city')
crime <- sp_data('crime')
Let’s look at a quick map of both datasets
plot(city, col='light blue')
points(crime, col='red', cex=.5, pch='+')
Let’s look at a table of the frequency of each type of crime:
tbdf <-as.data.frame(sort(table(crime$CATEGORY))[-1])
tbdf
## Var1 Freq
## 1 Arson 9
## 2 Weapons 15
## 3 Robbery 49
## 4 Auto Theft 86
## 5 Drugs or Narcotics 134
## 6 Commercial Burglary 143
## 7 Grand Theft 143
## 8 Assaults 172
## 9 DUI 212
## 10 Residential Burglary 219
## 11 Vehicle Burglary 221
## 12 Drunk in Public 232
## 13 Vandalism 355
## 14 Petty Theft 665
Let’s get the coordinates of the crime data:
xyco <- coordinates(crime)
## What are the dimensions of this?
dim(xyco)
## [1] 2661 2
#Let's take a look at the first few rows
head(xyco)
## coords.x1 coords.x2
## [1,] 6628868 1963718
## [2,] 6632796 1964362
## [3,] 6636855 1964873
## [4,] 6626493 1964343
## [5,] 6639506 1966094
## [6,] 6640478 1961983
##how many unique places have crime
xy <- unique(xyco)
##What are the dimensions of this?
dim(xy)
## [1] 1208 2
Compute the mean center and standard distance for the crime data.
# mean center of locations where crimes occurred
mc <- apply(xy, 2, mean)
mc
## coords.x1 coords.x2
## 6635645 1963105
# standard distance
sd <- sqrt(sum((xy[,1] - mc[1])^2 + (xy[,2] - mc[2])^2) / nrow(xy))
sd
## [1] 7224.564
# mean center of locations weighted by number of crimes
mcwt <- apply(xyco, 2, mean)
mcwt
## coords.x1 coords.x2
## 6635693 1962890
# standard distance
sdwt <- sqrt(sum((xyco[,1] - mcwt[1])^2 + (xyco[,2] - mcwt[2])^2) / nrow(xyco))
sdwt
## [1] 7157.337
A quadrat analysis is a way of testing whether or not a spatial pattern is significantly different than a random spatial pattern.
It requires applying a grid (cells of any shape) to our study area and then counting how many points fall within it.
When the study area is divided into cells and all points are in one cell (or ‘quadrat’), then there is a high degree of variation regarding the number of points per cell. Conversely, if there is the exact same number of points per cell, the variance is zero as there is no difference between a cell and any other cell.
Here is a basic approach to computing point density (although there are many)
This calculates the surface area of our study area (“city”), then calculates the density as the number of crimes (number of rows in the crimes in the city) divided by the area of the city
CityArea <- raster::area(city)
dens <- nrow(xyco) / CityArea
To compute quadrat counts I first create quadrats (e.g. a RasterLayer). I get the extent for the raster from the city polygon, and then assign an an arbitrary resolution of 1000. (In real life you should always try a range of resolutions, as this seriously impacts your results)
r <- raster(city)
res(r) <- 1000
r
## class : RasterLayer
## dimensions : 15, 34, 510 (nrow, ncol, ncell)
## resolution : 1000, 1000 (x, y)
## extent : 6620591, 6654591, 1956519, 1971519 (xmin, xmax, ymin, ymax)
## crs : +proj=lcc +lat_1=38.33333333333334 +lat_2=39.83333333333334 +lat_0=37.66666666666666 +lon_0=-122 +x_0=2000000 +y_0=500000.0000000001 +datum=NAD83 +units=us-ft +no_defs +ellps=GRS80 +towgs84=0,0,0
To find the cells that are in the city, I create polygons from the RasterLayer.
r <- rasterize(city, r)
plot(r)
quads <- as(r, 'SpatialPolygons')
plot(quads, add=TRUE)
points(crime, col='red', cex=.5)
The number of events (aka points) in each quadrat/cell can be counted using the ‘rasterize’ function. That function can be used to summarize the number of points within each cell, but also to compute statistics based on the ‘marks’ (attributes). For example we could compute the number of different crime types) by changing the ‘fun’ argument to another function (see ?rasterize).
nc <- rasterize(coordinates(crime), r, fun='count', background=0)
plot(nc)
plot(city, add=TRUE)
This map (nc) has crime counts (yay!) and we can see clustering (hooray!). But, you can see that the raster cells are spilling outside of the city limits and we know that we only have data for the city, so those areas should be excluded from the map.We can do that with the mask function (see ?mask).
ncrimes <- mask(nc, r)
plot(ncrimes)
plot(city, add=TRUE)
Better. Now let’s look at the frequencies of crimes per cell in this grid
f <- freq(ncrimes, useNA='no')
head(f)
## value count
## [1,] 0 48
## [2,] 1 29
## [3,] 2 24
## [4,] 3 22
## [5,] 4 19
## [6,] 5 16
#let's graph this (the frequency of crimes per cell)
plot(f, pch=20)
This means that the number of cells have 0 crimes, and one cell has over 200, but it’s an extreme outlier.
Now compute average number of crimes per quadrat (cell).
# number of quadrats
quadrats <- sum(f[,2])
quadrats
## [1] 283
# number of cases
cases <- sum(f[,1] * f[,2])
cases
## [1] 2621
mu <- cases / quadrats
#This now tells us the mean quadrant count (μ)
mu
## [1] 9.261484
Now let’s create a table for our VAR showing the number of events (K), the count quadrants (X), the variance (Kmu = events - \(\mu\)), the variance squared (Kmu2), and the variance squared multiplied by the number of quadrants (XKmu2)
ff <- data.frame(f)
colnames(ff) <- c('K', 'X')
ff$Kmu <- ff$K - mu
ff$Kmu2 <- ff$Kmu^2
ff$XKmu2 <- ff$Kmu2 * ff$X
head(ff)
## K X Kmu Kmu2 XKmu2
## 1 0 48 -9.261484 85.77509 4117.2042
## 2 1 29 -8.261484 68.25212 1979.3115
## 3 2 24 -7.261484 52.72915 1265.4996
## 4 3 22 -6.261484 39.20618 862.5360
## 5 4 19 -5.261484 27.68321 525.9811
## 6 5 16 -4.261484 18.16025 290.5639
Finally, based on this table, we can calculate the observed variance (s2), which is calculated as the average squared deviation from the mean of the dataset. In other words, the sum $X(K- \(\mu\))2$ column divided by the sum of the count of quadrats (X) column minus one (X-1)
s2 <- sum(ff$XKmu2) / (sum(ff$X)-1)
s2
## [1] 276.5555
To get the VMR (variance mean ratio) for the entire data set, take the S2 and divide it by \(\mu\). The variance-mean ratio (VMR) is calculated with:
VMR <- s2 / mu
VMR
## [1] 29.86082
If every cell contained the exact same number of points in the entire study area, then the variance (S2) would be zero (0), making the variance-mean ratio (VMR) also zero. This would imply that the points are perfectly dispersed. In this extreme (and very rare) scenario, the pattern is not random. Anytime the VMR < 1, then the point pattern is more evenly spaced (dispersed) than random.
If a point pattern is highly clustered with most cells containing no points and only a few cells having many points, the variance of cell frequencies (S2) will be large relative to the mean cell frequency (\(\mu\)). This will produce a very large value for the VMR. Anytime the VMR > 1, the point pattern is more clustered than random.
For a perfectly random point pattern, the variance (S2) of the cell frequency is equal to the mean cell frequency (\(\mu\)). When the VMR is close to 1, the points have a random arrangement (neither distributed nor clustered).
As our VMR was 29.8608186 it’s clear that our point pattern is clustered.
Was it a random process (Poisson) that generated this pattern? We need a chi-square test to determine this, but that is a process for another day.
The majority of this code are directly from [Robert Hijmans for rspatial.org] (https://rspatial.org/raster/analysis/8-pointpat.html)
McGrew Jr, J. C., & Monroe, C. B. (2009). An introduction to statistical problem solving in geography. Waveland Press.
O’sullivan, D., & Unwin, D. (2014). Geographic information analysis. John Wiley & Sons