This document can be accessed here: http://tiny.cc/w15pirate

Point Pattern Analysis

We need to start by loading the packages and reading in the data

if (!require("rspatial")) devtools::install_github('rspatial/rspatial')
library(rspatial)
city <- sp_data('city')
crime <- sp_data('crime')

Let’s look at a quick map of both datasets

plot(city, col='light blue')
points(crime, col='red', cex=.5, pch='+')

Let’s look at a table of the frequency of each type of crime:

tbdf <-as.data.frame(sort(table(crime$CATEGORY))[-1])
tbdf
##                    Var1 Freq
## 1                 Arson    9
## 2               Weapons   15
## 3               Robbery   49
## 4            Auto Theft   86
## 5    Drugs or Narcotics  134
## 6   Commercial Burglary  143
## 7           Grand Theft  143
## 8              Assaults  172
## 9                   DUI  212
## 10 Residential Burglary  219
## 11     Vehicle Burglary  221
## 12      Drunk in Public  232
## 13            Vandalism  355
## 14          Petty Theft  665

Let’s get the coordinates of the crime data:

xyco <- coordinates(crime)
## What are the dimensions of this?
dim(xyco)
## [1] 2661    2
#Let's take a look at the first few rows
head(xyco)
##      coords.x1 coords.x2
## [1,]   6628868   1963718
## [2,]   6632796   1964362
## [3,]   6636855   1964873
## [4,]   6626493   1964343
## [5,]   6639506   1966094
## [6,]   6640478   1961983
##how many unique places have crime
xy <- unique(xyco)
##What are the dimensions of this?
dim(xy)
## [1] 1208    2

Basic statistics

Compute the mean center and standard distance for the crime data.

# mean center of locations where crimes occurred
mc <- apply(xy, 2, mean)
mc
## coords.x1 coords.x2 
##   6635645   1963105
# standard distance
sd <- sqrt(sum((xy[,1] - mc[1])^2 + (xy[,2] - mc[2])^2) / nrow(xy))
sd
## [1] 7224.564
# mean center of locations weighted by number of crimes
mcwt <- apply(xyco, 2, mean)
mcwt
## coords.x1 coords.x2 
##   6635693   1962890
# standard distance
sdwt <- sqrt(sum((xyco[,1] - mcwt[1])^2 + (xyco[,2] - mcwt[2])^2) / nrow(xyco))
sdwt
## [1] 7157.337

Quadrat analysis

A quadrat analysis is a way of testing whether or not a spatial pattern is significantly different than a random spatial pattern.

It requires applying a grid (cells of any shape) to our study area and then counting how many points fall within it.

When the study area is divided into cells and all points are in one cell (or ‘quadrat’), then there is a high degree of variation regarding the number of points per cell. Conversely, if there is the exact same number of points per cell, the variance is zero as there is no difference between a cell and any other cell.

Checking density

Here is a basic approach to computing point density (although there are many)

This calculates the surface area of our study area (“city”), then calculates the density as the number of crimes (number of rows in the crimes in the city) divided by the area of the city

CityArea <- raster::area(city)
dens <- nrow(xyco) / CityArea

What is the number of crimes per square km?

To compute quadrat counts I first create quadrats (e.g. a RasterLayer). I get the extent for the raster from the city polygon, and then assign an an arbitrary resolution of 1000. (In real life you should always try a range of resolutions, as this seriously impacts your results)

r <- raster(city)
res(r) <- 1000
r
## class      : RasterLayer 
## dimensions : 15, 34, 510  (nrow, ncol, ncell)
## resolution : 1000, 1000  (x, y)
## extent     : 6620591, 6654591, 1956519, 1971519  (xmin, xmax, ymin, ymax)
## crs        : +proj=lcc +lat_1=38.33333333333334 +lat_2=39.83333333333334 +lat_0=37.66666666666666 +lon_0=-122 +x_0=2000000 +y_0=500000.0000000001 +datum=NAD83 +units=us-ft +no_defs +ellps=GRS80 +towgs84=0,0,0

To find the cells that are in the city, I create polygons from the RasterLayer.

r <- rasterize(city, r)
plot(r)
quads <- as(r, 'SpatialPolygons')
plot(quads, add=TRUE)
points(crime, col='red', cex=.5)

Finally, we can count!

The number of events (aka points) in each quadrat/cell can be counted using the ‘rasterize’ function. That function can be used to summarize the number of points within each cell, but also to compute statistics based on the ‘marks’ (attributes). For example we could compute the number of different crime types) by changing the ‘fun’ argument to another function (see ?rasterize).

nc <- rasterize(coordinates(crime), r, fun='count', background=0)
plot(nc)
plot(city, add=TRUE)

This map (nc) has crime counts (yay!) and we can see clustering (hooray!). But, you can see that the raster cells are spilling outside of the city limits and we know that we only have data for the city, so those areas should be excluded from the map.We can do that with the mask function (see ?mask).

ncrimes <- mask(nc, r)
plot(ncrimes)
plot(city, add=TRUE)  

Better. Now let’s look at the frequencies of crimes per cell in this grid

f <- freq(ncrimes, useNA='no')
head(f)
##      value count
## [1,]     0    48
## [2,]     1    29
## [3,]     2    24
## [4,]     3    22
## [5,]     4    19
## [6,]     5    16
#let's graph this (the frequency of crimes per cell)
plot(f, pch=20)

This means that the number of cells have 0 crimes, and one cell has over 200, but it’s an extreme outlier.

Now compute average number of crimes per quadrat (cell).

# number of quadrats
quadrats <- sum(f[,2])
quadrats
## [1] 283
# number of cases
cases <- sum(f[,1] * f[,2])
cases
## [1] 2621
mu <- cases / quadrats
#This now tells us the mean quadrant count (μ)
mu
## [1] 9.261484

Now let’s create a table for our VAR showing the number of events (K), the count quadrants (X), the variance (Kmu = events - \(\mu\)), the variance squared (Kmu2), and the variance squared multiplied by the number of quadrants (XKmu2)

ff <- data.frame(f)
colnames(ff) <- c('K', 'X')
ff$Kmu <- ff$K - mu
ff$Kmu2 <- ff$Kmu^2
ff$XKmu2 <- ff$Kmu2 * ff$X
head(ff)
##   K  X       Kmu     Kmu2     XKmu2
## 1 0 48 -9.261484 85.77509 4117.2042
## 2 1 29 -8.261484 68.25212 1979.3115
## 3 2 24 -7.261484 52.72915 1265.4996
## 4 3 22 -6.261484 39.20618  862.5360
## 5 4 19 -5.261484 27.68321  525.9811
## 6 5 16 -4.261484 18.16025  290.5639

Finally, based on this table, we can calculate the observed variance (s2), which is calculated as the average squared deviation from the mean of the dataset. In other words, the sum $X(K- \(\mu\))2$ column divided by the sum of the count of quadrats (X) column minus one (X-1)

s2 <- sum(ff$XKmu2) / (sum(ff$X)-1)
s2
## [1] 276.5555

To get the VMR (variance mean ratio) for the entire data set, take the S2 and divide it by \(\mu\). The variance-mean ratio (VMR) is calculated with:

VMR <- s2 / mu
VMR
## [1] 29.86082

What does this all mean?

If every cell contained the exact same number of points in the entire study area, then the variance (S2) would be zero (0), making the variance-mean ratio (VMR) also zero. This would imply that the points are perfectly dispersed. In this extreme (and very rare) scenario, the pattern is not random. Anytime the VMR < 1, then the point pattern is more evenly spaced (dispersed) than random.

If a point pattern is highly clustered with most cells containing no points and only a few cells having many points, the variance of cell frequencies (S2) will be large relative to the mean cell frequency (\(\mu\)). This will produce a very large value for the VMR. Anytime the VMR > 1, the point pattern is more clustered than random.

For a perfectly random point pattern, the variance (S2) of the cell frequency is equal to the mean cell frequency (\(\mu\)). When the VMR is close to 1, the points have a random arrangement (neither distributed nor clustered).

As our VMR was 29.8608186 it’s clear that our point pattern is clustered.

Was it a random process (Poisson) that generated this pattern? We need a chi-square test to determine this, but that is a process for another day.

References

The majority of this code are directly from [Robert Hijmans for rspatial.org] (https://rspatial.org/raster/analysis/8-pointpat.html)

McGrew Jr, J. C., & Monroe, C. B. (2009). An introduction to statistical problem solving in geography. Waveland Press.

O’sullivan, D., & Unwin, D. (2014). Geographic information analysis. John Wiley & Sons