Tips, tricks for and a template creating your Practical A formative.

Assignment

You can find the assignment on Duo. This site is a supplementary guide only.

Formative Assessment Practical A: Spatial Data Analysis

2020-2021

GEOG2472 Social Research in Geography
Assessment Formative
Component Practical A
Submission Deadline 19 November 2020 by 16.00

Assignment Length (number of pages): 4 pages of text with mixed media (code, maps, figures, and references do not count towards 4 pages)

The relationship between the formative and summative assessment: While vastly differing in content and subject matter, the relationship between the formative and summative elements coheres around the same principles, which require the students to:

  1. To provide the experience of conducting research projects which require students to identify individual research topics and questions for enquiry.
  2. To engage critically and reflexively with key issues in social research methods, from the conceptual and operational, to more practical elements of conducting research.
  3. To apply practical skills, quantitative approaches and theoretical understanding of the techniques in empirical analyses, for writing an individual report or essay.

Assessment/Assignment Title:

R for Spatial Data Analysis: Formative Assessment

Assessment Details:

R is a tool that helps scientists obtain, process, wrangle, analyse, and visualize spatial data. RMarkdown offers the ability to generate the methodological details and parameters used for robust and reproducible social science. This includes annotations of code, user-generated descriptions, as well as graphs and maps produced. Scientific results produced through this process are reproducible with given consistent inputs (data) methods and parameters (code).

You will write a report with a maximum length of four pages that showcases the code and spatial analysis skills you have acquired in this practical. Your report should be produced through RMarkdown to ensure the integrity and reproducibility of the code. You will be using the data.police.uk API to extract at least 3-months of crime data within a one-mile radius of your chosen spatial coordinates, you will perform a quadrat analysis to summarize this data, map the data, and write a brief reflection on the process.

Your report should include a written description that explains:

  • The rational for what you are doing: What area did you chose to analyse and why? Why would someone study the spatial distribution of crime?
  • A description of your methods: What did you do? Which packages did you use? What is an API and why are you using it? What does a quadrat analysis show?
  • Discussion: What do the results indicate about the area surrounding your chosen coordinates? What is represe#nted in this dataset? What might not be represented in this data set? Are there any problems with using police data? Are there any problems with using a quadrat analysis?
  • Results: What kinds of crimes are represented in this data? What did you find about the spatial distribution of points?
  • Reflection of the process: Detail any problems you encountered. How do you imagine using spatial analysis and R in your research? How does R compare to using a traditional GIS like ArcMap?

Each component should include the code, your results, and the parameters necessary to reproduce your results:

  • Pulling data from an API: What were the parameters (inputs) you used for the API?
  • A summary of the data: What are the geographic extents? What are the basic characteristics of this data (where does it cover, how many points does it have, etc..).
  • Maps and graphs of the data: include the maps and the parameters of your map in your report.

Submission:

The assessment must submitted as an *.html written in RMarkdown by the due deadline to the module electronic drop box on DUO. There is NO hard copy to be handed in.

Formatting RMarkdown

You need to be sure your document is an RMarkdown document. To work in R Markdown go to the file menu in RStudio–>New File–>RMarkdown and pick HTML

selecting HTML

This template (with a table of contents over to the left) was created using the “material” template from rmdformats.

The top of your RMarkdown is your YML information (like metadata about your document). Make sure that is filled out.

The next chunk (around line 10) is your Global Options for how the chunks of code in RMarkdown will be evaluated and displayed. Make sure that within the options, eval is TRUE and echo is TRUE, but you should make sure message=FALSE and warning=FALSE as you don’t want the warning messages printing on your document. However, this block, normally has echo=FALSE or include=FALSE, which you should keep as options.

opts_chunk$set(eval = TRUE, prompt = FALSE, tidy = TRUE, message = FALSE, warning = FALSE)

Templates

Once you have RMarkdown installed, you can add in templates. This is not compulsory, but allows you to personalize the output and make it pretty.

this blog post gives you an idea of what templates are possible based on

Once you have RMarkdown installed, you can install a template package the following:

# HTML pretty is very pretty templates
install.packages("html_pretty")
library("html_pretty")

# OR

install.packages("rmdformats")
library(rmdformats)

Then create a new document by going to RStudio–>New File–>RMarkdown and select template, and pick one, for example:

selecting pretty template

If you need help navigating RMarkdown

please check out one of the many guides to Rmarkdown: such as this one

If you are concerned with the syntax (e.g. headings, lists, links, blockquotes, bold words, etc), check out Markdown Basics

If you are more worried about the funcitonality, check out this guide to the inline code.

Formative Sections

Introduction

Introduce your project and give the rational for what you are doing in this study

  • What area did you chose to analyse and why?
  • Why would someone study the spatial distribution of crime?

Methods & Results

Be sure that you describe your methods:

  • What did you do?
  • Which packages did you use? Why did you choose these?
  • What is an API and why are you using it?
  • What is each step of the code doing?
  • What does a quadrat analysis show?
  • What are the parameters you used and why?

Include each chunk of code. Each chunk should include a description of what the code does and what each parameter you chose (and why). You may want to intersperse your results with your methods.

Be sure you summarize and describe the data:

  • What are the geographic extents?
  • What are the basic characteristics of this data
    • Where does it cover?
    • How many points does it have?
    • Anything else that is relevant or important

Include all Maps and graphs of the data as well as the parameters of your map in your report.

Results

  • Summarize
    • What kinds of crimes are represented in this data?
    • What did you find about the spatial distribution of points?

Discussion

  • Discuss your findings:
    • What do the results indicate about the area surrounding your chosen coordinates?
    • What is represented in this dataset?
    • What might not be represented in this data set?
    • Are there any problems with using police data?
    • Are there any problems with using a quadrat analysis?

Reflection

This is where you get to add your opinion to the report. Be sure to reflect on the processes you learned in Practical A.

  • Detail any problems you encountered.
  • How do you imagine using spatial analysis and R in your research?
  • How does R compare to using a traditional GIS like ArcMap?

Conclusion

What did you already tell us?

References

You can cite within RMarkdown, see this section of the RMarkdown book. However, you may also copy and paste from the word processor of your choice (which may be easier).

Be sure to include citations to any sources you used as an example for code. Include citations to your packages.

One fun R trick is that you can have R tell you the citation for the packages by using the citation() command. For example, if I wanted to cite the package I used for this template (rmdformats):

citation("rmdformats")

To cite package 'rmdformats' in publications use:

  Julien Barnier (2020). rmdformats: HTML Output Formats and Templates
  for 'rmarkdown' Documents. R package version 0.3.7.
  https://CRAN.R-project.org/package=rmdformats

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {rmdformats: HTML Output Formats and Templates for 'rmarkdown' Documents},
    author = {Julien Barnier},
    year = {2020},
    note = {R package version 0.3.7},
    url = {https://CRAN.R-project.org/package=rmdformats},
  }

I could also use .packages() to find all the packages loaded in a current R session

Finally, by using the write_bib command we can export all of our packages, or references to a .bib file in our working directory:

knitr::write_bib(c(.packages(), "rmdformats"), "packages.bib")

Step-by-step

The following sections are intended as an examples of the code needed. There are multiple ways of doing things and I have suppressed the code examples where you need to formulate it yourself. This does not mean you need to copy directly what I wrote.

Your mission

You will be using the UK Crime mapping API and mapping crime in a 1-mile radius of a specified location for at least 3 consecutive months of 2020. You will do a quadrat analysis and include maps of this.

General R setup

Set your working directory

Remember to do this, you may be knitting your RMarkdown to your working directory. You should have all of your data that you are trying to load in this directory.

Load your packages

Include a code block for loading your packages. Remember to describe what each package does. If you include the code for installing a package, make sure that the RMarkdown is set to eval=FALSE, because it will create an error when you knit this file. See the RMarkdown Book for more information about those code blocks

library("ggplot2")
library("sf")
library("sp")
library("dplyr")
library("formatR")

# for the API pulls
library("jsonlite")
library("httr")
library("tidyverse")

# for the quadrat analysis
library("spatstat")
library("rspatial")
library("rgdal")
library("maptools")
library("raster")
library("tigris")
options(tigris_class = "sf")

# for making maps
library("leaflet")

Getting data from the API

Picking a location

You may pick any location in the UK, you need to find the Latitude and the Longitude of this location as these will be inputs for the API.

You may want to use Google Maps to identify an X, and Y coordinate. Either way, create an object for your lat and long that you can call later.

You may add a map to this location if it helps

m <- leaflet()
m <- addTiles(m)
m <- addMarkers(m, lng = lng, lat = lat, popup = "my point")
m

Make the path an object

Make the path for the UK Police API into an object that you can call later

path <- "https://data.police.uk/api/crimes-street/all-crime?"

Creating a dataframe with crimes

There are several different ways to do this. Here I created 3 different objects for each month and then combined them. If you wanted to include more months, you could just add more lines of code here.

may <- GET(url = path, query = list(lat = lat, lng = lng, date = "2020-5"))

june <- GET(url = path, query = list(lat = lat, lng = lng, date = "2020-6"))

july <- GET(url = path, query = list(lat = lat, lng = lng, date = "2020-7"))

Check if there was an error. If each of these returns “[1] 200” you are good, if one returns a 404 or something else, there is a problem.

may$status_code
[1] 200
june$status_code
[1] 200
july$status_code
[1] 200

Let’s convert our API response into text and make sure the charactors are UTF-8 (otherwise it might look like gunk)

may <- content(may, as = "text", encoding = "UTF-8")
june <- content(june, as = "text", encoding = "UTF-8")
july <- content(july, as = "text", encoding = "UTF-8")

create a dataframe that converts the JSON response of the API into a dataframe that you can read (otherwise you’ll have trouble reading it with all the “[}":"attribute…”

month1 <- fromJSON(may, flatten = TRUE) %>% data.frame()

month2 <- fromJSON(june, flatten = TRUE) %>% data.frame()

month3 <- fromJSON(july, flatten = TRUE) %>% data.frame()

Next, I used “rbind” to combine three months of data together before cleaning it

crime <- rbind(month1, month2, month3)

Let’s clean this file. Because they are now just one dataframe, this should be fairly easy

df <- dplyr::select(crime, month, category, location = location.street.name, long = location.longitude, 
    lat = location.latitude)

Now we have the dataframe with our data for the crimes.

Describing the data

How could we describe the data? By number of crimes? By unique locations?

How about a table to view by number of crimes?

Var1 Freq
theft-from-the-person 6
bicycle-theft 7
shoplifting 27
other-crime 41
burglary 45
vehicle-crime 46
drugs 55
other-theft 61
criminal-damage-arson 123
public-order 169
anti-social-behaviour 289
violent-crime 528

Let’s take a quick look at where they are. Do they look clustered?

Converting to a spatial object

These points for the locations of crimes are not a “sf” object, so let’s convert them and assign a coordinate system. Because we know these were in 4326, we have to assign that. We can change it by projecting it later

point_geo <- st_as_sf(df, coords = c(x = "long", y = "lat"), remove = FALSE, crs = 4326)

Let’s put those points on a basemap

leaflet(data = point_geo) %>% addTiles() %>% addMarkers()

Want to learn more about leaflet? Check out Computing for the Social Science’s guide

Want to make custom icons? Leaflet Awesome

Maybe we should label them by the type of crime:

factpal <- colorFactor(rainbow(nrow(tbdf)), tbdf$Var1)

leaflet(point_geo) %>% addTiles() %>% addCircleMarkers(radius = 5, color = ~factpal(category), 
    stroke = FALSE, fillOpacity = 0.7)

We can also add a legend and change the basemap. To learn more about leaflet basemap tile sets, here is a good preview.

factpal <- colorFactor(rainbow(nrow(tbdf)), tbdf$Var1)

leaflet(point_geo) %>% addProviderTiles(providers$CartoDB.Positron) %>% addCircleMarkers(radius = 5, 
    color = ~factpal(category), stroke = FALSE, fillOpacity = 0.7) %>% addLegend("bottomright", 
    pal = factpal, values = ~category, title = "Crimes", opacity = 1)

Projections

reproject our points

# check if it's sp or sf
class(point_geo)
[1] "sf"         "data.frame"
# if sf, then use st_transform instead of spTransform

# proj4string(point_geo) <- CRS('+init=epsg:4326')
projcrimes <- st_transform(point_geo, CRS("+init=epsg:27700"))

The advantage of working in with EPSG:27700 https://epsg.io/27700 is it uses the UTM coordinate system which uses meters as a unit of measurement (much easier to communicate with than decimal degrees).

The disadvantage is that Leaflet doesn’t work easily with coordinate systems other than WGS84

draw a buffer

Remember that we were initially pulling the data from a 1-mile circle around a X and Y location? Let’s draw that circle on the map by drawing a 1-mile buffer around our original lat/long

me <- st_sfc(st_point(c(lng, lat)), crs = 4326)
me_utm <- st_transform(me, CRS("+init=epsg:27700"))
# The API drew a 1 mile radius, but UTM is in meters, so convert 1 mile to meters

# there are 1.609344 kilometers in a mile and 1000 meters in a km
km <- 1.609344 * 1000

bufme <- st_buffer(me_utm, km)

let’s look at that buffer with our points

Leaflet doesn’t work well with projections, so we’ll have to resort to a static map.

Let’s add that buffer to the map

xmin <- lng - 0.05
xmax <- lng + 0.05
ymin <- lat - 0.025
ymax <- lat + 0.025

ggplot(data = bufme) + geom_sf(data = bufme, fill = "white", color = "black") + geom_sf(data = projcrimes, 
    aes(color = category)) + geom_sf(data = me_utm, pch = "+", cex = 5) + coord_sf(xlim = c(xmin, 
    xmax), ylim = c(ymin, ymax), expand = FALSE)

Now let’s summarize our data

Now let’s summarize our data:

dim(projcrimes)
[1] 1403    6
# Let's take a look at the first few rows
head(projcrimes)
Simple feature collection with 6 features and 5 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 412390 ymin: 429201 xmax: 413662 ymax: 431499
CRS:            PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["OSGB 1936",
            ELLIPSOID["Airy 1830",6377563.396,299.3249646,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6277]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8901]]],
    CONVERSION["unknown",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",49,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-2,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996012717,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",400000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",-100000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
    month              category                    location      long       lat
1 2020-05 anti-social-behaviour On or near Selbourne Villas -1.813460 53.779676
2 2020-05 anti-social-behaviour    On or near Sandall Magna -1.803010 53.759005
3 2020-05 anti-social-behaviour    On or near Barwick Green -1.797337 53.768451
4 2020-05 anti-social-behaviour   On or near Frensham Grove -1.806514 53.775513
5 2020-05 anti-social-behaviour      On or near Shire Close -1.794254 53.759943
6 2020-05 anti-social-behaviour      On or near Shire Close -1.794254 53.759943
               geometry
1 POINT (412390 431499)
2 POINT (413085 429201)
3 POINT (413456 430253)
4 POINT (412849 431037)
5 POINT (413662 429307)
6 POINT (413662 429307)
## how many unique places have crime
xy <- unique(projcrimes)
## What are the dimensions of this?
dim(xy)
[1] 994   6

Point Pattern Analysis

We will use the spatstat package to perform a quadrat analysis on our points. Make sure both spatstat is loaded and installed.

Spatstat works by creating it’s own object class: ppp objects, which have two components, a set of (x, y) points, and a study area or ‘window’.

There is no easy function to convert the sf objects we have into ppp, so we have to first move them to SP objects (using the ‘as’ function)

spcrime <- as(projcrimes, "Spatial")

Now we can convert this sp point object into a ppp object using “as” again

pppcrime <- as(spcrime, "ppp")

We may need to clean the data before progressing, if so, keep in mind a ppp object doesn’t need to have variables or attribute information. In the spatstat package, variables are referred to as marks.

In a quadrat analysis we are more concerned with the pattern generated by the points and not their attributes, so you may want to clean your data, by setting all the attributes (or “marks”) to “null”

marks(pppcrime) <- NULL

Next we explicitly define the study boundary which we will use to observe the pattern.Spatstat uses a special boundary object - an owin, which stands for an “observation window.” We will need to coerce our circle (“bufme”) to an object of class owin using the function as.owin(). Similar to the sf to ppp conversion, you need to convert sf objects first to sp to coerce it into a owin object.

circlesp <- as(bufme, "Spatial")
# use 'as.owin' to convert our circle into an owin
owin <- as.owin(circlesp)
# let's check what class this is
class(owin)
[1] "owin"

Third, you need to set or “bind” the one mile boundary owin to the pppcrime point feature object to establish the study window. Use the Window() function, which is a spatstat function.

Window(pppcrime) <- owin

Let’s make a map and see what we get!

plot(owin, col = "light blue")
points(pppcrime, col = "red", cex = 0.5, pch = "+")

Centrography statistics

Compute the mean center and standard distance fort he crime data. To calculate this, we need to extract the x and y coordinates

xy <- coords(pppcrime)

# compute the mean center (mc)
mc <- summarize(xy, xmean = mean(x), ymean = mean(y))
mc
     xmean    ymean
1 413565.7 430339.1
# standard distance
sd <- sqrt(sum((xy[, 1] - mc[1])^2 + (xy[, 2] - mc[2])^2)/nrow(xy))
sd
[1] 44.12344
# calculate the density
CityArea <- as.numeric(st_area(projcrimes))
density <- nrow(xy)/CityArea

Given that the density calculates the area in meters^2, this is a little bit better of a measure, however, it would be even better to have a quadrat count.

Quadrat counts

Here, we break up the area into equally sized cells and count the number of points that fall into each cell.

We can carry out a simple quadrat analysis on our data using the quadratcount() function in the spatstat package. You have to specify the number of cells in your area. You should try several and see which one works best for the pattern you see in your data

Here, we make a 6 x 6 grid (nx = 6, ny = 6), count the number of points within each of the 36 cells.

qcounts1 <- quadratcount(pppcrime, nx = 6, ny = 6)

The object qcounts1 stores the number of points inside each quadrat. You can plot the quadrats along with the counts as follows.

# First, plot the city and points
plot(pppcrime, pch = 20, cols = "grey70", main = NULL)
# The plot the cells and counts
p1 <- plot(qcounts1, col = "red", add = TRUE)

p1
NULL

If there are loads of zeros, then Six-by-six might be too big and you could try a 5 by 5 grid.

qcounts2 <- quadratcount(pppcrime, nx = 5, ny = 5)

p3 <- plot(pppcrime, pch = 20, cols = "grey70", main = NULL)
plot(qcounts2, col = "blue", add = TRUE)

p3
Symbol map with constant values
pch: 20
cols: grey70

Based on how much crime is in this area, I need an even more dense grid, let’s try 8x8:

qcounts2 <- quadratcount(pppcrime, nx = 8, ny = 8)

p2 <- plot(pppcrime, pch = 20, cols = "grey70", main = NULL)
plot(qcounts2, col = "blue", add = TRUE)

p2
Symbol map with constant values
pch: 20
cols: grey70

In real life one should always try a range of row and column sizes to get a sense of how sensitive the results are.

These are neat plots, but let’s try to numerically summarize them. We’ll need to convert the resulting qcounts2 object into a regular data frame to calculate the VMR. Use the data.frame() function.

Qcount <- data.frame(qcounts2)

we can get a basic measure of the spatial arrangement of points by calculating the VMR of the quadrat counts

var(Qcount$Freq)/mean(Qcount$Freq)
[1] 22.84445

VMR greater than 1.0 indicates a tendency towards clustering, this is large, but not that large.

Chi-square

Let’s use a Chi-square test for spatial randomness using the handy dandy quadrat.test() function. The null hypothesis is spatial randomness.

quadrat.test(pppcrime, nx = 8, ny = 8)

    Chi-squared test of CSR using quadrat counts

data:  pppcrime
X2 = 1337.4, df = 59, p-value < 2.2e-16
alternative hypothesis: two.sided

Quadrats: 60 tiles (irregular windows)

This p-value is very close to zero. This indicates that we can reject the null hypothesis and as our points are clustered and are not random.

How would your results be different at a different scale?

Bibliography/References used

You can either paste these in or automatically generate them

Reminder

Knit often! When you are finished you’ll have an *.html document, this document is what you are uploading. You do not need to upload your .RMD document, but you may choose to do so.