Tips, tricks for and a template creating your Practical A formative.
You can find the assignment on Duo. This site is a supplementary guide only.
2020-2021
GEOG2472 | Social Research in Geography |
---|---|
Assessment | Formative |
Component | Practical A |
Submission Deadline | 19 November 2020 by 16.00 |
Assignment Length (number of pages): 4 pages of text with mixed media (code, maps, figures, and references do not count towards 4 pages)
The relationship between the formative and summative assessment: While vastly differing in content and subject matter, the relationship between the formative and summative elements coheres around the same principles, which require the students to:
R for Spatial Data Analysis: Formative Assessment
R is a tool that helps scientists obtain, process, wrangle, analyse, and visualize spatial data. RMarkdown offers the ability to generate the methodological details and parameters used for robust and reproducible social science. This includes annotations of code, user-generated descriptions, as well as graphs and maps produced. Scientific results produced through this process are reproducible with given consistent inputs (data) methods and parameters (code).
You will write a report with a maximum length of four pages that showcases the code and spatial analysis skills you have acquired in this practical. Your report should be produced through RMarkdown to ensure the integrity and reproducibility of the code. You will be using the data.police.uk API to extract at least 3-months of crime data within a one-mile radius of your chosen spatial coordinates, you will perform a quadrat analysis to summarize this data, map the data, and write a brief reflection on the process.
The assessment must submitted as an *.html written in RMarkdown by the due deadline to the module electronic drop box on DUO. There is NO hard copy to be handed in.
You need to be sure your document is an RMarkdown document. To work in R Markdown go to the file menu in RStudio–>New File–>RMarkdown and pick HTML
This template (with a table of contents over to the left) was created using the “material” template from rmdformats.
The top of your RMarkdown is your YML information (like metadata about your document). Make sure that is filled out.
The next chunk (around line 10) is your Global Options for how the chunks of code in RMarkdown will be evaluated and displayed. Make sure that within the options, eval is TRUE and echo is TRUE, but you should make sure message=FALSE and warning=FALSE as you don’t want the warning messages printing on your document. However, this block, normally has echo=FALSE or include=FALSE, which you should keep as options.
Once you have RMarkdown installed, you can add in templates. This is not compulsory, but allows you to personalize the output and make it pretty.
this blog post gives you an idea of what templates are possible based on
Once you have RMarkdown installed, you can install a template package the following:
# HTML pretty is very pretty templates
install.packages("html_pretty")
library("html_pretty")
# OR
install.packages("rmdformats")
library(rmdformats)
Then create a new document by going to RStudio–>New File–>RMarkdown and select template, and pick one, for example:
Introduce your project and give the rational for what you are doing in this study
Be sure that you describe your methods:
Include each chunk of code. Each chunk should include a description of what the code does and what each parameter you chose (and why). You may want to intersperse your results with your methods.
Be sure you summarize and describe the data:
Include all Maps and graphs of the data as well as the parameters of your map in your report.
This is where you get to add your opinion to the report. Be sure to reflect on the processes you learned in Practical A.
What did you already tell us?
You can cite within RMarkdown, see this section of the RMarkdown book. However, you may also copy and paste from the word processor of your choice (which may be easier).
Be sure to include citations to any sources you used as an example for code. Include citations to your packages.
One fun R trick is that you can have R tell you the citation for the packages by using the citation() command. For example, if I wanted to cite the package I used for this template (rmdformats):
To cite package 'rmdformats' in publications use:
Julien Barnier (2020). rmdformats: HTML Output Formats and Templates
for 'rmarkdown' Documents. R package version 0.3.7.
https://CRAN.R-project.org/package=rmdformats
A BibTeX entry for LaTeX users is
@Manual{,
title = {rmdformats: HTML Output Formats and Templates for 'rmarkdown' Documents},
author = {Julien Barnier},
year = {2020},
note = {R package version 0.3.7},
url = {https://CRAN.R-project.org/package=rmdformats},
}
I could also use .packages() to find all the packages loaded in a current R session
Finally, by using the write_bib command we can export all of our packages, or references to a .bib file in our working directory:
The following sections are intended as an examples of the code needed. There are multiple ways of doing things and I have suppressed the code examples where you need to formulate it yourself. This does not mean you need to copy directly what I wrote.
You will be using the UK Crime mapping API and mapping crime in a 1-mile radius of a specified location for at least 3 consecutive months of 2020. You will do a quadrat analysis and include maps of this.
Remember to do this, you may be knitting your RMarkdown to your working directory. You should have all of your data that you are trying to load in this directory.
Include a code block for loading your packages. Remember to describe what each package does. If you include the code for installing a package, make sure that the RMarkdown is set to eval=FALSE, because it will create an error when you knit this file. See the RMarkdown Book for more information about those code blocks
library("ggplot2")
library("sf")
library("sp")
library("dplyr")
library("formatR")
# for the API pulls
library("jsonlite")
library("httr")
library("tidyverse")
# for the quadrat analysis
library("spatstat")
library("rspatial")
library("rgdal")
library("maptools")
library("raster")
library("tigris")
options(tigris_class = "sf")
# for making maps
library("leaflet")
You may pick any location in the UK, you need to find the Latitude and the Longitude of this location as these will be inputs for the API.
You may want to use Google Maps to identify an X, and Y coordinate. Either way, create an object for your lat and long that you can call later.
You may add a map to this location if it helps
Make the path for the UK Police API into an object that you can call later
There are several different ways to do this. Here I created 3 different objects for each month and then combined them. If you wanted to include more months, you could just add more lines of code here.
may <- GET(url = path, query = list(lat = lat, lng = lng, date = "2020-5"))
june <- GET(url = path, query = list(lat = lat, lng = lng, date = "2020-6"))
july <- GET(url = path, query = list(lat = lat, lng = lng, date = "2020-7"))
Check if there was an error. If each of these returns “[1] 200” you are good, if one returns a 404 or something else, there is a problem.
[1] 200
[1] 200
[1] 200
Let’s convert our API response into text and make sure the charactors are UTF-8 (otherwise it might look like gunk)
may <- content(may, as = "text", encoding = "UTF-8")
june <- content(june, as = "text", encoding = "UTF-8")
july <- content(july, as = "text", encoding = "UTF-8")
create a dataframe that converts the JSON response of the API into a dataframe that you can read (otherwise you’ll have trouble reading it with all the “[}":"attribute…”
month1 <- fromJSON(may, flatten = TRUE) %>% data.frame()
month2 <- fromJSON(june, flatten = TRUE) %>% data.frame()
month3 <- fromJSON(july, flatten = TRUE) %>% data.frame()
Next, I used “rbind” to combine three months of data together before cleaning it
Let’s clean this file. Because they are now just one dataframe, this should be fairly easy
df <- dplyr::select(crime, month, category, location = location.street.name, long = location.longitude,
lat = location.latitude)
Now we have the dataframe with our data for the crimes.
How could we describe the data? By number of crimes? By unique locations?
How about a table to view by number of crimes?
Var1 | Freq |
---|---|
theft-from-the-person | 6 |
bicycle-theft | 7 |
shoplifting | 27 |
other-crime | 41 |
burglary | 45 |
vehicle-crime | 46 |
drugs | 55 |
other-theft | 61 |
criminal-damage-arson | 123 |
public-order | 169 |
anti-social-behaviour | 289 |
violent-crime | 528 |
Let’s take a quick look at where they are. Do they look clustered?
These points for the locations of crimes are not a “sf” object, so let’s convert them and assign a coordinate system. Because we know these were in 4326, we have to assign that. We can change it by projecting it later
Let’s put those points on a basemap
Want to learn more about leaflet? Check out Computing for the Social Science’s guide
Want to make custom icons? Leaflet Awesome
Maybe we should label them by the type of crime:
factpal <- colorFactor(rainbow(nrow(tbdf)), tbdf$Var1)
leaflet(point_geo) %>% addTiles() %>% addCircleMarkers(radius = 5, color = ~factpal(category),
stroke = FALSE, fillOpacity = 0.7)
We can also add a legend and change the basemap. To learn more about leaflet basemap tile sets, here is a good preview.
factpal <- colorFactor(rainbow(nrow(tbdf)), tbdf$Var1)
leaflet(point_geo) %>% addProviderTiles(providers$CartoDB.Positron) %>% addCircleMarkers(radius = 5,
color = ~factpal(category), stroke = FALSE, fillOpacity = 0.7) %>% addLegend("bottomright",
pal = factpal, values = ~category, title = "Crimes", opacity = 1)
[1] "sf" "data.frame"
# if sf, then use st_transform instead of spTransform
# proj4string(point_geo) <- CRS('+init=epsg:4326')
projcrimes <- st_transform(point_geo, CRS("+init=epsg:27700"))
The advantage of working in with EPSG:27700 https://epsg.io/27700 is it uses the UTM coordinate system which uses meters as a unit of measurement (much easier to communicate with than decimal degrees).
The disadvantage is that Leaflet doesn’t work easily with coordinate systems other than WGS84
Remember that we were initially pulling the data from a 1-mile circle around a X and Y location? Let’s draw that circle on the map by drawing a 1-mile buffer around our original lat/long
me <- st_sfc(st_point(c(lng, lat)), crs = 4326)
me_utm <- st_transform(me, CRS("+init=epsg:27700"))
# The API drew a 1 mile radius, but UTM is in meters, so convert 1 mile to meters
# there are 1.609344 kilometers in a mile and 1000 meters in a km
km <- 1.609344 * 1000
bufme <- st_buffer(me_utm, km)
Leaflet doesn’t work well with projections, so we’ll have to resort to a static map.
Let’s add that buffer to the map
xmin <- lng - 0.05
xmax <- lng + 0.05
ymin <- lat - 0.025
ymax <- lat + 0.025
ggplot(data = bufme) + geom_sf(data = bufme, fill = "white", color = "black") + geom_sf(data = projcrimes,
aes(color = category)) + geom_sf(data = me_utm, pch = "+", cex = 5) + coord_sf(xlim = c(xmin,
xmax), ylim = c(ymin, ymax), expand = FALSE)
Now let’s summarize our data:
[1] 1403 6
Simple feature collection with 6 features and 5 fields
geometry type: POINT
dimension: XY
bbox: xmin: 412390 ymin: 429201 xmax: 413662 ymax: 431499
CRS: PROJCRS["unknown",
BASEGEOGCRS["unknown",
DATUM["OSGB 1936",
ELLIPSOID["Airy 1830",6377563.396,299.3249646,
LENGTHUNIT["metre",1]],
ID["EPSG",6277]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8901]]],
CONVERSION["unknown",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",49,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-2,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996012717,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",400000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",-100000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
month category location long lat
1 2020-05 anti-social-behaviour On or near Selbourne Villas -1.813460 53.779676
2 2020-05 anti-social-behaviour On or near Sandall Magna -1.803010 53.759005
3 2020-05 anti-social-behaviour On or near Barwick Green -1.797337 53.768451
4 2020-05 anti-social-behaviour On or near Frensham Grove -1.806514 53.775513
5 2020-05 anti-social-behaviour On or near Shire Close -1.794254 53.759943
6 2020-05 anti-social-behaviour On or near Shire Close -1.794254 53.759943
geometry
1 POINT (412390 431499)
2 POINT (413085 429201)
3 POINT (413456 430253)
4 POINT (412849 431037)
5 POINT (413662 429307)
6 POINT (413662 429307)
## how many unique places have crime
xy <- unique(projcrimes)
## What are the dimensions of this?
dim(xy)
[1] 994 6
We will use the spatstat package to perform a quadrat analysis on our points. Make sure both spatstat is loaded and installed.
Spatstat works by creating it’s own object class: ppp objects, which have two components, a set of (x, y) points, and a study area or ‘window’.
There is no easy function to convert the sf objects we have into ppp, so we have to first move them to SP objects (using the ‘as’ function)
Now we can convert this sp point object into a ppp object using “as” again
We may need to clean the data before progressing, if so, keep in mind a ppp object doesn’t need to have variables or attribute information. In the spatstat package, variables are referred to as marks.
In a quadrat analysis we are more concerned with the pattern generated by the points and not their attributes, so you may want to clean your data, by setting all the attributes (or “marks”) to “null”
Next we explicitly define the study boundary which we will use to observe the pattern.Spatstat uses a special boundary object - an owin, which stands for an “observation window.” We will need to coerce our circle (“bufme”) to an object of class owin using the function as.owin(). Similar to the sf to ppp conversion, you need to convert sf objects first to sp to coerce it into a owin object.
circlesp <- as(bufme, "Spatial")
# use 'as.owin' to convert our circle into an owin
owin <- as.owin(circlesp)
# let's check what class this is
class(owin)
[1] "owin"
Third, you need to set or “bind” the one mile boundary owin to the pppcrime point feature object to establish the study window. Use the Window() function, which is a spatstat function.
Let’s make a map and see what we get!
Compute the mean center and standard distance fort he crime data. To calculate this, we need to extract the x and y coordinates
xy <- coords(pppcrime)
# compute the mean center (mc)
mc <- summarize(xy, xmean = mean(x), ymean = mean(y))
mc
xmean ymean
1 413565.7 430339.1
[1] 44.12344
Given that the density calculates the area in meters^2, this is a little bit better of a measure, however, it would be even better to have a quadrat count.
Here, we break up the area into equally sized cells and count the number of points that fall into each cell.
We can carry out a simple quadrat analysis on our data using the quadratcount() function in the spatstat package. You have to specify the number of cells in your area. You should try several and see which one works best for the pattern you see in your data
Here, we make a 6 x 6 grid (nx = 6, ny = 6), count the number of points within each of the 36 cells.
The object qcounts1 stores the number of points inside each quadrat. You can plot the quadrats along with the counts as follows.
# First, plot the city and points
plot(pppcrime, pch = 20, cols = "grey70", main = NULL)
# The plot the cells and counts
p1 <- plot(qcounts1, col = "red", add = TRUE)
NULL
If there are loads of zeros, then Six-by-six might be too big and you could try a 5 by 5 grid.
qcounts2 <- quadratcount(pppcrime, nx = 5, ny = 5)
p3 <- plot(pppcrime, pch = 20, cols = "grey70", main = NULL)
plot(qcounts2, col = "blue", add = TRUE)
Symbol map with constant values
pch: 20
cols: grey70
Based on how much crime is in this area, I need an even more dense grid, let’s try 8x8:
qcounts2 <- quadratcount(pppcrime, nx = 8, ny = 8)
p2 <- plot(pppcrime, pch = 20, cols = "grey70", main = NULL)
plot(qcounts2, col = "blue", add = TRUE)
Symbol map with constant values
pch: 20
cols: grey70
In real life one should always try a range of row and column sizes to get a sense of how sensitive the results are.
These are neat plots, but let’s try to numerically summarize them. We’ll need to convert the resulting qcounts2 object into a regular data frame to calculate the VMR. Use the data.frame() function.
we can get a basic measure of the spatial arrangement of points by calculating the VMR of the quadrat counts
[1] 22.84445
VMR greater than 1.0 indicates a tendency towards clustering, this is large, but not that large.
Let’s use a Chi-square test for spatial randomness using the handy dandy quadrat.test() function. The null hypothesis is spatial randomness.
Chi-squared test of CSR using quadrat counts
data: pppcrime
X2 = 1337.4, df = 59, p-value < 2.2e-16
alternative hypothesis: two.sided
Quadrats: 60 tiles (irregular windows)
This p-value is very close to zero. This indicates that we can reject the null hypothesis and as our points are clustered and are not random.
How would your results be different at a different scale?
You can either paste these in or automatically generate them
Knit often! When you are finished you’ll have an *.html document, this document is what you are uploading. You do not need to upload your .RMD document, but you may choose to do so.