1 Introduction

Link to introductive video

Start libraries:

library(sf)
library(tidyverse)
library(rnaturalearth)
library(ggthemes)
library(knitr)
library(units)
library(lubridate)
library(ggspatial)
library(ggmap)

options(scipen = 999, width = 10)

2 Calculate polygon area

In real life the spatial distribution of any phenomena is usually not regular. Irregular is also the distribution of administrative units. This means, if we want to compare different municipalities with each other, absolute values may not be the best option (at least we are then unable to describe the full picture). Next to the absolute values very often density is used. This means we present value per spatial unit (e.g. km2). Let us try to calculate area of districts and then calculate in which district is the highest and smallest population density in Georgia.

# calculate area for each district:
ge_adm_2_pop$area <- st_area(ge_adm_2_pop)

# Show areas of first 5 municipalities:
ge_adm_2_pop$area[1:5]
## Units: [m^2]
## [1]  313647912
## [2]  793499298
## [3] 1236316014
## [4]  970579105
## [5]   12850125

We see that, polygon areas are calculated in same units as the reference system of geolayer (currently meters) and the variable type is Units. Usually the density is given square kilometers. We can convert units from square meter to square kilometer and store it as numeric:

# st units to km2
ge_adm_2_pop$area <- set_units(ge_adm_2_pop$area, km^2)

#Check again:
ge_adm_2_pop$area[1:5]
## Units: [km^2]
## [1]  313.64791
## [2]  793.49930
## [3] 1236.31601
## [4]  970.57911
## [5]   12.85013
# sometimes its needed, that area is numeric:
ge_adm_2_pop$area <- as.numeric(ge_adm_2_pop$area)

Where is the biggest / smallest district (by area) of Georgia?

# filter biggest and smallest
ge_adm_2_extreme <- ge_adm_2_pop %>% filter(area == min(area) | area == max(area)) #"|" corresponds to 'OR'

Create the map:

ggplot()+
  theme_minimal()+
  geom_sf(data = ge_adm_1, fill = "grey70", col = "grey80", size= 0.2)+
  geom_sf(data = ge_adm_2_pop, fill="grey96", colour = "grey80", size = 0.2)+
  geom_sf(data = ge_adm_2_extreme, aes(fill = as.factor(area), colour = as.factor(area)), size=1)+
  geom_sf_text(data = ge_adm_2_extreme, aes(label = ADM_EN))+
  guides(fill=F, colour = F)+
  scale_fill_manual(values =c("red", "dodgerblue"))+
  scale_colour_manual(values =c("red", "dodgerblue"))

What is the area of biggest municipality? Smallest? What is average area of municipalities?

2.1 Spatial density

Next step is to calculate the spatial density of population in districts:

# it can be done in several ways
# first you can calculate the density in script of the ggplot:
ggplot()+
  theme_minimal()+
  geom_sf(data = ge_adm_0, fill = "grey70", colour = "grey60", size= 0.2)+
  geom_sf(data = ge_adm_2_pop, aes(fill = POP_Total / area), colour = "grey60", size = 0.2)+
  scale_fill_viridis_c(na.value = "red")

# Or you can calculate the density as separate column and after that draw the map:
ge_adm_2_pop <- ge_adm_2_pop %>% 
  mutate(pop_dens = POP_Total / area)

ggplot()+
  theme_minimal()+
  geom_sf(data = ge_adm_0, fill = "grey70", colour = "grey60", size= 0.2)+
  geom_sf(data = ge_adm_2_pop, aes(fill = pop_dens), colour = "grey60", size=0.25)+
  scale_fill_viridis_c(na.value = "red")+
  labs(fill = "population\ndensity\nkm2")

In which municipality is the population density highest? Where is it smallest?

# geom column slows down the query if we want to get quick overview. 
#Copy the layer and delete the geometry column:
temp0 <- ge_adm_2_pop %>% 
  st_drop_geometry()

# district with highest density of population:
temp0 %>% 
  filter(pop_dens == max(pop_dens)) %>% # filter max value
  kable() # format the output
ADM_PCODE ADM_EN POP_Total area pop_dens
GE4711 Gori City 48143 15.96035 3016.413
# 10 districts with lowest population density:
temp0 %>% 
  arrange(pop_dens) %>% 
  head(10) %>% 
  kable()
ADM_PCODE ADM_EN POP_Total area pop_dens
GE3830 Mestia 9316 3068.4855 3.036026
GE3526 Lentekhi 4386 1442.8955 3.039721
GE3231 Kazbegi 3795 1067.2891 3.555738
GE3529 Oni 6130 1363.0849 4.497152
GE3523 Ambrolauri 9139 1121.3702 8.149851
GE2928 Dedoplistskaro 21221 2520.8410 8.418222
GE3225 Dusheti 25659 3016.2177 8.507012
GE3227 Tianeti 9468 939.0467 10.082566
GE4125 Aspindza 10372 852.5888 12.165302
GE3531 Tsageri 10387 747.7607 13.890808
# highest density in top10 municipalities:
temp0 %>% 
  arrange(-pop_dens) %>% 
  head(10) %>% 
  kable()
ADM_PCODE ADM_EN POP_Total area pop_dens
GE4711 Gori City 48143 15.96035 3016.4129
GE2611 Kutaisi 147635 56.92257 2593.6111
GE1511 Batumi 152839 65.11924 2347.0635
GE11 Tbilisi 1171100 502.51710 2330.4679
GE3811 Zugdidi City 42998 21.42328 2007.0693
GE2911 Telavi City 19629 12.00048 1635.6848
GE4411 Rustavi 125103 79.29119 1577.7668
GE4111 Akhaltsikhe City 17903 12.85013 1393.2160
GE3814 Poti 41465 41.29668 1004.0759
GE2311 Ozurgeti City 14785 22.41720 659.5381

But wait! Look again the map of population density!
It’s ugly! We can do better! Go back to the first practical session and remind how to use classInt package to change class breaks. Do it and add it to your collection of homework’s.
Like this?

3 Base map

Sometimes we are facing not only with the fact that your thematic layer doesn’t fit well with base map, but even bigger problem is that you don’t have a proper base map. One nice possibility is to use geolayers from Natural Earth. And if you are R-user you can to it directly from R (vignette).

3.1 Import spatial data

Let assume us that we have some dataset about genetic variance collected in different locations. Your task is to map it to reveal potential spatial patterns. Dataset contains geographical coordinates, sample ID and variance (variable1) of some gene parameter (content is not important right now).

Import the data:

genes <- read.csv2("http://aasa.ut.ee/tbilisi/data/geneticVariance.csv")

glimpse(genes)
## Observations: 70
## Variables: 4
## $ sample.ID <int> ...
## $ variable1 <int> ...
## $ latitude  <fct> ...
## $ longitude <fct> ...

Everything is okay?
Look again! Latitude and longitude are stored as factors. Convert them to numeric:

genes <- genes %>% 
  mutate(longitude = as.numeric(as.character(longitude)), 
         latitude = as.numeric(as.character(latitude)))

glimpse(genes)
## Observations: 70
## Variables: 4
## $ sample.ID <int> ...
## $ variable1 <int> ...
## $ latitude  <dbl> ...
## $ longitude <dbl> ...

Visualize it:

ggplot()+
  geom_point(data = genes, aes(x=longitude, y=latitude, colour=variable1))

The number of dots on plot is 12, but the number of rows in dataset is 70. Why there is such a big difference?
It is because in every location several samples were collected. Can we avoid this overlapping and bring out all the samples??
As usual there are several options:

  • jitter the points randomly
  • calculate average (or median) value for every location

3.2 Jitter the points

Principles of this operation are well described in here (link). But we are not doing it just on plot but jitter the locations in data table.

3.2.1 Convert data table to spatial (sf) object

In GIS the most common data format is probably SHP. It works quite well, but has several disadvantages (one layer consists of several files [geometry, attributes, reference system etc]). Today on raising standard for the spatial objects is simple features (sf)

For point data this conversion is very simple:

glimpse(genes)
## Observations: 70
## Variables: 4
## $ sample.ID <int> ...
## $ variable1 <int> ...
## $ latitude  <dbl> ...
## $ longitude <dbl> ...
genes_sf <- st_as_sf(genes, coords = c("longitude", "latitude"), crs = 4326)

class(genes_sf)
## [1] "sf"        
## [2] "data.frame"

And now we can jitter the points:

genes_sf_jitt <- st_jitter(genes_sf, factor = 0.02) # read from Help, how the 'factor' parameter works!

ggplot()+
  geom_sf(data = genes_sf_jitt, aes(colour = variable1))+
  scale_colour_viridis_c()

Better! But still two major problems: jittering doesn’t clarify the picture (maybe we should try calculate average for every location) and secondly altough we can see the coordinate values on axes, it is not enough. For better spatial context we need a proper base map!

First we calculate average values for every location:

glimpse(genes)
## Observations: 70
## Variables: 4
## $ sample.ID <int> ...
## $ variable1 <int> ...
## $ latitude  <dbl> ...
## $ longitude <dbl> ...
genes_aggr <- genes %>% 
  group_by(longitude, latitude) %>% 
  summarise(var1_mean = mean(variable1)) %>% 
  ungroup()

# see the result:
genes_aggr %>% kable()
longitude latitude var1_mean
139.6730 -5.478421 3959071
142.9462 -5.851010 3633741
143.5166 -4.186353 4430432
143.6572 -6.147770 3913224
144.9667 -6.016660 3067700
145.1788 -5.673380 3318188
145.5927 -6.542736 3404230
145.8885 -6.974360 3936487
146.9323 -6.682114 1610366
147.7822 -9.994000 3262092
151.9846 -4.639850 2663686
152.1350 -4.215275 2069085
The result i s just a pla in table. We have to convert it to sf-object:
genes_aggr_sf <- st_as_sf(genes_aggr, coords = c("longitude", "latitude"), crs = 4326) # 4326 refers to WGS84!

3.3 Base map for spatial context

3.3.1 shp-layers as base maps

There are many different geolayers available at different scale level. Currently we download just country polygons:

countries50 <- ne_download(scale = 50, type = 'countries', category = 'cultural', returnclass = "sf")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Administrator\AppData\Local\Temp\RtmpuK8tQT", layer: "ne_50m_admin_0_countries"
## with 241 features
## It has 94 fields
## Integer64 fields read as strings:  POP_EST NE_ID

Plot the base map and sample points

ggplot()+
  geom_sf(data = countries50, fill = "white", colour = "grey", size=0.2)+
  geom_sf(data=genes_aggr_sf, colour = "red")

Experienced geographer can say that gene samples are collected from Papua New Guinea and Indonesia. Select those countries from base map layer:

countries50_ii <- countries50 %>% 
  filter(ADMIN == "Papua New Guinea" | ADMIN == "Indonesia") # '|' means 'OR'!!

#plot:
ggplot()+
  geom_sf(data = countries50_ii, fill = "white", colour = "grey", size=0.2)+
  geom_sf(data=genes_aggr_sf, colour = "red")

Much better! But we should trim the base map more (crop the empty western part):

# Crop by coordinates:
countries50_iii <- st_crop(countries50_ii, c(xmin = 138, xmax = 155, ymin = -12, ymax = 0))

#plot:
ggplot()+
  geom_sf(data = countries50_iii, fill = "white", colour = "grey", size=0.2)+
  geom_sf(data = genes_aggr_sf, colour = "red")

This is already pretty good. And now the spatial pattern:

ggplot()+
  geom_sf(data = countries50_iii, fill = "white", colour = "grey", size=0.25)+
  geom_sf(data=genes_aggr_sf, aes(colour = var1_mean), size=3)+
  scale_colour_viridis_c()

3.3.2 3rd-party base maps

When you want to add good spatial context to your thematic map, the best option is usually to create my your own. There are many good data sources available ( Land Board Estonia, OSM, Natural Earth etc).
The only problem is, that it may take a lot of time to create good, relevant and informative map. For quicker solution we can use base maps from other service providers (Google, OSM, Stamne etc).
Next we can try possibilities offered by OpenStreetMap. For that you have to install Java JDK in your computer (currently it’s Windows 64bit, link) and in R you must install package OpenStreetMap.
To download base map from OSM first you have to calculate bounding box to define the extent of your map:

# calculate bounding box of map extent:
gene_base_bbox <- st_bbox(countries50_iii)

# bounding box coordinates:
gene_base_bbox
##       xmin       ymin       xmax       ymax 
## 138.000000 -11.630566 155.000000  -1.353223
# download base map:
library(OpenStreetMap)
#logic is simple: openmap(c(upper latitude, left longitude), (lower latitude, right longitude))
# you can take the coordinates directly from bounding box: 
gene_base_map <- openmap(c(gene_base_bbox[4], gene_base_bbox[1]), c(gene_base_bbox[2], gene_base_bbox[3]), type = "nps")

# plot it:
autoplot(gene_base_map)

# Look from help what are the other available base map types. Can you use them?

In case you don’t have Java JDK installed you can use another option: ggmap. Syntax is slightly different, but the principles are the same:

# start library:
library(ggmap)

# calculate & define bounding box: 
gene_bbox <- st_bbox(gene_base_bbox) %>% as.numeric()

# it can be defined as an independednt object:
bbox <- c(left = gene_bbox[1], 
          bottom = gene_bbox[2], 
          right = gene_bbox[3],
          top = gene_bbox[4])

# download the map:
gene_stamen_map <- get_stamenmap(bbox, maptype = "terrain-background", zoom = 7)

#plot it:
ggmap(gene_stamen_map, extent = "device")

3.3.2.1 Mapping with ggmap (source = stamen map)

genes_aggr_crd <- genes_aggr_sf %>% 
  st_coordinates() %>% # extract coordinates
  as_tibble() # store them as table

# extract attributes from sf-object:
genes_aggr_df <- genes_aggr_sf %>% 
  st_drop_geometry()

# bind columns together:
genes_aggr_df <- bind_cols(genes_aggr_df, genes_aggr_crd)

ggmap(gene_stamen_map, extent = "device")+
  theme_minimal()+
  theme(axis.title = element_blank(),
        axis.text = element_blank())+
  geom_point(data = genes_aggr_df, aes(x = X, y = Y, fill = var1_mean), size=3, shape = 22, colour = "grey30")+ # you have to define coordinates columns as well!
  scale_fill_gradientn(colours = c("navyblue", "grey", "red"))+ # define your own palette
  labs(title = "Gene map", 
       caption = "A. Aasa",
       fill = "value")

3.4 Bubble map

A bubble map uses circles of different size to represent a numeric value on a territory. It displays one bubble per geographic coordinate, or one bubble per region (in this case the bubble is usually displayed in the baricentre of the region).

Bubble maps are used with two types of dataset:

  • a list of geographic coordinates (longitude and latitude) and a numeric variable controling the size of the bubble. In the previous example, the number of tweet at each unique pair of coordinate was used.

  • a list of regions with attributed values and knwon boundary. In this case, the bubble map will replace the usual choropleth map. Note that it allows to avoid the bias caused by different regional area in choropleth maps. (big regions tend to have more weight during the observation).link

3.4.1 Coronavirus world map

The new virus has plunged the world into chaos. Health care system is under the pressure. Human mobility has changed dramatically.
Therefore it is very important to understand how to measure the influences, how to fight with it and how to manage the process of economic recovery. Today we live in information society, we have access to very many different data sources. Wise decision making is based on knowledge. Knowledge is based on analysis and for the analysis we need the data. In first step usually the situation is described. Today we have access many different datasets wich are describing the virus outbreak. But as you know having the data and analysing the data is not the same. For the analysis we need tools. R is very powerful and there are many R-based Coronavirus dashboard available:

Let’s try to download and analyse the latest Coronavirus cases data. I choosed 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE link.

covid_cases <- read_csv("https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")

glimpse(covid_cases)
## Observations: 264
## Variables: 94
## $ `Province/State` <chr> ...
## $ `Country/Region` <chr> ...
## $ Lat              <dbl> ...
## $ Long             <dbl> ...
## $ `1/22/20`        <dbl> ...
## $ `1/23/20`        <dbl> ...
## $ `1/24/20`        <dbl> ...
## $ `1/25/20`        <dbl> ...
## $ `1/26/20`        <dbl> ...
## $ `1/27/20`        <dbl> ...
## $ `1/28/20`        <dbl> ...
## $ `1/29/20`        <dbl> ...
## $ `1/30/20`        <dbl> ...
## $ `1/31/20`        <dbl> ...
## $ `2/1/20`         <dbl> ...
## $ `2/2/20`         <dbl> ...
## $ `2/3/20`         <dbl> ...
## $ `2/4/20`         <dbl> ...
## $ `2/5/20`         <dbl> ...
## $ `2/6/20`         <dbl> ...
## $ `2/7/20`         <dbl> ...
## $ `2/8/20`         <dbl> ...
## $ `2/9/20`         <dbl> ...
## $ `2/10/20`        <dbl> ...
## $ `2/11/20`        <dbl> ...
## $ `2/12/20`        <dbl> ...
## $ `2/13/20`        <dbl> ...
## $ `2/14/20`        <dbl> ...
## $ `2/15/20`        <dbl> ...
## $ `2/16/20`        <dbl> ...
## $ `2/17/20`        <dbl> ...
## $ `2/18/20`        <dbl> ...
## $ `2/19/20`        <dbl> ...
## $ `2/20/20`        <dbl> ...
## $ `2/21/20`        <dbl> ...
## $ `2/22/20`        <dbl> ...
## $ `2/23/20`        <dbl> ...
## $ `2/24/20`        <dbl> ...
## $ `2/25/20`        <dbl> ...
## $ `2/26/20`        <dbl> ...
## $ `2/27/20`        <dbl> ...
## $ `2/28/20`        <dbl> ...
## $ `2/29/20`        <dbl> ...
## $ `3/1/20`         <dbl> ...
## $ `3/2/20`         <dbl> ...
## $ `3/3/20`         <dbl> ...
## $ `3/4/20`         <dbl> ...
## $ `3/5/20`         <dbl> ...
## $ `3/6/20`         <dbl> ...
## $ `3/7/20`         <dbl> ...
## $ `3/8/20`         <dbl> ...
## $ `3/9/20`         <dbl> ...
## $ `3/10/20`        <dbl> ...
## $ `3/11/20`        <dbl> ...
## $ `3/12/20`        <dbl> ...
## $ `3/13/20`        <dbl> ...
## $ `3/14/20`        <dbl> ...
## $ `3/15/20`        <dbl> ...
## $ `3/16/20`        <dbl> ...
## $ `3/17/20`        <dbl> ...
## $ `3/18/20`        <dbl> ...
## $ `3/19/20`        <dbl> ...
## $ `3/20/20`        <dbl> ...
## $ `3/21/20`        <dbl> ...
## $ `3/22/20`        <dbl> ...
## $ `3/23/20`        <dbl> ...
## $ `3/24/20`        <dbl> ...
## $ `3/25/20`        <dbl> ...
## $ `3/26/20`        <dbl> ...
## $ `3/27/20`        <dbl> ...
## $ `3/28/20`        <dbl> ...
## $ `3/29/20`        <dbl> ...
## $ `3/30/20`        <dbl> ...
## $ `3/31/20`        <dbl> ...
## $ `4/1/20`         <dbl> ...
## $ `4/2/20`         <dbl> ...
## $ `4/3/20`         <dbl> ...
## $ `4/4/20`         <dbl> ...
## $ `4/5/20`         <dbl> ...
## $ `4/6/20`         <dbl> ...
## $ `4/7/20`         <dbl> ...
## $ `4/8/20`         <dbl> ...
## $ `4/9/20`         <dbl> ...
## $ `4/10/20`        <dbl> ...
## $ `4/11/20`        <dbl> ...
## $ `4/12/20`        <dbl> ...
## $ `4/13/20`        <dbl> ...
## $ `4/14/20`        <dbl> ...
## $ `4/15/20`        <dbl> ...
## $ `4/16/20`        <dbl> ...
## $ `4/17/20`        <dbl> ...
## $ `4/18/20`        <dbl> ...
## $ `4/19/20`        <dbl> ...
## $ `4/20/20`        <dbl> ...

Imported dataset is in wide format with quite many dates. Convert it to long format. Then rename columns for faster processing. In next step merge Country_Region and Province_State, then clenn the results and convert it to sf object:

# convert to long format:
cases <- gather(covid_cases, "key", "value", 5:ncol(covid_cases))

# rename & clean:
cases <- cases %>% 
  rename(Country_Region = `Country/Region`,
         Province_State = `Province/State`) %>% 
  mutate(date = mdy(key),
         place = paste0(Country_Region, ", ", Province_State))

cases <- cases %>% 
  mutate(place = str_replace(place, ", NA", ""))

# create sf:
cases_sf <- st_as_sf(cases, coords = c("Long", "Lat"), crs = 4326)

# check the result:
glimpse(cases_sf)
## Observations: 23,760
## Variables: 7
## $ Province_State <chr> ...
## $ Country_Region <chr> ...
## $ key            <chr> ...
## $ value          <dbl> ...
## $ date           <date> ...
## $ place          <chr> ...
## $ geometry       <POINT [arc_degree]> ...

Compare the latest (-1 day) situation with situation 7 days ago

cases_sf_latest <- cases_sf %>% 
  filter(date == max(date)-1 | date == max(date)-1 - days(7)) 

You understood everything in previous script? ‘|’ is for ‘or’.

Draw the map of COVID-19 cases for 2 dates:

ggplot()+
  geom_sf(data = countries50, fill = "white", colour = "grey", size = 0.2)+
  geom_sf(data = cases_sf_latest, aes(size = value, alpha = value, colour = factor(date)))+
   facet_wrap(~date, ncol = 1)

As you noticed facet_wrap makes two maps for both of the dates.

Experienced eye may have noticed from previous map that there are points with coordinates latitude = 0 & longitude = 0. Most probably it’s some error and we should get rid of them. For that we have to extrat coordinates from sf object and filter 00-coordinates out.

# extract coordinates:
cases_crd_latest <- st_coordinates(cases_sf_latest) %>% 
  as_tibble()
  
# bind coordinates to sf-table:
cases_sf_latest_2 <- bind_cols(cases_sf_latest, cases_crd_latest)

# filter 00-coordinates out:
cases_sf_latest_2 <- cases_sf_latest_2 %>% 
  filter(X != 0) %>% 
  filter(Y != 0)

Plot the map:

ggplot()+
  geom_sf(data = countries50, fill = "white", colour = "grey", size=0.2)+
  geom_sf(data = cases_sf_latest_2, aes(size = value, colour = value), alpha = 0.7)+
  facet_wrap(~date, ncol = 1)+
  #scale_alpha(range = c(0.1, 0.5))+
  scale_colour_gradientn(colours = c("darkgreen", "purple"), trans = "log10")

How to visualize change during the last 7 days?

  • Select relevant columns,
  • Place them next to each other
  • calculate difference
  • plot the results:
# select relevant colunme and spread them to wide format:
cases_sf_latest_3 <- cases_sf_latest_2 %>% 
  select(place, X, Y, value, date) %>% 
  st_drop_geometry() %>% 
  spread(date, value)

# extract column names where we have dates:
dates <- colnames(cases_sf_latest_3)[4:5]

# rename columns (this is needed because of the tutorial script: dates are changing and exact dates can't be used):
colnames(cases_sf_latest_3)[4:5] <- c("a", "b")

# Replace cells without values ('NA') with zero:
cases_sf_latest_3$a[is.na(cases_sf_latest_3$a)] <- 0
cases_sf_latest_3$b[is.na(cases_sf_latest_3$b)] <- 0

# calculate change:
cases_sf_latest_3 <- cases_sf_latest_3 %>%
  mutate(change = b - a)

Draw the polished map:

ggplot()+
  theme_tufte()+
  theme(axis.title = element_blank(),
        axis.ticks = element_blank())+
  geom_sf(data = countries50, colour = "grey", fill= "grey90", size=0.2)+
  geom_point(data = cases_sf_latest_3, aes(x = X,
                                               y = Y,
                                               size = change,
                                              colour = change, 
                                           alpha= change))+
  scale_alpha(range = c(0.3, 0.7))+
  scale_colour_gradientn(trans = "log10", colours= c("yellow", "orange", "red", "red4", "black"))+
  scale_size_continuous(trans = "log10", range = c(1, 5))+
  labs(size= "change", alpha = "change", colour = "change",
       title = "Change in COVID-19 cases",
       subtitle = paste0("period: ", dates[1], "...", dates[2]))+
  guides(alpha = F,
         size= F)

Look carefully previous code chunk and try to understand the function of every parameter!

Sometimes map is not enough. We just have to many points. Solution is aggegation of data or using filtering to focus on interesting segments of the data.

Filter 10 places where the change is biggest:

cases_sf_latest_3_change_top10 <- cases_sf_latest_3 %>% 
  arrange(-change) %>% 
  head(n = 10)

Plot it:

ggplot()+
  geom_sf(data = countries50, fill = "white", colour = "grey", size=0.2)+
  geom_point(data = cases_sf_latest_3_change_top10, aes(x = X,
                                               y = Y,
                                               size = change,
                                              alpha = change), colour = "red")+
  #scale_colour_gradientn(colours= c("darkgreen", "purple"))+
  scale_size_continuous(trans = "log", range = c(1, 5))+
  scale_alpha(range = c(0.4, 0.8))+
  labs(size= "difference", alpha = "difference", colour = "difference")

Draw plot:

ggplot()+
  theme_minimal()+
  theme(panel.background = element_rect(fill = "slategray1"))+
  geom_segment(data = cases_sf_latest_3_change_top10, aes(x= 0, 
                                                              xend = change, 
                                                              y = reorder(place, change), 
                                                              yend = reorder(place, change)),
               colour = "black")+
  geom_point(data = cases_sf_latest_3_change_top10, aes(x = change, y = reorder(place, change)),
             shape = 21, colour = "white", fill = "firebrick", size= 3)+
  labs(title = "Change in Coronavirus cases during the last week", 
       x = "Change",
       y = "Country")

Which variant is more informative?

This was an example how to put spatial points on base map and how to visualize spatial patterns with points. Next, we look example of line maps.

3.5 Line maps

For that we use data about flights of Estonian national airline Nordica. Dataset covers all the flights operated by Nordica from December 2, 2018 to December 3, 2018. And our aim is to create typical destinations map:

Read the data and check file structure:

flights <- read.csv2("http://aasa.ut.ee/Rspatial/data/nordicaFlights.csv", encoding = "LATIN-1")

glimpse(flights) 
## Observations: 94
## Variables: 8
## $ date      <fct> ...
## $ departure <fct> ...
## $ arrival   <fct> ...
## $ from      <fct> ...
## $ to        <fct> ...
## $ fligjt    <fct> ...
## $ airplane  <fct> ...
## $ status    <fct> ...
flights %>% 
  head() %>% 
  kable()
date departure arrival from to fligjt airplane status
02.12.2018 7:00 8:25 Tallinn (TLL) München (MUC) LO8165 CR7 i Graafikujärgne.
02.12.2018 7:00 8:20 Tallinn (TLL) Viin (VIE) LO8193 CR9 i Graafikujärgne.
02.12.2018 7:20 7:20 Tallinn (TLL) Stockholm (ARN) LO8121 CR9 i Graafikujärgne.
02.12.2018 9:10 12:30 München (MUC) Tallinn (TLL) LO8166 CR7 i Graafikujärgne.
02.12.2018 9:20 9:55 Tallinn (TLL) Varssavi (WAW) LO790 CR9 i Graafikujärgne.
02.12.2018 9:25 12:35 Viin (VIE) Tallinn (TLL) LO8194 CR9 i Graafikujärgne.

All the data is nicely imported. All columns are stored as factors. But this is currently not important. Question is how can we add to the table the geographical coordinates of origin and destination?
As you probably know every airport has unique code (IATA code). Currently those codes are not in separate column. With help of tidyr and stringr package we can separate column with airport name and code to two columns:

# separate airport name and airport code:
flights <- flights %>% 
  separate(from, c("from_name" ,"from_code"), ' \\(')

flights <- flights %>% 
  separate(to, c("to_name" ,"to_code"), '\\(')

#remove ')' from string:
flights <- flights %>% 
  mutate(from_code = str_replace(from_code, "\\)", ""))

flights <- flights %>% 
  mutate(to_code = str_replace(to_code, "\\)", ""))

flights %>% 
  head() %>% 
  kable()
date departure arrival from_name from_code to_name to_code fligjt airplane status
02.12.2018 7:00 8:25 Tallinn TLL München MUC LO8165 CR7 i Graafikujärgne.
02.12.2018 7:00 8:20 Tallinn TLL Viin VIE LO8193 CR9 i Graafikujärgne.
02.12.2018 7:20 7:20 Tallinn TLL Stockholm ARN LO8121 CR9 i Graafikujärgne.
02.12.2018 9:10 12:30 München MUC Tallinn TLL LO8166 CR7 i Graafikujärgne.
02.12.2018 9:20 9:55 Tallinn TLL Varssavi WAW LO790 CR9 i Graafikujärgne.
02.12.2018 9:25 12:35 Viin VIE Tallinn TLL LO8194 CR9 i Graafikujärgne.

Airport codes are now nicely in separate columns. Next step is to find additional dataset where we have next to the airport codes also the geographical coordinates of airports.

airports <- read.csv("https://datahub.io/core/airport-codes/r/airport-codes.csv")

airports2 <- airports %>% 
  filter(iata_code != "") 

airports2 %>% 
  head() %>% 
  kable()
ident type name elevation_ft continent iso_country iso_region municipality gps_code iata_code local_code coordinates
03N small_airport Utirik Airport 4 OC MH MH-UTI Utirik Island K03N UTK 03N 169.852005, 11.222
07FA small_airport Ocean Reef Club Airport 8 NA US US-FL Key Largo 07FA OCA 07FA -80.274803161621, 25.325399398804
0AK small_airport Pilot Station Airport 305 NA US US-AK Pilot Station PQS 0AK -162.899994, 61.934601
0CO2 small_airport Crested Butte Airpark 8980 NA US US-CO Crested Butte 0CO2 CSE 0CO2 -106.928341, 38.851918
0TE7 small_airport LBJ Ranch Airport 1515 NA US US-TX Johnson City 0TE7 JCY 0TE7 -98.62249755859999, 30.251800537100003
13MA small_airport Metropolitan Airport 418 NA US US-MA Palmer 13MA PMX 13MA -72.31140136719999, 42.223300933800004
# Is airport of Tallinn (TLL) available: 
airports2 %>% 
  filter(iata_code == "TLL") %>% 
  kable()
ident type name elevation_ft continent iso_country iso_region municipality gps_code iata_code local_code coordinates
EETN large_airport Lennart Meri Tallinn Airport 131 EU EE EE-37 Tallinn EETN TLL 24.832799911499997, 59.41329956049999

As the airport coordinates are stored in one column we have to use separate function once again:

airp_crd <- airports2 %>% 
  separate(coordinates, c("lng" ,"lat"), ", ")

# convert both columns to numeric and select relevant columns:
airp_crd <- airp_crd %>% 
  mutate(lng = as.numeric(lng), lat = as.numeric(lat)) %>% 
  select(iata_code, lng, lat, municipality)

glimpse(airp_crd)
## Observations: 9,235
## Variables: 4
## $ iata_code    <fct> ...
## $ lng          <dbl> ...
## $ lat          <dbl> ...
## $ municipality <fct> ...

Next we have to join tables (see Practical session 1)

# join flights and airport table by origin:
flights2 <- left_join(flights, airp_crd, by = c("from_code" = "iata_code"))

#rename coordinates columns:
flights2 <- flights2 %>% 
  rename(from_x = lng, from_y = lat, from_mun = municipality)

#join flights and airport table by destination:
flights2 <- left_join(flights2, airp_crd, by = c("to_code" = "iata_code"))

#rename coordinates columns:
flights2 <- flights2 %>% 
  rename(to_x = lng, to_y = lat, to_mun = municipality)

glimpse(flights2)
## Observations: 104
## Variables: 16
## $ date      <fct> ...
## $ departure <fct> ...
## $ arrival   <fct> ...
## $ from_name <chr> ...
## $ from_code <chr> ...
## $ to_name   <chr> ...
## $ to_code   <chr> ...
## $ fligjt    <fct> ...
## $ airplane  <fct> ...
## $ status    <fct> ...
## $ from_x    <dbl> ...
## $ from_y    <dbl> ...
## $ from_mun  <fct> ...
## $ to_x      <dbl> ...
## $ to_y      <dbl> ...
## $ to_mun    <fct> ...

Plot the flight on base map:

ggplot()+
  geom_sf(data = countries50, fill="gray80", col = "gray40", size=0.5) + 
  geom_segment(data = flights2, aes(x=from_x, xend= to_x, y=from_y, yend = to_y), col = "red")

Instead of global map we can focues on Europe. For that we can use st_crop() function or we can filter out only European countries:

europe <- countries50 %>% filter(REGION_UN == "Europe")

# plot:
ggplot()+
  geom_sf(data = europe)+
  geom_segment(data = flights2, aes(x=from_x, xend= to_x, y=from_y, yend = to_y), col = "red")

Base map needs some additional croping

europe2 <- st_crop(europe, c(xmin=-10, xmax=41, ymin=45, ymax= 70))
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
# plot:
ggplot()+
  theme_minimal()+
  geom_sf(data = europe2, fill = "grey96", colour = "grey60", size = 0.25)+
  geom_segment(data = flights2, aes(x = from_x, xend = to_x, y = from_y, yend = to_y), col = "red")

Use curves instead of straight lines:

ggplot()+
  theme_minimal()+
  geom_sf(data = europe2, fill = "grey96", colour = "grey60", size = 0.25)+
  geom_curve(data = flights2, aes(x = from_x, xend = to_x, y = from_y, yend = to_y), col = "red", curvature = 0.2, arrow = arrow(length = unit(0.03, "npc")))

4 Save visualization (png, pdf)

# png:

png("flights.png", res=300, units = "cm", height = 15, width = 15)

ggplot()+
  theme_minimal()+
  geom_sf(data = europe2, fill = "grey96", colour = "grey60", size = 0.25)+
  geom_curve(data = flights2, aes(x = from_x, xend = to_x, y = from_y, yend = to_y), col = "red", curvature = 0.2, arrow = arrow(length = unit(0.03, "npc")))

dev.off() # It is crucial to close the writing session!



# pdf:
pdf("flights.pdf", height = 15, width = 15)

ggplot()+
  theme_minimal()+
  geom_sf(data = europe2, fill = "grey96", colour = "grey60", size = 0.25)+
  geom_curve(data = flights2, aes(x = from_x, xend = to_x, y = from_y, yend = to_y), col = "red", curvature = 0.2, arrow = arrow(length = unit(0.03, "npc")))

dev.off() # It is crucial to close the writing session!

The last version looks already like a thematic map. But stil there are some sortcomings:

  • map scale and north arrow is missing;
  • map title is missing.

We can try to fix it. Add the north arrow and map scale:

Plot it on top stamen map! In case of ggmap you have to use geom_segment instead of geom_curve!

5 Homework number 3

In this session your task is to analyse the COVID-19 data. You have to analyse how many people are recovered from COVID-19 data download. Please create:

  1. world map where you visualise how many people are recovered from the COVID-19
  2. plot of top10 countries.
  3. Add also the map population density in districts to your homeworks!

Bind all the home works into one doc/pdf/html. Add script-files separately (in case of R-markdowm script is already in html file). Please add your name to each file name! E-mail results to: . Final deadline for all submissions is May 15th, 2020.

Possible results:

Author: Anto Aasa
in ISET, Tbilisi
Last update: 2020-04-22 08:57:55

.