1 Introduction

Start libraries:

library(sf)
library(tidyverse)
library(rnaturalearth)
library(ggthemes)
library(knitr)
library(units)
library(lubridate)
library(ggspatial)
library(ggmap)

options(scipen = 999, width = 10)

2 Calculate polygon area

In real life the spatial distribution of any phenomena is usually not regular. Irregular is also the distribution of administrative units. This means, if we want to compare different municipalities with each other, absolute values may not be the best option (at least we are then unable to describe the full picture). Next to the absolute values very often density is used. This means we present value per spatial unit (e.g. km2). Let us try to calculate area of districts and then calculate in which district is the highest and smallest population density in Georgia.

# calculate area for each district:
ge_adm_2_pop$area <- st_area(ge_adm_2_pop)

# Show areas of first 5 municipalities:
ge_adm_2_pop$area[1:5]

## Units: [m^2]
## [1]  313647912
## [2]  793499298
## [3] 1236316014
## [4]  970579105
## [5]   12850125

We see that, polygon areas are calculated in same units as the reference system of geolayer (currently meters) and the variable type is Units. Usually the density is given square kilometers. We can convert units from square meter to square kilometer and store it as numeric:

# st units to km2
ge_adm_2_pop$area <- set_units(ge_adm_2_pop$area, km^2)

#Check again:
ge_adm_2_pop$area[1:5]

## Units: [km^2]
## [1]  313.64791
## [2]  793.49930
## [3] 1236.31601
## [4]  970.57911
## [5]   12.85013

# sometimes its needed, that area is numeric:
ge_adm_2_pop$area <- as.numeric(ge_adm_2_pop$area)

Where is the biggest / smallest district (by area) of Georgia?

# filter biggest and smallest
ge_adm_2_extreme <- ge_adm_2_pop %>% filter(area == min(area) | area == max(area)) #"|" corresponds to 'OR'

Create the map:

ggplot()+
  theme_minimal()+
  geom_sf(data = ge_adm_1, fill = "grey70", col = "grey80", size= 0.2)+
  geom_sf(data = ge_adm_2_pop, fill="grey96", colour = "grey80", size = 0.2)+
  geom_sf(data = ge_adm_2_extreme, aes(fill = as.factor(area), colour = as.factor(area)), size=1)+
  geom_sf_text(data = ge_adm_2_extreme, aes(label = ADM_EN))+
  guides(fill=F, colour = F)+
  scale_fill_manual(values =c("red", "dodgerblue"))+
  scale_colour_manual(values =c("red", "dodgerblue"))

What is the area of biggest municipality? Smallest? What is average area of municipalities?

2.1 Spatial density

Next step is to calculate the spatial density of population in districts:

# it can be done in several ways
# first you can calculate the density in script of the ggplot:
ggplot()+
  theme_minimal()+
  geom_sf(data = ge_adm_0, fill = "grey70", colour = "grey60", size= 0.2)+
  geom_sf(data = ge_adm_2_pop, aes(fill = POP_Total / area), colour = "grey60", size = 0.2)+
  scale_fill_viridis_c(na.value = "red")

# Or you can calculate the density as separate column and after that draw the map:
ge_adm_2_pop <- ge_adm_2_pop %>% 
  mutate(pop_dens = POP_Total / area)

ggplot()+
  theme_minimal()+
  geom_sf(data = ge_adm_0, fill = "grey70", colour = "grey60", size= 0.2)+
  geom_sf(data = ge_adm_2_pop, aes(fill = pop_dens), colour = "grey60", size=0.25)+
  scale_fill_viridis_c(na.value = "red")+
  labs(fill = "population\ndensity\nkm2")

In which municipality is the population density highest? Where is it smallest?

# geom column slows down the query if we want to get quick overview. 
#Copy the layer and delete the geometry column:
temp0 <- ge_adm_2_pop %>% 
  st_drop_geometry()

# district with highest density of population:
temp0 %>% 
  filter(pop_dens == max(pop_dens)) %>% # filter max value
  kable() # format the output

ADM_PCODE	ADM_EN	POP_Total	area	pop_dens
GE4711	Gori City	48143	15.96035	3016.413

# 10 districts with lowest population density:
temp0 %>% 
  arrange(pop_dens) %>% 
  head(10) %>% 
  kable()

ADM_PCODE	ADM_EN	POP_Total	area	pop_dens
GE3830	Mestia	9316	3068.4855	3.036026
GE3526	Lentekhi	4386	1442.8955	3.039721
GE3231	Kazbegi	3795	1067.2891	3.555738
GE3529	Oni	6130	1363.0849	4.497152
GE3523	Ambrolauri	9139	1121.3702	8.149851
GE2928	Dedoplistskaro	21221	2520.8410	8.418222
GE3225	Dusheti	25659	3016.2177	8.507012
GE3227	Tianeti	9468	939.0467	10.082566
GE4125	Aspindza	10372	852.5888	12.165302
GE3531	Tsageri	10387	747.7607	13.890808

# highest density in top10 municipalities:
temp0 %>% 
  arrange(-pop_dens) %>% 
  head(10) %>% 
  kable()

ADM_PCODE	ADM_EN	POP_Total	area	pop_dens
GE4711	Gori City	48143	15.96035	3016.4129
GE2611	Kutaisi	147635	56.92257	2593.6111
GE1511	Batumi	152839	65.11924	2347.0635
GE11	Tbilisi	1171100	502.51710	2330.4679
GE3811	Zugdidi City	42998	21.42328	2007.0693
GE2911	Telavi City	19629	12.00048	1635.6848
GE4411	Rustavi	125103	79.29119	1577.7668
GE4111	Akhaltsikhe City	17903	12.85013	1393.2160
GE3814	Poti	41465	41.29668	1004.0759
GE2311	Ozurgeti City	14785	22.41720	659.5381

But wait! Look again the map of population density!
It’s ugly! We can do better! Go back to the first practical session and remind how to use classInt package to change class breaks. Do it and add it to your collection of homework’s.
Like this?

3 Base map

Sometimes we are facing not only with the fact that your thematic layer doesn’t fit well with base map, but even bigger problem is that you don’t have a proper base map. One nice possibility is to use geolayers from Natural Earth. And if you are R-user you can to it directly from R (vignette).

3.1 Import spatial data

Let assume us that we have some dataset about genetic variance collected in different locations. Your task is to map it to reveal potential spatial patterns. Dataset contains geographical coordinates, sample ID and variance (variable1) of some gene parameter (content is not important right now).

Import the data:

genes <- read.csv2("http://aasa.ut.ee/tbilisi/data/geneticVariance.csv")

glimpse(genes)

## Observations: 70
## Variables: 4
## $ sample.ID <int> ...
## $ variable1 <int> ...
## $ latitude  <fct> ...
## $ longitude <fct> ...

Everything is okay?
Look again! Latitude and longitude are stored as factors. Convert them to numeric:

genes <- genes %>% 
  mutate(longitude = as.numeric(as.character(longitude)), 
         latitude = as.numeric(as.character(latitude)))

glimpse(genes)

## Observations: 70
## Variables: 4
## $ sample.ID <int> ...
## $ variable1 <int> ...
## $ latitude  <dbl> ...
## $ longitude <dbl> ...

Visualize it:

ggplot()+
  geom_point(data = genes, aes(x=longitude, y=latitude, colour=variable1))

The number of dots on plot is 12, but the number of rows in dataset is 70. Why there is such a big difference?
It is because in every location several samples were collected. Can we avoid this overlapping and bring out all the samples??
As usual there are several options:

jitter the points randomly
calculate average (or median) value for every location

3.2 Jitter the points

Principles of this operation are well described in here (link). But we are not doing it just on plot but jitter the locations in data table.

3.2.1 Convert data table to spatial (sf) object

In GIS the most common data format is probably SHP. It works quite well, but has several disadvantages (one layer consists of several files [geometry, attributes, reference system etc]). Today on raising standard for the spatial objects is simple features (sf)

For point data this conversion is very simple:

glimpse(genes)

## Observations: 70
## Variables: 4
## $ sample.ID <int> ...
## $ variable1 <int> ...
## $ latitude  <dbl> ...
## $ longitude <dbl> ...

genes_sf <- st_as_sf(genes, coords = c("longitude", "latitude"), crs = 4326)

class(genes_sf)

## [1] "sf"        
## [2] "data.frame"

And now we can jitter the points:

genes_sf_jitt <- st_jitter(genes_sf, factor = 0.02) # read from Help, how the 'factor' parameter works!

ggplot()+
  geom_sf(data = genes_sf_jitt, aes(colour = variable1))+
  scale_colour_viridis_c()

Better! But still two major problems: jittering doesn’t clarify the picture (maybe we should try calculate average for every location) and secondly altough we can see the coordinate values on axes, it is not enough. For better spatial context we need a proper base map!

First we calculate average values for every location:

glimpse(genes)

## Observations: 70
## Variables: 4
## $ sample.ID <int> ...
## $ variable1 <int> ...
## $ latitude  <dbl> ...
## $ longitude <dbl> ...

genes_aggr <- genes %>% 
  group_by(longitude, latitude) %>% 
  summarise(var1_mean = mean(variable1)) %>% 
  ungroup()

# see the result:
genes_aggr %>% kable()

longitude	latitude	var1_mean
139.6730	-5.478421	3959071
142.9462	-5.851010	3633741
143.5166	-4.186353	4430432
143.6572	-6.147770	3913224
144.9667	-6.016660	3067700
145.1788	-5.673380	3318188
145.5927	-6.542736	3404230
145.8885	-6.974360	3936487
146.9323	-6.682114	1610366
147.7822	-9.994000	3262092
151.9846	-4.639850	2663686
152.1350	-4.215275	2069085
The result i	s just a pla	in table. We have to convert it to sf-object:

genes_aggr_sf <- st_as_sf(genes_aggr, coords = c("longitude", "latitude"), crs = 4326) # 4326 refers to WGS84!

3.3 Base map for spatial context

3.3.1 shp-layers as base maps

There are many different geolayers available at different scale level. Currently we download just country polygons:

countries50 <- ne_download(scale = 50, type = 'countries', category = 'cultural', returnclass = "sf")

## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Administrator\AppData\Local\Temp\RtmpuK8tQT", layer: "ne_50m_admin_0_countries"
## with 241 features
## It has 94 fields
## Integer64 fields read as strings:  POP_EST NE_ID

Plot the base map and sample points

ggplot()+
  geom_sf(data = countries50, fill = "white", colour = "grey", size=0.2)+
  geom_sf(data=genes_aggr_sf, colour = "red")

Experienced geographer can say that gene samples are collected from Papua New Guinea and Indonesia. Select those countries from base map layer:

countries50_ii <- countries50 %>% 
  filter(ADMIN == "Papua New Guinea" | ADMIN == "Indonesia") # '|' means 'OR'!!

#plot:
ggplot()+
  geom_sf(data = countries50_ii, fill = "white", colour = "grey", size=0.2)+
  geom_sf(data=genes_aggr_sf, colour = "red")

Much better! But we should trim the base map more (crop the empty western part):

# Crop by coordinates:
countries50_iii <- st_crop(countries50_ii, c(xmin = 138, xmax = 155, ymin = -12, ymax = 0))

#plot:
ggplot()+
  geom_sf(data = countries50_iii, fill = "white", colour = "grey", size=0.2)+
  geom_sf(data = genes_aggr_sf, colour = "red")

This is already pretty good. And now the spatial pattern:

ggplot()+
  geom_sf(data = countries50_iii, fill = "white", colour = "grey", size=0.25)+
  geom_sf(data=genes_aggr_sf, aes(colour = var1_mean), size=3)+
  scale_colour_viridis_c()

3.3.2 3rd-party base maps

When you want to add good spatial context to your thematic map, the best option is usually to create my your own. There are many good data sources available ( Land Board Estonia, OSM, Natural Earth etc).
The only problem is, that it may take a lot of time to create good, relevant and informative map. For quicker solution we can use base maps from other service providers (Google, OSM, Stamne etc).
Next we can try possibilities offered by OpenStreetMap. For that you have to install Java JDK in your computer (currently it’s Windows 64bit, link) and in R you must install package OpenStreetMap.
To download base map from OSM first you have to calculate bounding box to define the extent of your map:

# calculate bounding box of map extent:
gene_base_bbox <- st_bbox(countries50_iii)

# bounding box coordinates:
gene_base_bbox

##       xmin       ymin       xmax       ymax 
## 138.000000 -11.630566 155.000000  -1.353223

# download base map:
library(OpenStreetMap)
#logic is simple: openmap(c(upper latitude, left longitude), (lower latitude, right longitude))
# you can take the coordinates directly from bounding box: 
gene_base_map <- openmap(c(gene_base_bbox[4], gene_base_bbox[1]), c(gene_base_bbox[2], gene_base_bbox[3]), type = "nps")

# plot it:
autoplot(gene_base_map)

# Look from help what are the other available base map types. Can you use them?

In case you don’t have Java JDK installed you can use another option: ggmap. Syntax is slightly different, but the principles are the same:

# start library:
library(ggmap)

# calculate & define bounding box: 
gene_bbox <- st_bbox(gene_base_bbox) %>% as.numeric()

# it can be defined as an independednt object:
bbox <- c(left = gene_bbox[1], 
          bottom = gene_bbox[2], 
          right = gene_bbox[3],
          top = gene_bbox[4])

# download the map:
gene_stamen_map <- get_stamenmap(bbox, maptype = "terrain-background", zoom = 7)

#plot it:
ggmap(gene_stamen_map, extent = "device")

3.3.2.1 Mapping with ggmap (source = stamen map)

genes_aggr_crd <- genes_aggr_sf %>% 
  st_coordinates() %>% # extract coordinates
  as_tibble() # store them as table

# extract attributes from sf-object:
genes_aggr_df <- genes_aggr_sf %>% 
  st_drop_geometry()

# bind columns together:
genes_aggr_df <- bind_cols(genes_aggr_df, genes_aggr_crd)

ggmap(gene_stamen_map, extent = "device")+
  theme_minimal()+
  theme(axis.title = element_blank(),
        axis.text = element_blank())+
  geom_point(data = genes_aggr_df, aes(x = X, y = Y, fill = var1_mean), size=3, shape = 22, colour = "grey30")+ # you have to define coordinates columns as well!
  scale_fill_gradientn(colours = c("navyblue", "grey", "red"))+ # define your own palette
  labs(title = "Gene map", 
       caption = "A. Aasa",
       fill = "value")

3.4 Bubble map

A bubble map uses circles of different size to represent a numeric value on a territory. It displays one bubble per geographic coordinate, or one bubble per region (in this case the bubble is usually displayed in the baricentre of the region).

Bubble maps are used with two types of dataset:

a list of geographic coordinates (longitude and latitude) and a numeric variable controling the size of the bubble. In the previous example, the number of tweet at each unique pair of coordinate was used.
a list of regions with attributed values and knwon boundary. In this case, the bubble map will replace the usual choropleth map. Note that it allows to avoid the bias caused by different regional area in choropleth maps. (big regions tend to have more weight during the observation).link

3.4.1 Coronavirus world map

The new virus has plunged the world into chaos. Health care system is under the pressure. Human mobility has changed dramatically.
Therefore it is very important to understand how to measure the influences, how to fight with it and how to manage the process of economic recovery. Today we live in information society, we have access to very many different data sources. Wise decision making is based on knowledge. Knowledge is based on analysis and for the analysis we need the data. In first step usually the situation is described. Today we have access many different datasets wich are describing the virus outbreak. But as you know having the data and analysing the data is not the same. For the analysis we need tools. R is very powerful and there are many R-based Coronavirus dashboard available:

Let’s try to download and analyse the latest Coronavirus cases data. I choosed 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE link.

covid_cases <- read_csv("https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv")

glimpse(covid_cases)

## Observations: 264
## Variables: 94
## $ `Province/State` <chr> ...
## $ `Country/Region` <chr> ...
## $ Lat              <dbl> ...
## $ Long             <dbl> ...
## $ `1/22/20`        <dbl> ...
## $ `1/23/20`        <dbl> ...
## $ `1/24/20`        <dbl> ...
## $ `1/25/20`        <dbl> ...
## $ `1/26/20`        <dbl> ...
## $ `1/27/20`        <dbl> ...
## $ `1/28/20`        <dbl> ...
## $ `1/29/20`        <dbl> ...
## $ `1/30/20`        <dbl> ...
## $ `1/31/20`        <dbl> ...
## $ `2/1/20`         <dbl> ...
## $ `2/2/20`         <dbl> ...
## $ `2/3/20`         <dbl> ...
## $ `2/4/20`         <dbl> ...
## $ `2/5/20`         <dbl> ...
## $ `2/6/20`         <dbl> ...
## $ `2/7/20`         <dbl> ...
## $ `2/8/20`         <dbl> ...
## $ `2/9/20`         <dbl> ...
## $ `2/10/20`        <dbl> ...
## $ `2/11/20`        <dbl> ...
## $ `2/12/20`        <dbl> ...
## $ `2/13/20`        <dbl> ...
## $ `2/14/20`        <dbl> ...
## $ `2/15/20`        <dbl> ...
## $ `2/16/20`        <dbl> ...
## $ `2/17/20`        <dbl> ...
## $ `2/18/20`        <dbl> ...
## $ `2/19/20`        <dbl> ...
## $ `2/20/20`        <dbl> ...
## $ `2/21/20`        <dbl> ...
## $ `2/22/20`        <dbl> ...
## $ `2/23/20`        <dbl> ...
## $ `2/24/20`        <dbl> ...
## $ `2/25/20`        <dbl> ...
## $ `2/26/20`        <dbl> ...
## $ `2/27/20`        <dbl> ...
## $ `2/28/20`        <dbl> ...
## $ `2/29/20`        <dbl> ...
## $ `3/1/20`         <dbl> ...
## $ `3/2/20`         <dbl> ...
## $ `3/3/20`         <dbl> ...
## $ `3/4/20`         <dbl> ...
## $ `3/5/20`         <dbl> ...
## $ `3/6/20`         <dbl> ...
## $ `3/7/20`         <dbl> ...
## $ `3/8/20`         <dbl> ...
## $ `3/9/20`         <dbl> ...
## $ `3/10/20`        <dbl> ...
## $ `3/11/20`        <dbl> ...
## $ `3/12/20`        <dbl> ...
## $ `3/13/20`        <dbl> ...
## $ `3/14/20`        <dbl> ...
## $ `3/15/20`        <dbl> ...
## $ `3/16/20`        <dbl> ...
## $ `3/17/20`        <dbl> ...
## $ `3/18/20`        <dbl> ...
## $ `3/19/20`        <dbl> ...
## $ `3/20/20`        <dbl> ...
## $ `3/21/20`        <dbl> ...
## $ `3/22/20`        <dbl> ...
## $ `3/23/20`        <dbl> ...
## $ `3/24/20`        <dbl> ...
## $ `3/25/20`        <dbl> ...
## $ `3/26/20`        <dbl> ...
## $ `3/27/20`        <dbl> ...
## $ `3/28/20`        <dbl> ...
## $ `3/29/20`        <dbl> ...
## $ `3/30/20`        <dbl> ...
## $ `3/31/20`        <dbl> ...
## $ `4/1/20`         <dbl> ...
## $ `4/2/20`         <dbl> ...
## $ `4/3/20`         <dbl> ...
## $ `4/4/20`         <dbl> ...
## $ `4/5/20`         <dbl> ...
## $ `4/6/20`         <dbl> ...
## $ `4/7/20`         <dbl> ...
## $ `4/8/20`         <dbl> ...
## $ `4/9/20`         <dbl> ...
## $ `4/10/20`        <dbl> ...
## $ `4/11/20`        <dbl> ...
## $ `4/12/20`        <dbl> ...
## $ `4/13/20`        <dbl> ...
## $ `4/14/20`        <dbl> ...
## $ `4/15/20`        <dbl> ...
## $ `4/16/20`        <dbl> ...
## $ `4/17/20`        <dbl> ...
## $ `4/18/20`        <dbl> ...
## $ `4/19/20`        <dbl> ...
## $ `4/20/20`        <dbl> ...

Imported dataset is in wide format with quite many dates. Convert it to long format. Then rename columns for faster processing. In next step merge Country_Region and Province_State, then clenn the results and convert it to sf object:

# convert to long format:
cases <- gather(covid_cases, "key", "value", 5:ncol(covid_cases))

# rename & clean:
cases <- cases %>% 
  rename(Country_Region = `Country/Region`,
         Province_State = `Province/State`) %>% 
  mutate(date = mdy(key),
         place = paste0(Country_Region, ", ", Province_State))

cases <- cases %>% 
  mutate(place = str_replace(place, ", NA", ""))

# create sf:
cases_sf <- st_as_sf(cases, coords = c("Long", "Lat"), crs = 4326)

# check the result:
glimpse(cases_sf)

## Observations: 23,760
## Variables: 7
## $ Province_State <chr> ...
## $ Country_Region <chr> ...
## $ key            <chr> ...
## $ value          <dbl> ...
## $ date           <date> ...
## $ place          <chr> ...
## $ geometry       <POINT [arc_degree]> ...

Compare the latest (-1 day) situation with situation 7 days ago

cases_sf_latest <- cases_sf %>% 
  filter(date == max(date)-1 | date == max(date)-1 - days(7))

You understood everything in previous script? ‘|’ is for ‘or’.

Draw the map of COVID-19 cases for 2 dates:

ggplot()+
  geom_sf(data = countries50, fill = "white", colour = "grey", size = 0.2)+
  geom_sf(data = cases_sf_latest, aes(size = value, alpha = value, colour = factor(date)))+
   facet_wrap(~date, ncol = 1)

As you noticed facet_wrap makes two maps for both of the dates.

Experienced eye may have noticed from previous map that there are points with coordinates latitude = 0 & longitude = 0. Most probably it’s some error and we should get rid of them. For that we have to extrat coordinates from sf object and filter 00-coordinates out.

# extract coordinates:
cases_crd_latest <- st_coordinates(cases_sf_latest) %>% 
  as_tibble()
  
# bind coordinates to sf-table:
cases_sf_latest_2 <- bind_cols(cases_sf_latest, cases_crd_latest)

# filter 00-coordinates out:
cases_sf_latest_2 <- cases_sf_latest_2 %>% 
  filter(X != 0) %>% 
  filter(Y != 0)

Plot the map:

ggplot()+
  geom_sf(data = countries50, fill = "white", colour = "grey", size=0.2)+
  geom_sf(data = cases_sf_latest_2, aes(size = value, colour = value), alpha = 0.7)+
  facet_wrap(~date, ncol = 1)+
  #scale_alpha(range = c(0.1, 0.5))+
  scale_colour_gradientn(colours = c("darkgreen", "purple"), trans = "log10")

How to visualize change during the last 7 days?

Select relevant columns,
Place them next to each other
calculate difference
plot the results:

# select relevant colunme and spread them to wide format:
cases_sf_latest_3 <- cases_sf_latest_2 %>% 
  select(place, X, Y, value, date) %>% 
  st_drop_geometry() %>% 
  spread(date, value)

# extract column names where we have dates:
dates <- colnames(cases_sf_latest_3)[4:5]

# rename columns (this is needed because of the tutorial script: dates are changing and exact dates can't be used):
colnames(cases_sf_latest_3)[4:5] <- c("a", "b")

# Replace cells without values ('NA') with zero:
cases_sf_latest_3$a[is.na(cases_sf_latest_3$a)] <- 0
cases_sf_latest_3$b[is.na(cases_sf_latest_3$b)] <- 0

# calculate change:
cases_sf_latest_3 <- cases_sf_latest_3 %>%
  mutate(change = b - a)

Draw the polished map:

ggplot()+
  theme_tufte()+
  theme(axis.title = element_blank(),
        axis.ticks = element_blank())+
  geom_sf(data = countries50, colour = "grey", fill= "grey90", size=0.2)+
  geom_point(data = cases_sf_latest_3, aes(x = X,
                                               y = Y,
                                               size = change,
                                              colour = change, 
                                           alpha= change))+
  scale_alpha(range = c(0.3, 0.7))+
  scale_colour_gradientn(trans = "log10", colours= c("yellow", "orange", "red", "red4", "black"))+
  scale_size_continuous(trans = "log10", range = c(1, 5))+
  labs(size= "change", alpha = "change", colour = "change",
       title = "Change in COVID-19 cases",
       subtitle = paste0("period: ", dates[1], "...", dates[2]))+
  guides(alpha = F,
         size= F)

Look carefully previous code chunk and try to understand the function of every parameter!

Sometimes map is not enough. We just have to many points. Solution is aggegation of data or using filtering to focus on interesting segments of the data.

Filter 10 places where the change is biggest:

cases_sf_latest_3_change_top10 <- cases_sf_latest_3 %>% 
  arrange(-change) %>% 
  head(n = 10)

Plot it:

ggplot()+
  geom_sf(data = countries50, fill = "white", colour = "grey", size=0.2)+
  geom_point(data = cases_sf_latest_3_change_top10, aes(x = X,
                                               y = Y,
                                               size = change,
                                              alpha = change), colour = "red")+
  #scale_colour_gradientn(colours= c("darkgreen", "purple"))+
  scale_size_continuous(trans = "log", range = c(1, 5))+
  scale_alpha(range = c(0.4, 0.8))+
  labs(size= "difference", alpha = "difference", colour = "difference")

Draw plot:

ggplot()+
  theme_minimal()+
  theme(panel.background = element_rect(fill = "slategray1"))+
  geom_segment(data = cases_sf_latest_3_change_top10, aes(x= 0, 
                                                              xend = change, 
                                                              y = reorder(place, change), 
                                                              yend = reorder(place, change)),
               colour = "black")+
  geom_point(data = cases_sf_latest_3_change_top10, aes(x = change, y = reorder(place, change)),
             shape = 21, colour = "white", fill = "firebrick", size= 3)+
  labs(title = "Change in Coronavirus cases during the last week", 
       x = "Change",
       y = "Country")

Which variant is more informative?

This was an example how to put spatial points on base map and how to visualize spatial patterns with points. Next, we look example of line maps.

3.5 Line maps

For that we use data about flights of Estonian national airline Nordica. Dataset covers all the flights operated by Nordica from December 2, 2018 to December 3, 2018. And our aim is to create typical destinations map:

Read the data and check file structure:

flights <- read.csv2("http://aasa.ut.ee/Rspatial/data/nordicaFlights.csv", encoding = "LATIN-1")

glimpse(flights)

## Observations: 94
## Variables: 8
## $ date      <fct> ...
## $ departure <fct> ...
## $ arrival   <fct> ...
## $ from      <fct> ...
## $ to        <fct> ...
## $ fligjt    <fct> ...
## $ airplane  <fct> ...
## $ status    <fct> ...

flights %>% 
  head() %>% 
  kable()

date	departure	arrival	from	to	fligjt	airplane	status
02.12.2018	7:00	8:25	Tallinn (TLL)	München (MUC)	LO8165	CR7 i	Graafikujärgne.
02.12.2018	7:00	8:20	Tallinn (TLL)	Viin (VIE)	LO8193	CR9 i	Graafikujärgne.
02.12.2018	7:20	7:20	Tallinn (TLL)	Stockholm (ARN)	LO8121	CR9 i	Graafikujärgne.
02.12.2018	9:10	12:30	München (MUC)	Tallinn (TLL)	LO8166	CR7 i	Graafikujärgne.
02.12.2018	9:20	9:55	Tallinn (TLL)	Varssavi (WAW)	LO790	CR9 i	Graafikujärgne.
02.12.2018	9:25	12:35	Viin (VIE)	Tallinn (TLL)	LO8194	CR9 i	Graafikujärgne.

All the data is nicely imported. All columns are stored as factors. But this is currently not important. Question is how can we add to the table the geographical coordinates of origin and destination?
As you probably know every airport has unique code (IATA code). Currently those codes are not in separate column. With help of tidyr and stringr package we can separate column with airport name and code to two columns:

# separate airport name and airport code:
flights <- flights %>% 
  separate(from, c("from_name" ,"from_code"), ' \\(')

flights <- flights %>% 
  separate(to, c("to_name" ,"to_code"), '\\(')

#remove ')' from string:
flights <- flights %>% 
  mutate(from_code = str_replace(from_code, "\\)", ""))

flights <- flights %>% 
  mutate(to_code = str_replace(to_code, "\\)", ""))

flights %>% 
  head() %>% 
  kable()

date	departure	arrival	from_name	from_code	to_name	to_code	fligjt	airplane	status
02.12.2018	7:00	8:25	Tallinn	TLL	München	MUC	LO8165	CR7 i	Graafikujärgne.
02.12.2018	7:00	8:20	Tallinn	TLL	Viin	VIE	LO8193	CR9 i	Graafikujärgne.
02.12.2018	7:20	7:20	Tallinn	TLL	Stockholm	ARN	LO8121	CR9 i	Graafikujärgne.
02.12.2018	9:10	12:30	München	MUC	Tallinn	TLL	LO8166	CR7 i	Graafikujärgne.
02.12.2018	9:20	9:55	Tallinn	TLL	Varssavi	WAW	LO790	CR9 i	Graafikujärgne.
02.12.2018	9:25	12:35	Viin	VIE	Tallinn	TLL	LO8194	CR9 i	Graafikujärgne.

Airport codes are now nicely in separate columns. Next step is to find additional dataset where we have next to the airport codes also the geographical coordinates of airports.

airports <- read.csv("https://datahub.io/core/airport-codes/r/airport-codes.csv")

airports2 <- airports %>% 
  filter(iata_code != "") 

airports2 %>% 
  head() %>% 
  kable()

ident	type	name	elevation_ft	continent	iso_country	iso_region	municipality	gps_code	iata_code	local_code	coordinates
03N	small_airport	Utirik Airport	4	OC	MH	MH-UTI	Utirik Island	K03N	UTK	03N	169.852005, 11.222
07FA	small_airport	Ocean Reef Club Airport	8	NA	US	US-FL	Key Largo	07FA	OCA	07FA	-80.274803161621, 25.325399398804
0AK	small_airport	Pilot Station Airport	305	NA	US	US-AK	Pilot Station		PQS	0AK	-162.899994, 61.934601
0CO2	small_airport	Crested Butte Airpark	8980	NA	US	US-CO	Crested Butte	0CO2	CSE	0CO2	-106.928341, 38.851918
0TE7	small_airport	LBJ Ranch Airport	1515	NA	US	US-TX	Johnson City	0TE7	JCY	0TE7	-98.62249755859999, 30.251800537100003
13MA	small_airport	Metropolitan Airport	418	NA	US	US-MA	Palmer	13MA	PMX	13MA	-72.31140136719999, 42.223300933800004

# Is airport of Tallinn (TLL) available: 
airports2 %>% 
  filter(iata_code == "TLL") %>% 
  kable()

ident	type	name	elevation_ft	continent	iso_country	iso_region	municipality	gps_code	iata_code	local_code	coordinates
EETN	large_airport	Lennart Meri Tallinn Airport	131	EU	EE	EE-37	Tallinn	EETN	TLL		24.832799911499997, 59.41329956049999

As the airport coordinates are stored in one column we have to use separate function once again:

airp_crd <- airports2 %>% 
  separate(coordinates, c("lng" ,"lat"), ", ")

# convert both columns to numeric and select relevant columns:
airp_crd <- airp_crd %>% 
  mutate(lng = as.numeric(lng), lat = as.numeric(lat)) %>% 
  select(iata_code, lng, lat, municipality)

glimpse(airp_crd)

## Observations: 9,235
## Variables: 4
## $ iata_code    <fct> ...
## $ lng          <dbl> ...
## $ lat          <dbl> ...
## $ municipality <fct> ...

Next we have to join tables (see Practical session 1)

# join flights and airport table by origin:
flights2 <- left_join(flights, airp_crd, by = c("from_code" = "iata_code"))

#rename coordinates columns:
flights2 <- flights2 %>% 
  rename(from_x = lng, from_y = lat, from_mun = municipality)

#join flights and airport table by destination:
flights2 <- left_join(flights2, airp_crd, by = c("to_code" = "iata_code"))

#rename coordinates columns:
flights2 <- flights2 %>% 
  rename(to_x = lng, to_y = lat, to_mun = municipality)

glimpse(flights2)

## Observations: 104
## Variables: 16
## $ date      <fct> ...
## $ departure <fct> ...
## $ arrival   <fct> ...
## $ from_name <chr> ...
## $ from_code <chr> ...
## $ to_name   <chr> ...
## $ to_code   <chr> ...
## $ fligjt    <fct> ...
## $ airplane  <fct> ...
## $ status    <fct> ...
## $ from_x    <dbl> ...
## $ from_y    <dbl> ...
## $ from_mun  <fct> ...
## $ to_x      <dbl> ...
## $ to_y      <dbl> ...
## $ to_mun    <fct> ...

Plot the flight on base map:

ggplot()+
  geom_sf(data = countries50, fill="gray80", col = "gray40", size=0.5) + 
  geom_segment(data = flights2, aes(x=from_x, xend= to_x, y=from_y, yend = to_y), col = "red")

Instead of global map we can focues on Europe. For that we can use st_crop() function or we can filter out only European countries:

europe <- countries50 %>% filter(REGION_UN == "Europe")

# plot:
ggplot()+
  geom_sf(data = europe)+
  geom_segment(data = flights2, aes(x=from_x, xend= to_x, y=from_y, yend = to_y), col = "red")

Base map needs some additional croping

europe2 <- st_crop(europe, c(xmin=-10, xmax=41, ymin=45, ymax= 70))

## although coordinates are longitude/latitude, st_intersection assumes that they are planar

# plot:
ggplot()+
  theme_minimal()+
  geom_sf(data = europe2, fill = "grey96", colour = "grey60", size = 0.25)+
  geom_segment(data = flights2, aes(x = from_x, xend = to_x, y = from_y, yend = to_y), col = "red")

Use curves instead of straight lines:

ggplot()+
  theme_minimal()+
  geom_sf(data = europe2, fill = "grey96", colour = "grey60", size = 0.25)+
  geom_curve(data = flights2, aes(x = from_x, xend = to_x, y = from_y, yend = to_y), col = "red", curvature = 0.2, arrow = arrow(length = unit(0.03, "npc")))

4 Save visualization (png, pdf)

# png:

png("flights.png", res=300, units = "cm", height = 15, width = 15)

ggplot()+
  theme_minimal()+
  geom_sf(data = europe2, fill = "grey96", colour = "grey60", size = 0.25)+
  geom_curve(data = flights2, aes(x = from_x, xend = to_x, y = from_y, yend = to_y), col = "red", curvature = 0.2, arrow = arrow(length = unit(0.03, "npc")))

dev.off() # It is crucial to close the writing session!



# pdf:
pdf("flights.pdf", height = 15, width = 15)

ggplot()+
  theme_minimal()+
  geom_sf(data = europe2, fill = "grey96", colour = "grey60", size = 0.25)+
  geom_curve(data = flights2, aes(x = from_x, xend = to_x, y = from_y, yend = to_y), col = "red", curvature = 0.2, arrow = arrow(length = unit(0.03, "npc")))

dev.off() # It is crucial to close the writing session!

The last version looks already like a thematic map. But stil there are some sortcomings:

map scale and north arrow is missing;
map title is missing.

We can try to fix it. Add the north arrow and map scale:

Plot it on top stamen map! In case of ggmap you have to use geom_segment instead of geom_curve!

5 Homework number 3

In this session your task is to analyse the COVID-19 data. You have to analyse how many people are recovered from COVID-19 data download. Please create:

world map where you visualise how many people are recovered from the COVID-19
plot of top10 countries.
Add also the map population density in districts to your homeworks!

Bind all the home works into one doc/pdf/html. Add script-files separately (in case of R-markdowm script is already in html file). Please add your name to each file name! E-mail results to: anto.aasa@ut.ee. Final deadline for all submissions is May 15th, 2020.

Possible results:

Author: Anto Aasa
in ISET, Tbilisi
Last update: 2020-04-22 08:57:55

logo

Simple thematic maps

Anto Aasa

Spring 2020