libraries:

library(tidyverse)
library(ggthemes) 
library(plotly)
library(sf)
library(leaflet)
library(gganimate)
library(tmap)

Today’s world is the world of the Internet. Data and information are everywhere. Our society is getting more and more data-driven. Good decisions and governance are supported by data, analysis, knowledge, and wisdom. And for that, we need excellent communication. Static visualizations are informative and unavoidable in the printed world. But the Internet offers us the platform for interactivity. Interactive documents and dashboards may help to communicate the message in a more attractive and catchy way. We cannot afford to miss this train.
In the current session, we learn how to create an (interactive) web page (*.html) in RStudio, and we try to create interactive dashboards and briefly look at the nature of shiny applications.

According to the Wikipedia: “Across the many fields concerned with interactivity, including information science, computer science, human-computer interaction, communication, and industrial design, there is little agreement over the meaning of the term”interactivity”, although all are related to interaction with computers and other machines with a user interface.
Multiple views on interactivity exist. In the “contingency view” of interactivity, there are three levels:
- Not interactive, when a message is not related to previous messages;
- Reactive, when a message is related only to one immediately previous message; and
- Interactive, when a message is related to a number of previous messages and to the relationship between them.”

Creating the interactive and especially animated visualizations means very often that you have to wait. You have to wait because the processing and rendering takes time. Time of waiting depends on the efficiency of the algorithm and computing power. Waiting is good time to read and learn more about possibilities to create interactive maps and visualizations in R:

Interactive time series

First, we try to create some simple interactive time series plots. Interactivity means that we can communicate with the object. For example, we can zoom in/out and get additional information (select, mouseover).

COVID-19 infections in Estonia

Estonian COVID-19 information and data is available from Estonian Covid-19 open-data portal. There are available many different interactive visualizations about the current and previous situation. From the same location we can download also the raw data. Datasets are refreshed weekly on Tuesdays around 12:00-12:30 noon Estonian local time. From this page we choose data set about the infections in counties. Link to the data set: https://opendata.digilugu.ee/opendata_covid19_test_county_all.csv.

Import the data into R workspace! Name the imported dataset : covid19_test_county_all. Be careful with dates!!! By default the date variables are recognized as character!!!

If the import was correct and successful, the structure of data should look like this:

glimpse(covid19_test_county_all)
## Rows: 34,408
## Columns: 11
## $ LastStatisticsDate <date> 2022-11-28, 2022-11-28, 2022-11-28, 2022-11-28, 20~
## $ StatisticsDate     <date> 2020-02-05, 2020-02-05, 2020-02-05, 2020-02-05, 20~
## $ Country            <chr> "Eesti", "Eesti", "Eesti", "Eesti", "Eesti", "Eesti~
## $ CountryEHAK        <dbl> 233, 233, 233, 233, 233, 233, 233, 233, 233, 233, 2~
## $ County             <chr> "Harju maakond", "Harju maakond", "Hiiu maakond", "~
## $ CountyEHAK         <chr> "0037", "0037", "0039", "0039", "0045", "0045", "00~
## $ ResultValue        <chr> "N", "P", "N", "P", "N", "P", "N", "P", "N", "P", "~
## $ DailyTests         <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ TotalTests         <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ DailyCases         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ TotalCases         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~

In current example the interesting fields in dataset are:

  • StatisticsDate: the date for which the result is valid;
  • County: name of the county (place of residence);
  • CountyEHAK: EHAK code of the county (we need it for mapping!);
  • ResultValue: test result (Negative or Positive);
  • DailyCases: number of the new cases.

Keep only positive tests!

If you managed to complete it correctly, then you see that ResultValue is P:

covid19_test_county_all %>% 
  distinct(ResultValue, .keep_all = FALSE)
## # A tibble: 1 x 1
##   ResultValue
##   <chr>      
## 1 P

Now plot it as a static plot:

ggplot()+
  geom_line(data = covid19_test_county_all, aes(x= StatisticsDate, y = DailyCases), size= .25)+
  facet_wrap(vars(County), scale = "free_y") # separate plot window for every County. What happends without: scale = "free_y"?

In column County after every county name is ” maakond”. Delete it! (Hint: Session 1).

For some cases the place of residence is unknown (NA; foreigners usually). Delete those rows as well! Hint: Session 2.

Let’s try to plot it as an interactive graph! First, create the ggplot object gg_cov_cases and then plot it with ggplotly() from plotly library:

gg_cov_cases <- ggplot()+
  geom_line(data = covid19_test_county_all, aes(x = StatisticsDate, y = DailyCases), size = .25)+
  facet_wrap(vars(County), scale = "free_y", ncol = 3) 

plotly::ggplotly(gg_cov_cases)

In another example with the same data we want to compare the dynamics of two counties (with approximately the same population size): Tartu and Ida-Viru county (population size ~145 000):

# filter:
covid19_test_county_all_2 <- covid19_test_county_all %>% 
  filter(County == "Tartu" | County =="Ida-Viru")

# ggplot:
gg_cov_cases_2 <- ggplot()+
  theme_classic()+
  geom_line(data = covid19_test_county_all_2, aes(x= StatisticsDate, y = DailyCases, color = County), size= .25, alpha = .5)+
  scale_colour_manual(values = c("blue", "red"))

# plotly plot:  
plotly::ggplotly(gg_cov_cases_2)

Plotly is good, but is has limitations. Sometimes it is better to use dygraph library. First install it and start it. dygraph assumes that time series are converted to xts time-series object:

library(dygraphs)

# Select relevant variables:
covid19_test_county_all_2a <- covid19_test_county_all_2 %>% 
  select(StatisticsDate, County, DailyCases)

# For dygraph the tabel should be in wide format:
covid19_test_county_all_2a <- covid19_test_county_all_2a %>% 
  pivot_wider(names_from = County, values_from = DailyCases, id_cols = StatisticsDate)

# conversion to xts:
covid19_test_county_all_2b <- xts::xts(covid19_test_county_all_2a, order.by = covid19_test_county_all_2a$StatisticsDate) 

# delete the date-column (in xts date is stored separately; try (glimpse()):
covid19_test_county_all_2b$StatisticsDate <- NULL

# plot the dygraph:
dygraph(covid19_test_county_all_2b) %>% 
  dySeries() %>% 
  dyRangeSelector() # allows to zoom to specific period: 

This was very short introduction to the world of interactive plots. In reality the horizon of possibilities is very wide! Read more: link.

Interactive maps

As usual, there are several options to draw interactive plots. During the previous sessions you have seen how easy it is with tmap. But there are some other options available as well. You can use for example ggplot&plotly, leaflet, tmap or something else.

Prepare the data

As you already saw, the library called plotly is very simple to apply. Just make the plot/map in ggplot and wrap it with plotly::ggplotly(). In next example we create interactive plot of COVID-19 cases in Estonia.
First, prepare the attributes: number of positive cases for the last available date by county:

covid19_test_county_all_latest <- covid19_test_county_all %>% 
  select(CountyEHAK, DailyCases, StatisticsDate) %>% 
  filter(StatisticsDate == max(StatisticsDate))

Now refresh you memory and import again the spatial layer of Estonian counties!

If you managed it correctly then the structure is:

# data structure:
glimpse(counties)
## Rows: 15
## Columns: 3
## $ MNIMI    <chr> "Saare maakond", "Viljandi maakond", "Hiiu maakond", "Harju m~
## $ MKOOD    <chr> "0074", "0084", "0039", "0037", "0056", "0071", "0045", "0081~
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((463065.2 64..., MULTIPOLYGON (((~

and after plotting it looks:

# map:
ggplot()+
  geom_sf(data = counties, fill = "slategray2", col = "blue", size= .25)

The layer of counties is very detail. This means the processing and plotting is relatively slow. But from previous sessions you remember, that simplification / generalization can be quite helpful and helps to speed up the next steps:

counties <- counties %>% 
  st_simplify(preserveTopology = TRUE, dTolerance = 200) %>% 
  st_cast("MULTIPOLYGON") # defines the type of geometry after simplification

# structure:
glimpse(counties)
## Rows: 15
## Columns: 3
## $ MNIMI    <chr> "Saare maakond", "Viljandi maakond", "Hiiu maakond", "Harju m~
## $ MKOOD    <chr> "0074", "0084", "0039", "0037", "0056", "0071", "0045", "0081~
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((463065.2 64..., MULTIPOLYGON (((~

Join the attributes with polygons!

covid19_test_county_all_latest_sf <- left_join(counties, covid19_test_county_all_latest, by = c("MKOOD" = "CountyEHAK"))

What happens, if you join it this way?

covid19_test_county_all_latest_sf <- left_join(covid19_test_county_all_latest, counties, by = c("CountyEHAK" = "MKOOD"))

Probably the next command gives error? (If yes, then use the correct (first) join!)
If everything is ok, then:

gg_covid19_map <- ggplot()+
  geom_sf(data = covid19_test_county_all_latest_sf, aes(fill = DailyCases))+
  scale_fill_gradientn(colours = c("forestgreen", "grey70", "orange", "red"))

gg_covid19_map

Sometimes it’s good to add values to map as labels. You cannot plot labels directly for polygons layer (but you can try and see, what happens?). Instead you should calculate centroids and plot labels for centroids:

covid19_test_county_all_latest_sf_cntr <- covid19_test_county_all_latest_sf %>% 
  st_centroid()

gg_covid19_map <- ggplot()+
  theme_void()+
  geom_sf(data = covid19_test_county_all_latest_sf, aes(fill = DailyCases))+
  geom_sf_text(data = covid19_test_county_all_latest_sf_cntr, aes(label = DailyCases))+
  scale_fill_gradientn(colours = c("forestgreen", "grey70", "orange", "red"))

gg_covid19_map

plotly

And now interactive map with plotly. Procedure is the same like it was for time series plot:

plotly::ggplotly(gg_covid19_map)

Leaflet

Another widely used option is Leaflet. Leaflet is an open source JavaScript library used to build web mapping applications. Leaflet allows developers to very easily display tiled web maps hosted on a public server, with optional tiled overlays.

Leaflet package is available also for R: link.
Firstly you have to install it!

library(leaflet)

For leaflet the spatial object should be in geographical coordinates (CRS = 4326):

# polygons:
covid19_test_county_all_latest_sf_4326 <- covid19_test_county_all_latest_sf %>% 
  st_transform(4326)

# labels:
covid19_test_county_all_latest_sf_cntr_4326 <- covid19_test_county_all_latest_sf_cntr %>% 
  st_transform(4326)

And the first attempt to plot counties with Leaflet:

leaflet() %>% 
  addTiles() %>% 
  addPolygons(data = covid19_test_county_all_latest_sf_4326) 

Leaf let assumes, that colours and palette is defined separately:

library(RColorBrewer)
## Warning: package 'RColorBrewer' was built under R version 4.1.3
pal <- colorBin(palette = "Purples", 
                domain = covid19_test_county_all_latest_sf_4326$DailyCases, n = 5) # split colors from white to red into 5 even bins

And know the output:

leaflet() %>% 
  addTiles() %>% 
  addPolygons(data = covid19_test_county_all_latest_sf_4326, 
              label= ~DailyCases, #mouseover value
              color = "gray", # border color 
              fillColor = ~pal(covid19_test_county_all_latest_sf_4326$DailyCases), # polygons fill color
              weight = 1.0, # border lines thickness 
              opacity = 1.0, # border lines transparency
              fillOpacity = 0.8) %>%  # polygons fill transparency
  addLabelOnlyMarkers(data = covid19_test_county_all_latest_sf_cntr_4326,
                      label = ~covid19_test_county_all_latest_sf_cntr_4326$DailyCases,
                      labelOptions = labelOptions(noHide = T))

The output can be much more polished and processed, but today the purpose was just to introduce Leaflet as an alternative.

tmaps

Actually you are already familiar with tmap. And you already saw, how to use it as an interactive map. Just convert tmap_mode() to “view” and the output will be interactive:

tmap_mode("view")
## tmap mode set to interactive viewing

And the map:

tm_shape(covid19_test_county_all_latest_sf_4326)+
  tm_polygons(col = "DailyCases", 
              style = "pretty",
              palette = "Reds",
              alpha = .7)+
tm_shape(covid19_test_county_all_latest_sf_cntr_4326)+
  tm_text(text = "DailyCases",
          bg.color = "grey",
          shadow = T)

Interactive plots and maps are nice and informative. But for for the communication they cannot be independent, they must be part of something else. Very often they are embedded to web page. In next example we look how to create html directly from RStudio. For that we use rMarkdown.

rMarkdown

html

< According to the RStudio: “R Markdown documents are fully reproducible. Use a productive notebook interface to weave together narrative text and code to produce elegantly formatted output. Use multiple languages including R, Python, and” SQL. More: link.
It is true, at least partially. For example all the R related tutorials of current course are produced with rMarkdown.
Let’s try!

Before the next steps you should install the library:

install.packages("rmarkdown")

Then create the markdown file (*.Rmd) where you write the script (the script file will be converted to html) : File -> New File -> R Markdown

In next window you can enter the title and author of the document. Other options are ok by default.

Short tutorial of rMarkdown is available from here: link
You explore the script of current page as well: link. To see the highlighted and formatted script you should download it and open it in RStudio.
For the very beginners the visual markdown ditor is available:

Try to create your own rMarkdown html! Fill it for example with content from your previous assignments!

To process the Rmd-file to html-file, you have to “knit” it:

In case you have your own webpage you can upload the created html directly to there. But you can use also the RPubs platform: link. For RPubs you have to create user account.

Dashboards

According to Wikipedia: A dashboard is a type of graphical user interface which often provides at-a-glance views of key performance indicators (KPIs) relevant to a particular objective or business process. In other usage, “dashboard” is another name for “progress report” or “report” and considered a form of data visualization.
The “dashboard” is often accessible by a web browser and is usually linked to regularly updating data sources.

List of the impressive dashboards created in R is very long:

Good introduction how to start using the flexdashboard is available from here: https://pkgs.rstudio.com/flexdashboard/

To start with flexdashboard you should select:

File -> New File -> R Markdown -> Flex Dashboard

For the beginners the first big problem is probably the layout of the dashboard. Some optional storyboard layouts are described in here: link.
The individual assignment after this session for you is to create your own interactive dashboard! Fill the dashboard with you own content. Dashboard must contain at least 4 different interactive visualizations (at least 2 maps)

You can use this file as the possible template for your assignment: link


Author: Anto Aasa
Supervisors: Anto Aasa & Lika Zhvania
LTOM.02.041
Last update: 2022-11-30 10:05:37

.