libraries:
library(tidyverse)
library(ggthemes)
library(plotly)
library(sf)
library(leaflet)
library(gganimate)
library(tmap)
Today’s world is the world of the Internet. Data and information are
everywhere. Our society is getting more and more data-driven. Good
decisions and governance are supported by data, analysis, knowledge, and
wisdom. And for that, we need excellent communication. Static
visualizations are informative and unavoidable in the printed world. But
the Internet offers us the platform for interactivity. Interactive
documents and dashboards may help to communicate the message in a more
attractive and catchy way. We cannot afford to miss this train.
In the current session, we learn how to create an (interactive) web page
(*.html) in RStudio, and we try to create interactive dashboards and
briefly look at the nature of shiny applications.
According to the Wikipedia: “Across the many fields concerned with interactivity, including information science, computer science, human-computer interaction, communication, and industrial design, there is little agreement over the meaning of the term”interactivity”, although all are related to interaction with computers and other machines with a user interface.
Multiple views on interactivity exist. In the “contingency view” of interactivity, there are three levels:
- Not interactive, when a message is not related to previous messages;
- Reactive, when a message is related only to one immediately previous message; and
- Interactive, when a message is related to a number of previous messages and to the relationship between them.”
Creating the interactive and especially animated visualizations means very often that you have to wait. You have to wait because the processing and rendering takes time. Time of waiting depends on the efficiency of the algorithm and computing power. Waiting is good time to read and learn more about possibilities to create interactive maps and visualizations in R:
First, we try to create some simple interactive time series plots. Interactivity means that we can communicate with the object. For example, we can zoom in/out and get additional information (select, mouseover).
Estonian COVID-19 information and data is available from Estonian Covid-19 open-data portal. There are available many different interactive visualizations about the current and previous situation. From the same location we can download also the raw data. Datasets are refreshed weekly on Tuesdays around 12:00-12:30 noon Estonian local time. From this page we choose data set about the infections in counties. Link to the data set: https://opendata.digilugu.ee/opendata_covid19_test_county_all.csv.
Import the data into R workspace! Name the imported dataset :
covid19_test_county_all. Be careful with dates!!! By
default the date variables are recognized as character!!!
If the import was correct and successful, the structure of data should look like this:
glimpse(covid19_test_county_all)
## Rows: 34,408
## Columns: 11
## $ LastStatisticsDate <date> 2022-11-28, 2022-11-28, 2022-11-28, 2022-11-28, 20~
## $ StatisticsDate <date> 2020-02-05, 2020-02-05, 2020-02-05, 2020-02-05, 20~
## $ Country <chr> "Eesti", "Eesti", "Eesti", "Eesti", "Eesti", "Eesti~
## $ CountryEHAK <dbl> 233, 233, 233, 233, 233, 233, 233, 233, 233, 233, 2~
## $ County <chr> "Harju maakond", "Harju maakond", "Hiiu maakond", "~
## $ CountyEHAK <chr> "0037", "0037", "0039", "0039", "0045", "0045", "00~
## $ ResultValue <chr> "N", "P", "N", "P", "N", "P", "N", "P", "N", "P", "~
## $ DailyTests <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ TotalTests <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ DailyCases <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ TotalCases <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
In current example the interesting fields in dataset are:
StatisticsDate: the date for which the result is
valid;County: name of the county (place of residence);CountyEHAK: EHAK code of the county (we need it for
mapping!);ResultValue: test result (Negative or
Positive);DailyCases: number of the new cases.Keep only positive tests!
If you managed to complete it correctly, then you see that
ResultValue is P:
covid19_test_county_all %>%
distinct(ResultValue, .keep_all = FALSE)
## # A tibble: 1 x 1
## ResultValue
## <chr>
## 1 P
Now plot it as a static plot:
ggplot()+
geom_line(data = covid19_test_county_all, aes(x= StatisticsDate, y = DailyCases), size= .25)+
facet_wrap(vars(County), scale = "free_y") # separate plot window for every County. What happends without: scale = "free_y"?
In column County after every county name is ” maakond”.
Delete it! (Hint: Session 1).
For some cases the place of residence is unknown (NA;
foreigners usually). Delete those rows as well! Hint: Session 2.
Let’s try to plot it as an interactive graph! First, create the
ggplot object gg_cov_cases and then plot it with
ggplotly() from plotly library:
gg_cov_cases <- ggplot()+
geom_line(data = covid19_test_county_all, aes(x = StatisticsDate, y = DailyCases), size = .25)+
facet_wrap(vars(County), scale = "free_y", ncol = 3)
plotly::ggplotly(gg_cov_cases)
In another example with the same data we want to compare the dynamics of two counties (with approximately the same population size): Tartu and Ida-Viru county (population size ~145 000):
# filter:
covid19_test_county_all_2 <- covid19_test_county_all %>%
filter(County == "Tartu" | County =="Ida-Viru")
# ggplot:
gg_cov_cases_2 <- ggplot()+
theme_classic()+
geom_line(data = covid19_test_county_all_2, aes(x= StatisticsDate, y = DailyCases, color = County), size= .25, alpha = .5)+
scale_colour_manual(values = c("blue", "red"))
# plotly plot:
plotly::ggplotly(gg_cov_cases_2)
Plotly is good, but is has limitations. Sometimes it is
better to use dygraph library. First install it and start
it. dygraph assumes that time series are converted to
xts time-series object:
library(dygraphs)
# Select relevant variables:
covid19_test_county_all_2a <- covid19_test_county_all_2 %>%
select(StatisticsDate, County, DailyCases)
# For dygraph the tabel should be in wide format:
covid19_test_county_all_2a <- covid19_test_county_all_2a %>%
pivot_wider(names_from = County, values_from = DailyCases, id_cols = StatisticsDate)
# conversion to xts:
covid19_test_county_all_2b <- xts::xts(covid19_test_county_all_2a, order.by = covid19_test_county_all_2a$StatisticsDate)
# delete the date-column (in xts date is stored separately; try (glimpse()):
covid19_test_county_all_2b$StatisticsDate <- NULL
# plot the dygraph:
dygraph(covid19_test_county_all_2b) %>%
dySeries() %>%
dyRangeSelector() # allows to zoom to specific period:
This was very short introduction to the world of interactive plots. In reality the horizon of possibilities is very wide! Read more: link.
As usual, there are several options to draw interactive plots. During
the previous sessions you have seen how easy it is with
tmap. But there are some other options available as well.
You can use for example ggplot&plotly,
leaflet, tmap or something else.
As you already saw, the library called plotly is very
simple to apply. Just make the plot/map in ggplot and wrap
it with plotly::ggplotly(). In next example we create
interactive plot of COVID-19 cases in Estonia.
First, prepare the attributes: number of positive cases for the last
available date by county:
covid19_test_county_all_latest <- covid19_test_county_all %>%
select(CountyEHAK, DailyCases, StatisticsDate) %>%
filter(StatisticsDate == max(StatisticsDate))
Now refresh you memory and import again the spatial layer of Estonian counties!
If you managed it correctly then the structure is:
# data structure:
glimpse(counties)
## Rows: 15
## Columns: 3
## $ MNIMI <chr> "Saare maakond", "Viljandi maakond", "Hiiu maakond", "Harju m~
## $ MKOOD <chr> "0074", "0084", "0039", "0037", "0056", "0071", "0045", "0081~
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((463065.2 64..., MULTIPOLYGON (((~
and after plotting it looks:
# map:
ggplot()+
geom_sf(data = counties, fill = "slategray2", col = "blue", size= .25)
The layer of counties is very detail. This means the processing and plotting is relatively slow. But from previous sessions you remember, that simplification / generalization can be quite helpful and helps to speed up the next steps:
counties <- counties %>%
st_simplify(preserveTopology = TRUE, dTolerance = 200) %>%
st_cast("MULTIPOLYGON") # defines the type of geometry after simplification
# structure:
glimpse(counties)
## Rows: 15
## Columns: 3
## $ MNIMI <chr> "Saare maakond", "Viljandi maakond", "Hiiu maakond", "Harju m~
## $ MKOOD <chr> "0074", "0084", "0039", "0037", "0056", "0071", "0045", "0081~
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((463065.2 64..., MULTIPOLYGON (((~
Join the attributes with polygons!
covid19_test_county_all_latest_sf <- left_join(counties, covid19_test_county_all_latest, by = c("MKOOD" = "CountyEHAK"))
What happens, if you join it this way?
covid19_test_county_all_latest_sf <- left_join(covid19_test_county_all_latest, counties, by = c("CountyEHAK" = "MKOOD"))
Probably the next command gives error? (If yes, then use the correct
(first) join!)
If everything is ok, then:
gg_covid19_map <- ggplot()+
geom_sf(data = covid19_test_county_all_latest_sf, aes(fill = DailyCases))+
scale_fill_gradientn(colours = c("forestgreen", "grey70", "orange", "red"))
gg_covid19_map
Sometimes it’s good to add values to map as labels. You cannot plot labels directly for polygons layer (but you can try and see, what happens?). Instead you should calculate centroids and plot labels for centroids:
covid19_test_county_all_latest_sf_cntr <- covid19_test_county_all_latest_sf %>%
st_centroid()
gg_covid19_map <- ggplot()+
theme_void()+
geom_sf(data = covid19_test_county_all_latest_sf, aes(fill = DailyCases))+
geom_sf_text(data = covid19_test_county_all_latest_sf_cntr, aes(label = DailyCases))+
scale_fill_gradientn(colours = c("forestgreen", "grey70", "orange", "red"))
gg_covid19_map
And now interactive map with plotly. Procedure is the
same like it was for time series plot:
plotly::ggplotly(gg_covid19_map)
Another widely used option is Leaflet. Leaflet is an open source JavaScript library used to build web mapping applications. Leaflet allows developers to very easily display tiled web maps hosted on a public server, with optional tiled overlays.
Leaflet package is available also for R: link.
Firstly you have to install it!
library(leaflet)
For leaflet the spatial object should be in geographical coordinates (CRS = 4326):
# polygons:
covid19_test_county_all_latest_sf_4326 <- covid19_test_county_all_latest_sf %>%
st_transform(4326)
# labels:
covid19_test_county_all_latest_sf_cntr_4326 <- covid19_test_county_all_latest_sf_cntr %>%
st_transform(4326)
And the first attempt to plot counties with Leaflet:
leaflet() %>%
addTiles() %>%
addPolygons(data = covid19_test_county_all_latest_sf_4326)
Leaf let assumes, that colours and palette is defined separately:
library(RColorBrewer)
## Warning: package 'RColorBrewer' was built under R version 4.1.3
pal <- colorBin(palette = "Purples",
domain = covid19_test_county_all_latest_sf_4326$DailyCases, n = 5) # split colors from white to red into 5 even bins
And know the output:
leaflet() %>%
addTiles() %>%
addPolygons(data = covid19_test_county_all_latest_sf_4326,
label= ~DailyCases, #mouseover value
color = "gray", # border color
fillColor = ~pal(covid19_test_county_all_latest_sf_4326$DailyCases), # polygons fill color
weight = 1.0, # border lines thickness
opacity = 1.0, # border lines transparency
fillOpacity = 0.8) %>% # polygons fill transparency
addLabelOnlyMarkers(data = covid19_test_county_all_latest_sf_cntr_4326,
label = ~covid19_test_county_all_latest_sf_cntr_4326$DailyCases,
labelOptions = labelOptions(noHide = T))
The output can be much more polished and processed, but today the purpose was just to introduce Leaflet as an alternative.
Actually you are already familiar with tmap. And you
already saw, how to use it as an interactive map. Just convert
tmap_mode() to “view” and the output will be
interactive:
tmap_mode("view")
## tmap mode set to interactive viewing
And the map:
tm_shape(covid19_test_county_all_latest_sf_4326)+
tm_polygons(col = "DailyCases",
style = "pretty",
palette = "Reds",
alpha = .7)+
tm_shape(covid19_test_county_all_latest_sf_cntr_4326)+
tm_text(text = "DailyCases",
bg.color = "grey",
shadow = T)
Interactive plots and maps are nice and informative. But for for the
communication they cannot be independent, they must be part of something
else. Very often they are embedded to web page. In next example we look
how to create html directly from RStudio. For that we use
rMarkdown.
< According to the RStudio: “R Markdown documents are fully
reproducible. Use a productive notebook interface to weave together
narrative text and code to produce elegantly formatted output. Use
multiple languages including R, Python, and” SQL. More: link.
It is true, at least partially. For example all the R related tutorials
of current course are produced with rMarkdown.
Let’s try!
Before the next steps you should install the library:
install.packages("rmarkdown")
Then create the markdown file (*.Rmd) where you write the script (the script file will be converted to html) : File -> New File -> R Markdown
In next window you can enter the title and author of the document. Other options are ok by default.
Short tutorial of rMarkdown is available from here: link
You explore the script of current page as well: link. To see the highlighted and
formatted script you should download it and open it in RStudio.
For the very beginners the visual markdown ditor is available:
Try to create your own rMarkdown html! Fill it for example with content from your previous assignments!
To process the Rmd-file to html-file, you have to “knit” it:
In case you have your own webpage you can upload the created html directly to there. But you can use also the RPubs platform: link. For RPubs you have to create user account.
According to Wikipedia: A dashboard is a type of graphical user interface which often provides at-a-glance views of key performance indicators (KPIs) relevant to a particular objective or business process. In other usage, “dashboard” is another name for “progress report” or “report” and considered a form of data visualization.
The “dashboard” is often accessible by a web browser and is usually linked to regularly updating data sources.
List of the impressive dashboards created in R is very long:
Good introduction how to start using the flexdashboard is available from here: https://pkgs.rstudio.com/flexdashboard/
To start with flexdashboard you should select:
File -> New File -> R Markdown -> Flex Dashboard
For the beginners the first big problem is probably the layout of the
dashboard. Some optional storyboard layouts are described in here: link.
The individual assignment after this session for you is to create your
own interactive dashboard! Fill the dashboard with you own content.
Dashboard must contain at least 4 different interactive visualizations
(at least 2 maps)
You can use this file as the possible template for your assignment: link
Author: Anto Aasa
Supervisors: Anto Aasa & Lika Zhvania
LTOM.02.041
Last update: 2022-11-30 10:05:37
.