Today we look more precisely how to transform and mutate the data.
> According to Wikipedia: Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.

Let’s suppose that we are animal geographers. We have received a task to analyse the distribution of farm animals in Estonia. The data is coming from Estonian Agricultural Registers and Information Board (ARIB) and contains counts of different farmed animal species and breeds in Estonian farms.

Firstly we list all the libraries we need in current session (obviously in real life we don’t know in the beginning what tools we need and the list is formed during the process):

library(tidyverse)
library(knitr)
library(stringr)
library(readxl)
library(curl) # library to download data from web
library(sf)

You can import data directly from Excel. It’s not so comfortable as from csv, but we can manage! First you have to download the xlsx-file and then import it to R.

# define xlsx file address
url <- "http://aasa.ut.ee/Rspatial/data/FarmedAnimalsByLocation_31102018.xlsx"

# define the name for downloaded file
destfile <- "FarmedAnimalsByLocation_31102018.xlsx"

# download the data according to previous parameters
curl_download(url, destfile)

# import data to R
agrAnimal <- read_excel(destfile)

Get familiar with the data:

glimpse(agrAnimal)

## Rows: 5,041
## Columns: 13
## $ `action place`             <chr> "EE1000", "EE10009", "EE10011", "EE10015", …
## $ `estonian holstein cattle` <dbl> 1, 56, 2, 26, 170, 33, 1, 0, 0, 4, 14, 393,…
## $ `estonian red cattle`      <dbl> 1, 3, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0…
## $ `estonian native cattle`   <dbl> 0, 0, 0, 0, 17, 0, 0, 1, 0, 0, 0, 0, 1, 3, …
## $ `beef cattle`              <dbl> 0, 0, 32, 0, 2, 0, 27, 1, 3, 6, 21, 0, 0, 0…
## $ sheeps                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0…
## $ goats                      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ pigs                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `X koordinaat`             <chr> "6405512", "6565214", "6525085", "6563514",…
## $ `Y koordinaat`             <chr> "661450", "603603", "582066", "593841", "60…
## $ county                     <chr> "VÕRU MAAKOND", "LÄÄNE-VIRU MAAKOND", "JÄRV…
## $ municipality               <chr> "ANTSLA VALD", "TAPA VALD", "TÜRI VALD", "J…
## $ `admin unit`               <chr> "LUHAMETSA KÜLA", "LINNAPE KÜLA", "PALA KÜL…

or:

head(agrAnimal)

## # A tibble: 6 × 13
##   action pl…¹ eston…² eston…³ eston…⁴ beef …⁵ sheeps goats  pigs X koo…⁶ Y koo…⁷
##   <chr>         <dbl>   <dbl>   <dbl>   <dbl>  <dbl> <dbl> <dbl> <chr>   <chr>  
## 1 EE1000            1       1       0       0      0     0     0 6405512 661450 
## 2 EE10009          56       3       0       0      0     0     0 6565214 603603 
## 3 EE10011           2       0       0      32      0     0     0 6525085 582066 
## 4 EE10015          26       0       0       0      0     0     0 6563514 593841 
## 5 EE10016         170       3      17       2      0     0     0 6528313 604696 
## 6 EE10018          33       0       0       0      0     0     0 6554026 598905 
## # … with 3 more variables: county <chr>, municipality <chr>,
## #   `admin unit` <chr>, and abbreviated variable names ¹`action place`,
## #   ²`estonian holstein cattle`, ³`estonian red cattle`,
## #   ⁴`estonian native cattle`, ⁵`beef cattle`, ⁶`X koordinaat`, ⁷`Y koordinaat`

#tail(agrAnimal)

1 Delete columns

As you see, the file contains information about farmed(?) animals in Estonia. Data is aggregated on farm level. We have counts of animal species and breeds. Locational information contains geographical coordinates (looks like in Estonian official system, EPSG:3301). Additionally we have information about adminstrative units, municipalities and counties. Unfortunately we don’t have metadata (municipality ID and time). Therefore we don’t know which version of adminstrative hierarchy is used. This data would be crucial to create maps: we may join the tables (geolayer & attributes). But we have the exact coordinates of farms and we can delete the columns with descriptive locational information. There are many ways to exclude columns from dataset:

#back up the data:
agrAnimal_bu <- agrAnimal

# delete the column
agrAnimal_bu$county <- NULL

# delete with 'select':
agrAnimal_bu <- agrAnimal_bu %>% select(-municipality)
glimpse(agrAnimal_bu)

## Rows: 5,041
## Columns: 11
## $ `action place`             <chr> "EE1000", "EE10009", "EE10011", "EE10015", …
## $ `estonian holstein cattle` <dbl> 1, 56, 2, 26, 170, 33, 1, 0, 0, 4, 14, 393,…
## $ `estonian red cattle`      <dbl> 1, 3, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0…
## $ `estonian native cattle`   <dbl> 0, 0, 0, 0, 17, 0, 0, 1, 0, 0, 0, 0, 1, 3, …
## $ `beef cattle`              <dbl> 0, 0, 32, 0, 2, 0, 27, 1, 3, 6, 21, 0, 0, 0…
## $ sheeps                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0…
## $ goats                      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ pigs                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `X koordinaat`             <chr> "6405512", "6565214", "6525085", "6563514",…
## $ `Y koordinaat`             <chr> "661450", "603603", "582066", "593841", "60…
## $ `admin unit`               <chr> "LUHAMETSA KÜLA", "LINNAPE KÜLA", "PALA KÜL…

# select relevant variable
agrAnimal <- agrAnimal %>% 
  select(`action place`, 
         `estonian holstein cattle`, 
         `estonian red cattle`,  
         `estonian native cattle`, 
         `beef cattle`, 
         sheeps, 
         goats, 
         pigs, 
         `X koordinaat`, 
         `Y koordinaat`, 
         municipality)

glimpse(agrAnimal)

## Rows: 5,041
## Columns: 11
## $ `action place`             <chr> "EE1000", "EE10009", "EE10011", "EE10015", …
## $ `estonian holstein cattle` <dbl> 1, 56, 2, 26, 170, 33, 1, 0, 0, 4, 14, 393,…
## $ `estonian red cattle`      <dbl> 1, 3, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0…
## $ `estonian native cattle`   <dbl> 0, 0, 0, 0, 17, 0, 0, 1, 0, 0, 0, 0, 1, 3, …
## $ `beef cattle`              <dbl> 0, 0, 32, 0, 2, 0, 27, 1, 3, 6, 21, 0, 0, 0…
## $ sheeps                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0…
## $ goats                      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ pigs                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `X koordinaat`             <chr> "6405512", "6565214", "6525085", "6563514",…
## $ `Y koordinaat`             <chr> "661450", "603603", "582066", "593841", "60…
## $ municipality               <chr> "ANTSLA VALD", "TAPA VALD", "TÜRI VALD", "J…

2 Rename columns

If column name contains space or some specific symbols, we have to define them with help of apostrophe (‘name name’). It is annoying… Better rename the columns:

agrAnimal <- agrAnimal %>% rename(action_place = `action place`,
                                  estonian_holstein_cattle = `estonian holstein cattle`,
                                  estonian_red_cattle = `estonian red cattle`,
                                  estonian_native_cattle = `estonian native cattle`, 
                                  beef_cattle = `beef cattle`, 
                                  X_coord = `X koordinaat`,
                                  Y_coord = `Y koordinaat`)

Check the result! Is it ok?
Check again! What is the data type for coordinates?
You can’t plot data if coordinates are stored as characters…
Convert them to numeric format! Converting from character to integer or numeric is easy. If you want to convert factor to numeric you have to convert factor firstly to character and only then to integer, otherwise if you try to convert directly from factor to integer or numeric your result is a vector of the internal level representations of your factor and not the original values:

agrAnimal <- agrAnimal %>% 
  mutate(X_coord = as.numeric(X_coord), 
         Y_coord = as.numeric(Y_coord))

Check the result! Ok?
If you are satisfied, you can try to map the data. Currently its not a spatial object but flat data table which contains coordinates. In ggplot it’s not a problem, you can plot spatial objects and tabular data on same plot. Only important thing is that coordinates must be in the same reference system.

ggplot()+
  geom_point(data = agrAnimal, aes(x= X_coord, y = Y_coord))

If you are familiar with Estonian geography you probably recognize, that there is something wrong. What? Can you solve it?
The problem is, that traditionally X is used to label longitude and Y is used for latitude. In current case it’s vice versa. To avoid mistakes it’s better to rename those columns:

agrAnimal <- agrAnimal %>% 
  rename(x = Y_coord, y = X_coord)

plot:

ggplot()+
  geom_point(data = agrAnimal, aes(x = x, y = y))

Nice! Estonia is so densely covered with farms that we can already see the contour of Estonia! But to be safe, we download also the geolayer of Estonian municipalities and plot those two layers together.
Download and unzip the municipalities layer:

download.file("https://geoportaal.maaamet.ee/docs/haldus_asustus/omavalitsus_shp.zip", destfile="omavalitsus_shp.zip")
#omavalitsus means municipality!
unzip("omavalitsus_shp.zip")

Put results on plot!
First we have to import the downloaded shp-layer. SHP-layer is actually a bunch of several files (attribues, geometry, projection etc). With help of sf library the working with SHP-layer is today very simple! First import the data:

# Attention!!! Date ion shp-file name is changed after every data update!
# check the correct file name if needed:
list.files(pattern = "shp")

##  [1] "asustusyksus_20211001.shp" "asustusyksus_20211101.shp"
##  [3] "eestimaa_wgs84.shp"        "eestimaa_wgs84.shp.xml"   
##  [5] "gps_us.shp"                "gps_us_monterey.shp"      
##  [7] "maakond_20210901.shp"      "maakond_20211101.shp"     
##  [9] "maakond_20221001.shp"      "maakond_shp.zip"          
## [11] "omavalitsus_20211001.shp"  "omavalitsus_20221101.shp" 
## [13] "omavalitsus_shp.zip"       "population_2017.shp"      
## [15] "trt_cont.shp"

# import shp:
municip <-  st_read("omavalitsus_20211001.shp", quiet = T)

If the import was successful you should see some information about imported layer. It contains 79 features (municipalities) and 5 fields (columns). Important is to notice the epsg (3301), which currently refers to the estonian official CRS. If the CRS is not defined you can define it by your self:

st_crs(municip) <- 3301

Plot municipalities and animal data together. A mentioned earlier ggplot2 is very flexible and you can plot plain data table and geolayer simultaneously at the same plot. It is important to put layers in correct order!

ggplot()+
  geom_sf(data = municip)+ 
  geom_point(data = agrAnimal, aes(x=x, y=y), # in aes() are defined all the parameters which are coming from the data (coordinates, colour, shape, transparency[alpha])
             colour = "red", # but you can define previously listed parameters globally as well
             alpha = 0.5, # you can "broke"" the lines to make code more readable!
             size= 0.5, 
             shape = 2)

We can see, that all the data fits nicely on the map. To see the importance of correct order of layers, plot the map again, but change the order of layers this time!
Previously plotted map shows only the general distribution of Estonian farms. This is certainly not enough for good analysis. We should try to visualize distribution patterns of different species and breeds. Next we draw the map where we see the distribution of sheeps, goats and pigs:

ggplot()+
  geom_sf(data = municip, colour = "grey40", fill = "grey80", size=0.5)+ 
  geom_point(data = agrAnimal, aes(x=x, y=y, size=goats), colour = "blue")+
  geom_point(data = agrAnimal, aes(x=x, y=y, size=pigs), colour = "grey30")+
  geom_point(data = agrAnimal, aes(x=x, y=y, size=sheeps), colour = "red")

Not very nice? All the layers are overlapped and uppermost layer hides the other layers, proper legend is missing etc. We should create separate map for every species/breed? Currently our data is stored in so called wide format which means all the data groups are stored in different columns. To make analysis process more convenient and faster we should convert the data to long format link

https://swcarpentry.github.io/r-novice-gapminder/14-tidyr/

Convert data from wide format to long format:

agrAnimal_2 <- gather(agrAnimal, "key", "value", 2:8)

agrAnimal_2 <- agrAnimal_2 %>% 
  filter(value > 0)

Compare long and wide format:
Wide format:

agrAnimal %>% 
  head() %>% 
  kable()

action_place	estonian_holstein_cattle	estonian_red_cattle	estonian_native_cattle	beef_cattle	y	x	municipality
EE1000	1	1	0	0	6405512	661450	ANTSLA VALD
EE10009	56	3	0	0	6565214	603603	TAPA VALD
EE10011	2	0	0	32	6525085	582066	TÜRI VALD
EE10015	26	0	0	0	6563514	593841	JÄRVA VALD
EE10016	170	3	17	2	6528313	604696	JÄRVA VALD
EE10018	33	0	0	0	6554026	598905	JÄRVA VALD

Long format:

agrAnimal_2 %>% 
  head() %>% 
  kable()

action_place	y	x	municipality	key	value
EE1000	6405512	661450	ANTSLA VALD	estonian_holstein_cattle	1
EE10009	6565214	603603	TAPA VALD	estonian_holstein_cattle	56
EE10011	6525085	582066	TÜRI VALD	estonian_holstein_cattle	2
EE10015	6563514	593841	JÄRVA VALD	estonian_holstein_cattle	26
EE10016	6528313	604696	JÄRVA VALD	estonian_holstein_cattle	170
EE10018	6554026	598905	JÄRVA VALD	estonian_holstein_cattle	33

Now we can plot different groups more easily and faster:

ggplot()+
  geom_sf(data = municip, colour = "grey40", fill = "grey80", size=0.1)+ 
  geom_point(data = agrAnimal_2, aes(x = x, y = y, size = value, colour = key, alpha = value))

Now we have proper legend but general expression is still the mess… Plot every species/breed on separete submap? For that we can wrap every submap in sequenced panel In ggplot2 this is called faceting

library(ggthemes)

ggplot()+
  theme_map(base_size = 10)+
  theme(legend.position = "right",
        panel.grid = element_line(colour = "white"),
        strip.background = element_rect(fill = "white", colour = "white"),
        strip.text.x = element_text(angle = 0, hjust = 0))+
  geom_sf(data = municip, colour = "grey70", fill = "grey90", size=0.25)+ 
  geom_point(data = agrAnimal_2, aes(x = x, y = y, size = value, colour = key, alpha = value))+
  facet_wrap(~key)+
  guides(colour = F)

## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

The labels of facets are not very beautiful (instead of ‘beef_cattle* it should be ’beef cattle’ etc). We can fix it:

key_labels_uniq <- unique(agrAnimal_2$key)

key_labels <- c("estonian_holstein_cattle" = "estonian holstein cattle",
                "estonian_red_cattle" = "estonian red cattle",
                "estonian_native_cattle" = "estonian native cattle",
                "beef_cattle" = "beef cattle",
                "sheeps" = "sheeps", 
                "goats" = "goats",
                "pigs" = "pigs")



ggplot()+
  theme_map(base_size = 10)+
  theme(legend.position = "right",
        panel.grid = element_line(colour = "white"),
        strip.background = element_rect(fill = "white", colour = "white"),
        strip.text.x = element_text(angle = 0, hjust = 0))+
  geom_sf(data = municip, colour = "grey70", fill = "grey90", size=0.25)+ 
  geom_point(data = agrAnimal_2, aes(x = x, y = y, size = value, colour = key, alpha = value))+
  facet_wrap(~key, labeller = as_labeller(key_labels))+
  guides(colour = F)

## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

3 Join tables

Still not satisfied! Maybe the main problem is, that we have to much data and we should aggregate it from farm level to municipality level? Let’s try!
First we have to check the compatibility of two layers (are the municipality divisions from the same time?). To do that we have to join those two tables (link).

http://ohi-science.org/data-science-training/dplyr.html#joining-datasets

Remind the structure of tables:

#animals:
glimpse(agrAnimal_2)

## Rows: 7,152
## Columns: 6
## $ action_place <chr> "EE1000", "EE10009", "EE10011", "EE10015", "EE10016", "EE…
## $ y            <dbl> 6405512, 6565214, 6525085, 6563514, 6528313, 6554026, 652…
## $ x            <dbl> 661450, 603603, 582066, 593841, 604696, 598905, 596034, 6…
## $ municipality <chr> "ANTSLA VALD", "TAPA VALD", "TÜRI VALD", "JÄRVA VALD", "J…
## $ key          <chr> "estonian_holstein_cattle", "estonian_holstein_cattle", "…
## $ value        <dbl> 1, 56, 2, 26, 170, 33, 1, 4, 14, 393, 13, 665, 272, 144, …

# municipalities
glimpse(municip)

## Rows: 79
## Columns: 6
## $ ONIMI    <chr> "Ruhnu vald", "Muhu vald", "Viimsi vald", "Saaremaa vald", "H…
## $ OKOOD    <chr> "0689", "0478", "0890", "0714", "0205", "0303", "0907", "0897…
## $ MNIMI    <chr> "Saare maakond", "Saare maakond", "Harju maakond", "Saare maa…
## $ MKOOD    <chr> "0074", "0074", "0037", "0074", "0039", "0068", "0056", "0084…
## $ TYYP     <chr> "1", "1", "1", "1", "1", "1", "1", "4", "1", "4", "1", "4", "…
## $ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((455191.3 64..., MULTIPOLYGON (((…

In one table the names of the municipalities are stored in column municipality in another ONIMI. Problem is that join function is case sensitive. This means that municipality names must be stored in both tables in exactly the same way. Currently in attribute file the names are in UPPERCASE. Convert both of them to lower case (more information about working with strings: link)

agrAnimal_2 <- agrAnimal_2 %>% 
  mutate(municip_key = str_to_lower(municipality))

municip <- municip %>% 
  mutate(municip_key = str_to_lower(ONIMI)) # create new layer to back up the data!

Check the results (`glimpse)!

Keep in new municipality layer only the names (municip_key) and codes (OKOOD) of the municipalities. Use the select function (find from previous parts of tutorial!!!)

municip2 <- municip %>% 
  select(municip_key, OKOOD)

glimpse(municip2)

## Rows: 79
## Columns: 3
## $ municip_key <chr> "ruhnu vald", "muhu vald", "viimsi vald", "saaremaa vald",…
## $ OKOOD       <chr> "0689", "0478", "0890", "0714", "0205", "0303", "0907", "0…
## $ geometry    <MULTIPOLYGON [m]> MULTIPOLYGON (((455191.3 64..., MULTIPOLYGON …

remove geometry column from new municipalities table:

municip2 <- municip2 %>% 
  st_drop_geometry()

#or:
#st_geometry(municip2) <- NULL

Instead of joining the raw animal data we should aggregate it to municipality level! Let’s calculate the general count of all animals by municipalities. For that we have to define the grouping value and use summarise() function:

agrAnimal_2_aggr <- agrAnimal_2 %>% 
  group_by(municip_key) %>% 
  summarise(n = sum(value)) %>% 
  ungroup()

glimpse(agrAnimal_2_aggr)

## Rows: 73
## Columns: 2
## $ municip_key <chr> "alutaguse vald", "anija vald", "antsla vald", "elva vald"…
## $ n           <dbl> 1740, 1225, 14130, 14107, 2745, 7927, 1535, 9101, 3212, 13…

And now we can try to join the tables. Most probably there will be some errors. To see how many municipality names are stored in both tables exactly the same way we use inner_join:

agrAnimal_JOINED <- inner_join(agrAnimal_2_aggr, municip2, by = "municip_key")

glimpse(agrAnimal_JOINED)

## Rows: 73
## Columns: 3
## $ municip_key <chr> "alutaguse vald", "anija vald", "antsla vald", "elva vald"…
## $ n           <dbl> 1740, 1225, 14130, 14107, 2745, 7927, 1535, 9101, 3212, 13…
## $ OKOOD       <chr> "0130", "0141", "0142", "0171", "0184", "0191", "0198", "0…

In municipalities layer we have 79 units but after join only 73.

Join the result with geolayer (now we use left_join to keep all the municipalities). After that we filter out all the municipalities with no data (NA-value):

agrAnimal_2_aggr_sf <- left_join(municip, agrAnimal_JOINED, by ="municip_key")

agrAnimal_2_aggr_sf <- agrAnimal_2_aggr_sf %>% 
  filter(!is.na(n))

glimpse(agrAnimal_2_aggr_sf)

## Rows: 73
## Columns: 9
## $ ONIMI       <chr> "Ruhnu vald", "Muhu vald", "Viimsi vald", "Saaremaa vald",…
## $ OKOOD.x     <chr> "0689", "0478", "0890", "0714", "0205", "0303", "0907", "0…
## $ MNIMI       <chr> "Saare maakond", "Saare maakond", "Harju maakond", "Saare …
## $ MKOOD       <chr> "0074", "0074", "0037", "0074", "0039", "0068", "0056", "0…
## $ TYYP        <chr> "1", "1", "1", "1", "1", "1", "1", "1", "4", "1", "4", "4"…
## $ municip_key <chr> "ruhnu vald", "muhu vald", "viimsi vald", "saaremaa vald",…
## $ n           <dbl> 219, 3360, 79, 61404, 9101, 612, 537, 2263, 21, 1473, 16, …
## $ OKOOD.y     <chr> "0689", "0478", "0890", "0714", "0205", "0303", "0907", "0…
## $ geometry    <MULTIPOLYGON [m]> MULTIPOLYGON (((455191.3 64..., MULTIPOLYGON …

plot the map

ggplot()+
  geom_sf(data = agrAnimal_2_aggr_sf, aes(fill = n))+
  scale_fill_gradientn(colours = topo.colors(20), na.value = "black")

We see some small white spots on the map (e.g. Viljandi, Võru):

Well. we have to search for another way to aggregate the layer of farmed animals. One possibility is the spatial join.

4 Spatial join

If “ordinary” join is based on key-column of tables, the spatial join is based on geometry. This means both of datasets must be geolayers.
For that we have to convert animals data to sf-object more about sf

4.1 Convert plain table to sf spatial object

This kind of converting assumes, that we have in table locational information (usually coordinates).

Conversion to sf itself is easy. For that we can use function sf(). Explore the sybtax of the function:

?st_as_sf

Check the structure of data and convert it to sf:

# structure:
glimpse(agrAnimal_2)

## Rows: 7,152
## Columns: 7
## $ action_place <chr> "EE1000", "EE10009", "EE10011", "EE10015", "EE10016", "EE…
## $ y            <dbl> 6405512, 6565214, 6525085, 6563514, 6528313, 6554026, 652…
## $ x            <dbl> 661450, 603603, 582066, 593841, 604696, 598905, 596034, 6…
## $ municipality <chr> "ANTSLA VALD", "TAPA VALD", "TÜRI VALD", "JÄRVA VALD", "J…
## $ key          <chr> "estonian_holstein_cattle", "estonian_holstein_cattle", "…
## $ value        <dbl> 1, 56, 2, 26, 170, 33, 1, 4, 14, 393, 13, 665, 272, 144, …
## $ municip_key  <chr> "antsla vald", "tapa vald", "türi vald", "järva vald", "j…

agrAnimal_2_sf <- st_as_sf(agrAnimal_2, coords = c("x", "y"), crs = 3301)

#what happend?
glimpse(agrAnimal_2_sf)

## Rows: 7,152
## Columns: 6
## $ action_place <chr> "EE1000", "EE10009", "EE10011", "EE10015", "EE10016", "EE…
## $ municipality <chr> "ANTSLA VALD", "TAPA VALD", "TÜRI VALD", "JÄRVA VALD", "J…
## $ key          <chr> "estonian_holstein_cattle", "estonian_holstein_cattle", "…
## $ value        <dbl> 1, 56, 2, 26, 170, 33, 1, 4, 14, 393, 13, 665, 272, 144, …
## $ municip_key  <chr> "antsla vald", "tapa vald", "türi vald", "järva vald", "j…
## $ geometry     <POINT [m]> POINT (661450 6405512), POINT (603603 6565214), POI…

Plot the result (may take a bit more time…):

ggplot()+
  geom_sf(data = municip)+
  geom_sf(data = agrAnimal_2_sf, aes(col = key, alpha = value))

Now we can perform the spatial join. The purpose is to add to animals sf-layer the municipality information from the municipalities layer:

agrAnimal_2_sf_municip <- st_join(agrAnimal_2_sf, municip, join = st_intersects)

In my computer previous command gave error message.
Despite the fact that both layers have the same CRS:

st_crs(agrAnimal_2_sf) == st_crs(municip)

## [1] FALSE

To trick the software we can say that we want to transform both layers to new CRS:

agrAnimal_2_sf_municip <- st_join(st_transform(agrAnimal_2_sf, 3301), st_transform(municip, 3301), join = st_intersects)

agrAnimal_2_sf_municip <- agrAnimal_2_sf_municip %>% 
  st_drop_geometry()

agrAnimal_2_sf_municip_aggr <- agrAnimal_2_sf_municip %>% 
  group_by(OKOOD, key) %>% 
  summarise(sum = sum(value)) %>% 
  ungroup()

## `summarise()` has grouped output by 'OKOOD'. You can override using the
## `.groups` argument.

agrAnimal_2_sf_municip_aggr <- agrAnimal_2_sf_municip_aggr %>% 
  spread(key, sum)

glimpse(agrAnimal_2_sf_municip_aggr)

## Rows: 73
## Columns: 8
## $ OKOOD                    <chr> "0130", "0141", "0142", "0171", "0184", "0191…
## $ beef_cattle              <dbl> 595, 665, 2357, 519, 1546, 1508, 304, 4046, 1…
## $ estonian_holstein_cattle <dbl> 53, 198, 1195, 3792, 43, 2203, 1178, 799, 648…
## $ estonian_native_cattle   <dbl> 1, 22, 33, 27, NA, 1, 2, 112, 85, 15, 15, NA,…
## $ estonian_red_cattle      <dbl> 91, 26, 550, 1051, NA, 4, 4, 267, 3, 3, 2632,…
## $ goats                    <dbl> 34, 37, 13, 25, 130, 74, 15, 181, 26, 3, 89, …
## $ pigs                     <dbl> 2, NA, 8560, 8086, NA, 3569, NA, 8, NA, NA, 1…
## $ sheeps                   <dbl> 964, 277, 1422, 607, 1026, 568, 32, 3688, 779…

4.2 Join municipality counts to municipalities geolayer

agrAnimal_2_sf_municip_aggr <- left_join(municip, agrAnimal_2_sf_municip_aggr, by="OKOOD")

Plot the map:

ggplot()+
  geom_sf(data = agrAnimal_2_sf_municip_aggr, aes(fill=beef_cattle), size=0.25, colour = "grey70")+
  scale_fill_gradientn(colours = c("darkgreen", "grey80", "orange", "red", "brown"), na.value = "magenta")+
  labs(fill = "N", 
       title = "Beef cattle in Estonian municipalities",
       subtitle = "Agricultural Registers and Information Board",
       caption = "author: A. Aasa")

This kind maps can be misleading. The area of municipalities is very different. Maybe the density is better measure? Let’s try.

# calculate area:
agrAnimal_2_sf_municip_aggr$area <- st_area(agrAnimal_2_sf_municip_aggr)

# m2 => km2:
agrAnimal_2_sf_municip_aggr <- agrAnimal_2_sf_municip_aggr %>% 
  mutate(area = as.numeric(area) / 1000000) 

# plot:
ggplot()+
  geom_sf(data = agrAnimal_2_sf_municip_aggr, aes(fill = beef_cattle / area), size=0.25, colour = "grey70")+
  scale_fill_gradientn(colours = c("darkgreen", "grey80", "orange", "red", "brown"), na.value = "magenta")+
  labs(fill = "N per km2", 
       title = "Beef cattle density in Estonian municipalities",
       subtitle = "Agricultural Registers and Information Board",
       caption = "author: A. Aasa")

Quite big difference compared with previous?

You can read more closely about spatial join from here: link

Can you produce similar ma with tmaplibrary?

Author: Anto Aasa
Supervisors: Anto Aasa & Lika Zhvania
LTOM.02.041
Last update: 2022-11-06 10:11:08

logo

Data wrangling

Anto Aasa