::p_load(sf, tidyverse, tmap, spdep, funModeling) pacman
Take-home EX1
Take-home Exercise 1: Geospatial Analytics for Social Good
Setting the Scene
Water is an important resource to mankind. Clean and accessible water is critical to human health. It provides a healthy environment, a sustainable economy, reduces poverty and ensures peace and security. Yet over 40% of the global population does not have access to sufficient clean water. By 2025, 1.8 billion people will be living in countries or regions with absolute water scarcity, according to UN-Water. The lack of water poses a major threat to several sectors, including food security. Agriculture uses about 70% of the world’s accessible freshwater.
Developing countries are most affected by water shortages and poor water quality. Up to 80% of illnesses in the developing world are linked to inadequate water and sanitation. Despite technological advancement, providing clean water to the rural community is still a major development issues in many countries globally, especially countries in the Africa continent.
To address the issue of providing clean and sustainable water supply to the rural community, a global Water Point Data Exchange (WPdx) project has been initiated. The main aim of this initiative is to collect water point related data from rural areas at the water point or small water scheme level and share the data via WPdx Data Repository, a cloud-based data library. What is so special of this project is that data are collected based on WPDx Data Standard.
Objectives
Geospatial analytics hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate global and local measures of spatial Association techniques to reveals the spatial patterns of Not Functional water points. For the purpose of this study, Nigeria will be used as the study country.
The Data
Apstial data
For the purpose of this assignment, data from WPdx Global Data Repositories will be used. There are two versions of the data. They are: WPdx-Basic and WPdx+. You are required to use WPdx+ data set.
Geospatial data
Nigeria Level-2 Administrative Boundary (also known as Local Government Area) polygon features GIS data will be used in this take-home exercise. The data can be downloaded either from The Humanitarian Data Exchange portal or geoBoundaries.
Getting Started
In the code chunk below, p_load()
of pacman package is used to install and load the following R packages into R environment:
sf: used for importing, managing and processing geospatial data
tidyverse: a set of packages for data science
tmap: used to generate thematic maps
spdep: used to create spatial weights matrix objects
funModeling: used for rapid Exploratory Data Analysis
Importing Geospatial Data
In this in-class data, two geospatial data sets will be used, they are:
geo_export
nga_admbnda_adm2_osgof_20190417
Importing water point geospatial data
First, we are going to import the water point geospatial data (i.e. geo_export) by using the code chunk below.
Next, write_rds()
of readr package is used to save the extracted sf data table (i.e. wp) into an output file in rds data format. The output file is called wp_nga.rds and it is saved in geodata sub-folder.
Importing Nigeria LGA boundary data
Data Wrangling
Recoding NA values into string
replace_na()
is used to recode all the NA values in status_cle field into Unknown.
EDA
freq()
of funModeling package is used to display the distribution of status_cle field in wp_nga.
There are 9 different status of data, for the convenience of analysis, we will group them into 3 categories.
Functional: Functional, Functional but needs repair, Functional but not in use.
Non functional: Non-Functional, Non-Functional due to dry season, Non functional due to dry season, Abandoned
Unknown: Unknown
Extracting funtional water point
Extracting non-funtional water point
Extracting water point with Unknown class
Performing Point-in-Polygon Count
nga_wp <- nga %>%
mutate(`total wpt` = lengths(
st_intersects(nga, wp_nga))) %>%
mutate(`wpt functional` = lengths(
st_intersects(nga, wpt_functional))) %>%
mutate(`wpt non-functional` = lengths(
st_intersects(nga, wpt_nonfunctional))) %>%
mutate(`wpt unknown` = lengths(
st_intersects(nga, wpt_unknown)))
Saving the Analytical Data Table
Now, we have the tidy sf data table subsequent analysis. We will save the sf data table into rds format.
Visualising the spatial dsitribution of water points
nga_wp <- read_rds("geodata/nga_wp.rds")
total <- qtm(nga_wp, "total wpt")
wp_functional <- qtm(nga_wp, "wpt functional")
wp_nonfunctional <- qtm(nga_wp, "wpt non-functional")
unknown <- qtm(nga_wp, "wpt unknown")
tmap_arrange(total, wp_functional, wp_nonfunctional, unknown, asp=1, ncol=2)
Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Visualising Regional Development Indicator
equal <- tm_shape(nga_wp) +
tm_fill("wpt non-functional",
n = 5,
style = "equal") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Equal interval classification")
quantile <- tm_shape(nga_wp) +
tm_fill("wpt non-functional",
n = 5,
style = "quantile") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Equal quantile classification")
tmap_arrange(equal,
quantile,
asp=1,
ncol=2)
To ensure the consistency of the analysis scale, the fixed distance method will be used. Determine the centroid for each area
Determine the cutoff distance for fixed distance weight matrix
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.675 12.814 20.300 22.075 27.800 72.027
The result shows that the maximum distance is 72.027.
Computing Fixed distance weight matrix
Neighbour list object:
Number of regions: 774
Number of nonzero links: 18050
Percentage nonzero weights: 3.012973
Average number of links: 23.32041
Each region have average 23 neighbors.
Visualizing the Distance weight Matrix
Assigning weights to each neighboring polygon
Characteristics of weights list object:
Neighbour list object:
Number of regions: 774
Number of nonzero links: 18050
Percentage nonzero weights: 3.012973
Average number of links: 23.32041
Link number distribution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
5 8 12 21 34 33 33 37 28 34 28 20 19 22 15 13 13 16 12 15 16 13 11 11 6 14
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
10 5 14 14 13 7 9 9 11 9 11 14 13 11 9 5 3 5 6 9 8 9 4 5 3 3
53 54 55 56 57 58 59 60 61 62 63 64 65 67 68
2 4 5 4 6 5 2 5 5 10 7 3 3 4 1
5 least connected regions:
90 112 123 237 670 with 1 link
1 most connected region:
585 with 68 links
Weights style: W
Weights constants summary:
n nn S0 S1 S2
W 774 599076 774 130.1845 3126.732
The weight of the first polygon’s neighbors
[[1]]
[1] 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302
[7] 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302
[13] 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302
[19] 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302
[25] 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302
[31] 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302
[37] 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302
[43] 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302
[49] 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302
[55] 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302 0.01587302
[61] 0.01587302 0.01587302 0.01587302
Using Moran’s | for Global Spatial Autocorrelation
Moran I test under randomisation
data: nga_wp$`wpt non-functional`
weights: wm73_lw
Moran I statistic standard deviate = 22.517, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.3273418398 -0.0012936611 0.0002130078
Computing Monte Carlo Moran’s |
set.seed(1234)
MC_NF= moran.mc(nga_wp$`wpt non-functional`,
listw=wm73_lw,
nsim=999,
zero.policy = TRUE,
na.action=na.omit)
MC_NF
Monte-Carlo simulation of Moran I
data: nga_wp$`wpt non-functional`
weights: wm73_lw
number of simulations + 1: 1000
statistic = 0.32734, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
set.seed(1234)
MC_F= moran.mc(nga_wp$`wpt functional`,
listw=wm73_lw,
nsim=999,
zero.policy = TRUE,
na.action=na.omit)
MC_F
Monte-Carlo simulation of Moran I
data: nga_wp$`wpt functional`
weights: wm73_lw
number of simulations + 1: 1000
statistic = 0.50517, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
From the result Moran | value is > 0,the overall spatial autocorrelation tend to be similarly clustered. The p value is 0.001 which is statically significant. Therefore, null hypothesis should be rejected. There is sign of clustering.
Visualising Monte Carlo simulation of Moran’s |
Cluster and Outlier Analysis
Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
1 0.43151603 -7.191834e-04 0.008120236 4.7966255 1.613609e-06
2 0.27475350 -2.904635e-04 0.003338620 4.7601280 1.934703e-06
3 0.69235062 -8.956670e-04 0.345864093 1.1787856 2.384836e-01
4 0.05590525 -3.884365e-04 0.029702941 0.3266329 7.439455e-01
5 0.33277612 -3.884365e-04 0.004984321 4.7190630 2.369335e-06
6 0.05909213 -4.231402e-05 0.001519106 1.5172126 1.292130e-01
Mapping local Moran’s I value
Moran_NF <- tm_shape(nga_wp.localMI) +
tm_fill("white") +
tm_borders("grey", lwd = 0.5, alpha = 0.5) +
tm_shape(nga_wp.localMI[nga_wp.localMI$Pr.Ii.nf <= 0.05,]) +
tm_fill(col = "Ii",
style = "pretty",
palette = "RdBu",
title = "local moran statistics") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Local Moran I value for p < 0.05 (Non Functional)",
main.title.size = 0.7,
main.title.fontface = "bold",
title = "Fixed Distance Weight",
title.size = 0.6,
legend.height = 0.25,
legend.width = 0.35)
Moran_F <- tm_shape(nga_wp.localMI) +
tm_fill("white") +
tm_borders("grey", lwd = 0.5, alpha = 0.5) +
tm_shape(nga_wp.localMI[nga_wp.localMI$Pr.Ii.f <= 0.05,]) +
tm_fill(col = "Ii.1",
style = "pretty",
palette = "RdBu",
title = "local moran statistics") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Local Moran I value for p < 0.05 (Functional)",
main.title.size = 0.7,
main.title.fontface = "bold",
title = "Fixed Distance Weight",
title.size = 0.6,
legend.height = 0.25,
legend.width = 0.35)
tmap_arrange(Moran_NF, Moran_F, asp = 1, ncol = 2)
Mapping local Moran’s I p-values
p_value_nf <- tm_shape(nga_wp.localMI) +
tm_fill(col = "Pr.Ii.nf",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Blues",
title = "local Moran's I p-values") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "P-value (Non-Functional)",
main.title.size = 0.7,
main.title.fontface = "bold",
title = "Fixed Distance Weight",
title.size = 0.6,
legend.height = 0.25,
legend.width = 0.35)
p_value_f <- tm_shape(nga_wp.localMI) +
tm_fill(col = "Pr.Ii.f",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Greens",
title = "local Moran's I p-values" ) +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "P-value (Functional)",
main.title.size = 0.7,
main.title.fontface = "bold",
title = "Fixed Distance Weight",
title.size = 0.6,
legend.height = 0.25,
legend.width = 0.35)
tmap_arrange(p_value_nf, p_value_f, asp=1, ncol=2)
Importing Geospatial Data
Importing water point geospatial data
Mapping both local Moran’s I values and p-values
Moran_NF <- tm_shape(nga_wp.localMI) +
tm_fill("white") +
tm_borders("grey", lwd = 0.5, alpha = 0.5) +
tm_shape(nga_wp.localMI[nga_wp.localMI$Pr.Ii.nf <= 0.05,]) +
tm_fill(col = "Ii",
style = "pretty",
palette = "RdBu",
title = "local moran statistics") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Local Moran I value for p < 0.05 (Non Functional)",
main.title.size = 0.7,
main.title.fontface = "bold",
title = "Fixed Distance Weight",
title.size = 0.6,
legend.height = 0.25,
legend.width = 0.35)
Moran_F <- tm_shape(nga_wp.localMI) +
tm_fill("white") +
tm_borders("grey", lwd = 0.5, alpha = 0.5) +
tm_shape(nga_wp.localMI[nga_wp.localMI$Pr.Ii.f <= 0.05,]) +
tm_fill(col = "Ii.1",
style = "pretty",
palette = "RdBu",
title = "local moran statistics") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Local Moran I value for p < 0.05 (Functional)",
main.title.size = 0.7,
main.title.fontface = "bold",
title = "Fixed Distance Weight",
title.size = 0.6,
legend.height = 0.25,
legend.width = 0.35)
p_value_nf <- tm_shape(nga_wp.localMI) +
tm_fill(col = "Pr.Ii.nf",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Blues",
title = "local Moran's I p-values") +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "P-value (Non-Functional)",
main.title.size = 0.7,
main.title.fontface = "bold",
title = "Fixed Distance Weight",
title.size = 0.6,
legend.height = 0.25,
legend.width = 0.35)
p_value_f <- tm_shape(nga_wp.localMI) +
tm_fill(col = "Pr.Ii.f",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Greens",
title = "local Moran's I p-values" ) +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "P-value (Functional)",
main.title.size = 0.7,
main.title.fontface = "bold",
title = "Fixed Distance Weight",
title.size = 0.6,
legend.height = 0.25,
legend.width = 0.35)
tmap_arrange(Moran_NF, Moran_F, p_value_nf, p_value_f, asp=1, ncol=2)
LISA Cluster Map
Plotting Moran scatterplot
Preparing LISA map classes
nci2 <- moran.plot(nga_wp$Z.nonfunc, wm73_lw,
labels=as.character(nga_wp$ADM2_EN),
xlab="z-Non-functional Water Point",
ylab="Spatially Lag z-Non-functional Water Point")
scale waterpoint
Filter all the area with p-value<0.05 and identify significant areas.