-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
151 lines (114 loc) · 4.72 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
output: github_document
always_allow_html: true
editor_options:
markdown:
wrap: 72
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
message = FALSE,
warning = FALSE,
fig.retina = 2,
fig.align = 'center'
)
```
# portawaterperu
<!-- badges: start -->
[![License: CC BY
4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
[![R-CMD-check](https://github.com/openwashdata/portawaterperu/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/openwashdata/portawaterperu/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
The goal of the package `portawaterperu` is to provide access to data about community portable water systems in Peru. The data is collected from SIASAR database consisted of information and surveys about water catchments, storage system, treatment, distribution networks and maintainance.
## Installation
You can install the development version of portawaterperu from
[GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("openwashdata/portawaterperu")
```
```{r}
## Run the following code in console if you don't have the packages
## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))
library(dplyr)
library(knitr)
library(readr)
library(stringr)
library(gt)
library(kableExtra)
```
Alternatively, you can download the individual datasets as a CSV or XLSX
file from the table below.
```{r, echo=FALSE, message=FALSE, warning=FALSE}
extdata_path <- "https://github.com/openwashdata/portawaterperu/raw/main/inst/extdata/"
read_csv("data-raw/dictionary.csv") |>
distinct(file_name) |>
dplyr::mutate(file_name = str_remove(file_name, ".rda")) |>
dplyr::rename(dataset = file_name) |>
mutate(
CSV = paste0("[Download CSV](", extdata_path, dataset, ".csv)"),
XLSX = paste0("[Download XLSX](", extdata_path, dataset, ".xlsx)")
) |>
knitr::kable()
```
## Data
The package provides access to one dataset `portawaterperu`.
```{r}
library(portawaterperu)
```
### portawaterperu
The dataset `portawaterperu` contains data abour portable water system from 32 communities in Peru. It has
`r nrow(portawaterperu)` observations and `r ncol(portawaterperu)`
variables
```{r}
portawaterperu |>
head(3) |>
gt::gt() |>
gt::as_raw_html()
```
For an overview of the variable names, see the following table.
```{r echo=FALSE, message=FALSE, warning=FALSE}
readr::read_csv("data-raw/dictionary.csv") |>
dplyr::filter(file_name == "portawaterperu.rda") |>
dplyr::select(variable_name:description) |>
knitr::kable() |>
kableExtra::kable_styling("striped") |>
kableExtra::scroll_box(height = "200px")
```
## Example
```{r}
library(portawaterperu)
library(ggplot2)
# Provide some example code here
portawaterperu |>
#dplyr::filter(stringr::str_starts(divisiones, "AMAZONAS")) |>
#dplyr::group_by(divisiones) |>
#dplyr::summarise(mean = mean(pob_servida)) |>
ggplot(aes(y = pop_serviced, color = type_gravity))+
geom_boxplot(outliers = F)+
labs(title = "Population served given different gravity types",
y= "Population") +
theme_classic()
```
## Capstone Project
This dataset is shared as part of a capstone project in Data Science for openwashdata. For more information about the project and to explore further insights, please visit the project page at https://ds4owd-001.github.io/project-laurenjudah/ (to be public available)
## Methodology
The data was obtained from @SIASAR, an information system containing data on rural water supply and sanitation services. Using SIASAR's "download data by country" tool, all available data for Peru (10 excel files) were downloaded. After examining the 10 excel files, only 5 pertained to potable water systems. Those 5 data sets were imported into R and subsequently empty values and unnecessary columns were deleted from them. Finally, the 5 data sets were combined into 1 data frame based on community ID. The combined, cleaned data set contains data from 32 communities.
SIASAR does not provide a matching data dictionary. openwashdata developer went through the attachements of the original questionnaire:
- https://globalsiasar.org/es/content/documentacion-tecnica
- https://globalsiasar.org/en/content/technical-documentation
The variable description is written with our best guess with the information from the attachements.
## License
Data are available as
[CC-BY](https://github.com/openwashdata/portawaterperu/blob/main/LICENSE.md).
## Citation
Please cite this package using:
```{r}
citation("portawaterperu")
```