diff --git a/images/data_structures.png b/images/data_structures.png new file mode 100644 index 00000000..260b6a2c Binary files /dev/null and b/images/data_structures.png differ diff --git a/slide_r_elements_2.Rmd b/slide_r_elements_2.Rmd index 4813f2f0..0c1f8809 100644 --- a/slide_r_elements_2.Rmd +++ b/slide_r_elements_2.Rmd @@ -1,7 +1,7 @@ --- title: "Introduction To Programming in R (2)" subtitle: "R Foundations for Data Analysis" -author: "Marcin Kierczak, Sebastian DiLorenzo" +author: "Marcin Kierczak, Sebastian DiLorenzo, Guilherme Dias" keywords: bioinformatics, course, scilifelab, nbis, R output: xaringan::moon_reader: @@ -58,31 +58,40 @@ name: contents name: cplx_data_str ## Complex data structures -Using the previously discussed basic data types (`numeric`, `integer`, `logical` and `character`) one can construct more complex data structures: --- +Using the basic data types (`numeric`, `logical` and `character`) one can construct more complex data structures: + +
+
+-- + +.pull-left-50[ +![](images/data_structures.png) +] -dim | Homogenous | Heterogenous +.pull-right-50[ + +dimensions | Homogenous | Heterogenous ----|------------|----------------- -0d | n/a | n/a -1d | vectors | list -2d | matrices | data frame -nd | arrays | n/a +0 | n/a | n/a +1 | vectors | list +2 | matrices | data frame +n | arrays | n/a -- factors – special type +] --- name: atomic_vectors ## Atomic vectors -An *atomic vector*, or simply a *vector*, is a one dimensional data structure (a sequence) of elements of the same data type. +An *atomic vector*, or simply a *vector*, is a sequence of elements of the same data type. We build vectors using the function `c()` (combine). ```{r vector, echo=T} -vec <- c(7,2.3,4,12) +vec <- c(1, 2, 3) vec ``` -In R, even a single number is a one-element vector. You have to get used to think in terms of vectors... +In R, even a single number is a one-element vector. Get used to think in terms of vectors... --- name: atomic_vectors2 @@ -102,17 +111,20 @@ name: combining_vectors ## Combining two or more vectors Vectors can easily be combined: ```{r vec.comb, echo=T} -v1 <- c(1,3,5,7.56) +v1 <- c(1,2,3) v2 <- c('a','b','c') -v3 <- c(0.1, 0.2, 3.1415) +v3 <- c('do','re','mi') c(v1, v2, v3) ``` -Please note that after combining vectors, all elements became character. It is called a *coercion*. +Note that after combining numbers with characters, all elements became character. + +This is called a **coercion**. --- name: basic_vect_arithm ## Basic vector arithmetics +We can perform operations on vectors: ```{r vec.artihmetics, echo=T} v1 <- c(1, 2, 3, 4) v2 <- c(7, -9, 15.2, 4) @@ -128,10 +140,12 @@ name: recycling_rule ## Vectors – recycling rule ```{r vec.recycling, echo=T} v1 <- c(1, 2, 3, 4, 5) -v2 <- c(1, 2) +v2 <- c(0, 1) v1 + v2 ``` -Values in the shorter vector will be **recycled** to match the length of the longer one: v2 <- c(1, 2, 1, 2, 1) +Values in the shorter vector will be **recycled** (repeated) to match the length of the longer one. + +In this case, `v2 <- c(0, 1)` becomes `v2 <- c(0, 1, 0, 1, 0)` so that it can be added to v1. --- name: vec_indexing @@ -151,11 +165,16 @@ name: vec_indexing2 ## Vectors – indexing cted. And what happens if we want to retrieve elements outside the vector? ```{r vec.index.beyond, echo=T} +vec <- c('a', 'b', 'c', 'd', 'e') vec[0] # R counts elements from 1 -vec[78] # Index past the length of the vector +vec[10] # Positive index past the length of the vector +vec[-6] # Negative index past the length of the vector ``` -Note, if you ask for an element with index lower than the index of the first element, you will het an empty vector of the sme type as the original vector. -If you ask for an element beyond the vector's length, you get an NA value. +An index of **zero** will result in an empty vector of the same type as the original vector. + +A **positive** index beyond the vector's length will result in an `NA` value. + +A **negative** index beyond the vector's length will result in the full unchanged vector. Basically, R ignores your request. --- name: vec_indexing3 @@ -186,6 +205,7 @@ You can name elements of your vector: vec <- c(23.7, 54.5, 22.7) names(vec) # by default there are no names names(vec) <- c('sample1', 'sample2', 'sample3') +vec vec[c('sample2', 'sample1')] ``` @@ -197,7 +217,7 @@ You can return a vector without certain elements: ```{r vec.rm, echo=T} vec <- c(1, 2, 3, 4, 5) vec[-5] # without the 5-th element -vec[-(c(1,3,5))] # withoutelements 1, 3, 5 +vec[-(c(1,3,5))] # without elements 1, 3, 5 ``` --- @@ -300,12 +320,13 @@ name: seq R provides also a few handy functions to generate sequences of numbers: ```{r seq, echo=T} c(1:5, 7:10) # the ':' operator -(seq1 <- seq(from=1, to=10, by=2)) -(seq2 <- seq(from=11, along.with = seq1)) +seq1 <- seq(from=1, to=10, by=2) +seq(from=11, along.with = seq1) seq(from=10, to=1, by=-2) ``` --- +exclude: true name: printing_brackets ## A detour – printing with `()` @@ -326,8 +347,8 @@ while: --- name: seq2 -## Back to sequences -One may also wish to repeat certain value or a vector n times: +## Repeating sequences +One may also wish to repeat a value or a vector n times: ```{r rep, echo=T} rep('a', times=5) rep(1:5, times=3) @@ -338,7 +359,7 @@ rep(seq(from=1, to=3, by=2), times=2) name: random_seq ## Sequences of random numbers -There is also a really useful function `sample()` that helps with generating sequences of random numbers: +We can use `sample()` to generate sequences of random numbers: ```{r sample, echo=T} # simulate casting a fair dice 10x @@ -357,8 +378,7 @@ Now, let us see how this can be useful. We need more than 10 results. Let's cast ```{r dices, echo=T} # simulate casting a fair dice 10x fair <- sample(x = c(1:6), size=10e3, replace = T) -unfair <- sample(x = c(1:6), size=10e3, replace = T, - prob = myprobs) +unfair <- sample(x = c(1:6), size=10e3, replace = T, prob = myprobs) ``` --- @@ -400,6 +420,7 @@ sum(v1) # sum all the elements ``` --- +exclude: true name: vec_adv2 ## Vectors/sequences – more advanced operations 2 @@ -422,6 +443,7 @@ cummax(v1) # maximum up to i-th element ``` --- +exclude: true name: vec_pairwise_comp ## Vectors/sequences – pairwise comparisons @@ -455,12 +477,14 @@ name: factors ## Factors To work with **nominal** values, R offers a special data type, a *factor*: ```{r factor, echo=T} -vec <- c('giraffe', 'donkey', 'liger', - 'liger', 'giraffe', 'liger') +vec <- c('blue', 'yellow', 'purple', + 'yellow', 'yellow', 'blue') vec.f <- factor(vec) summary(vec.f) ``` -So donkey is coded as 1, giraffe as 2 and liger as 3. Coding is alphabetical. +The levels of a factor are coded alphabetically by default. So blue is coded as 1, purple as 2 and yellow as 3. + +Factors are really just a special type of integer vectors. ```{r factor2, echo=T} as.numeric(vec.f) ``` @@ -469,16 +493,15 @@ as.numeric(vec.f) name: factors2 ## Factors -You can also control the coding/mapping: +You can manually control the coding/mapping of factors and their labels: ```{r factor.coding, echo=T} -vec <- c('giraffe', 'donkey', 'liger', - 'liger', 'giraffe', 'liger') -vec.f <- factor(vec, levels=c('donkey', 'giraffe', - 'liger'), - labels=c('zonkey','Sophie','tigon')) +vec <- c('blue', 'yellow', 'purple', + 'yellow', 'yellow', 'blue') +vec.f <- factor(vec, levels=c('blue', 'purple', 'yellow', 'white'), + labels=c('sea','flower','sun','snow')) summary(vec.f) ``` -A bit confusing, factors... + --- name: ordered_fac @@ -486,9 +509,14 @@ name: ordered_fac ## Ordered To work with ordinal scale (ordered) variables, one can also use factors: ```{r ordinal, echo=T} -vec <- c('tiny', 'small', 'medium', 'large') +vec <- c('small', 'tiny', 'large', 'medium') factor(vec) # rearranged alphabetically -factor(vec, ordered=T) # order as provided +``` +-- +We can control the order: +```{r ordinal2, echo=T} +factor(vec, levels = c('tiny', 'small', 'medium', 'large'), + ordered=TRUE) # ordered as provided in the levels argument ```