Difference between revisions of "R: tidyr"
Jump to navigation
Jump to search
Rafahsolis (talk | contribs) |
Rafahsolis (talk | contribs) |
||
| Line 1: | Line 1: | ||
== tidyr Package == | == tidyr Package == | ||
http://vita.had.co.nz/papers/tidy-data.pdf | http://vita.had.co.nz/papers/tidy-data.pdf | ||
| + | |||
<source lang="rsplus"> | <source lang="rsplus"> | ||
# gather() | # gather() | ||
| Line 10: | Line 11: | ||
If we have a dataset with multiple variables stored in one column, for example:<br /> | If we have a dataset with multiple variables stored in one column, for example:<br /> | ||
grade, male_1, female_1, male_2, female_2, are colums storing grades for two diferent classes. | grade, male_1, female_1, male_2, female_2, are colums storing grades for two diferent classes. | ||
| + | |||
<source lang="rsplus"> | <source lang="rsplus"> | ||
res <- gather(students2, sex_class, count, -grade) | res <- gather(students2, sex_class, count, -grade) | ||
| Line 28: | Line 30: | ||
2 Sally final C <NA> C <NA> <NA></nowiki> | 2 Sally final C <NA> C <NA> <NA></nowiki> | ||
| + | This code will tidy the data: | ||
| + | <source lang="rsplus"> | ||
| + | students3 %>% | ||
| + | gather(class, grade, class1:class5, na.rm = TRUE) %>% | ||
| + | print | ||
| + | </source> | ||
| + | spread()<br /> | ||
| + | Spreads a colum into its variables. | ||
| + | <source lang="rsplus"> | ||
| + | students3 %>% | ||
| + | gather(class, grade, class1:class5, na.rm = TRUE) %>% | ||
| + | spread(test, grade) %>% | ||
| + | print | ||
| + | </source> | ||
| + | To change the values from clas1, clas2... to 1, 2... | ||
<source lang="rsplus"> | <source lang="rsplus"> | ||
students3 %>% | students3 %>% | ||
gather(class, grade, class1:class5, na.rm = TRUE) %>% | gather(class, grade, class1:class5, na.rm = TRUE) %>% | ||
| + | spread(test, grade) %>% | ||
| + | mutate(class= extract_numeric(class)) %>% | ||
print | print | ||
</source> | </source> | ||
Revision as of 14:28, 23 April 2015
tidyr Package
http://vita.had.co.nz/papers/tidy-data.pdf
# gather()
# Having a dataset "students" with colums "grade", "male", "female"
gather(students, sex, count, -grade)
# Will return a dataset with the colums grade, sex, count
If we have a dataset with multiple variables stored in one column, for example:
grade, male_1, female_1, male_2, female_2, are colums storing grades for two diferent classes.
res <- gather(students2, sex_class, count, -grade)
separate(res, col=sex_class, into = c("sex", "class"))
# or using pipelines:
students2 %>%
gather( sex_class, count, -grade) %>%
separate( col=sex_class, into = c("sex", "class")) %>%
print
If there are variables stored in both rows ad colums:
> students3
name test class1 class2 class3 class4 class5
1 Sally midterm A <NA> B <NA> <NA>
2 Sally final C <NA> C <NA> <NA>
This code will tidy the data:
students3 %>%
gather(class, grade, class1:class5, na.rm = TRUE) %>%
print
spread()
Spreads a colum into its variables.
students3 %>%
gather(class, grade, class1:class5, na.rm = TRUE) %>%
spread(test, grade) %>%
print
To change the values from clas1, clas2... to 1, 2...
students3 %>%
gather(class, grade, class1:class5, na.rm = TRUE) %>%
spread(test, grade) %>%
mutate(class= extract_numeric(class)) %>%
print