Difference between revisions of "R: tidyr"

From RHS Wiki
Jump to navigation Jump to search
(Created page with "== tidyr Package == http://vita.had.co.nz/papers/tidy-data.pdf <source lang="rsplus"> # gather() # Having a dataset "students" with colums "grade", "male", "female" gather(stu...")
 
Line 27: Line 27:
 
1  Sally midterm      A  <NA>      B  <NA>  <NA>
 
1  Sally midterm      A  <NA>      B  <NA>  <NA>
 
2  Sally  final      C  <NA>      C  <NA>  <NA></nowiki>
 
2  Sally  final      C  <NA>      C  <NA>  <NA></nowiki>
 +
 +
<source lang="rsplus">
 +
students3 %>%
 +
  gather(class, grade, class1:class5, na.rm = TRUE) %>%
 +
  print
 +
</source>

Revision as of 14:06, 23 April 2015

tidyr Package

http://vita.had.co.nz/papers/tidy-data.pdf

# gather()
# Having a dataset "students" with colums "grade", "male", "female"
gather(students, sex, count, -grade)
# Will return a dataset with the colums grade, sex, count

If we have a dataset with multiple variables stored in one column, for example:
grade, male_1, female_1, male_2, female_2, are colums storing grades for two diferent classes.

res <- gather(students2, sex_class, count, -grade)
separate(res, col=sex_class, into = c("sex", "class"))

# or using pipelines:
students2 %>%
  gather( sex_class, count, -grade) %>%
  separate( col=sex_class, into = c("sex", "class")) %>%
  print

If there are variables stored in both rows ad colums:

> students3
    name    test class1 class2 class3 class4 class5
1  Sally midterm      A   <NA>      B   <NA>   <NA>
2  Sally   final      C   <NA>      C   <NA>   <NA>
students3 %>%
  gather(class, grade, class1:class5, na.rm = TRUE) %>%
  print