Using R: quickly calculating summary statistics (with dplyr)
dplyr是屬於ddply套件,要使用前請先安裝。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## The code for the toy data is exactly the same | |
data <- data.frame(sex = c(rep(1, 1000), rep(2, 1000)), | |
treatment = rep(c(1, 2), 1000), | |
response1 = rnorm(2000, 0, 1), | |
response2 = rnorm(2000, 0, 1)) | |
## reshape2 still does its thing: | |
library(reshape2) | |
melted <- melt(data, id.vars=c("sex", "treatment")) | |
## This part is new: | |
library(dplyr) | |
grouped <- group_by(melted, sex, treatment) | |
summarise(grouped, mean=mean(value), sd=sd(value)) |
這是一個簡單的範例,首先使用group by將data分成兩群,注意在輸入欄位時不需要加上任何引號。group by的結果會長得像這樣:
分組的data仍然是一個data frame,但是分組資訊已經包含在其中。下一個指令(summarise),會將欄位依分組聚集(aggregate),聚集指令可依您的需求而定,這邊使用的是平均數和標準差:
從上面的例子可以看得出來dplyr提供了比較簡潔的指令。當然我們也可以換種表示方式:
第一個指令沒什麼特別的。但是第二種就需要注意一下,dplyr使用了 %.% 這個計算符號來將符號左邊的function放入右邊的function中計算。melted指的是我們要用來的分析的data,透過%.% 放入了group by中運算;group by的結果再放入summarise中運算,算是另外一種表示巢狀計算方式。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Source: local data frame [4,000 x 4] | |
Groups: sex, treatment, variable | |
sex treatment variable value | |
1 1 1 response1 -0.15668214 | |
2 1 2 response1 -0.40934759 | |
3 1 1 response1 0.07103731 | |
4 1 2 response1 0.15113270 | |
5 1 1 response1 0.30836910 | |
6 1 2 response1 -1.41891407 | |
7 1 1 response1 -0.07390246 | |
8 1 2 response1 -1.34509686 | |
9 1 1 response1 1.97215697 | |
10 1 2 response1 -0.08145883 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Source: local data frame [8 x 5] | |
Groups: sex, treatment | |
sex treatment variable mean sd | |
1 1 1 response1 0.021856280 1.0124371 | |
2 1 1 response2 0.045928150 1.0151670 | |
3 1 2 response1 -0.065017971 0.9825428 | |
4 1 2 response2 0.011512867 0.9463053 | |
5 2 1 response1 -0.005374208 1.0095468 | |
6 2 1 response2 -0.051699624 1.0154782 | |
7 2 2 response1 0.046622111 0.9848043 | |
8 2 2 response2 -0.055257295 1.0134786 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
summarise(group_by(melted, sex, treatment, variable), | |
mean=mean(value), sd=sd(value)) | |
melted %.% group_by(sex, treatment, variable) %.% | |
summarise(mean=mean(value), sd=sd(value)) |
沒有留言:
張貼留言