> df <- data.frame(subgroup =rep(paste("s",1:3, sep=""), times=3),
feature=c(rep("a",6), rep("b",3)) ,
var=rep(1:3, each=3),
data=c(rnorm(3,1), rnorm(3,2), rnorm(3,0)))
> df
subgroup feature var data
1 s1 a 1 1.53152620
2 s2 a 1 1.25476445
3 s3 a 1 1.04221040
4 s1 a 2 1.68913400
5 s2 a 2 1.48290273
6 s3 a 2 1.62871854
7 s1 b 3 0.05278296
8 s2 b 3 -0.66623654
9 s3 b 3 -1.40006454
< p>I want to check the sum of the “data” column of each combination of feature-var existing in my data set. More precisely, I want to get TRUE when the sum is greater than 3, otherwise FALSE:
< p>
> result
feature snp res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE
I Try to use “aggregate” or “pass”, but can’t make them meet my needs. Any ideas? Thanks in advance.
library(plyr)
ddply(df, c("feature", "var"), summarize, res = ifelse(sum(data)> 3,TRUE, FALSE))
The result is:
feature var res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE
Another way is to use data.table, which should provide Some performance advantages:
library(data.table)
dt <- data.table(df)
dt[, ifelse( sum(data)> 3, TRUE, FALSE), by = c("feature", "var")]
feature var V1
[1,] a 1 TRUE
[2,] a 2 TRUE
[3,] b 3 FALSE
This is a small example to illustrate my data:
> df <- data.frame(subgroup=rep(paste("s",1:3, sep=""), times=3),
feature=c(rep("a",6), rep("b",3)),
var=rep(1:3, each=3),
data=c(rnorm (3,1), rnorm(3,2), rnorm(3,0)))
> df
subgroup feature var data
1 s1 a 1 1.53152620
2 s2 a 1 1.25476445
3 s3 a 1 1.04221040
4 s1 a 2 1.68913400
5 s2 a 2 1.48290273
6 s3 a 2 1.62871854
7 s1 b 3 0.05278296
8 s2 b 3 -0.66623654
9 s3 b 3 -1.40006454
I want to check the sum of the “data” column of each combination of feature-var existing in my data set. More precisely, I want to get TRUE when the sum is greater than 3, otherwise FALSE:
> result
feature snp res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE
I try Use “aggregate” or “pass”, but can’t make them meet my needs. Any ideas? Thanks in advance.
One way is to use the plyr function ddply to group features and var. You can use the summary function to create a new data.frame, which The columns correspond to the rules you developed.
library(plyr)
ddply(df, c("feature", "var"), summarize , res = ifelse(sum(data)> 3,TRUE, FALSE))
The result is:
feature var res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE
Another method is to use data.table, which should provide some performance advantages:
< /p>
library(data.table)
dt <- data.table(df)
dt[, ifelse(sum(data)> 3, TRUE, FALSE), by = c("feature", "var")]
feature var V1
[1,] a 1 TRUE
[2,] a 2 TRUE
[3,] b 3 FALSE