In R, how to use “aggregation” or “pass” when not all factors combinations

This is a small example to illustrate my data:

> df <- data.frame(subgroup =rep(paste("s",1:3, sep=""), times=3),
feature=c(rep("a",6), rep("b",3)) ,
var=rep(1:3, each=3),
data=c(rnorm(3,1), rnorm(3,2), rnorm(3,0)))
> df
subgroup feature var data
1 s1 a 1 1.53152620
2 s2 a 1 1.25476445
3 s3 a 1 1.04221040
4 s1 a 2 1.68913400
5 s2 a 2 1.48290273
6 s3 a 2 1.62871854
7 s1 b 3 0.05278296
8 s2 b 3 -0.66623654
9 s3 b 3 -1.40006454

< p>I want to check the sum of the “data” column of each combination of feature-var existing in my data set. More precisely, I want to get TRUE when the sum is greater than 3, otherwise FALSE:

< p>

> result
feature snp res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE

I Try to use “aggregate” or “pass”, but can’t make them meet my needs. Any ideas? Thanks in advance.

One way is to use the plyr function ddply to group features and var. You can Use the summary function to create a new data.frame with columns corresponding to the rules you developed.

library(plyr)
ddply(df, c("feature", "var"), summarize, res = ifelse(sum(data)> 3,TRUE, FALSE))

The result is:

feature var res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE

Another way is to use data.table, which should provide Some performance advantages:

library(data.table)
dt <- data.table(df)

dt[, ifelse( sum(data)> 3, TRUE, FALSE), by = c("feature", "var")]

feature var V1
[1,] a 1 TRUE
[2,] a 2 TRUE
[3,] b 3 FALSE

This is a small example to illustrate my data:

> df <- data.frame(subgroup=rep(paste("s",1:3, sep=""), times=3),
feature=c(rep("a",6), rep("b",3)),
var=rep(1:3, each=3),
data=c(rnorm (3,1), rnorm(3,2), rnorm(3,0)))
> df
subgroup feature var data
1 s1 a 1 1.53152620
2 s2 a 1 1.25476445
3 s3 a 1 1.04221040
4 s1 a 2 1.68913400
5 s2 a 2 1.48290273
6 s3 a 2 1.62871854
7 s1 b 3 0.05278296
8 s2 b 3 -0.66623654
9 s3 b 3 -1.40006454

I want to check the sum of the “data” column of each combination of feature-var existing in my data set. More precisely, I want to get TRUE when the sum is greater than 3, otherwise FALSE:

> result
feature snp res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE

I try Use “aggregate” or “pass”, but can’t make them meet my needs. Any ideas? Thanks in advance.

One way is to use the plyr function ddply to group features and var. You can use the summary function to create a new data.frame, which The columns correspond to the rules you developed.

library(plyr)
ddply(df, c("feature", "var"), summarize , res = ifelse(sum(data)> 3,TRUE, FALSE))

The result is:

feature var res
1 a 1 TRUE
2 a 2 TRUE
3 b 3 FALSE

Another method is to use data.table, which should provide some performance advantages:

< /p>

library(data.table)
dt <- data.table(df)

dt[, ifelse(sum(data)> 3, TRUE, FALSE), by = c("feature", "var")]

feature var V1
[1,] a 1 TRUE
[2,] a 2 TRUE
[3,] b 3 FALSE

Leave a Comment

Your email address will not be published.