My original phone is slow, but it works:
df.median<-ddply(exp,
.(groupname,starttime,fPhase,fCycle),
numcolwise(median),
na.rm=TRUE)
Using idata.frame, error: is.data.frame( df) is not TRUE
library(plyr)
df.median<-ddply(idata.frame(exp),
.(groupname,starttime ,fPhase,fCycle),
numcolwise(median),
na.rm=TRUE)
So, I thought, maybe this is my data. So I tried baseball data Set. The idata.frame example works fine: dlply(idata.frame(baseball),”id”,nrow) but if I try a call similar to what I want using baseball, it doesn’t work:
bb.median<-ddply(idata.frame(baseball),
.(id,year,team),
numcolwise(median),
na. rm=TRUE)
>Error: is.data.frame(df) is not TRUE
Maybe my mistake lies in how I specify the grouping? Does anyone know how to make my example work?
ETA:
I also tried:
groupVars <- c("groupname","starttime","fPhase ","fCycle")
voi<-c('inadist','smldist','lardist')
i<-idata.frame(exp)
ag. median <- aggregate(i[,voi], i[,groupVars], median)
Error in i[, voi]: object of type'environment' is not subsettable
It uses more Fast way to get the median, but gives different errors. I think I don’t know how to use idata.frame at all.
In particular, lapply(.SD,FUN) and .SDcols Parameter with
Set data.table
library(data.table)
DT <- as.data.table(exp)
iexp <- idata.frame(exp)
Which columns are numbers
numeric_columns <- names(which(unlist(lapply(DT, is .numeric))))
dt.median <- DT[, lapply(.SD, median), by = list(groupname, starttime, fPhase,
fCycle), .SDcols = numeric_columns]
Some benchmark tests
library(rbenchmark)
benchmark(data.table = DT[ , lapply(.SD, median), by = list(groupname, starttime,
fPhase, fCycle), .SDcols = numeric_columns],
plyr = ddply(exp, .(groupname, starttime, fPhase, fCycle), numcolwise(median), na .rm = TRUE),
idataframe = ddply(exp, .(groupname, starttime, fPhase, fCycle), function(x) data.frame(inadist = median(x$inadist),
smldist = median(x$smldist), lardist = median(x$lardist), inadur = median(x$inadur),
smldur = median(x$smldur), lardur = median(x$lardur), emptyct = median (x$emptyct),
entct = median(x$entct), inact = median(x$inact), smlct = median(x$smlct),
larct = median(x$larct), na.rm = TRUE)),
aggregate = aggregate(exp[, numeric_columns],
exp[, c("groupname", "starttime", "fPhase", "fCycle")], < br /> median),
replications = 5)
## test replications elapsed relative user.self
## 4 aggregate 5 5.42 1.789 5.30
## 1 data.table 5 3.03 1.000 3.03
## 3 idataframe 5 11.81 3.898 11.77
## 2 plyr 5 9.47 3.125 9.45
I use a name in R Large data frame for exp (file here). In order to improve performance, it is recommended that I view the idata.frame() function from plyr. But I think I am wrong.
My original call, speed Slow, but effective:
df.median<-ddply(exp,
.(groupname,starttime,fPhase,fCycle),
numcolwise(median ),
na.rm=TRUE)
Using idata.frame, error: is.data.frame(df) is not TRUE
library(plyr)
df.median<-ddply(idata.frame(exp),
.(groupname,starttime,fPhase,fCycle),
numcolwise(median),
na.rm=TRUE)
So, I thought, maybe this is my data. So I tried the baseball dataset. The idata.frame example works fine: dlply(idata.frame(baseball), "Id",nrow) But if I try a call using baseball similar to what I want, it doesn't work:
bb.median<-ddply(idata.frame (baseball),
.(id,year,team),
numcolwise(median),
na.rm=TRUE)
>Error: is.data.frame(df) is not TRUE
Maybe my mistake lies in how I specify the grouping? Does anyone know how to make my example work?
ETA:
I also tried:
groupVars <- c("groupname","starttime","fPhase ","fCycle")
voi<-c('inadist','smldist','lardist')
i<-idata.frame(exp)
ag. median <- aggregate(i[,voi], i[,groupVars], median)
Error in i[, voi]: object of type'environment' is not subsettable
It uses more Fast way to get the median, but gives different errors. I don’t think I know how to use idata.frame at all.
Given that you are using "Big" data and looking for performance, this seems to be very suitable for data.table.
Especially lapply(.SD,FUN) and .SDcols parameters with
Set data. table
library(data.table)
DT <- as.data.table(exp)
iexp <- idata.frame(exp)< /pre>Which columns are numbers
numeric_columns <- names(which(unlist(lapply(DT, is.numeric)))))
dt.median <- DT[, lapply(.SD, median), by = list(groupname, starttime, fPhase,
fCycle), .SDcols = numeric_columns] pre>Some benchmark tests
library(rbenchmark)
benchmark(data.table = DT[, lapply(.SD, median), by = list (groupname, starttime,
fPhase, fCycle), .SDcols = numeric_columns],
plyr = ddply(exp, .(groupname, starttime, fPhase, fCycle), numcolwise(median), na.rm = TRUE),
idataframe = ddply(exp, .(groupname , starttime, fPhase, fCycle), function(x) data.frame(inadist = median(x$inadist),
smldist = median(x$smldist), lardist = median(x$lardist), inadur = median (x$inadur),
smldur = median(x$smldur), lardur = median(x$lardur), emptyct = median(x$emptyct),
entct = median(x$entct), inact = median(x$inact), smlct = median(x$smlct),
larct = median(x$larct), na.rm = TRUE)),
aggregate = aggregate(exp[, numeric_columns],
exp[, c("groupname", "starttime", "fPhase", "fCycle")],
median),
replications = 5)
< br />## test replications elapsed relative user.self
## 4 aggregate 5 5.42 1.789 5.30
## 1 data.table 5 3.03 1.000 3.03
## 3 idataframe 5 11.81 3.898 11.77
## 2 plyr 5 9.47 3.125 9.45