Performance – the most effective way to group data frames

Can anyone suggest a more efficient way to group data frames without using SQL/indexing/data.table option?

I am looking for similar problems, this one suggests indexing options.

The following is the method of timing subsets.

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Subset and time< br />system.time(x <- dat[dat$x> 500, ])
# user system elapsed
# 0.092 0.000 0.090
system.time(x <- dat[which (dat$x> 500), ])
# user system elapsed
# 0.040 0.032 0.070
system.time(x <- subset(dat, x> 500))
# user system elapsed
# 0.108 0.004 0.109

Edit:
As Roland suggested I use microbenchmark. It seems to perform best.

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x> 500, ],
dat[which(dat$ x> 500), ],
subset(dat, x> 500))
#plot
autoplot.microbenchmark(res)

As Roland suggested I use microbenchmark. Which seems to perform best.

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x> 500, ],
dat[which(dat$x> 500), ],
subset(dat, x> 500))
#plot
autoplot.microbenchmark(res)

Can anyone suggest a more effective way to Group data frames without using SQL/indexing/data.table option?

I am looking for similar problems, this one suggests indexing options.

The following is the method of timing subsets.

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Subset and time< br />system.time(x <- dat[dat$x> 500, ])
# user system elapsed
# 0.092 0.000 0.090
system.time(x <- dat[which (dat$x> 500), ])
# user system elapsed
# 0.040 0.032 0.070
system.time(x <- subset(dat, x> 500))
# user system elapsed
# 0.108 0.004 0.109

Edit:
As Roland suggested I use microbenchmark. It seems to perform best.

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x> 500, ],
dat[which(dat$ x> 500), ],
subset(dat, x> 500))
#plot
autoplot.microbenchmark(res)

As Roland suggested I use microbenchmark. Which seems to perform best .

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat $x> 500, ],
dat[which(dat$x> 500), ],
subset(dat, x> 500))
#plot
autoplot.microbenchmark (res)

Leave a Comment

Your email address will not be published.