Performance - the most effective way to group data frames - data, Effective, frame, grouping, method, performance

Can anyone suggest a more efficient way to group data frames without using SQL/indexing/data.table option?

I am looking for similar problems, this one suggests indexing options.

The following is the method of timing subsets.

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Subset and time< br />system.time(x <- dat[dat$x> 500, ])
# user system elapsed 
# 0.092 0.000 0.090 
system.time(x <- dat[which (dat$x> 500), ])
# user system elapsed 
# 0.040 0.032 0.070 
system.time(x <- subset(dat, x> 500))
 # user system elapsed 
# 0.108 0.004 0.109

Edit:
As Roland suggested I use microbenchmark. It seems to perform best.

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x> 500, ],
 dat[which(dat$ x> 500), ],
 subset(dat, x> 500))
#plot
autoplot.microbenchmark(res)

As Roland suggested I use microbenchmark. Which seems to perform best.

library("ggplot2")
 library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000)) 

#Benchmark
res <- microbenchmark( dat[dat$x> 500, ],
 dat[which(dat$x> 500), ],
 subset(dat, x> 500))
#plot
autoplot.microbenchmark(res)

Can anyone suggest a more effective way to Group data frames without using SQL/indexing/data.table option?

I am looking for similar problems, this one suggests indexing options.

The following is the method of timing subsets.

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Subset and time< br />system.time(x <- dat[dat$x> 500, ])
# user system elapsed 
# 0.092 0.000 0.090 
system.time(x <- dat[which (dat$x> 500), ])
# user system elapsed 
# 0.040 0.032 0.070 
system.time(x <- subset(dat, x> 500))
 # user system elapsed 
# 0.108 0.004 0.109

Edit:
As Roland suggested I use microbenchmark. It seems to perform best.

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x> 500, ],
 dat[which(dat$ x> 500), ],
 subset(dat, x> 500))
#plot
autoplot.microbenchmark(res)

As Roland suggested I use microbenchmark. Which seems to perform best .

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat $x> 500, ],
 dat[which(dat$x> 500), ],
 subset(dat, x> 500))
#plot
autoplot.microbenchmark (res)

data, Effective, frame, grouping, method, performance

WordPress database error: [Table 'yf99682.wp_s6mz6tyggq_comments' doesn't exist]
SELECT SQL_CALC_FOUND_ROWS wp_s6mz6tyggq_comments.comment_ID FROM wp_s6mz6tyggq_comments WHERE ( comment_approved = '1' ) AND comment_post_ID = 2643 ORDER BY wp_s6mz6tyggq_comments.comment_date_gmt ASC, wp_s6mz6tyggq_comments.comment_ID ASC

Performance – the most effective way to group data frames

Leave a Comment Cancel reply