I must admit that I am a big fan of Hadley Wickham and his packages. So the moment I heard that his new readr() package was out on CRAN, I decided to check it out.
What I thought would be exciting to do was compare the file read times of readr()’s read_csv() function with that of data.table()’s fread() and base R function read.csv(). To do this, I chose my linux machine with 4 GB Ram and a very old core II duo processor.
I imported a 67.3 MB csv file using the above mentioned functions. The read times and the code used are below
library(readr) library(data.table) setwd(“/media/ramius/E2A02905A028E1B1/Work/Jigsaw Academy”) #Read time for read.csv() pt<-proc.time() data<-read.csv(“telecom.csv”) proc.time()-pt
## user system elapsed ## 15.063 0.071 15.140
#Data stored as data.frame class(data)
## [1] “data.frame”
#Read time for readr()’s read_csv() pt1<-proc.time() data1<-read_csv(“telecom.csv”) proc.time()-pt1
## user system elapsed ## 2.449 0.040 2.489
#Data stored as data.frame class(data1)
## [1] “tbl_df” “tbl” “data.frame”
#Read time for data.table()’s fread() pt2<-proc.time() data2<-fread(input = “telecom.csv”) proc.time()-pt2
## user system elapsed ## 1.604 0.040 1.644
#Data stored as data.table class(data2)
## [1] “data.table” “data.frame”
As one can see the file read times were lowest for data.table()’s fread(). (No surprises there!!!) Also worth noting that read_csv() is upto 5 times faster than read.csv().
According to Hadley, https://github.com/hadley/readr , readr is fast but is not as fast as fread(). The question then is why should one even bother about using readr()? Simple answer, everything that is read by readr() functions such as read_csv() is data.frame wrapped as a tbl_df. On the other hand fread() will produce a data.table. How does that matter? Think data manipulation, the way dataframes (including tbl_df) behave is quite different from how a data.table would behave. And if you are already using packages like dplyr() and ggplot2() that work on data frames, then using readr() is probably better than loading data through fread().
It is expected that just like his earlier packages such as dplyr() and ggplot2(), readr() will also become an integral part of any data analyst’s workflow. Life for R users would have been very dull had Hadley decided to remain just an academic!!!!
Long live R and “Hadleyverse”
The whole code for this .rmd file can be accessed from https://github.com/Gunnvant/RFiles/blob/master/readr%20comparison.Rmd
Suggested Reads:
Want to use R, but are stuck because your Data Set is too large? We have a solution
Stringi Package in R
Fill in the details to know more
Important Artificial Intelligence Tools
October 31, 2022
Top 28 Data Analytics Tools For Data Analysts | UNext
September 27, 2022
May 5, 2022
Best Frameworks In Java You Should Know In 2021
May 5, 2021
Lean Management Tools: An Ultimate Overview For 2021
May 4, 2021
Talend ETL: An Interesting Guide In 4 Points
Add your details:
By proceeding, you agree to our privacy policy and also agree to receive information from UNext through WhatsApp & other means of communication.
Upgrade your inbox with our curated newletters once every month. We appreciate your support and will make sure to keep your subscription worthwhile