What is R?
R is an open-source programming language developed for statistical analysis, data science and graphical representation of data.
Ross Ihaka and Robert Gentleman developed R at the University of Auckland in the 1990s to perform statistical analysis on data.
R is popular because of:
- Extensive libraries: Specialized libraries such as dplyr, data.table and reader for data wrangling and manipulation
- Data visualization: Packages and tools for developing high-quality graphs and plots
Since R is open-source, anyone can use it, fix bugs and add packages and enhancements. Packages are an important reason behind R's popularity.
What is a package?
Think of packages as reusable blocks 🔁. Instead of copying and pasting a piece of code, you can create a package for it (save time and also share your code with others who might need it!).
R packages contain:
- Functions (aka the reusable blocks)
- Documentation (to help you understand how to use the package)
- Sample data
R packages are also easy to share. Since R is supported by a large and active community of data scientists, statisticians and analysts focused on improving and maintaining R, there are plenty of packages available on the CRAN.
CRAN (Comprehensive R Archive Network) is a network of servers around the world that store updated versions of R code, packages and documentation.
Who uses R?
R is a platform for statisticians and data scientists (aka the humans of data) to perform data cleaning, analysis and visualization. It is widely used in data mining, statistical analysis, data science, machine learning and academia.
Major organizations worldwide such as Google, Mozilla, The New York Times, TechCrunch and Accenture use R for data analysis.
How can you get started with R?
Installing R and R Studio
- To download R, visit CRAN and choose the appropriate version based on your operating system (Linus, Mac OS or Windows).
- To write programs in R, you need an IDE (Integrated Development Environment) called RStudio. Download and install it from here.
Installing R packages
After setting up R and RStudio, you should install some R packages. The tidyverse contains packages you need to perform almost every analysis such as ggplot2, dplyr tibble, tidyr, readr, purrr, stringr and forcats.
To install tidyverse, run the following command:
After installing the packages, run the following command to load them:
You're all set to start programming in R!
To learn more about using R in data science, read the free ebook titled R for Data Science by Garrett Grolemund and Hadley Wickham.
Think we're missing something? 🧐 Help us update this article by sending us your suggestions here. 🙏