Tidymodels for beginners
For this lesson i prepared a decision chart that will help people who are looking to start using the tidymodels framework but don’t know where to begin from understand it from a starter point of view and i will also answer some popular tidymodels related questions i have come across online.
I have seen people complaining about how they want to start using the tidymodels r framework but haven’t really found tutorials that breaks things down enough to their level. If you are in that category of people then this lesson is for you. R is a great programming language that continues to evolve, one of the things that attracts people to R is its simplicity and how there is always a package for anything/anytask.
It might interest you to know that the brain behind R’s popular machine learning package “caret” is also part of the team behind the tidymodels framework.
The diagram below is an overview chart i created to give you a general understanding of the tidymodels workflow

- the
recipes
package is an alternative method for creating and pre-processing design matrices that can be used for modeling or visualization - the
rsample
package contains a set of functions to create different types of resamples and corresponding classes for their analysis - the
dials
package provides a framework for defining, creating and managing tunning parameters for modeling - the
tunes
package contains functions and classes to be used in conjunction with othertidymodels
packages for finding reasonable values of hyper-parameters in models parsnip
is a collection of modeling packages designed with common APIs and shared philosophyyardstick
is a package to estimate how well models are working using tidy data principles
Some popular tidymodels related questions i have come across online:
- what is tidymodes?
- How many packages are in the tidymodels framework?
- How can i install tidymodels?
- Is tidymodels better than caret?
- Can bootstrap resampling be done with tidymodels?
- How fast is tidymodels for modeling?
Answers:
WHAT IS TIDYMODELS
Tidymodels is a collection of r packages for modeling and machine learning using tidyverse principles.
HOW MANY PACKAGES ARE IN THE TIDYMODELS FRAMEWORK
As at the time of this write up there are 15 packages in tidymodels v0.1.1
HOW CAN I INSTALL TIDYMODELS
Like any other r package available on cran tidymodels can be installed using install.packages("tidymodels")
or using devtools.
IS TIDYMODELS BETTER THAN CARET
People tend to ask this particular question a lot. The caret package is much older and has more functionalities when compared to tidymodels, but with the pace at which the tidymodels team are regularly updating the framework with new packages and functionalities i think it would become the preferred choice for most people soon.
CAN BOOTSTRAP RESAMPLING BE DONE WITH TIDYMODELS
Yes, bootstrap resampling can be done with the rsample
package inside the tidymodels framework.
HOW FAST IS TIDYMODELS FOR MODELING
tidymodels framework supports the use of multiple cores for processing, however, one still needs to load packages like foreach
, doMC
, doSNOW
, doParallel
independently to register the parallel backend.
EXAMPLE
Now that we have a general understanding of the tidymodels framework let us experiment with a simple example. We will use the popular mtcars dataset for this example. We will build a linear regression model the tidymodels way, Our goal is to predict fuel consumption (mpg) given displacement (disp).
Install package
install.packages("tidymodels") # skip if you have already installed it..
Load package for use
library(tidymodels)
## -- Attaching packages ---------------------------------------------------------------------- tidymodels 0.1.1 --
## v broom 0.7.0 v recipes 0.1.13
## v dials 0.0.9 v rsample 0.0.8
## v dplyr 1.0.2 v tibble 3.0.3
## v ggplot2 3.3.2 v tidyr 1.1.2
## v infer 0.5.3 v tune 0.1.1
## v modeldata 0.0.2 v workflows 0.2.0
## v parsnip 0.1.3 v yardstick 0.0.7
## v purrr 0.3.4
## -- Conflicts ------------------------------------------------------------------------- tidymodels_conflicts() --
## x purrr::discard() masks scales::discard()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x recipes::step() masks stats::step()
Load the dataset
data("mtcars") # the dataset is available in R
Pre-process
The only pre-processing we will do is to remove all other variables from the dataset and keep only variables of interest “mpg”, “disp”
Create a recipe and remove some variables with the step_rm
function
prep_rec <- recipe(mpg~., data = mtcars) %>%
step_rm(cyl, hp, drat, wt, qsec, vs, am, gear, carb) %>%
prep()
See what our prep_rec looks like
prep_rec
## Data Recipe
##
## Inputs:
##
## role #variables
## outcome 1
## predictor 10
##
## Training data contained 32 data points and no missing data.
##
## Operations:
##
## Variables removed cyl, hp, drat, wt, qsec, vs, am, gear, carb [trained]
Notice the “Variables removed” part indicating all the variables we got rid of
See what the pre-processed data looks like with the juice()
function
mtcars_juiced <- juice(prep_rec)
head(mtcars_juiced)
## # A tibble: 6 x 2
## disp mpg
## <dbl> <dbl>
## 1 160 21
## 2 160 21
## 3 108 22.8
## 4 258 21.4
## 5 360 18.7
## 6 225 18.1
juice()
is used to extract pre-processed data out of a prepared recipe
Fit model
my_model <- linear_reg() %>%
set_engine("lm") %>%
fit(mpg~., data = mtcars_juiced)
Make prediction
(predicted <- my_model %>% predict(mtcars_juiced) %>% mutate(actual = mtcars_juiced$mpg)) %>% head()
## # A tibble: 6 x 2
## .pred actual
## <dbl> <dbl>
## 1 23.0 21
## 2 23.0 21
## 3 25.1 22.8
## 4 19.0 21.4
## 5 14.8 18.7
## 6 20.3 18.1
View metric
predicted %>% metrics(truth = actual, estimate = .pred)
## # A tibble: 3 x 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 rmse standard 3.15
## 2 rsq standard 0.718
## 3 mae standard 2.61
Wrap-up
The example above is only to give you a glimpse of how powerful and easy the tidymodels framework is, there is much more that can be done with the packages that it compounds, the idea was just to get you started and show its beautiful interface..
Don’t forget to share this lesson to others if you find it helpful… Thanks.