Section 10: Random Forests & Boosted Trees

Sections 9 and 10 are on tree-based methods. There are three main methods:

Each of these methods stems from the basic decision tree algorithm. Fundamentally, tree-based methods rely on the ability to split data based on information from features. Require a mathematical definition of information and the ability to measure it.

Classification and Regression Tree (CART) introduces many concepts:

  • Cross validation of Trees
  • Pruning Trees
  • Surrogate Splits
  • Variable Importance Scores
  • Search for Linear Splits