300.Watts300.Watts
Posts Tags About EN / ็น
300.Watts300.Watts
PostsTagsAbout
EN / ็น
Morris Tai
Morris Tai
Rust ยท Systems ยท Infra notes
  • ๐Ÿ“ Canada / Taiwan
  • ๐ŸŒ™ Night owl ยท GMT-4
  • GitHub โ†—
  • X โ†—
  • Email
Now
Playing
Survive in Canada
Building
GPU Platform
Thinking
What is consciousness?

There is only one heroism in the world: to see the world as it is, and to love it.

โ€” Romain Rolland
17 Aug 2017

Notes on the Kaggle Titanic Stacking Model

2017-08-17

While reading through Kaggle kernels for the Titanic challenge, many of them use SVM, RandomForest, LogisticRegression, etc. What makes this particular kernel interesting is that it builds a model from six different learners:
Introduction to Ensembling/Stacking in Python Using data from Titanic: Machine Learning from Disaster
At Level 1 it uses RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier, ExtraTreesClassifier, and SVM, and at Level 2 it uses XGBoost. I sketched the overall flow of the model to make it easier to understand โ€” the raw source is hard to parse quickly. The author cleverly uses classes to keep the notebook code clean, which also makes it easier to modify and organize later.

 Kaggle, Stacking
05 Aug 2017

Pandas cut and qcut Functions

2017-08-05

When we have continuous numerical values, we can discretize them using cut and qcut. The cut function bins values by numeric intervals, while qcut bins them by quantiles. In other words, cut produces bins of equal length, while qcut produces bins of equal size. The cut function Suppose we have the ages of a group of people:
ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32, 101]
If we want to discretize this list into “18 to 25”, “25 to 35”, “35 to 60”, and “60 and above”, we can use the cut function:

 Pandas, Cut, Qcut
  • 1
  • 2
  • 3
2018 - 2026 Morris | CC BY-NC 4.0