blog

18

What is cross validation?

• 0

Cross validation (CV) is a technique to test the effectiveness of a statistical model. To perform CV we need to keep aside a sample/portion of the data on which is do not use to train the model, later us this sample for testing/validating. Below are the few common techniques used for CV.

**1. Train_Test Split approach.**

In this approach the complete data is split into training and test sets. The model is trained on the training set and tested using the test data. If the data is limited, we would miss a lot of information on the data which are not used in training.

**2. K-Folds Cross Validation**

K-Fold ensures that every observation from the original dataset has the chance of appearing in training and test set. This is very appealing when data is limited. This method can be described as follows

- Split the entire data randomly into k folds. The higher value of K leads to less biased model (but large variance might lead to overfit), where as the lower value of K is similar to the train-test split approach we saw before.
- Fit the model using the k-1 folds and calculate the performance (error, ROC, etc) using the remaining Kth fold.
- Repeat this process for every K-fold as a test set. Then take the average performance scores. That will be the performance metric for the model

```
from sklearn.model_selection import KFold # import KFold
x = np.array([[1, 3], [3, 10], [3, 4], [4, 8], [5, 7], [6, 7]])
y = np.array([1, 5, 8, 9, 10, 15])
kf = KFold(n_splits = 2)
kf.get_n_splits(x)
for train_index, test_index in kf.split(x):
print('train_index =', train_index, 'test_index =', test_index)
```

output

```
train_index = [3 4 5] test_index = [0 1 2]
train_index = [0 1 2] test_index = [3 4 5]
```

As you can see, the function split the original data into different subsets of the data

• 0

prerequisite

related

Add a Comment

Seller:

Amazon

Description

Python Machine learning

Seller:

Amazon

Description

Deep Learning (Adaptive Computation and Machine Learning series)

You may also be interested in

Cox proportional hazard model and its applications in financial modeling

What is cross validation?

Some basic properties of markov chain

Principal component analysis (PCA) explained

Commonly used heavy tail distributions

The Bayes rule and its applications

Adaboost Algorithm Explained

A simple binning approach for probability of default modeling

Common methods to calculate confidence band for binomial distribution

What is new in Python 3.7?

×

×