< DeepLearning > The Support Vectors in DeepLearning

< DeepLearning > The Support Vectors in DeepLearning

Dose DeepLearning has ‘Support Vectors’?

As we all know, support vectors is a notation in SVM(Support Vector Machine). Support vectors means that the data points in the decision boundary(the Maximum Margin in SVM) are very important in a classification algorithm.

We can train a SVM only with the ‘Support Vectors’ and can achieve the same accuracy with models trained with more data. So does the ‘Support Vectors’ exists in DeepLearning?

Yes! It dose!

Many experiments have be made to testify that there may be many redundant data in your training dataset. Deep models could reach to similar accuracy with a few ‘IMPORTANT’ data points which could be seen as the ‘Support Vectors’.

So, the only question is how to determine which training data point is important and how to define the importance of a data points?

How to determine which training data points is more IMPORTANT?

**In a word, the sample(with a right label) that is hard to be classified by your model is more important. **So we have to make sure how hard it is. There are three ways which from three papers:

  • An Empirical Study of Example Forgetting during Deep Neural Network Learning

Toneva, Mariya and Sordoni, Alessandro and Combes, Remi Tachet des and Trischler, Adam and Bengio, Yoshua and Gordon, Geoffrey J, 2018

This paper finds that the data points that are easily forgotten during the process of training is more likely to be important.

One sample to be forgotten during training means that the sample once be to right classified would be falsely classified in future training iterations. One sample to that is unforgotten during training means that the once the sample is right classificed would never be falsely classified during the training iteration.

This paper demonstrates that by removing the unforgotten data points, your model could maintain a similar testing accuracy compare to be trained with whole training data.

  • Dataset Culling: Towards Efficient Training Of Distillation-Based Domain Specific Models

Yoshioka, Kentaro and Lee, Edward and Wong, Simon and Horowitz, Mark, 2019

To determain which training sample is more importan, this paper designs a loss.

To evaluate how difficult an image is to predict, we develop a confidence loss metric. This loss (shown below) uses the model’s output confidence levels to determine whether data samples are 1) difficult-to-predict and kept or 2) easy and culled away.

Lconf = −x logx * Q + (1 − x) *expx/( expx + 1) + b

Input x is the prediction confidence, b is a constant to set the
intercept to zero, and Q sets the weighting of low-confidence
predictions.

  • Are All Training Examples Created Equal? An Empirical Study

Yoshioka, Kentaro and Lee, Edward and Wong, Simon and Horowitz, Mark, 2019

This paper demonstrates that the importance of a training sample could be measured by computing gradient of the sample during back propagation. The sample with large gradients means that the sample is more important.

< DeepLearning > The Support Vectors in DeepLearning

https://zhengtq.github.io/2019/03/06/dataset-cut/

Author

Billy

Posted on

2019-03-06

Updated on

2021-03-14

Licensed under

Comments