Multi-Sample Dropout: Method that reduces the training time by 4 times
Multi-Sample Dropout introduced in the paper Multi-Sample Dropout for Accelearted Training and Better Generalization is a new way to expand the traditional Dropout by using multiple dropout masks for the same mini-batch.

The original dropout creates a randomly selected subset (called a dropout sample) from the input in each training iteration while the multi-sample dropout creates multiple dropout samples. The loss is calculated for each sample, and then the losses are averaged to obtain the final loss.
The paper shows that multi-sample dropout significantly accelerates training by reducing the number of iterations until convergence for image classification tasks using the old way of training neural networks i.e. using a constant learning rate and decaying it. So I test this method for cyclic learning and see if I can reproduce the results from the paper.
Note:- If you are not familiar with cyclic learning I wrote a jupyter notebook explaining the 4 key papers that introduced all the techniques by Leslie N. Smith, Reproducing Leslie N. Smith’s papers using fastai.
Table of Contents:
- Load CIFAR-100 (initially I test using CIFAR-100)
- Resnet-56
- How to implement multi-sample dropout in model
- Diversity among samples is needed
- Code for Multi-Sample Dropout
- Code for Multi-Sample Dropout Loss function
- Get baseline without Multi-Sample Dropout
All the code and detailed discussion on the topic has been covered in this jupyter notebook.
I am thinking about shifting to jupyter notebooks for these tutorials, as it is easier to experiment in the notebooks.
So if you want to follow along and know when I am making a new notebook, you can check https://kushajveersingh.github.io/notebooks/. I will keep updating the list as I make new notebooks.
I will still make Medium posts but most of the paper implementations will be done in the Jupyter notebooks.