Learn everything you need to know on how to use Hydra in your machine learning projects. All features of Hydra are discussed with a dummy ML example.

In an effort to increase standardization across the PyTorch ecosystem Facebook AI in a recent blog post told that they would be leveraging Facebook’s open-source Hydra framework to handle configs, and also offer an integration with PyTorch Lightning. This post is about Hydra.

If you are reading this post then I assume you are familiar with what are config files, why are they useful, and how they increase reproducibility. And you also know what a nightmare is argparse. …

paper, my implementation

Recently, a lot of research has focused on building neural networks that can learn to reason. Some examples include simulating particle physics [1], coming up with mathematical equations from data [2].

Figure 1: What are the colors of the furthest pair of objects?

Figure 1, shows an example of a reasoning task. In these tasks, we have to learn the fundamental properties of the environment given some data. In this example, the model has to first learn the meaning of the words furthest, color, pair.

Many reasoning tasks can be formalized as graph problems and message passing [3] has been shown to be a key component to modern graph…

This was originally posted on my personal blog at https://kushajveersingh.github.io/post_008/

You can check my website at https://kushajveersingh.github.io/ as a reference of what the site looks like.

Note:- I am not an expert in website building. I focus on deep learning research, so there may be some things that may not be needed or are not optimal. But in the end I got the site running which is what matter the most for me.

My system info

  1. Ubuntu 20.04 LTS
  2. Ghost 3.15.3
  3. Yarn 1.22.4
  4. Nodejs 12.16.3

Short summary of what we are going to do.

  • Install Ghost locally from source
  • Use default casper theme to make the website
  • Generate a…

paper, official code, fastai discussion thread, my notebook

Mish activation function is proposed in Mish: A Self Regularized Non-Monotonic Neural Activation Function paper. The experiments conducted in the paper shows it achieves better accuracy than ReLU. Also, many experiments have been conducted by the fastai community and they were also able to achieve better results than ReLU.

Mish is defined as x * tanh(softplus(x)) or by the below equation.

PyTorch implementation of Mish activation function is given below.

To build upon this activation function let’s first see the plot of the function.

Multi-Sample Dropout introduced in the paper Multi-Sample Dropout for Accelearted Training and Better Generalization is a new way to expand the traditional Dropout by using multiple dropout masks for the same mini-batch.

The original dropout creates a randomly selected subset (called a dropout sample) from the input in each training iteration while the multi-sample dropout creates multiple dropout samples. The loss is calculated for each sample, and then the losses are averaged to obtain the final loss.

The paper shows that multi-sample dropout significantly accelerates training by reducing the number of iterations until convergence for image classification tasks using the…

The following papers by Leslie N. Smith are covered

  1. A disciplined approach to neural network hyper-parameters: Part 1 — learning rate, batch size, momentum, and weight decay. paper
  2. Super-Convergence: Very Fast Training of Neural Networks Using Learning Rates. paper
  3. Exploring loss function topology with cyclical learning rates. paper
  4. Cyclical Learning Rates for Training Neural Networks. paper

Although the main aim is to reproduce the papers but a lot of research has been done since than and thus where needed I would change some things to match the state of the art practices. …

In this post, I would focus on all of the theoretical knowledge you need for the latest trends in NLP. I made this reading list as I learned new concepts. In the next post, I would share the things that I use to practice these concepts including fine-tuning and models from rank 1 on competition leaderboards. Use this link to get to part 2 (Still to make).

For the resources, I include papers, blogs, videos.

It is not necessary to read most of the stuff. Your main goal should be to understand that in this paper this thing was introduced…

It follows from the paper High-Resolution Network for Photorealistic Style Transfer. I discuss the paper details and the pytorch code. My code implementation can be found in this repo. The official code release for the paper can be found here.

Use this model as your de-facto model for style transfer.


  1. What is Style Transfer?
  2. So why another paper?
  3. Gram Matrix
  4. High-Resolution Models
  5. Style Transfer Details
  6. Hi_Res Generation Network
  7. Loss Functions
  8. Difficult Part
  9. Conclusion

What is Style Transfer?

We have two images as input one is content image and the other is style image.

Nvidia released a new paper Semantic Image Synthesis with Spatially-Adaptive Normalization. The official code can be found here(PyTorch), my implementation can be found here (PyTorch).

Nvidia has been pushing the state-of-the-art in GANs for quite some time now. Their earlier work pix2pixHD on which this paper pushed it even further. To give some motivation for this paper, see the demo released by Nvidia.

The original demo released by Nvidia.

Table of Contents

  1. What is Semantic Image Synthesis:- Brief overview of the field.
  2. New things in the paper
  3. How to train my model?:- How Semantic Image Synthesis models work
  4. Then I dive into the…

Recently a new normalization technique is proposed not for the activations but for the weights themselves in the paper Weight Standardization.

In short, to get new state of the art results, they combined Batch Normalization and Weight Standardization. So in this post, I discuss what is weight standardization and how it helps in the training process, and I will show my own experiments on CIFAR-10 which you can also follow along.

Fig1; Taken from the paper. Shows a clear comparison of all normalizations.

The notebook for the post is at this link. For my experiments, I will use cyclic learning. As the paper discusses training with constant learning rates, I would use…

Kushajveer Singh

Deep Learning Researcher with interest in Computer Vision and Natural Language Processing https://kushajveersingh.github.io/blog/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store