In an effort to increase standardization across the PyTorch ecosystem Facebook AI in a recent blog post told that they would be leveraging Facebook’s open-source Hydra framework to handle configs, and also offer an integration with PyTorch Lightning. This post is about Hydra.
If you are reading this post then I assume you are familiar with what are config files, why are they useful, and how they increase reproducibility. And you also know what a nightmare is argparse. …
Recently, a lot of research has focused on building neural networks that can learn to reason. Some examples include simulating particle physics , coming up with mathematical equations from data .
Figure 1, shows an example of a reasoning task. In these tasks, we have to learn the fundamental properties of the environment given some data. In this example, the model has to first learn the meaning of the words furthest, color, pair.
Many reasoning tasks can be formalized as graph problems and message passing  has been shown to be a key component to modern graph…
This was originally posted on my personal blog at https://kushajveersingh.github.io/post_008/
You can check my website at https://kushajveersingh.github.io/ as a reference of what the site looks like.
Note:- I am not an expert in website building. I focus on deep learning research, so there may be some things that may not be needed or are not optimal. But in the end I got the site running which is what matter the most for me.
My system info
Mish activation function is proposed in Mish: A Self Regularized Non-Monotonic Neural Activation Function paper. The experiments conducted in the paper shows it achieves better accuracy than ReLU. Also, many experiments have been conducted by the fastai community and they were also able to achieve better results than ReLU.
Mish is defined as
x * tanh(softplus(x)) or by the below equation.
PyTorch implementation of Mish activation function is given below.
To build upon this activation function let’s first see the plot of the function.
Multi-Sample Dropout introduced in the paper Multi-Sample Dropout for Accelearted Training and Better Generalization is a new way to expand the traditional Dropout by using multiple dropout masks for the same mini-batch.
The original dropout creates a randomly selected subset (called a dropout sample) from the input in each training iteration while the multi-sample dropout creates multiple dropout samples. The loss is calculated for each sample, and then the losses are averaged to obtain the final loss.
The paper shows that multi-sample dropout significantly accelerates training by reducing the number of iterations until convergence for image classification tasks using the…
The following papers by Leslie N. Smith are covered
In this post, I would focus on all of the theoretical knowledge you need for the latest trends in NLP. I made this reading list as I learned new concepts. In the next post, I would share the things that I use to practice these concepts including fine-tuning and models from rank 1 on competition leaderboards. Use this link to get to part 2 (Still to make).
For the resources, I include papers, blogs, videos.
It is not necessary to read most of the stuff. Your main goal should be to understand that in this paper this thing was introduced…
It follows from the paper High-Resolution Network for Photorealistic Style Transfer. I discuss the paper details and the pytorch code. My code implementation can be found in this repo. The official code release for the paper can be found here.
Use this model as your de-facto model for style transfer.
We have two images as input one is content image and the other is style image.
Nvidia released a new paper Semantic Image Synthesis with Spatially-Adaptive Normalization. The official code can be found here(PyTorch), my implementation can be found here (PyTorch).
Nvidia has been pushing the state-of-the-art in GANs for quite some time now. Their earlier work pix2pixHD on which this paper pushed it even further. To give some motivation for this paper, see the demo released by Nvidia.
Recently a new normalization technique is proposed not for the activations but for the weights themselves in the paper Weight Standardization.
In short, to get new state of the art results, they combined Batch Normalization and Weight Standardization. So in this post, I discuss what is weight standardization and how it helps in the training process, and I will show my own experiments on CIFAR-10 which you can also follow along.
The notebook for the post is at this link. For my experiments, I will use cyclic learning. As the paper discusses training with constant learning rates, I would use…