**A Bayesian Perspective on the Deep Image Prior**
[Zezhou Cheng](http://people.cs.umass.edu/~zezhoucheng), [Matheus Gadelha](http://mgadelha.me), [Subhransu Maji](http://people.cs.umass.edu/~smaji/), [Daniel Sheldon](https://people.cs.umass.edu/~sheldon/)
_University of Massachusetts - Amherst_
The [deep image prior](https://dmitryulyanov.github.io/deep_image_prior) was recently introduced as a prior for natural images. It represents images as the output of a convolutional network with random inputs. For “inference”, gradient descent is performed to adjust network parameters to make the output match observations. This approach yields good performance on a range of image reconstruction tasks. We show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to infinity, and derive the corresponding kernel. This informs a Bayesian approach to inference. We show that by conducting posterior inference using stochastic gradient Langevin we avoid the need for early stopping, which is a drawback of the current approach, and improve results for denoising and impainting tasks. We illustrate these intuitions on a number of 1D and 2D signal reconstruction tasks.
Publication
==========================================================================================
**A Bayesian Perspective on the Deep Image Prior**

Zezhou Cheng, Matheus Gadelha, Subhransu Maji, Daniel Sheldon

Computer Vision and Pattern Recognition (CVPR), 2019

[arXiv](https://arxiv.org/abs/1904.07457), [pdf](./gp-dip.pdf), [supplementary](./gp-dip-supp.pdf), [poster](./gp-dip-poster.pdf), [bibtex](./gp-dip.bib)
Code
==========================================================================================
[GitHub Link](https://github.com/ZezhouCheng/GP-DIP)
Main Discovery
==========================================================================================
**1. Deep Image Prior (DIP) is asymptotically equivalent to a stationary Gaussian Process (GP) prior**
* **We derive the analytical form of the GP kernel and analyze the effect of convolution, upsampling, downsampling and skip connections in the resulting GP kernel.**
![Priors and posterior with 1D convolutional networks. The covariance function $\cos\theta_{t_1, t_2} = K(t_1 − t_2)/K(0)$ for the (a) AutoEncoder and (b) Conv architectures estimated empirically for different values of depth and input covariance. For the Conv architecture we also compute the covariance function analytically shown as dashed lines in panel (b). The empirical estimates were obtained with networks with 256 filters. Panel (c) shows samples from the prior of the Conv architecture with two different configurations, and panel (d) shows the posterior means and variances estimated using SGLD.](figs/1D.png)
* **The samples drawn from the DIP and GP prior with equivalent stationary kernel are shown below.**
* **The posterior mean estimated by the SGD with the DIP matches the GP posterior mean as the number of channels in the network increases. However posterior inference with long-tail GP kernels is slow for large images compared to SGD inference of the DIP.**
![Inpainting with a Gaussian process (GP) and deep image prior (DIP). Top (a) Comparison of the Radial basis function (RBF) kernel with the length scale learned on observed pixels in (c) and the stationary DIP kernel. Bottom (a) PSNR of the GP posterior with the DIP kernel and DIP as a function of the number of channels. DIP approaches the GP performance as the number of channels increases from 16 to 512. (d - f) Inpainting results (with the PSNR values) from GP with the RBF (GP RBF) and DIP (GP DIP) kernel, as well as the deep image prior. The DIP kernel is more effective than the RBF.](figs/GP-DIP.png)
**2. SGLD: a Bayesian inference method for deep image prior**
* **Inference with SGD requires early stopping since the MSE with respect to the input eventually goes to zero, thus overfitting to the noise. SGLD on the other hand doesn't and samples from the posterior provide a notion of uncertainty.**
![Denoising and inpainiting results with the deep image prior. (a) Mean Squared Error (MSE) of the
inferred image with respect to the noisy input image as a function of iteration for two different noise levels. SGD converges to zero MSE
resulting in overfitting while SGLD roughly converges to the noise level in the image. This is also illustrated in panel (b) where we plot
the MSE of SGD and SGLD as a function of the noise level $\sigma^2$ after convergence. (c) An inpainting result where parts of the image
inside the blue boundaries are masked out and inferred using SGLD with the deep image prior. (d) An estimate of the variance obtained from
posterior samples visualized as a heat map. Notice that the missing regions near the top left have lower variance as the area is uniform.](figs/abstract.png)
* **SGLD performs better than vanilla gradient descent on image denoising and inpainting tasks. The PSNR for various images are shown below. See paper for details.**
Input SGD SGLD
| | House | Peppers | Lena | Baboon | F16 | Kodak1 | Kodak2 | Kodak3 | Kodak12 | Avg. |
|:------:|:-------:|:---------:|:-------:|:--------:|:-------:|:--------:|:--------:|:--------:|:---------:|:---------:|
| SGD | 26.74 | 28.42 | 29.17 | 23.50 | 29.76 | 26.61 | 28.68 | 30.07 | 29.78 | 28.08 |
| SGLD | **30.86** | **30.82** | **32.05** | **24.54** | **32.90** | **27.96** | **32.05** | **33.29** | **32.79** | **30.81** |
[Image denoising task]
Input SGD (19.23 dB) SGLD (21.86 dB)
| | Barb. | Boat | House | Lena | Peppers | C.man | Couple | Finger | Hill | Man | Mont. | Avg. |
|:----:|:-------:|:-----:|:-----:|:-----:|:-------:|:-----:|:------:|:------:|:-----:|:-----:|:-------:|:-------:|
| SGD | 28.48 | 31.54 | 35.34 | 35.00 | 30.40 | 27.05 | 30.55 | 32.24 | 31.37 | 31.32 | 30.21 | 28.08 |
| SGLD | **33.82** | **34.26** | **40.13** | **37.73** | **33.97** | **30.33** | **33.72** | **33.41** | **34.03** | **33.54** | **34.65** | **34.51** |
[Image inpainting task]
Acknowledgements
==========================================================================================
Thiss research was supported in part by NSF grants #1749833, #1749854, and #1661259, and the MassTech Collaborative for funding the UMass GPU cluster.