**Working Paper Series, Sveriges Riksbank (Central Bank of Sweden)**
# No 297:

SPEEDING UP MCMC BY EFFICIENT DATA SUBSAMPLING

*Matias Quiroz *()

*, Mattias Villani *()

* and Robert Kohn*
**Abstract:** The computing time for Markov Chain Monte Carlo (MCMC)
algorithms can be prohibitively large for datasets with many observations,
especially when the data density for each observation is costly to
evaluate. We propose a framework where the likelihood function is estimated
from a random subset of the data, resulting in substantially fewer density
evaluations. The data subsets are selected using an efficient Probability
Proportional-to-Size (PPS) sampling scheme, where the inclusion probability
of an observation is proportional to an approximation of its contribution
to the log-likelihood function. Three broad classes of approximations are
presented. The proposed algorithm is shown to sample from a distribu- tion
that is within O(m^-1/2) of the true posterior, where m is the subsample
size. Moreover, the constant in the O(m^-1/2) error bound of the likelihood
is shown to be small and the approximation error is demonstrated to be
negligible even for a small m in our applications. We propose a simple way
to adaptively choose the sample size m during the MCMC to optimize sampling
efficiency for a fixed computational budget. The method is applied to a
bivariate probit model on a data set with half a million observations, and
on a Weibull regression model with random effects for discrete-time
survival data.

**Keywords:** Bayesian inference; Markov Chain Monte Carlo; Pseudo-marginal MCMC; Big Data; Probability Proportional-to-Size sampling; Numerical integration.; (follow links to similar papers)

**JEL-Codes:** C11; C13; C15; C55; C83; (follow links to similar papers)

46 pages, March 1, 2015

Before downloading any of the electronic versions below
you should read our statement on
copyright.

Download GhostScript
for viewing Postscript files and the
Acrobat Reader for viewing and printing pdf files.

**Full text versions of the paper:**

rap_wp297_150330.pdf

Download Statistics

Questions (including download problems) about the papers in this series should be directed to Lena Löfgren ()

Report other problems with accessing this service to Sune Karlsson ()
or Helena Lundin ().

Programing by

Design by Joachim Ekebom