Honglei Xie

Computational Inference in Logistic Regression

January 11, 2016 | 2 Minute Read

Logistic regression is one of the most commonly used techniques to analyze binary data. The classical method to estimate the parameters is through Newton-Rapson. Here I’m demonstrating the alternative method: Bayesian method (MCMC) and make a comparison based on a real dataset.

The dataset sediment is composed of 1884 sediment samples. Samples were divided into 2 classes on the basis of their toxicity (the toxic class is coded as 1 and non toxic class is coded as 0) and described by 9 chemical variables. Since we are interested in making statistical inference on parameters, so we only used the training set (1414 samples).

To build the benchmark model, I used glm function in R to estimate parameters. We know that Newton-Rapson (NR) algorithm is used in this function. However, NR suffers from limitations such as it requires the Hessian matrix being non-singular and the initial values having much effect on the convergence rate. Thus, I considered implementing Markov Chain Monte Carlo (MCMC) to estimate the parameters instead.

Proposed model

\[log(\frac{p_i}{1-p_i})=x_i' \beta=\beta_0+\beta_1x_{i1}+\beta_2x_{i2}+...+\beta_9x_{i,9}\]

\[Y_i{\sim}Bernoulli(p_i)\quad L(\boldsymbol{\beta})= \prod\limits_{i=1}^n {p_i}^ {y_i} + (1-p_i)^{(1-y_i)}\] \[ l(\boldsymbol{\beta})=\sum\limits_{i=1}^n y_i log(p_i)+(1-y_i)log(1-p_i)\]

We can prove that \(Var(\boldsymbol{\beta})=(X'WX)^{-1}\) where \(W\) is a diagonal matrix of weights with entries \[w_{ii}=\widehat{\mu}(n_i-\widehat{\mu_i})/n_i\] Thus, \(log(P(\beta \vert Y,X))=\sum\limits_{i=1}^n y_i log(p_i)+(1-y_i)log(1-p_i)\)

Everytime we draw from the \(\boldsymbol{\beta^{(t)}}\) that comes from \(N(\boldsymbol{\beta^{(t-1)}},\mathbf{H}^{-1})\) where \(\mathbf{H}=X'WX\). Here we applied Metropolis-Hastings algorithm to sample \(\boldsymbol{\beta}\), which means that acceptance = min\((1,T(\beta^{(t-1)} \vert \beta^{\ast}) P(\beta ^{\ast})\Big/T(\beta^{\ast} \vert \beta^{(t-1)})P(\beta ^{(t-1)}))\). Then we generate \(U{\sim}Unif(0,1)\), if \(U<\) acceptance, \(\beta^{(t)}=\beta^{\ast}\) otherwise \(\beta^{(t)}=\beta^{(t-1)}\).

To avoid the the non-invertable matrix problem, we used Cholesky decomposition in our codes; and it turned out to be \(H=R'R\quad \beta^{\ast}=R^{-1}q+\beta^{(t-1)}\) where \(q{\sim}N(\mathbf{0},\mathbf{I_k})\)

Results

There are the results in MCMC implemented in Matlab and Newton-Rapson approach in glm function in R. If interested, please check out the code.

Thoughts of Mine

  • Estimates in logistic regression using Newton-Rapson and MCMC algorithm are very close to each other. It turns out that deterministic method and Bayesian method have a pretty good agreement in this case.
  • The likelihood function converges after about 200 iterations which shows parameters estimated by MCMC are stable.
  • MCMC is essentially a Bayesian method; so it has a few nice properties; for example, it can handle complex models that deterministic methods cannot solve, at the expense of high computational cost and relatively slow convergence rate though.
  • Newton-Rapson is considered as the best method when making inferences in ordinary logistic regression since it is pretty fast and it has great accuracy. I demonstrated that we actually have alternative approach to estimate parameters in the framework of logistic regression instead of routine Newton-Rapson. Moreover, when it comes to the generalized linear mixed effects model with non-identity link where there is no closed form for the full likelihood function, Bayesian method seems to be the only choice.