Shota Gugushvili and Frank van der Meulen and Moritz Schauer and Peter Spreij (2019). Fast and scalable non-parametric Bayesian inference for Poisson point processes. RESEARCHERS.ONE, https://www.researchers.one/article/2019-06-6.

We study the problem of non-parametric Bayesian estimation of the intensity function of a Poisson point process. The observations are $n$ independent realisations of a Poisson point process on the interval $[0,T]$. We propose two related approaches. In both approaches we model the intensity function as piecewise constant on $N$ bins forming a partition of the interval $[0,T]$. In the first approach the coefficients of the intensity function are assigned independent gamma priors, leading to a closed form posterior distribution. On the theoretical side, we prove that as $n\rightarrow\infty,$ the posterior asymptotically concentrates around the ``true", data-generating intensity function at an optimal rate for $h$-H\"older regular intensity functions ($0 < h\leq 1$).

In the second approach we employ a gamma Markov chain prior on the coefficients of the intensity function. The posterior distribution is no longer available in closed form, but inference can be performed using a straightforward version of the Gibbs sampler. Both approaches scale well with sample size, but the second is much less sensitive to the choice of $N$.

Practical performance of our methods is first demonstrated via synthetic data examples. We compare our second method with other existing approaches on the UK coal mining disasters data. Furthermore, we apply it to the US mass shootings data and Donald Trump's Twitter data.

October 10, 2019 12:00 am

Dear Shota,

Sorry for such a long delay in sending my feedback; I see that I took me so long that you already have uploaded a revision. After looking at the feedback you got from Professor Samson and your subsequent revision, I think my comments are still relevant, so I'll share with you.

Before getting into details, I want to say that this is a really nice paper. Despite discussing a challenging problem and proposing a non-trivial solution, the presentation is so clear. For those of us who read Bayesian nonparametrics from time to time, I very much appreciate your efforts toward clarity.

In terms of the content, I really liked the first method proposed -- independent gamma priors -- because of how simple everything is. When you have a closed-form for the posterior distribution, it is easy to do computations and to prove posterior concentration results with elementary arguments, e.g., Chebyshev's inequality.

One quick question: Is it right that you can get the advertised rates without an extra logarithmic term? Usually, in these nonparametric problems, the minimax rate (often polynomial in n) gets corrupted by a logarithmic term, but you don't have this here. My guess is that it's because you're effectively assuming the Holder smoothness "h" is KNOWN since the bin number being used depends on that "h". Am I correct?

Anyway, I liked the independent gamma approach so I was sad when you so quickly abandoned it for something (a little) more complicated. My understanding of why you don't stick with the independent gamma is because there was too much sensitivity to the choice of the bin number. Am I right? What if you treat N as unknown, put a prior on it, and try to adapt to it? People often think that this class of models (mixing over a discrete model index) is computationally too expensive, but I haven't experienced that. And, in your case, the conditional posterior, given N, is so simple that you won't have any computational difficulties. I was able to do something like this a while back but in the high-dimensional sparse linear regression setting

http://arxiv.org/abs/1406.7718

In that regression context, people have no mostly given up on discrete model priors, but I found it to work really well, even in problems with (what I consider to be) very high dimensions. My student and I are finishing a paper soon on a similar approach to sparse Gaussian graphical models, where the graph that determines where the non-zero entries in the precision matrix go is sparse in the sense of having relatively few edges. There's some genuinely nonparametric problems like the one you're considering discussed here

http://arxiv.org/abs/1604.05734

I think this would be an interesting thing to try -- we've been able to get adaptive results to unknown structure, e.g., smoothness, very much like what you're trying to do. I'd be happy to discuss more and/or send additional references if you're interested.

I should also say that the approach I've been considering in these contexts is not really "Bayesian" because I'm letting the data inform the prior in a specific way. This could also be of some interest in your Poisson process application.

October 29, 2019 12:00 am

**Summary**. In this manuscript the authors propose two Bayesian procedures to estimate the intensity function of an inhomogeneous Poisson process. The underlying assumption on the intensity function is that it is Hölder continuous. These procedures stem from two choices of priors supported on piecewise constant functions: a) i.i.d. Gamma priors on the values of the constant pieces, and b) Gamma Markov Chain prior on the values of the constant pieces. The former leads to a conjugate posterior and so inference is straightforward; the latter requires sampling to be carried out via Gibbs sampling.

In both cases the number of bins (constant pieces) needs to be selected. The authors derive the marginal likelihood for the prior in a), while for the prior in b) the authors select the number of bins to be fixed and instead put a hyper prior on one of the parameters of the prior to induce regularity.

For the posterior corresponding to the prior in a) the authors establish two theoretical results: they bound the risk of the posterior mean and they derive the posterior contraction rate. For the posterior corresponding to the prior in b) they present no theoretical results, but for both cases they report some numerical experiments.

The numerical results include both simulated data as well three real data examples. For the synthetic data examples the authors exemplify how the estimates perform and report point-wise confidence intervals for the heights of the constant pieces as well. They exemplify how different choices for the number of bins and values of the hyper-parameters affect the results. For the real data examples there are some comparisons with existing methods and some interpretation of the results.

**Clarity**. The manuscript is written in a very clear way, and has appropriate length. The proofs are clear as well. The authors also provide enough context for the problem, and interpretation and intuition behind their approach.

**Originality**. The results are very closely related to already existing results in the literature (and that you cite), particularly those about spline priors: the prior in Assumption 1 is just a spline prior of order 1 (or degree 0.) I don’t think you should omit the results (since the proofs are very direct) but I think a better focus for the paper could have been on uncertainty quantification or as you note in the discussion, on studying the posterior corresponding to the GMC prior which has a correlation structure, or looking into including covariates in the model. Another possibility would be to look into spatial adaptivity, where the h smoothness of λ might depend on x, or perhaps change point detection.

Given how close the Poisson model is to a Gaussian model, and since uncertainty quantification results are known for the Gaussian model, it would have been interesting to report on whether the posterior corresponding to the Gamma prior in the Poisson model provides good uncertainty quantification or not (I suspect it might not because most of the credible intervals seem way to narrow.)

I think that if the posterior corresponding to the gamma prior does not provide good uncertainty quantification, then reporting that a conjugate prior has this intrinsic limitation would have been of interest. The message would then be that conjugate priors (which might seem like a default choice) are actually limited (qua uncertainty quantification.) And if it turns out that they are appropriate, then conditions under which this is true are also something worth reporting.

Having said that, the applications are interesting and well treated.

**Quality**. The presentation is good and polished and the results are presented in a clear fashion. The problem is interesting but somewhat limited in scope with the reasonable, but still somewhat limiting assumption on the smoothness of λ. I would personally have focused on different aspects, such as uncertainty quantification. There is perhaps too much focus on the purported computational efficiency of the approach; the way I see it, under your prior, the binned observations {H_k , k = 1, . . . , N } are sufficient statistics for estimating a stepwise approximation of λ (which is the best reconstruction of λ that you can build under your prior) and so, given that the heuristic for picking N seems to work well, any estimator based on the Hk should be fast.

**Conclusion**. To close, the paper is nicely written but I think that it would make a better contribution if it would focus on uncertainty quantification, inclusion of covariates, study of the asymptotics of the empirical Bayes choice of N. I think that these would have higher appeal.

September 28, 2019 4:06 am

Dear Adeline, thank you for your feedback, we have updated the manuscript. Here I attach a point-by-point response to your review.

December 17, 2019 3:56 pm

Dear Ryan,

Thanks for your interesting comments and also pointing out related references. Our approach aligns well with the viewpoint in [https://arxiv.org/pdf/1604.05734.pdf__] __that in high-dimensional problems the role of the prior is simply to facilitate efficient posterior inference. Below we provide response to your specific queries.

1) In our experience, the log factor is often an artefact of the proof that goes via the entropy and prior mass route (the powerful and general-purpose Ghosal-Ghosh-van der Vaart machinery). With direct arguments, like those in our paper, it is at times possible to avoid a superfluous log factor in the polynomial rates. For another instance where log terms do not appear, see https://arxiv.org/pdf/1706.07449.pdf

2) Indeed, the results with an independent gamma prior are more sensitive to the choice of the number of bins. Furthermore, the gamma Markov chain prior gives "smoother" point estimates and marginal posterior bands, which look visually more pleasing than those based on the independent prior. This feature reflects better the smoothly varying character of our benchmark intensity functions.

3) Since for every fixed N the posterior based on the independent gamma prior is known explicitly, it is indeed not hard to implement an extension where N is equipped with a prior (the marginal likelihood is tractable). In the revised version of the manuscript, we added Appendix C dealing with this issue. Our preference is still with the GMC prior.

As far as the posterior contraction rates are concerned, we agree that by equipping N with a prior it should be possible to get adaptive results for H\"{o}lder-smooth intensity functions. This, however, lies somewhat outside the scope of our predominantly computations-oriented paper. Manuscript length is also a consideration: in its present state, the paper is 45 pages long.

Sincerely,

Shota Gugushvili, Frank van der Meulen, Moritz Schauer, and Peter Spreij

December 17, 2019 4:00 pm

Dear Paulo,

Thanks for interesting comments. A detailed response to you questions is given in the attached pdf file. We also revised and extended the manuscript.

Best wishes,

Shota, Frank, Moritz, and Peter

January 2, 2020 12:27 am

The paper is really nicely written and I'm tempted to try out your methods on some data of my own which I'm fascinated by, and which has some big importance right now in order, perhaps, to correct a miscarriage of justice, see https://www.math.leidenuniv.nl/~gill/Untitled_extended.pdf for a paper-in-the-course-of-being-written, and my recent talk https://www.math.leidenuniv.nl/~gill/bengeentalk.pdf

So I looked in the paper for a link to an R package but instead had to click several times and only then got to a Github page... I guess you are working on it, but please work on it fast! In the meantime maybe you can analyse my data yourself and show us what you see?

Anybody else interested? Let me know and I'll send you the data I have and give you more information, if you like. Google "Ben Geen serial killer nurse" in order to find out about the case - both sides of the story, so to speak.

The paper is interesting, I gave few suggestions. I would be very interested in the generalization to the inclusion of covariates.