## Description

Fits an autoregressive Poisson or negative binomial model to a univariate or multivariate time series of counts. The characteristic feature of `hhh4`

models is the additive decomposition of the conditional mean into *epidemic* and *endemic* components (Held et al, 2005). Log-linear predictors of covariates and random intercepts are allowed in all components; see the Details below. A general introduction to the `hhh4`

modelling approach and its implementation is given in the `vignette("hhh4")`

. Meyer et al (2017, Section 5, available as `vignette("hhh4_spacetime")`

) describe `hhh4`

models for areal time series of infectious disease counts.

## Usage

`hhh4(stsObj, control = list( ar = list(f = ~ -1, offset = 1, lag = 1), ne = list(f = ~ -1, offset = 1, lag = 1, weights = neighbourhood(stsObj) == 1, scale = NULL, normalize = FALSE), end = list(f = ~ 1, offset = 1), family = c("Poisson", "NegBin1", "NegBinM"), subset = 2:nrow(stsObj), optimizer = list(stop = list(tol=1e-5, niter=100), regression = list(method="nlminb"), variance = list(method="nlminb")), verbose = FALSE, start = list(fixed=NULL, random=NULL, sd.corr=NULL), data = list(t = stsObj@epoch - min(stsObj@epoch)), keep.terms = FALSE ), check.analyticals = FALSE)`

## Value

`hhh4`

returns an object of class `"hhh4"`

, which is a list containing the following components:

- coefficients
named vector with estimated (regression) parameters of the model

- se
estimated standard errors (for regression parameters)

- cov
covariance matrix (for regression parameters)

- Sigma
estimated variance-covariance matrix of random effects

- Sigma.orig
estimated variance parameters on internal scale used for optimization

- Sigma.cov
inverse of marginal Fisher information (on internal scale), i.e., the asymptotic covariance matrix of

`Sigma.orig`

- call
the matched call

- dim
vector with number of fixed and random effects in the model

- loglikelihood
(penalized) loglikelihood evaluated at the MLE

- margll
(approximate) log marginal likelihood should the model contain random effects

- convergence
logical. Did optimizer converge?

- fitted.values
fitted mean values \(\mu_{i,t}\)

- control
control object of the fit

- terms
the terms object used in the fit if

`keep.terms = TRUE`

and`NULL`

otherwise- stsObj
the supplied

`stsObj`

- lags
named integer vector of length two containing the lags used for the epidemic components

`"ar"`

and`"ne"`

, respectively. The corresponding lag is`NA`

if the component was not included in the model.- nObs
number of observations used for fitting the model

- nTime
number of time points used for fitting the model

- nUnit
number of units (e.g. areas) used for fitting the model

- runtime
the

`proc.time`

-queried time taken to fit the model, i.e., a named numeric vector of length 5 of class`"proc_time"`

## Arguments

- stsObj
object of class

`"sts"`

containing the (multivariate) count data time series.- control
a list containing the model specification and control arguments:

`ar`

Model for the autoregressive component given as list with the following components:

- f = ~ -1
a formula specifying \(\log(\lambda_{it})\)

- offset = 1
optional multiplicative offset, either 1 or a matrix of the same dimension as

`observed(stsObj)`

- lag = 1
a positive integer meaning autoregression on \(y_{i,t-lag}\)

`ne`

Model for the neighbour-driven component given as list with the following components:

- f = ~ -1
a formula specifying \(\log(\phi_{it})\)

optional multiplicative offset, either 1 or a matrix of the same dimension as `observed(stsObj)`

a non-negative integer meaning dependency on \(y_{j,t-lag}\)

neighbourhood weights \(w_{ji}\). The default corresponds to the original formulation by Held et al (2005), i.e., the spatio-temporal component incorporates an unweighted sum over the lagged cases of the first-order neighbours. See Paul et al (2008) and Meyer and Held (2014) for alternative specifications, e.g., `W_powerlaw`

. Time-varying weights are possible by specifying an array of `dim()`

`c(nUnits, nUnits, nTime)`

, where `nUnits=ncol(stsObj)`

and `nTime=nrow(stsObj)`

.

optional matrix of the same dimensions as `weights`

(or a vector of length `ncol(stsObj)`

) to scale the `weights`

to `scale * weights`

.

logical indicating if the (scaled) `weights`

should be normalized such that each row sums to 1.

`end`

Model for the endemic component given as listwith the following components

- f = ~ 1
a formula specifying \(\log(\nu_{it})\)

optional multiplicative offset \(e_{it}\), either 1 or a matrix of the same dimension as `observed(stsObj)`

`family`

Distributional family -- either `"Poisson"`

,or the Negative Binomial distribution. For the latter, theoverdispersion parameter can be assumed to be the same for all units (`"NegBin1"`

), to vary freely over all units(`"NegBinM"`

), or to be shared by some units (specified by a factor of length `ncol(stsObj)`

such that its number of levels determines the number of overdispersion parameters). Note that `"NegBinM"`

is equivalent to `factor(colnames(stsObj), levels = colnames(stsObj))`

.

`subset`

Typically `2:nrow(obs)`

if model containsautoregression

`optimizer`

a list of three lists of control arguments.

The `"stop"`

list specifies two criteria for the outeroptimization of regression and variance parameters: the relative`tol`

erance for parameter change using the criterion `max(abs(x[i+1]-x[i])) / max(abs(x[i]))`

,and the maximum number `niter`

of outer iterations.

Control arguments for the single optimizers are specified in thelists named `"regression"`

and `"variance"`

.`method="nlminb"`

is the default optimizer for both (takingadvantage of the analytical Fisher information matrices), however,the `method`

s from `optim`

may also be specified(as well as `"nlm"`

but that one is not recommended here).Especially for the variance updates, Nelder-Mead optimization(`method="Nelder-Mead"`

) is an attractive alternative.All other elements of these two lists are passed as`control`

arguments to the chosen `method`

, e.g., if`method="nlminb"`

, adding `iter.max=50`

increases the maximum number of inner iterations from 20 (default) to 50.For `method="Nelder-Mead"`

, the respective argument iscalled `maxit`

and defaults to 500.

`verbose`

non-negative integer (usually in the range`0:3`

) specifying the amount of tracing information to beoutput during optimization.

`start`

a list of initial parameter values replacinginitial values set via `fe`

and `ri`

.Since surveillance 1.8-2, named vectors are matchedagainst the coefficient names in the model (where unmatchedstart values are silently ignored), and need not be complete,e.g., `start = list(fixed = c("-log(overdisp)" = 0.5))`

(default: 2) for a `family = "NegBin1"`

model.In contrast, an unnamed start vector must specify the full setof parameters as used by the model.

`data`

a named list of covariates that are to beincluded as fixed effects (see `fe`

) in any of the 3component formulae.By default, the time variable `t`

is available and used forseasonal effects created by `addSeason2formula`

.In general, covariates in this list can be either vectors oflength `nrow(stsObj)`

interpreted as time-varying butcommon across all units, or matrices of the same dimension asthe disease counts `observed(stsObj)`

.

`keep.terms`

logical indicating if the terms objectused in the fit is to be kept as part of the returned object.This is usually not necessary, since the terms object isreconstructed by the `terms`

-method for class`"hhh4"`

if necessary (based on `stsObj`

and`control`

, which are both part of the returned`"hhh4"`

object).

The auxiliary function `makeControl`

might be useful to create such a list of control parameters.

logical (or a subset of `c("numDeriv", "maxLik")`

), indicating if (how) the implemented analytical score vector and Fisher information matrix should be checked against numerical derivatives at the parameter starting values, using the packages numDeriv and/or maxLik. If activated, `hhh4`

will return a list containing the analytical and numerical derivatives for comparison (no ML estimation will be performed). This is mainly intended for internal use by the package developers.

## Author

Michaela Paul, Sebastian Meyer, Leonhard Held

## Details

An endemic-epidemic multivariate time-series model for infectious disease counts \(Y_{it}\) from units \(i=1,\dots,I\) during periods \(t=1,\dots,T\) was proposed by Held et al (2005) and was later extended in a series of papers (Paul et al, 2008; Paul and Held, 2011; Held and Paul, 2012; Meyer and Held, 2014). In its most general formulation, this so-called `hhh4`

(or HHH or \(H^3\) or triple-H) model assumes that, conditional on past observations, \(Y_{it}\) has a Poisson or negative binomial distribution with mean $$\mu_{it} = \lambda_{it} y_{i,t-1} + \phi_{it} \sum_{j\neq i} w_{ji} y_{j,t-1} + e_{it} \nu_{it} $$ In the case of a negative binomial model, the conditional variance is \(\mu_{it}(1+\psi_i\mu_{it})\) with overdispersion parameters \(\psi_i > 0\) (possibly shared across different units, e.g., \(\psi_i\equiv\psi\)). Univariate time series of counts \(Y_t\) are supported as well, in which case `hhh4`

can be regarded as an extension of `glm.nb`

to account for autoregression. See the Examples below for a comparison of an endemic-only `hhh4`

model with a corresponding `glm.nb`

.

The three unknown quantities of the mean \(\mu_{it}\),

\(\lambda_{it}\) in the autoregressive (

`ar`

) component,\(\phi_{it}\) in the neighbour-driven (

`ne`

) component, and\(\nu_{it}\) in the endemic (

`end`

) component,

are log-linear predictors incorporating time-/unit-specific covariates. They may also contain unit-specific random intercepts as proposed by Paul and Held (2011). The endemic mean is usually modelled proportional to a unit-specific offset \(e_{it}\) (e.g., population numbers or fractions); it is possible to include such multiplicative offsets in the epidemic components as well. The \(w_{ji}\) are transmission weights reflecting the flow of infections from unit \(j\) to unit \(i\). If weights vary over time (prespecified as a 3-dimensional array \((w_{jit})\)), the `ne`

sum in the mean uses \(w_{jit} y_{j,t-1}\). In spatial `hhh4`

applications, the “units” refer to geographical regions and the weights could be derived from movement network data. Alternatively, the weights \(w_{ji}\) can be estimated parametrically as a function of adjacency order (Meyer and Held, 2014), see `W_powerlaw`

.

(Penalized) Likelihood inference for such `hhh4`

models has been established by Paul and Held (2011) with extensions for parametric neighbourhood weights by Meyer and Held (2014). Supplied with the analytical score function and Fisher information, the function `hhh4`

by default uses the quasi-Newton algorithm available through `nlminb`

to maximize the log-likelihood. Convergence is usually fast even for a large number of parameters. If the model contains random effects, the penalized and marginal log-likelihoods are maximized alternately until convergence.

## References

Held, L., Höhle, M. and Hofmann, M. (2005): A statistical framework for the analysis of multivariate infectious disease surveillance counts. *Statistical Modelling*, **5** (3), 187-199. tools:::Rd_expr_doi("10.1191/1471082X05st098oa")

Paul, M., Held, L. and Toschke, A. M. (2008): Multivariate modelling of infectious disease surveillance data. *Statistics in Medicine*, **27** (29), 6250-6267. tools:::Rd_expr_doi("10.1002/sim.4177")

Paul, M. and Held, L. (2011): Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. *Statistics in Medicine*, **30** (10), 1118-1136. tools:::Rd_expr_doi("10.1002/sim.4177")

Held, L. and Paul, M. (2012): Modeling seasonality in space-time infectious disease surveillance data. *Biometrical Journal*, **54** (6), 824-843. tools:::Rd_expr_doi("10.1002/bimj.201200037")

Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. *The Annals of Applied Statistics*, **8** (3), 1612-1639. tools:::Rd_expr_doi("10.1214/14-AOAS743")

Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. *Journal of Statistical Software*, **77** (11), 1-55. tools:::Rd_expr_doi("10.18637/jss.v077.i11")

## See Also

See the special functions `fe`

, `ri`

and the examples below for how to specify unit-specific effects.

Further details on the modelling approach and illustrations of its implementation can be found in `vignette("hhh4")`

and `vignette("hhh4_spacetime")`

.

## Examples

`######################## Univariate examples######################### weekly counts of salmonella agona cases, UK, 1990-1995data("salmonella.agona")## convert old "disProg" to new "sts" data classsalmonella <- disProg2sts(salmonella.agona)salmonellaplot(salmonella)## generate formula for an (endemic) time trend and seasonalityf.end <- addSeason2formula(f = ~1 + t, S = 1, period = 52)f.end## specify a simple autoregressive negative binomial modelmodel1 <- list(ar = list(f = ~1), end = list(f = f.end), family = "NegBin1")## fit this model to the datares <- hhh4(salmonella, model1)## summarize the model fitsummary(res, idx2Exp=1, amplitudeShift=TRUE, maxEV=TRUE)plot(res)plot(res, type = "season", components = "end")### weekly counts of meningococcal infections, Germany, 2001-2006data("influMen")fluMen <- disProg2sts(influMen)meningo <- fluMen[, "meningococcus"]meningoplot(meningo)## again a simple autoregressive NegBin model with endemic seasonalitymeningoFit <- hhh4(stsObj = meningo, control = list( ar = list(f = ~1), end = list(f = addSeason2formula(f = ~1, S = 1, period = 52)), family = "NegBin1"))summary(meningoFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE)plot(meningoFit)plot(meningoFit, type = "season", components = "end")########################## Multivariate examples########################### bivariate analysis of influenza and meningococcal infections### (see Paul et al, 2008)plot(fluMen, same.scale = FALSE) ## Fit a negative binomial model with## - autoregressive component: disease-specific intercepts## - neighbour-driven component: only transmission from flu to men## - endemic component: S=3 and S=1 sine/cosine pairs for flu and men, respectively## - disease-specific overdispersionWfluMen <- neighbourhood(fluMen)WfluMen["meningococcus","influenza"] <- 0WfluMenf.end_fluMen <- addSeason2formula(f = ~ -1 + fe(1, which = c(TRUE, TRUE)), S = c(3, 1), period = 52)f.end_fluMenfluMenFit <- hhh4(fluMen, control = list( ar = list(f = ~ -1 + fe(1, unitSpecific = TRUE)), ne = list(f = ~ 1, weights = WfluMen), end = list(f = f.end_fluMen), family = "NegBinM"))summary(fluMenFit, idx2Exp=1:3)plot(fluMenFit, type = "season", components = "end", unit = 1)plot(fluMenFit, type = "season", components = "end", unit = 2)# \dontshow{ ## regression test for amplitude/shift transformation of sine-cosine pairs ## coefficients were wrongly matched in surveillance < 1.18.0 stopifnot(coef(fluMenFit, amplitudeShift = TRUE)["end.A(2 * pi * t/52).meningococcus"] == sqrt(sum(coef(fluMenFit)[paste0("end.", c("sin","cos"), "(2 * pi * t/52).meningococcus")]^2)))# }### weekly counts of measles, Weser-Ems region of Lower Saxony, Germanydata("measlesWeserEms")measlesWeserEmsplot(measlesWeserEms) # note the two districts with zero cases## we could fit the same simple model as for the salmonella cases abovemodel1 <- list( ar = list(f = ~1), end = list(f = addSeason2formula(~1 + t, period = 52)), family = "NegBin1")measlesFit <- hhh4(measlesWeserEms, model1)summary(measlesFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE)## but we should probably at least use a population offset in the endemic## component to reflect heterogeneous incidence levels of the districts,## and account for spatial dependence (here just using first-order adjacency)measlesFit2 <- update(measlesFit, end = list(offset = population(measlesWeserEms)), ne = list(f = ~1, weights = neighbourhood(measlesWeserEms) == 1))summary(measlesFit2, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE)plot(measlesFit2, units = NULL, hide0s = TRUE)## 'measlesFit2' corresponds to the 'measlesFit_basic' model in## vignette("hhh4_spacetime"). See there for further analyses,## including vaccination coverage as a covariate,## spatial power-law weights, and random intercepts.if (FALSE) {### last but not least, a more sophisticated (and time-consuming)### analysis of weekly counts of influenza from 140 districts in### Southern Germany (originally analysed by Paul and Held, 2011,### and revisited by Held and Paul, 2012, and Meyer and Held, 2014)data("fluBYBW")plot(fluBYBW, type = observed ~ time)plot(fluBYBW, type = observed ~ unit, ## mean yearly incidence per 100.000 inhabitants (8 years) population = fluBYBW@map$X31_12_01 / 100000 * 8)## For the full set of models for data("fluBYBW") as analysed by## Paul and Held (2011), including predictive model assessem*nt## using proper scoring rules, see the (computer-intensive)## demo("fluBYBW") script:demoscript <- system.file("demo", "fluBYBW.R", package = "surveillance")demoscript#file.show(demoscript)## Here we fit the improved power-law model of Meyer and Held (2014)## - autoregressive component: random intercepts + S = 1 sine/cosine pair## - neighbour-driven component: random intercepts + S = 1 sine/cosine pair## + population gravity with normalized power-law weights## - endemic component: random intercepts + trend + S = 3 sine/cosine pairs## - random intercepts are iid but correlated between componentsf.S1 <- addSeason2formula( ~-1 + ri(type="iid", corr="all"), S = 1, period = 52)f.end.S3 <- addSeason2formula( ~-1 + ri(type="iid", corr="all") + I((t-208)/100), S = 3, period = 52)## for power-law weights, we need adjaceny orders, which can be## computed from the binary adjacency indicator matrixnbOrder1 <- neighbourhood(fluBYBW)neighbourhood(fluBYBW) <- nbOrder(nbOrder1)## full model specificationfluModel <- list( ar = list(f = f.S1), ne = list(f = update.formula(f.S1, ~ . + log(pop)), weights = W_powerlaw(maxlag=max(neighbourhood(fluBYBW)), normalize = TRUE, log = TRUE)), end = list(f = f.end.S3, offset = population(fluBYBW)), family = "NegBin1", data = list(pop = population(fluBYBW)), optimizer = list(variance = list(method = "Nelder-Mead")), verbose = TRUE)## CAVE: random effects considerably increase the runtime of model estimation## (It is usually advantageous to first fit a model with simple intercepts## to obtain reasonable start values for the other parameters.)set.seed(1) # because random intercepts are initialized randomlyfluFit <- hhh4(fluBYBW, fluModel)summary(fluFit, idx2Exp = TRUE, amplitudeShift = TRUE)plot(fluFit, type = "fitted", total = TRUE)plot(fluFit, type = "season")range(plot(fluFit, type = "maxEV"))plot(fluFit, type = "maps", prop = TRUE)gridExtra::grid.arrange( grobs = lapply(c("ar", "ne", "end"), function (comp) plot(fluFit, type = "ri", component = comp, main = comp, exp = TRUE, sub = "multiplicative effect")), nrow = 1, ncol = 3)plot(fluFit, type = "neweights", xlab = "adjacency order")}########################################################################## An endemic-only "hhh4" model can also be estimated using MASS::glm.nb########################################################################## weekly counts of measles, Weser-Ems region of Lower Saxony, Germanydata("measlesWeserEms")## fit an endemic-only "hhh4" model## with time covariates and a district-specific offsethhh4fit <- hhh4(measlesWeserEms, control = list( end = list(f = addSeason2formula(~1 + t, period = measlesWeserEms@freq), offset = population(measlesWeserEms)), ar = list(f = ~-1), ne = list(f = ~-1), family = "NegBin1", subset = 1:nrow(measlesWeserEms)))summary(hhh4fit)## fit the same model using MASS::glm.nbmeaslesWeserEmsData <- as.data.frame(measlesWeserEms, tidy = TRUE)measlesWeserEmsData$t <- c(hhh4fit$control$data$t)glmnbfit <- MASS::glm.nb( update(formula(hhh4fit)$end, observed ~ . + offset(log(population))), data = measlesWeserEmsData)summary(glmnbfit)## Note that the overdispersion parameter is parametrized inversely.## The likelihood and point estimates are all the same.## However, the variance estimates are different: in glm.nb, the parameters## are estimated conditional on the overdispersion theta.# \dontshow{stopifnot( all.equal(logLik(hhh4fit), logLik(glmnbfit)), all.equal(1/coef(hhh4fit)[["overdisp"]], glmnbfit$theta, tolerance = 1e-6), all.equal(coef(hhh4fit)[1:4], coef(glmnbfit), tolerance = 1e-6, check.attributes = FALSE), all.equal(c(residuals(hhh4fit)), residuals(glmnbfit), tolerance = 1e-6, check.attributes = FALSE))# }`

Run the code above in your browser using DataLab