Out-of-sample state decoding using a frozen HMM
Source:R/run_out_of_sample_decoding.R
run_out_of_sample_decoding.RdThis function applies the parameters of a previously fitted hidden Markov model
(an epiquest_hmm object) to new data. It allows for
freezing of the HMM parameters (transition probabilities, emission
distributions and initial state distribution) and determining the hidden
states for new data without re-estimating the model.
Arguments
- obs_data_new
A
data.framecontaining the new observations. Must contain anindexcolumn and the same response columns (e.g.,rateornum/denom) used to fit thehmm_frozen. Ifseasonal = TRUE, it must also contain aseasoncolumn.- hmm_frozen
An object of class
epiquest_hmm. This is the fitted model object containing the parameters to be applied to the new data.- seasonal
A logical. If
TRUE, the function respects season boundaries inobs_data_newto prevent transitions between different seasons.
Value
An object of class epiquest_decode. This is a list containing:
states: The inputobs_data_newwith added columns for the predicted state and method used (see Details below).probs: For methods 'local' and 'filtering', the posterior probabilities for each of the states.
Details
The function initializes a new depmix model using the structure of
obs_data_new and then uses depmixS4::setpars to hard-code the
parameters from hmm_frozen.
This is particularly useful for prospective surveillance, where you want to classify new weekly data using a model trained on historical seasons without the new data points influencing the previously trained model.
The states can be decoded according to different methods:
'global': Uses the Viterbi algorithm to find the single most likely sequence of states across the entire time series. This approach outputs a single state for each data point. There are no associated probabilities.'local': Assigns the state with the highest posterior smoothing probability at each specific time point.'filtering': Similar to 'local', but performs real-time decoding, where the state at time t is determined only by observations from time 1 to t.
Comparing 'local' and 'filtering', each approach has its advantages and disadvantages. Filtering state assignments do not use information from the following week to determine states, so they do not use all available information. When computed every week on new data as the data arrive, the state assignments are stable. Local state assingments can change: they are only stable once the data for the following week is available.
The plots generated by create_hmm_plots() and create_loop_plots() all use 'local'
state assignments.