Robust and resilient hidden sequence decoding in noisy data with enhanced HMM and dynamic NN ensembles

Document Type

Article

Publication Date

7-1-2025

Publication Title

International Journal of Data Science and Analytics

Abstract

The hidden Markov model (HMM) is a powerful tool for modeling sequential data in fields such as bioinformatics, speech recognition, natural language processing, and finance. However, sequential data are often noisy and incomplete, posing significant challenges for traditional HMMs. To address these issues, we propose a novel approach that integrates a Dynamic-Weighted Sequential Ensemble of Neural Network models (DWSE-NN) and an Enhanced Adaptive-HMM (EA-HMM). Our DWSE-NN framework reduces bias from previous ensemble predictions, while the EA-HMM, based on a modified Baum–Welch (MBW) and Posterior-Viterbi (PV) algorithm, dynamically adjusts transition and emission probabilities to achieve improved decoding accuracy and robustness. Our ensemble model demonstrates substantial improvements over Adaboosting sequential ensembles across multiple datasets and metrics. For instance, on Dataset 1, DWSE-RNN achieved 98.34% accuracy, 98.30% precision, 98.33% recall, and an F1 score of 98.25% under noisy test conditions, with a loss of 0.57, significantly surpassing Adaboost’s 74.18% accuracy, 80.47% precision, 74.18% recall, and 75.95% F1 score with a loss of 8.92. Similarly, on Dataset 2, DWSE-RNN outperformed Adaboost, achieving 89.02% accuracy and F1 score, compared to Adaboost’s 66.57% accuracy and 67.23% F1 score. On Dataset 3, DWSE-RNN maintained resilience, achieving 50.04% accuracy, 48.76% precision, 52.27% recall, and a 49.81% F1 score under noisy conditions, compared to Adaboost’s 33.85% accuracy and 34.05% F1 score. To further assess decoding performance, we introduce a ‘reconstruction loss’ metric, which evaluates how well the EA-HMM captures underlying state transitions and emission probabilities, compared to traditional HMM models. For instance, DWSE-DNN with the MBW and PV configuration demonstrated resilience with a reconstruction loss of 0.05 on Dataset D1, 0.07 on Dataset D2, and 0.10 on Dataset D3. These results validate our model’s robustness, making it highly suitable for applications requiring accurate and resilient sequence decoding in challenging data environments.

Share

COinS