Inside Prometheus v4: A 5-Model Deep Learning Ensemble for Threat Detection

2026-03-10 · Trevor Skinner

Why One Model Is Not Enough

Cybersecurity threats are not a single classification problem. A brute force attack looks nothing like DNS tunneling. Ransomware file mutations have no behavioral overlap with beaconing callbacks. A single neural network trained on all threat types inevitably compromises: it learns the most common patterns well but struggles with edge cases and novel attack variants.

Prometheus v4 addresses this with a 5-model deep learning ensemble. Each model is a specialist, trained to excel at a specific aspect of threat analysis. Their outputs are combined by a meta-learner that produces the final classification. The result is a system that is simultaneously a deep generalist and a collection of focused experts.

The Five Models

Model 1: DeepThreatClassifier (Residual Network). The backbone of the ensemble is a deep residual network that classifies events into 8 threat categories: brute force, beaconing, ransomware, DDoS, malware, phishing, data exfiltration, and benign. Residual connections (skip connections) allow the network to learn identity mappings, preventing the vanishing gradient problem that plagues deep networks. This model processes 25-dimensional feature vectors and outputs class probabilities. Test accuracy: 99.83% (on honeypot test data).

Model 2: BehavioralSequenceModel (LSTM). Threats are not isolated events — they are sequences of behaviors. The LSTM (Long Short-Term Memory) model analyzes temporal patterns across event sequences. It excels at detecting slow-and-low attacks that unfold over hours: gradual credential stuffing, periodic beaconing, and staged data exfiltration. The LSTM's gated memory cells allow it to remember relevant context from much earlier in a sequence while ignoring noise. Test accuracy: 99.83% (on honeypot test data).

Model 3: MalwareFeatureClassifier. A specialized feed-forward network optimized for static feature analysis. While the other models focus on behavioral patterns, this model analyzes intrinsic properties of events: entropy scores, file sizes, permission patterns, and process trees. It catches threats that look behaviorally normal but have anomalous static characteristics, such as a process with unusual memory allocations or a file with suspiciously high entropy. Test accuracy: 99.83% (on honeypot test data).

Model 4: EnsembleScorer (Stacking Meta-Learner). The meta-learner takes the outputs of the first three models as input and learns the optimal way to combine their predictions. It is trained using out-of-fold stacking: each training sample's meta-features come from a model that never saw that sample during training, preventing information leakage. The meta-learner has 49,000 parameters and learns which specialist to trust for which type of event. Test accuracy: 99.84% (on honeypot test data).

Model 5: AnomalyAutoencoder. The autoencoder takes a fundamentally different approach. Instead of classifying events into categories, it learns what “normal” looks like. Events that deviate significantly from the learned normal pattern produce high reconstruction error and are flagged as anomalous. This model catches zero-day attacks and novel threat types that none of the classifiers have been trained on. Detection rate: 65.4% attack true positive, 97.9% benign true negative (2.1% FP), with a reconstruction error threshold of 0.004704.

Leakage-Free Validation

Model accuracy numbers are meaningless if the evaluation is contaminated. Data leakage — where information from the test set inadvertently influences training — is the most common source of inflated accuracy in ML security research. We eliminate leakage through strict separation: training, validation, and test sets are split before any preprocessing, feature extraction, or balancing. The ensemble meta-learner uses out-of-fold predictions exclusively, ensuring no sample ever contributes to both its own meta-features and the meta-learner's training data.

Our training pipeline processes 987,549 unique samples (deduplicated from 17.6M raw events), balanced to 1,088,664 through class-aware sampling. The held-out test set of 108,867 samples provides the accuracy figures reported above.

How They Work Together

When an event enters the detection pipeline, feature extraction produces a 25-dimensional vector. This vector is fed simultaneously to all five models. The DeepThreatClassifier, BehavioralSequenceModel, and MalwareFeatureClassifier each produce class probability distributions. These distributions become meta-features for the EnsembleScorer, which outputs the final classification and confidence score. In parallel, the AnomalyAutoencoder computes reconstruction error. If the reconstruction error exceeds the threshold, the event is flagged as anomalous regardless of the classifier outputs.

This architecture provides defense in depth at the model level. A novel attack might fool one classifier, but fooling three classifiers, a meta-learner, and an anomaly detector simultaneously is orders of magnitude harder. The total parameter count across all five models is 7,015,729 — large enough to capture complex patterns but small enough for real-time inference on GPU.

The ensemble achieves 99.63% accuracy on our leakage-free honeypot test set with false positive monitoring via drift detection. Behind every detection in Prometheus is not a single model making a guess, but five specialized neural networks reaching consensus.