A hybrid LLM and machine learning framework for early fire detection in subway tunnels

Sensitivity analysis of operational parameters

Table 8 presents the sensitivity analysis of the LLM-augmented classifiers across different decision thresholds ((tau)) and temporal persistence windows (k) on the Total dataset. The analysis evaluates how F1 Score, Detection Delay, and Pre-Alarm Rate (PAR) vary as the alarm-formation rule becomes more permissive or more conservative. The default operating setting used in the main experiments is (tau =0.50) and (k=1.0text {s}).

Table 8 Sensitivity analysis of LLM-augmented classifiers on the Total dataset across decision thresholds ((tau)) and temporal persistence windows. Results are presented as mean (± standard deviation) across 5 seeds. Bold values indicate the default operating setting ((tau =0.50) and (k=10) frames, equivalent to 1.0 s) used in the main experiments; these results are kept consistent with the main tables.

Overall, the results show a clear trade-off between detection sensitivity and premature-alarm suppression. Lower thresholds generally preserve higher F1 scores but substantially increase PAR, indicating more aggressive alarm behavior. For example, at (tau =0.25) and (k=1.0text {s}), SVM + LLM and GBM + LLM achieve high F1 scores of 87.45% and 90.85%, respectively, but their PAR values rise to 99.25% and 51.32%. In contrast, higher thresholds suppress premature alarms more strongly but often degrade F1. For instance, increasing the threshold to (tau =0.75) at (k=1.0text {s}) reduces PAR to 8.68% for SVM + LLM, 6.04% for RF + LLM, and 12.08% for GBM + LLM, but lowers their F1 scores to 64.78%, 69.21%, and 69.66%, respectively.

The temporal persistence window further modulates this trade-off. Increasing the persistence window from 1.0 to 2.0 s or 3.0 s generally reduces PAR, but excessive smoothing can substantially reduce F1, especially for SVM + LLM and RF + LLM. GBM + LLM is comparatively more robust under moderate smoothing: at (tau =0.50) and

Source