Author: Mark Ludwikowski <markl02us@yahoo.com> · INTERNAL / PREVIEW — ADRIZ self-assessment for INGV review. Use your browser’s Print → Save as PDF for a PDF copy.

A Public-Data-Driven Wildfire and Volcanic-Confounder Visual Monitoring System for Sicily / Mount Etna

A detector-first, VLM-veto, multi-feed-corroboration architecture, evaluated with operational honesty

Author: Mark Ludwikowski <markl02us@yahoo.com> Deliverable: ADRIZ → INGV (Istituto Nazionale di Geofisica e Vulcanologia), Mount Etna visual monitoring Frozen evaluation baseline: commit 4d1b0cca79bf396e99b9a49e8477ae3a36ecfd33 (4d1b0cc), branch master Verification window (UTC): 2026-06-29 ~02:31 → ~02:56 Operational-classification labels used throughout: LIVE_OPERATIONAL · LIVE_STALE · STAGED_NOT_LIVE · RESEARCH_ONLY · PLANNED · BROKEN_OR_BLOCKED · UNKNOWN_NEEDS_VERIFICATION

Evidence discipline (binding). Every quantitative claim in this thesis traces to a committed Phase-1 evidence artifact in flank_wildfire/reports/thesis/. No number is invented or softened. Where evidence is incomplete the text says so plainly, scopes the claim, or labels the metric UNKNOWN. The system is not claimed to solve early wildfire detection generally; it is a reproducible, evidence-backed architecture for multi-source environmental monitoring, reported with detector-alone-versus-two-stage metrics, confidence intervals, latency, false-positive and false-negative behaviour, and explicit failure modes.

Abstract

Mount Etna is one of the most challenging environments in the world for camera-based wildfire detection: vegetated, populated flanks that genuinely burn sit directly beneath a persistently active volcano whose degassing plumes, ash columns, lava incandescence, and strombolian ejecta are constant visual confounders, and whose summit is routinely obscured by meteorological cloud and twilight glare. This work presents and evaluates ADRIZ, a public-data-driven visual monitoring system for the Etna / Sicily setting that combines four components: (i) a camera wall ingesting multiple public webcam and institutional video sources on a 75-second model-resident cadence with honest per-source health reporting; (ii) a frozen YOLO11s detector (weights sha256 c0a3d0ead257…cf0a20) that generates per-frame candidates; (iii) a crop-level Qwen3-VL semantic veto (qwen3-vl-32b, temperature 0.0) invoked only on detector-routed hot/bright crops, made recall-safe at night by a durable satellite-corroboration override; and (iv) a 64-feed public-data board with operational staleness classification plus a per-class satellite/weather/geospatial corroboration layer (WF36).

On a held-out, leakage-controlled evaluation (RESEARCH_ONLY, small-n), the two-stage system held the volcanic false-alarm rate at 8.1% (5/62, 95% Wilson CI [3.5–17.5%]) versus 9.7% detector-alone, while preserving 94.4% daytime genuinely-visible wildfire recall (17/18, 95% CI [74.2–99.0%]). The veto's measured effect was strictly one-directional (McNemar b=2, c=0, exact p=0.50, not significant at this sample size): it removed exactly one sensor-artifact false alarm and one borderline daytime ground-lights frame, and improved no genuine-fire decision. At night the veto originally vetoed two true vegetation fires whose flame is single-frame-indistinguishable from lava incandescence; a durable night-safety guard now withholds the volcanic veto on any uncorroborated wildfire-class night detection (surfacing it as uncertain_night rather than dropping it), taking the night true-fire silent false-negative count from 2 to 0 while leaving the daytime numbers and the volcanic false-alarm rate unchanged. A separately-reported quantum due-diligence track found, on the same Etna volcanic-versus-wildfire task, that simulated quantum classifiers are beaten by matched classical baselines with confidence intervals that exclude zero (Q-kernel −0.099 [−0.161, −0.038]) — a clean publishable negative, with the one genuine novelty (volcanic source-inversion as a QUBO) preserved as a research line, no quantum hardware used. We report the system with full operational classification, identify the limitations honestly (3-of-5 cameras online at verification, seven stale feeds, latency tails not instrumented, alert dispatch staged-off), and lay out the next build directions: domain adaptation, IR/thermal fusion, MTG-FCI event tracking, and a fine-tuned wildfire/volcano VLM.

Chapter 1 — Motivation and Problem

1.1 Wildfire early detection and the false-alarm budget

Fixed-camera early smoke detection is an established and operationally valuable lineage. Govil et al. [1] demonstrated, on the HPWREN / AlertWildfire camera network in Southern California, a deep-learning system scanning hundreds of cameras every minute that detects smoke typically within roughly fifteen minutes of ignition at under one false positive per camera per day. That result frames the problem this work inherits: detection value comes not from recall alone but from recall under a strict false-alarm budget. A monitoring system that alarms constantly is operationally useless regardless of its sensitivity, because human reviewers stop trusting it. Dewangan et al. [2] (FIgLib / SmokeyNet) and the PyroNear-2025 benchmark [3] established that single-frame camera detection saturates on confounders and that the task remains hard across camera domains — which is precisely why this work layers a semantic veto and multi-source corroboration on top of a fast detector rather than relying on the detector alone.

1.2 Sicily and Mount Etna as a hard monitoring environment

Sicily experiences a severe Mediterranean wildfire season; its vegetated terrain, including the populated lower and middle flanks of Mount Etna, genuinely burns. The deliverable target for this system is INGV, Italy's national geophysics and volcanology institute, whose Etna monitoring concern spans both volcanic activity and the wildfire risk on the edifice's flanks. The defining difficulty is co-location: a vegetation wildfire on Etna's flank and the volcano's own thermal and plume activity occupy the same cameras, the same satellite pixels, and frequently the same frame.

1.3 Volcanic and meteorological confounders

The confounder set at Etna is unusually rich and adversarial:

Volcanic plumes and degassing — persistent white/blue summit degassing and fumarolic steam plumes that, in a single camera frame, resemble wildfire smoke columns.
Ash plumes and eruption columns — episodic, darker columns that can be mistaken for dense wildfire smoke.
Lava incandescence, lava fountains, strombolian explosions, incandescent ejecta — red/orange glowing thermal sources whose single-frame appearance is near-identical to a nighttime vegetation-fire flame line. This is the single hardest confounder in the system and the source of its only true recall residual (Chapter 6).
Meteorological cloud and fog/haze obscuring the summit, and sunset/twilight glow painting warm, fire-coloured light across the edifice.
Sensor and lens artifacts — vertical streaks, lens flare, and reflections that a detector can box as flame.

WARP [12] found that both CNN and transformer wildfire detectors fail to distinguish cloud-like patches from real smoke under local adversarial perturbation; Etna supplies that adversarial confounder set naturally, every day. A system for this environment must therefore be engineered around wildfire-versus-volcanic disambiguation, not generic smoke detection.

1.4 The need for a public-feed, multi-source, low-cost system

No single sensor resolves the confounder problem. A camera sees a plume but cannot tell smoke from steam at the summit; a satellite thermal product sees heat but at a pixel far coarser than the camera's region of interest and cannot tell flank wildfire from crater lava when the heat is on-crater; a gas sensor or SO₂ retrieval supports a "volcanic" reading but is a coarse atmospheric column, not pixel-level event proof. The architecture this thesis evaluates therefore treats the problem as a system of independent public data sources — camera candidate detection, semantic VLM reasoning, and satellite/weather/geospatial corroboration — built entirely from public feeds and low-cost commodity compute (a local workstation detector plus remote serverless VLM inference; no edge, Hailo, or DGX hardware is assumed). The contribution is the reproducible, operationally-honest integration of these sources, with every component classified by its true operational status.

This chapter situates each design choice in current, verified literature. Every reference was confirmed to exist against arXiv, the publisher, IEEE/ScienceDirect, or the Hugging Face papers index before citation; all 35 references are VERIFIED (the two highest-stakes, SmokeBench [7] and FCI-FireDyn [26], were independently spot-re-confirmed). Numbers in square brackets index the References section.

2.1 Camera-based wildfire / smoke detection foundations

The camera-wall lineage is defined by Govil et al. [1] (HPWREN/AlertWildfire, ~15-minute detection, <1 FP/camera/day), Dewangan et al. [2] (FIgLib's ~25,000 labelled fixed-camera smoke images and the spatiotemporal SmokeyNet CNN that exploits frame-to-frame information), and the PyroNear-2025 benchmark [3] (a geographically diverse web-scraped camera dataset, ~150k annotations over 640 wildfires, showing the task remains hard across domains). These define the detector stage's job — per-frame candidate generation under a strict false-alarm budget — and motivate the temporal and multi-source layers added on top.

2.2 YOLO-style detectors and transformer challengers

The incumbent detector is a YOLO11s-class single-stage model chosen for model-resident real-time inference. Newer detectors are treated as future directions, not requirements: YOLOv12 [4] (attention-centric, Area-Attention/R-ELAN, improved mAP at comparable latency but with training-instability/CPU-throughput costs); RT-DETR [5] (the first real-time end-to-end NMS-free detection transformer, CVPR 2024); and for open-vocabulary candidate generation, Grounding DINO 1.5 [6], particularly the TensorRT-optimised Edge variant (~75 FPS). RT-DETR's NMS-free property and Grounding DINO Edge's text-promptable open-set detection are the two most relevant upgrade paths for a camera wall that must add confounder classes without full retraining.

2.3 VLMs / MLLMs for wildfire smoke — strong semantically, weak as primary localizers

This is the central evidence base for the detector-first + VLM-veto choice. SmokeBench [7] (Qi, Li, Barnes; WACV 2026) evaluates MLLMs (Qwen2.5-VL, InternVL3, GPT-4o, Gemini-2.5 Pro, Grounding DINO, Idefics2, Unified-IO 2) on smoke classification, localization, and detection; its headline finding is that models can often classify large-area smoke but all struggle with accurate localization, especially early-stage, with performance strongly tied to smoke volume. Earth-observation VLM benchmarks corroborate the pattern: GPT-4V-on-EO ("good at captioning, bad at counting") [8] and GEOBench-VLM [9] both show strong open-ended scene knowledge but poor spatial localization/counting (GPT-4o ~40% on GEOBench-VLM MCQs, ~2× chance). This directly supports using a VLM not as the primary localizer but as a second-stage semantic veto on already-localized crops — the regime where MLLMs are strong.

2.4 Detector-first + VLM-veto versus VLM-only monitoring

No single canonical paper names "detector-first + VLM-veto for wildfire cameras"; the claim is supported by converging verified evidence and this is stated honestly rather than attributed to an invented source. SmokeBench [7] and the EO-VLM benchmarks [8,9] establish VLM localization weakness; the camera-detector lineage [1,2,3] establishes that fast single-stage detectors localize well but over-fire on confounders. FireCLIP [10] is the closest direct evidence that a vision-language stage adds value specifically as a false-alarm discriminator (cooking smoke, industrial emissions), reporting ≥12.45% zero-shot improvement and better regional generalization via prompt tuning. The two-stage decomposition — fast detector for recall/localization, VLM for precision/semantic veto — is exactly what the WF25 evaluation (Chapter 5) tests empirically.

2.5 Prompt design as an evaluated system component

FireCLIP [10] demonstrates prompt tuning as the mechanism delivering its false-alarm and generalization gains; TuneVLSeg [11] benchmarks textual/visual/multimodal prompt-tuning under domain shift (textual prompts degrade under large shift, visual prompting is a competitive cheaper first attempt); WARP [12] shows prompt/threshold-adjacent design controls the recall-versus-false-alarm operating point. Accordingly, the Qwen3-VL veto prompts in this work are versioned and frozen (Chapter 3, model_prompt_freeze.json) and reported as a tested variable, not an afterthought.

2.6 Domain adaptation for camera networks

Identified as a likely core next-build pillar. Verified sources: the MCAF multilevel-feature-alignment UDA smoke detector [13]; EDIF [14] for enhanced domain-invariant cross-domain forest-fire smoke detection; a synthetic-to-real UDA study on the AlertWildfire network [15]; and Pesonen et al. [16] on zero-shot foundation-model supervision training small real-time camera segmenters from box labels. Each ADRIZ camera (INGV, Windy, EtnaWalk) is a distinct visual domain (lighting, angle, weather, Etna's plume backdrop), which scopes the Chapter-7 adaptation plan.

2.7 Robustness and adversarial / hard-negative testing

WARP [12] (Ide & Yang) is the first model-agnostic framework for adversarial robustness of wildfire detection models, injecting global (Gaussian) and local (cloud-PNG-patch) noise; transformers showed >70% precision degradation under global noise, and both CNN and transformer models failed to distinguish cloud-like patches from real smoke under local attacks. This is the template for the failure-appendix hard-negative battery (Chapter 6) and the direct literature motivation for the VLM veto + satellite corroboration as mitigations.

2.8 IR / thermal fusion (future capability)

Verified RGB-thermal fusion sources: a UAV multi-scenario RGB-Thermal forest-fire dataset and fusion model [17]; the MCDet target-aware RGB-T fusion model [18]; a visible+thermal-infrared flame-detection method [19]; and at the VLM level WildFireVQA [20], a large radiometric-thermal VQA benchmark finding RGB remains the strongest single modality for current MLLMs while retrieved thermal context helps stronger models. IR is directly relevant to Etna's lava-versus-flame ambiguity, but WildFireVQA keeps the claim honest: thermal is a contextual gain, not a solved modality for VLMs.

2.9 Optical-flow / temporal plume tracking and segmentation

For temporal smoke-motion consistency and plume-growth segmentation: a spatiotemporal bag-of-features early-smoke detector using histogram of oriented optical flow exploiting upward thermal convection [21]; spatiotemporal/dynamic-texture forest-fire smoke video detection [22]; and for modern segmentation, SAM 2 [23] (promptable video segmentation with streaming memory) plus a fire-specific SAM2 study [24] (Box+MP best, mIoU ~0.64). Temporal consistency is the most promising future mitigation for the night-fire↔lava residual (lava is steady; wildfire flickers and spreads).

2.10 Satellite corroboration and geostationary event tracking (MTG-FCI)

The corroboration layer's geostationary basis. MTG-FCI detects fires ~4 h earlier than SEVIRI, ~2 h before MODIS, and finds ~5× more active-fire pixels than SEVIRI [25]; the FCI-FireDyn / Fire-Event-Tracker algorithm [26] (Paugam et al., 2026) spatio-temporally clusters FCI hotspots at 10-minute cadence to derive fire-arrival maps, rate of spread, and burnt-area evolution, validated on Southern-European 2024–2025 fires; a feasibility study [27] explores unsupervised MTG-FCI wildfire detection. The directive's insistence that FCI be treated as event-tracking / early-candidate data, not perfect ground truth is grounded here: FCI's strength is temporal evolution and early timing, while its ~1–2 km pixel makes it too coarse for camera-event-level ground truth — exactly the "supports / too coarse" labelling WF36 applies.

2.11 Point corroboration: FIRMS VIIRS/MODIS and Sentinel-3 SLSTR

Schroeder et al. [28] is the canonical VIIRS 375 m active-fire algorithm behind NASA FIRMS; Xu & Wooster [29] describe the operational SLSTR daytime active-fire / FRP product with global intercomparison to MODIS/VIIRS/Landsat (daytime product operational since March 2022), building on pre-launch algorithm work [30]. These supply the published detection limits that justify the asymmetric corroboration logic in WF36: a thermal hotspot near a camera candidate elevates confidence, while absence is treated as non-disconfirming (sub-pixel early smoke is below the satellite detection floor).

2.12 Sentinel-5P/TROPOMI and CAMS as contextual gas/plume evidence

Theys et al. [31] (global TROPOMI volcanic-SO₂ degassing), an Italy-specific Stromboli SO₂ study [32], and the TROPOMI SO₂ retrieval ATBD [33] anchor the gas-context layer. TROPOMI SO₂ supports a "volcanic degassing" classification but its coarse footprint and overpass cadence make it supporting context, never per-pixel camera-event proof — the precise labelling WF36 enforces. CAMS/GFAS plays the analogous aerosol/emission role; TROPOMI is cited as the verified anchor and CAMS-specific event-level proof is flagged context-only.

2.13 Operational data-feed reliability and staleness classification

This is an engineering/operational contribution rather than an academic finding, and that is stated honestly: it is grounded in official documentation — NASA FIRMS product/latency documentation [34] and EUMETSAT MTG instrument documentation [35] — which defines the upstream cadences against which the staleness thresholds are derived. The system's feed-health monitor classifies every feed operationally, with a degraded-response guard ensuring a degraded upstream cannot masquerade as "live-with-zero."

Chapter 3 — System Architecture

Status of this chapter's claims: every component is labelled with its verified operational status from operational_state_verification.md and system_performance_spec.md, re-verified from current evidence in the 2026-06-29 02:31–02:56 UTC window at commit 4d1b0cc.

3.1 Overview

ADRIZ is a four-layer pipeline:

  Public camera sources (5 configured)
        │  75 s model-resident cycle
        ▼
  [Stage 1]  YOLO11s detector  (frozen, sha256 c0a3d0ea…)
        │  per-frame candidate boxes, 19-class head
        ├── PASS_THROUGH classes (smoke / ash / degassing) ──────────────┐
        │                                                                │
        └── ROUTE classes (lava / incandescence / flame) ──► hot/bright crop
                                                                │
        ▼                                                       ▼
  [Stage 2]  Crop-level Qwen3-VL veto (qwen3-vl-32b, temp 0.0)
        │  WILDFIRE | VOLCANIC | NEITHER   (+ NIGHT-SAFETY override)
        ▼
  [Stage 3]  WF36 multi-feed corroboration  (FIRMS / SLSTR / FCI / SEVIRI / TROPOMI / CAMS)
        │  on-crater = volcanic;  off-crater fresh FIRMS = independent wildfire support
        ▼
  [Stage 4]  Alert taxonomy + feed-health board (64 feeds) + bilingual dashboard

The detector and VLM run live in the EtnaCameraWall scheduled task; the feed board refreshes hourly via EtnaFeedsRefresh. Both tasks were Running at verification (schtasks /query @ 02:32 UTC). Status: LIVE_OPERATIONAL for the running pipeline; alert dispatch is STAGED_NOT_LIVE (Section 3.8).

3.2 The 64-feed public-data board and feed-health monitoring

The data-feeds board inventories 64 public sources. At verification (curl https://adr-etna-ingv.pages.dev/data/feeds.json @ 2026-06-29T02:31:33Z, HTTP 200), an independent recount of the 64 group-level entries matched the published summary exactly:

Feed status	Count	Meaning (from the live banner)	Status label
live	46	data returned now	LIVE_OPERATIONAL
stale	7	real pull, but upstream archive/outage lag	LIVE_STALE
catalogued	10	reachable but needs a token / no scalar-point API	STAGED_NOT_LIVE
auth_pending	1	our key not yet configured	STAGED_NOT_LIVE
error	0	—	—
total	64		LIVE_OPERATIONAL (board)

Two engineering guarantees make this a defensible operational claim rather than a vanity count:

Honest taxonomy. The 7 stale and 10 catalogued feeds are not live data and are never described as such; the corroboration logic treats a feed older than the 24 h threshold as no current evidence, never as contradiction.
Overpass degradation guard (LIVE_OPERATIONAL). osm_roads_rail.py (L64–72, commit 4d1b0cc, originally 398e94d) applies a plausibility guard: an empty/zero road count is flagged stale with validated_pull=False, so a degraded Overpass response cannot masquerade as live-with-a-bogus-zero. The current live read is 23,379 roads / 341 rail → status=live.

A representative live numeric is the Fire Weather Index: the most recent EFFIS daily FWI analysis (CEMS EWDS GEFF 4.1) at the Etna-summit cell was 13.06 (moderate) at 02:31:33Z. Honest latency caveat: GEFF 4.1 daily analysis carries ~3–4 day latency, so this is the most recent daily analysis, not an instantaneous reading.

The seven stale feeds at verification are surfaced as failures, not hidden (full table in Chapter 6 / failure_appendix.md §8): cams_gfas_fire (archive 208 days behind), effis_active_fire (EFFIS WFS Oracle backend failure, self-heals), era5_land (~5-day production latency, stale-by-design), gwis (JRC WFS Oracle backend failure), ingv_oe_bulletin (no Etna item in this week's GVP RSS), meteostat (bulk-archive lag), opensky_adsb (HTTP 429 anonymous rate-limit).

3.3 Camera ingestion — including the multi-source reality

Five camera sources are configured (/api/cams sources[] @ 2026-06-29T02:44:37Z): the INGV EtnaTVChn mosaic (garr.tv PeerTube HLS), three Windy webcams (Milo East 9.1 km, Trecastagni 16.8 km, Catania Jonio 26.8 km from summit), and the EtnaWalk YouTube live stream. The wall runs a 75-second model-resident cycle (cadence_s:75, confirmed in both /api/cams and the local publisher artifact cameras_wall.json).

Honest multi-source caveat (the camera wall is not "5 live cameras"). At the verification timestamp only 3 of the 5 sources were online (n_online:3) — the three Windy cameras (all CLEAR, DAY_RGB). The INGV EtnaTVChn mosaic and the EtnaWalk stream were OFFLINE (online=False, badge OFFLINE). The wall reports OFFLINE honestly rather than serving a frozen frame. Thesis-wide wording is therefore scoped to "5 configured camera sources, 3 online at the verification timestamp," never "5 live cameras." Status: LIVE_OPERATIONAL (3/5 online); the two OFFLINE sources are a LIVE_STALE sub-component honestly flagged.

Camera health detection is itself a LIVE_OPERATIONAL feature: per-source online:false / badge OFFLINE is emitted truthfully (multi_cam_service.py L251–261), and a stale/frozen-frame watchdog enforces an age-based guard (STALE_FRAME_MAX_S = 1800 s). A per-frame perceptual-hash identical-frame check (to catch a recent but frozen camera) is not yet implemented and is queued in the roadmap. The /api/cams endpoint additionally showed transient TLS resets (two of three fetches) during verification before succeeding; the locally published cameras_wall.json corroborated the same content. This is recorded as a monitoring flag (Chapter 6), not a hard stop.

3.4 The detector — frozen YOLO11s, exact provenance

The Stage-1 detector is frozen for the thesis (WF19 KEEP-INCUMBENT decision):

Parameter	Value
Architecture	YOLO11s, 19-class head
Weights	`models/ingv_v1b_best.pt`, 19,261,267 bytes
Weights sha256	`c0a3d0ead257d318e70bec3bb84feaec7b99e9e3d55b132fc5f1ffd405cf0a20`
Inference size / confidence	imgsz 960 / conf 0.25
Crop pad / max side	0.25 fraction / 768 px
Dedup / cooldown	IoU 0.4 / 1800 s per class-location
Device	auto (CPU or GPU; no edge-hardware assumption)

The model-resident detector loop (multi_cam_task.py, PID 31700, ~308 MB resident) was running at verification. The 19-class head emits five wildfire/volcanic-relevant buckets; note that data.yaml shows nc:5, which is stale — the operative head is 19-class, confirmed against the weights and service/config.py CLASS_NAMES. Status: LIVE_OPERATIONAL.

Crucially, the detector's class is used to route:

PASS_THROUGH (never sent to the VLM in Config A): wildfire_smoke, ash_plume, forced_ashladen_degassing, fumarolic_steam, passive_degassing_steam. Smoke is passed through to preserve wildfire recall — a grey distant plume is not red-hot and must never be vetoable.
ROUTE to VLM: wildfire_flame, active_lava_flow, incandescent_ejecta, lava_fountain, lava_incandescence, strombolian_explosion. The hot/bright classes are exactly where the lava-versus-flame disambiguation lives.

3.5 Crop-level Qwen3-VL veto — exact model, prompt, and settings

The Stage-2 veto is the exact, frozen configuration recorded in model_prompt_freeze.json (no thesis result references "Qwen-VL" generically):

Field	Value
Model	`qwen3-vl-32b` (Qwen3-VL 32B)
Serving	PHOENIX Model-Vault RunPod serverless, OpenAI-compatible endpoint (remote; crash-resilient by design)
Routing	crop-level — a padded crop of each routed detector box; the full frame is not sent for the veto
Temperature	0.0 (deterministic)
max_tokens	120
top_p	server default (not pinned in the request payload — a reproducibility gap, see Chapter 6)
Retries / backoff / timeout	3 / 8.0 s / 180 s
Image encoding	crop downscaled to long-side ≤768 px, JPEG q88, base64 data-URL
Output	strict JSON `{"label":"WILDFIRE\|VOLCANIC\|NEITHER","confidence":0.0-1.0,"reason":"<=14 words"}`; on parse failure, label `PARSE_FAIL`
Cache policy	deterministic (temp 0) verdicts cached per crop; WF25 reproduced Config A bit-for-bit from cache (0 new calls, 245/245 decisions match, 0 phash leakage)

Two prompts are frozen. The CROP_PROMPT (hot/bright disambiguation, primary veto) explicitly instructs the model "Do NOT assume it is volcanic just because Etna is a volcano — vegetation wildfires occur on Etna's flanks," and forces a one-of-three choice WILDFIRE / VOLCANIC / NEITHER (the last covering sunset glow, sunlit cloud, artificial lights, lens flare, reflection, sensor artifact). The SMOKE_PROMPT (degassing-versus-plume, used for ambiguous large/summit smoke) distinguishes a denser browner/greyer wildfire column rising from vegetated ground from a white/blue crater-rooted degassing plume from diffuse sky-wide cloud/haze. The exact verbatim text of both prompts is in model_prompt_freeze.json (vlm_prompt_exact_text_CROP_PROMPT, vlm_prompt_exact_text_SMOKE_PROMPT).

The VLM is invoked only on detector-routed hot/bright crops: measured at ~0.151 calls/frame over the held-out set and 0 on quiet frames (vlm_call_rate), corroborated live by vlm_calls_this_cycle: 0 on a quiet cycle. This is the one performance figure safe to classify LIVE_OPERATIONAL for the rate itself. Status: LIVE_OPERATIONAL (detector + crop-veto run live in EtnaCameraWall; WF25 metrics are RESEARCH_ONLY).

3.6 The NIGHT-SAFETY corroboration override

The VLM veto's most consequential design element is its night-safety override, a durable guard (not a one-off) in service/crop_veto.py + service/config.py. It exists because the volcano-context prompt that gives the system its low volcanic false-alarm rate is exactly what mis-routes a bright nighttime vegetation fire — whose flame is single-frame-indistinguishable from lava incandescence — to VOLCANIC.

Rule as implemented. When a wildfire-class detection (wildfire_flame / wildfire_smoke) is routed and the VLM verdict would SUPPRESS it (VOLCANIC or NEITHER):

Daytime (panel mean-grey > NIGHT_PANEL_MEAN_MAX = 12): unchanged — daytime recall was already preserved; lava confusion is a night problem.
Night/thermal (panel mean-grey ≤ 12): the volcanic suppression is honoured only if there is independent volcanic corroboration consistent with the vent — on-crater/summit-proximal FIRMS-SLSTR/FRP (firms_corroborated), or the hot crop sits inside the summit ROI (inside_summit_roi), reusing the WF36 on-crater logic. With no such corroboration the alarm is not silently dropped: it is downgraded to a still-surfaced WILDFIRE_UNCERTAIN_NIGHT / needs_review state (alert feed + tile).

A real off-crater night fire can therefore never be erased by the VLM alone. The re-scored effect is quantified in Chapter 5: night true-fire silent false-negative 2 → 0, daytime and volcanic-FA unchanged. Status: LIVE_OPERATIONAL guard logic (in the live service); the WF25 re-score demonstrating its effect is RESEARCH_ONLY.

3.7 WF36 multi-feed corroboration

Stage 3 places a candidate in independent context via service/corroboration.py (corroborate_decision, gate_alerts, _volcanic_scene), evaluated against a live feed snapshot (snapshot_utc 2026-06-29T02:56:39Z). The corroboration logic implements rules for the five genuinely-corroborable detector classes:

Wildfire smoke / flame. Take firms_near_summit_km (min of fresh VIIRS/MODIS). A hit in the annulus CRATER_KM(3) < near ≤ NEAR_SUMMIT_KM(15) → corroborated (independent wildfire signal). A hit ≤3 km is treated as the volcano itself → not a wildfire confirmation. >15 km or no fresh FIRMS → uncorroborated; a fresh FRP granule with no co-located value contributes only granule-recency (supports). Load-bearing assumption: on-crater FIRMS is volcanic, not wildfire — FIRMS cannot distinguish lava from a wildfire on the crater itself. Because FIRMS/SLSTR have hours-scale latency, a real early wildfire will routinely be uncorroborated (camera-only early warning), so the system must alarm on high detector+VLM confidence in that window rather than wait for satellite.
Lava / incandescence. On-crater fresh FIRMS (≤3 km), else fresh FRP, else fresh FCI coverage → corroborated as VOLCANIC with explains_volcanic=True, which suppresses any wildfire alert for the same scene.
Volcanic ash plume. Fresh CAMS AOD value → corroborated; else fresh SEVIRI coverage → corroborated (ash/IR-window context). This is the weakest corroborated verdict in the module: SEVIRI coverage existing is not evidence a plume is present, so the thesis-safe wording is "ash context available (geostationary coverage)," not "ash plume confirmed."
Volcanic steam / degassing. Fresh TROPOMI SO₂ ≥ 5e-4 mol/m², else fresh CAMS SO₂ ≥ 5e-5 kg/m², else fresh EMIT granule → corroborated. The rule and thresholds are real, but on the verification cycle no SO₂ value was usable (TROPOMI summit-box mean null + CAMS stale), so degassing was uncorroborated this cycle — correctly.

The worked examples in wf36_corroboration_matrix.md §3 are the actual output of python service/corroboration.py against the live snapshot. On that cycle, firms_near_summit_km = 0.4 km (on-crater): lava_incandescence was corroborated VOLCANIC, while wildfire_flame/wildfire_smoke were correctly held as uncorroborated (the 0.4 km hit is Etna's own crater thermal, not an independent wildfire — exactly the trap WF36 exists to avoid).

The honest 5-corroborable / 8-STAGED split. Eight Gate-C classes are STAGED_NOT_LIVE corroboration targets, not live detection or corroboration, and the thesis must not imply otherwise: meteorological cloud, glare/sun/reflection, and black/frozen/stale frame are detector context labels only (they never alarm and have no corroboration branch); fog/haze, industrial smoke, dust/quarry, camera artifact, and unknown are not detector classes at all. For industrial smoke and dust, the OSM industrial/power and roads/rail data are on the board but not wired into corroboration.py — they are available-but-unwired columns, not corroboration the thesis can claim. Status: LIVE_OPERATIONAL for the five rule-backed classes (with their cycle-level qualifiers); STAGED_NOT_LIVE for the other eight.

3.8 Alert taxonomy, human review, and the bilingual dashboard

Alerts are deduplicated spatially (IoU 0.4) with an 1800 s per-class/location cooldown; frames older than 1800 s and feeds older than 24 h are treated as stale. The output surface is an internal/preview self-assessment dashboard that explicitly carries a "Not a public product" banner. Automated public alert dispatch is STAGED_NOT_LIVE: email dispatch was gated off (alert_email.enabled=false) at verification, and the human-in-loop review workflow is not yet a live operational pipeline. This is stated plainly: the system is not claimed to operate an alerting pipeline.

The dashboard is fully bilingual (Italian default, English toggle) with browser-detected language and localStorage persistence. Translation-key parity is exact: the en: and it: dictionaries in public/i18n.js each contain exactly 240 keys (240/240). Per-string translation quality was not separately audited; the LIVE_OPERATIONAL claim is key-count parity. Status: LIVE_OPERATIONAL (bilingual UI); STAGED_NOT_LIVE (alert dispatch / human-in-loop).

Chapter 4 — Dataset and Evaluation Design

Operational classification of all Chapter-4/5 metrics: RESEARCH_ONLY (held-out offline evaluation; small-n flagged throughout). The headline system performance is a held-out benchmark, not a live alert-dispatch measurement.

4.1 Held-out sets

Two disjoint held-out sets are used, both real frames only:

Volcanic false-alarm (FA) set, n = 62. Bulletin-confirmed volcanic frames from MarkL02/ingv-etna-camera-historical, all ground-truth-negative for wildfire (summit degassing, ash, lava glow, strombolian activity, cloud, glare/sunset, snow). These are the confounders the system must not alarm on.
Daytime recall set, n = 18 (with an alternative n = 20). Detector-alarmed, genuinely-visible camera-fire frames from clean-source datasets — D-Fire (box-annotated) plus Roboflow/HPWREN/web — used to measure that the veto does not destroy recall. The Pyronear-sequence data is reported separately and never blended.

Operating-point confusion definition: GT-negative = the 62 volcanic frames; GT-positive = the 18 daytime genuinely-visible camera-fire frames (the conservative denominator).

4.2 The daytime recall denominator (both reported, honest reconciliation)

The headline conservatively excludes the entire four-frame night↔lava residual category from the daytime denominator → n = 18, 94.4% (17/18), matching the committed WF25_system_scoring. Under Config A only 2 of those 4 night frames are actually lost (07871, 07875); the other 2 (07723, 07773) pass through and alarm. Excluding only the 2 genuinely-lost frames gives the alternative n = 20, 95.0% (19/20), CI [76.4–99.1%]. Both are disclosed; the headline uses the conservative 17/18.

4.3 No-leakage controls

pHash leakage: 0 collisions between the external volcanic set and the v1b detector training images (eval_external_v1b.json). Honest caveat: the v1b train images live off-repo (DGX/RunPod workspace), so the train↔held-out diff is taken on documented provenance; the 62 held-out frames are independently confirmed internally distinct (62 unique pHashes). To fully close it, the train↔held-out collision list (expected empty) should be committed.
Deterministic VLM cache (temperature 0). Every routed-crop verdict is read from the committed temperature-0 cache (reports/crop_veto_outputs/te_crop_level_cpu_configA.json). WF25 scoring spent 0 new VLM calls, a stub was wired to raise on any cache miss (none occurred), and 245/245 per-frame decisions matched the stored Config A result bit-for-bit. Determinism is scoped to the served qwen3-vl-32b build (temperature 0 gave ±0 swing across three fresh-query repeats), not guaranteed in perpetuity if the served model changes.
Contamination handling. Two non-camera contamination frames (a false-colour Landsat-8 pan crop and a painting) were correctly rejected by the VLM and are excluded from the recall denominator rather than counted as true wildfire misses (CONTAM_016, CONTAM_017).

4.4 Taxonomy and confounders

The evaluation is built around the wildfire/volcanic confounder taxonomy: wildfire smoke and flame (positives), against volcanic ash, lava/incandescence, strombolian, degassing/steam, plus meteorological cloud, fog/haze, glare/sunset, snow, sensor/lens artifact. A representative hard-negative library is exported (Chapter 6), with web hard-negatives (dust, fog, industrial smoke, glare) listed for category coverage even where the images live off-repo (paths left blank, not fabricated).

Chapter 5 — Results

All Chapter-5 metrics are RESEARCH_ONLY (held-out offline, small-n), reproduced cache-only from the post-night-guard artifacts. The detector is frozen; the VLM verdicts are deterministic temperature-0 cache reads.

5.1 The full WF25 Gate-A table (detector-alone vs two-stage)

The complete performance specification for the shipping two-stage system — detector → crop-level Qwen3-VL veto, Config A (smoke pass-through) — measured end-to-end on the same real held-out frames, with Wilson 95% CIs:

Metric	Detector alone	Two-stage system	Δ	95% CI (two-stage, Wilson)	Evidence
wildfire smoke recall (daytime)	100% (17/17)	100% (17/17)	0.0 pp	[81.6 – 100%]	`per_frame_recall`, wildfire_smoke
wildfire flame recall (daytime)	100% (10/10)	90.0% (9/10)	−10.0 pp	[59.6 – 98.2%]	`per_frame_recall`, wildfire_flame
volcanic-plume FP rate	9.7% (6/62)	8.1% (5/62)	−1.6 pp	[3.5 – 17.5%]	survivors = summit-degassing smoke
steam/cloud/fog FP rate	0% (0/62)	0% (0/62)	0.0 pp	[0 – 5.8%]	no steam/cloud frame alarmed
artifact FP rate	1.6% (1/62)	0% (0/62)	−1.6 pp	[0 – 5.8%]	`9243ab` lens/sensor artifact removed
overall volcanic FP rate	9.7% (6/62)	8.1% (5/62)	−1.6 pp	[3.5 – 17.5%]	`A_external_volcanic_FA`
precision / PPV (operating)	0.750	0.7727	+0.023	—	§5.3 confusion
recall / sensitivity (daytime, n=18)	100% (18/18)	94.4% (17/18)	−5.6 pp	[74.2 – 99.0%]	`recall_daytime_only`
F1 (operating)	0.857	0.850	−0.007	—	§5.3 confusion
F2, recall-first (operating)	0.9375	0.9043	−0.033	—	§5.3 confusion
specificity (operating)	0.9032	0.9194	+0.016	—	§5.3 confusion
false negatives introduced by VLM	—	1 (borderline ground-lights, not a true fire)	—	—	§5.3 paired change
latency p50 / p90 / p99 (detector CPU)	243.6 / ~310 / ~335 ms	+ amortised VLM	—	n=245	`detector_latency.json`
VLM per routed crop (p50 / p95 / max)	—	992 / 1421 / 2053 ms	—	n=37	`vlm_per_routed_crop_ms`
VLM calls per frame / quiet / active	—	0.151 / 0 / 0.032	—	measured	`vlm_call_rate`
estimated cost per alert	—	~$5–15/mo all-in ($0 quiet, scale-to-zero)	—	—	`cost_model.json`

Reading note on per-class denominators. A frame can carry both a smoke box and a flame box, so the class counts (17 smoke + 10 flame) exceed the 18 unique daytime frames. The single daytime recall loss (dfire_AoF07872) is a flame box (settlement ground-lights), which is why flame recall shows −10 pp while smoke recall is untouched.

Reading note on latency tails. p90/p99 were not separately computed; the committed cache stores detector-CPU p50/p95/max (243.6 / 334.5 / 1876.5 ms). p90 ≈ 310 ms by interpolation; p99 ≈ the max-tail (the 1876 ms max is a single GC/IO outlier). These tail estimates and the frame-capture success rate are UNKNOWN_NEEDS_VERIFICATION and queued in the roadmap. End-to-end CPU mean ≈ 426 ms/frame (detector mean + 0.151 × VLM mean); a frame with one routed crop ≈ 1756 ms p95; quiet frames add 0.

5.2 Reproduction and no-leakage (re-confirmed)

Quantity	Value	Status
Routed hot crops in held-out set	37	reproduced
Served from temperature-0 cache	37	✅
New VLM calls this run	0	✅
Cache-miss stub raised	No	✅ deterministic
pHash leakage (external volcanic ↔ v1b train)	0 collisions	✅

Reproduce: python service/thesis_wf25_scorecard.py.

5.3 Paired-change confusion (detector-alone → two-stage) and McNemar

Volcanic FA set (n=62, GT-negative):

Transition	Count	Files
FP → TN (veto suppressed a false alarm)	1	`9243ab…` (sensor/lens artifact, VLM ruled NEITHER)
FP → FP (false alarm survived)	5	the 5 summit-degassing `wildfire_smoke` plumes (pass-through by design)
TN → FP (veto created a false alarm)	0	—
TN → TN (unchanged)	56	—

Recall set (GT-positive):

Transition	Count	Files
TP → TP (fire kept)	17 (daytime)	—
TP → FN (veto vetoed a fire)	0 daytime · 2 night-residual (BEFORE guard) → 0 silent FN (AFTER guard)	`dfire_AoF07871`, `dfire_AoF07875` — now surfaced `uncertain_night`, not dropped
FN → TP (veto recovered a fire)	0	—
FN → FN (unchanged)	0	—

McNemar (exact, two-sided) over the full paired decision set (62 volcanic + 18 daytime recall frames): discordant b (det-alone alarm, two-stage no-alarm) = 2; discordant c (det-alone no-alarm, two-stage alarm) = 0; exact two-sided p = 0.50 — not significant. All discordant pairs are one-directional (b>0, c=0): the veto only ever removes alarms, never adds one. Its entire measured effect on this held-out set is the removal of 2 alarms — 1 volcanic sensor-artifact FP and 1 borderline daytime ground-lights frame.

5.4 Plain finding (as the directive requires)

On this held-out set the Qwen3-VL veto changes exactly 2 of 80 paired decisions, both removals. It suppresses 1 volcanic sensor-artifact false alarm (9243ab: 6/62 → 5/62 FA) and drops 1 borderline daytime ground-lights frame (dfire_AoF07872: 18/18 → 17/18 recall). It does not improve precision or recall on any genuine smoke or flame case, it creates no new false alarm, and it does not reach 0% volcanic FA — the 5 survivors are summit-degassing smoke that passes through by design. The veto's honest, measured value is the removal of one artifact false alarm; on every genuine-fire and genuine-degassing decision it leaves the detector unchanged. A larger FA reduction (8.1% → 3.2%) is available only by also routing smoke (Config B), at the cost of routing the contamination frames and incurring 2 additional night-lava recall losses; the prior whole-frame veto reached 0% FA but cost ~20.8% recall. Config A is the recommended default.

5.5 Recall preservation (Gate B) and the night-safety re-score

Gate-B question: does the VLM ever veto a true wildfire smoke/flame case?

Daytime genuinely-visible fire: NO — recall preservation HOLDS. The VLM vetoes 0 true daytime smoke/flame frames. The only daytime veto (dfire_AoF07872) is settlement ground-lights, not a fire.
Night (BEFORE guard): YES — 2 true vegetation-fire frames were vetoed VOLCANIC (dfire_AoF07871 @0.818, dfire_AoF07875 @0.878). By single-frame appearance their flame is near-identical to lava incandescence, and the volcano-primed VLM sided VOLCANIC.
Night (AFTER guard): NO silent FN — the residual is MITIGATED. Both frames are detected at panel mean-grey 3.0 / 3.2 (well below the 12 threshold), off-crater, with no FIRMS → the guard fires → each is surfaced as uncertain_night / needs_review instead of being dropped.

Re-scored result (same temperature-0 cache, 0 new VLM calls):

	Night true-fire FN	Daytime true-fire FN	Volcanic FA (n=62)
Before guard	2 (07871, 07875 silently vetoed VOLCANIC)	0	8.1% (5/62)
After guard	0 (both surfaced as `uncertain_night`)	0	8.1% (5/62) — unchanged

The volcanic FA is provably unchanged: 0 of the 62 volcanic frames are dark enough (all bright daytime, mean > 12) to trip the night guard, so the guard structurally cannot touch the 5 daytime-degassing survivors. The veto is consequently a recall-safe veto at night via the corroboration override — a daytime advisory precision layer AND a night veto that is corroboration-gated so it can never produce a silent off-crater night false-negative. (Reproduce: python service/rescore_wf25_night_guard.py.)

False-negative bound: daytime true-fire FN introduced by the VLM = 0 (Wilson upper bound on the observed 1/18 daytime FN — a non-fire frame — is 25.8%, which the small sample cannot tighten).

5.6 WF36 per-class corroboration matrix (the 5-corroborable / 8-STAGED split)

Cell legend: ✔ confirms · ◐ supports · ✗ contradicts · – unavailable · ∅ not-applicable · ⏳ stale · ≈ too-coarse. (Source-column keys as in wf36_corroboration_matrix.md §1.)

Class (Gate-C)	detector class?	FCI/SEV	FIRMS	SLSTR	TROPOMI	CAMS	OSM	CAM
wildfire smoke	YES (`wildfire_smoke`)	–	✔/◐	◐	∅	∅	–	–
wildfire flame	YES (`wildfire_flame`)	–	✔/◐	◐	∅	∅	–	–
volcanic ash plume	YES (`ash_plume`)	✔(SEV)	∅	∅	∅	◐⏳	∅	–
volcanic steam/degassing	YES (3 classes)	–	∅	∅	✔(SO₂)	◐⏳	∅	–
lava / incandescence	YES (5 classes)	◐(FCI)	✔(on-crater)	◐(FRP)	∅	∅	∅	–
meteorological cloud	partial (context-only)	–	∅	∅	∅	–	∅	∅
fog / haze	NO	–	∅	∅	∅	–	∅	∅
industrial smoke	NO	–	–	–	–	–	(avail, NOT wired)	–
dust / quarry / road dust	NO	–	∅	∅	∅	≈	(avail, NOT wired)	–
glare / sun / reflection	partial (context-only)	∅	∅	∅	∅	∅	∅	∅
camera artifact	NO	∅	∅	∅	∅	∅	∅	(target, NOT wired)
black / frozen / stale frame	partial (context-only)	∅	∅	∅	∅	∅	∅	(target, NOT wired)
unknown	NO	–	–	–	–	–	–	–

Genuinely corroborable now (5): wildfire smoke, wildfire flame (LIVE rules; uncorroborated this cycle — on-crater 0.4 km FIRMS, correctly not a wildfire confirmation), lava/incandescence (corroborated VOLCANIC this cycle), volcanic ash plume (corroborated via SEVIRI coverage only — weak/contextual, CAMS AOD was stale), volcanic steam/degassing (LIVE rule; uncorroborated this cycle — TROPOMI null + CAMS stale + value below the elevated floor). STAGED_NOT_LIVE (8): meteorological cloud, glare, frozen-frame (context labels only), fog/haze, industrial smoke, dust, camera artifact, unknown (no rule / no detector class). Per the directive's §8 hard-stop check, no over-claim is required and no hard-stop is triggered, provided the thesis restricts corroboration claims to the five rule-backed classes with their cycle-level qualifiers and labels the other eight STAGED — which it does.

5.7 Operational spec snapshot (Gate E)

Key LIVE_OPERATIONAL rows (full table in system_performance_spec.md): 5 configured cameras / 3 online; 75 s cadence; VLM trigger = routed hot/bright crops only; 64 feeds (46 live / 7 stale / 10 catalogued / 1 key-pending / 0 error); FWI 13.06 moderate; hourly EtnaFeedsRefresh Running; dedup IoU 0.4 / cooldown 1800 s; 240/240 bilingual parity; OFFLINE camera health honest; Overpass degradation guard active. STAGED_NOT_LIVE: human-review / alert-email dispatch. UNKNOWN_NEEDS_VERIFICATION: frame-capture success rate; p90/p99 latency.

5.8 Quantum evaluation (edge-of-research due diligence — honest negative)

Operational classification: RESEARCH_ONLY. SIMULATION ONLY — no QPU was contacted, the IBM Quantum key was not read. All compute was light local CPU (statevector simulation of ≤4 qubits, ~46 s wall). A quantum win is asserted only where a paired-difference CI excludes zero.

A fresh, real-data quantum-versus-classical benchmark was run directly on the INGV task: discriminate VOLCANIC vs WILDFIRE thermal-anomaly events near Etna — the populated-flank problem INGV's own literature calls spectrally hard. Data: 33 Etna-edifice volcanic events (GVP/INGV weekly state oracle + FIRMS FRP) and 404 vegetated-flank wildfire events (real FIRMS active fire in the ≤25 km annulus), n = 437, 182 date groups, GroupKFold by date (leakage-guarded). The hard near-field intrinsic regime uses thermal magnitude / FIRMS multiplicity only (log_frp_max, log_frp_sum, log_firms_count, n_firms_sensors), with no geometry and no source-availability proxies (which correlate perfectly with class by construction and are stripped). With 4 features = 4 qubits, the quantum map sees the full signal with no PCA information loss — the fairest possible footing.

Out-of-fold AUC (grouped-by-date, n=437):

Model	Type	OOF AUC	95% CI
Classical RBF-SVM (matched)	classical	0.936	[0.889, 0.973]
Classical HistGB (matched)	classical	0.918	[0.867, 0.964]
Classical HistGB (full features)	classical	0.892	[0.820, 0.947]
Quantum fidelity kernel (ZZ, 4q)	quantum	0.837	[0.767, 0.896]
Quantum VQC (4q, 2-layer)	quantum	0.702	[0.608, 0.790]

Paired AUC deltas (quantum − classical, bootstrap 95% CI):

Comparison	Δ AUC	95% CI	Read
Q-kernel − RBF (matched)	−0.099	[−0.161, −0.038]	quantum worse, CI excludes 0
Q-kernel − HistGB (matched)	−0.081	[−0.139, −0.028]	quantum worse, CI excludes 0
VQC − RBF (matched)	−0.234	[−0.325, −0.150]	quantum much worse, CI excludes 0
Q-kernel − HistGB (full)	−0.055	[−0.129, +0.021]	tie/worse (CI brackets 0)

Honest interpretation. This is a clean, CI-backed publishable negative: on the same real features and the same leakage-guarded split, the quantum fidelity kernel (0.837) is beaten by matched classical RBF-SVM (0.936) by −0.099 [−0.161, −0.038]; the VQC (0.702) is the worst model tested. Notably the loss is not a dimensionality-truncation artifact — with only 4 features the 4-qubit map sees the full signal — it is the encoding/kernel-geometry mismatch and the VQC generalisation ceiling themselves. Because statevector simulation is exact, the classification verdict will not improve on real hardware (device noise only hurts). The right tool for this classification task is classical.

The one genuine novelty (preserved as a research line, not an operational claim): the formulation of volcanic deformation source inversion (Mogi/Okada) as a QUBO/Ising problem. On a synthetic-realistic Etna GNSS geometry, the multi-source / model-selection variant solved by simulated annealing matches the exact optimum 100% [89–100%] where multi-start Levenberg–Marquardt traps at 60% and greedy/OMP at 0%. Honest caveat: this win is shared by a classical simulated-annealing sampler — it is a QUBO-formulation success, not quantum-hardware advantage. The Mogi-single-source-QUBO and Dozier-sub-pixel-as-QAOA mappings are, to our knowledge, literature firsts (pending peer confirmation). Overall quantum verdict: worth evaluating and worth formulating, but on the classification tasks that actually run the monitor it does not add operational value — classical wins with CIs that exclude zero. The QPU gate remains BLOCKED pending explicit approval; no quantum hardware was used anywhere. (Reproduce: python quantum_disambiguator.py, ~46 s, statevector sim only.)

Chapter 6 — Failure Modes and Limitations

This chapter is exhaustive by design (Gate F). The machine-readable manifest is failure_case_manifest.csv (38 rows); 24 source frames + 24 thumbnails are exported to failure_crops/. Read-only build: the live multi_cam_service was not disturbed.

6.1 The night-fire↔lava residual (the only true wildfire-recall loss)

Two frames are the only true wildfire-recall losses in the shipping Config A, reported separately and excluded from the daytime denominator:

case_id	frame	detector	VLM verdict	crop
FN_007	`dfire_pos_AoF07871`	wildfire_flame @0.818	VOLCANIC "glowing, irregularly shaped incandescence consistent with lava flow or vent activity"	`failure_crops/FN_007.jpg`
FN_009	`dfire_pos_AoF07875`	wildfire_flame @0.878	VOLCANIC "bright, diffuse glow … consistent with summit incandescence or strombolian activity"	`failure_crops/FN_009.jpg`

A bright nighttime vegetation-fire line and lava incandescence are not separable from single-frame appearance; the volcano-context prompt that gives the system its low volcanic FA is exactly what mis-routes these two. This is mitigated, not solved. The night-safety guard (Chapter 3.6 / 5.5) converts these from silent false-negatives into surfaced uncertain_night alerts (night silent FN 2 → 0), but the underlying single-frame ambiguity remains. Recommended next steps (future work, not claimed operational): multi-frame temporal persistence (lava is steady; wildfire flickers and spreads) and a hard FIRMS/SLSTR night co-location override. Config B incurs 2 additional night-lava losses (AoF07723, AoF07773; FNB_010, FNB_011) — the documented recall price of pushing volcanic FA from 8.1% to 3.2%, and the reason Config A is the default.

6.2 The borderline town-lights veto

FN_008 (dfire_pos_AoF07872, wildfire_flame @0.858/0.713 → NEITHER, "artificial ground lights, likely from settlement") is the single frame that moves daytime recall 100% → 94.4% (17/18). It is an arguably-correct rejection of distant settlement/ground lights, counted conservatively as a recall loss so the headline is not inflated. Crop: failure_crops/FN_008.jpg.

6.3 The five surviving volcanic false alarms

The Config-A veto suppresses exactly one volcanic FP (FP_003 / 9243ab, sensor-artifact/lens-flare boxed as flame, VLM ruled NEITHER, 9.7% → 8.1%). The 5 survivors (FP_001, FP_002, FP_004, FP_005, FP_006) are summit passive-degassing steam / cloud / ash mis-boxed as wildfire_smoke; smoke-class crops pass through by design (this is what preserves wildfire recall), so the veto cannot suppress them in Config A. All six source frames are exported (failure_crops/FP_001.jpg … FP_006.jpg).

6.4 Prior frame-level-veto losses (motivating the crop-level design)

The prior whole-frame veto reached 0/62 volcanic FA but destroyed ~20.8% of genuinely-visible fire recall (FRAMEVETO_012–015: HPWREN/Roboflow faint distant horizon smoke called "haze/cloud" by the volcano-context whole-frame prompt). The shipping crop-level Config A recovers every one — a grey distant plume is not red-hot, is never routed to the VLM, and can never be vetoed. Retained as the honest upper-bound failure mode of the alternative architecture.

6.5 Representative hard negatives

Volcanic confounder library (exported real frames): HN_018 steam/degassing, HN_019 ash plume, HN_020 lava glow, HN_021 strombolian, HN_022 cloud obscuration, HN_023 glare/sunset, HN_024 snow cover. Web hard negatives (dust, fog, industrial smoke, glare, other; HNWEB_025–029) carry blank frame paths because the images live on the eval pod, not in this checkout — stated as fact, not fabricated. Categories with 0 locally-exportable instances (industrial smoke, dust, compression-artifact) are stated as such rather than invented.

6.6 Honesty flags and known limitations (consolidated)

Cameras: 3 of 5 online at verification. INGV EtnaTVChn and EtnaWalk were OFFLINE; the system reports OFFLINE honestly. No claim of "5 live cameras."
7 stale feeds (Chapter 3.2 / failure_appendix.md §8) are excluded from any "live" count and treated as no-evidence by corroboration. The 10 catalogued + 1 key-pending feeds are not live.
p90 / p99 latency and frame-capture success rate are UNKNOWN_NEEDS_VERIFICATION — not instrumented; only detector-CPU p50/p95/max exist. They are queued in the roadmap; the thesis quotes p95/max as the available bounds.
Alert email dispatch is STAGED_NOT_LIVE (ALERT_EMAIL_ENABLED=0); the dashboard is internal/preview ("Not a public product"); human-in-loop review is not a live workflow.
/api/cams transient TLS resets observed (2 of 3 fetches; recovered on retry, local publisher corroborated). A monitoring item, not a hard stop.
The VLM is an advisory precision layer, made recall-safe at night by the corroboration override — it is not a general early-smoke detector (consistent with SmokeBench [7]).
8 STAGED corroboration classes must not be claimed as detected or corroborated.
Small-n CIs. n=62 (FA) and n=18 (daytime recall) give wide Wilson CIs; the McNemar is underpowered (2 discordant pairs). Every point estimate is directional; the veto's only confirmed benefit (1 artifact FP) is within CI noise. The thesis does not over-claim a system-level FP improvement.
Leakage is recorded as 0 but the train↔held-out collision list is not committed (train images off-repo); fully closing it is a roadmap item.
top_p not pinned in the VLM request payload (server default) — a reproducibility gap to close.
Determinism is scoped to the served qwen3-vl-32b build, not guaranteed in perpetuity.
0 frozen-frame and 0 satellite-contradiction incidents were observed in the held-out set (FROZEN_037, SATCON_038); none is invented. The night dark-RGB condition is handled as a true quiet-scene negative, not mis-read as a frozen failure.

Chapter 7 — Future Work

Every item here is PLANNED until it has live health evidence; nothing is described as operational. The thesis baseline stays frozen at 4d1b0cc / ingv_v1b_best.pt / qwen3-vl-32b temp 0.0 — new models are challengers, not baseline swaps. No edge/Hailo/DGX hardware is assumed (ops-room target: detector CPU/GPU auto, VLM remote serverless).

7.1 Phase 0–2 weeks — close Phase-1 gaps and harden the freeze

Instrument frame-capture success rate (rolling per-source online/offline + frame-age counter, replacing the point-in-time "3 of 5 online"); compute p90/p99 latency from existing per-frame arrays including the routed-crop VLM tail, separately for CPU and GPU; pin top_p in the VLM request; harden /api/cams with a retry/health probe and a freshness-stamped fallback to cameras_wall.json; add a per-frame pHash-identical-consecutive-frame guard (catch a frozen-but-recent camera); lock WF36 STAGED wording.

7.2 Phase 1 month — evaluation rigor

Enlarge the held-out sets (more bulletin-confirmed volcanic frames and clean-source daytime wildfire frames; keep pHash-leakage-zero and the Pyronear-sequence separation); commit the train↔held-out collision list; run the WARP-style hard-negative robustness battery [12] (Gaussian noise, JPEG compression, blur, cloud-like patches, fog/haze, glare, rain-on-lens, timestamp overlays, black/frozen frames) and report detector + two-stage degradation curves.

7.3 Phase 3 months — domain adaptation and segmentation challengers

Domain adaptation [13,14,15,16] is the core next pillar: collect unlabeled frames from every live camera, mine detector positives + detector–VLM disagreements + VLM vetoes, human-review a small hard set, train with UDA/self-training, and test on held-out camera/date/weather/volcanic episodes, guarding against catastrophic false-positive drift. Detector/segmentation challengers (RESEARCH_ONLY): YOLO11/YOLO12 [4], RT-DETR [5], Grounding DINO 1.5/Edge [6], SAM 2 video plume masks [23,24], optical-flow smoke-motion-consistency [21,22], sky/terrain masking — benchmarked against the frozen YOLO11s, not swapped in. Prompt-as-evaluated-component study [10,11]: measure the recall-versus-FA tradeoff across CROP/SMOKE prompt variants, SmokeBench-style [7].

7.4 Phase 6 months — fine-tuned VLM, IR/thermal fusion, FCI event tracking

Fine-tuned Etna/wildfire VLM [20]: fine-tune a Qwen2.5/Qwen3-VL-style model on Etna degassing-vs-wildfire-vs-cloud crops, keeping the frozen 32B as the thesis baseline. IR/thermal fusion [17,18,19,20]: RGB+IR fusion, thermal/night detection, fire/lava/industrial-heat discrimination, satellite-thermal corroboration — directly attacking the night-fire↔lava residual. MTG-FCI event tracking [25,26,27]: move FCI from coverage-only corroboration to FireDyn-style fire-pixel extraction, rate-of-spread, FRP evolution, and fire-arrival maps, treating FCI as early-candidate + event-tracking data, not perfect ground truth. Active-learning dashboard + data flywheel: detector fires → candidates, VLM disagreements → hard examples, human decisions → labels, satellite corroboration → weak labels, INGV bulletins → volcanic labels, stale/frozen frames → camera-health labels.

7.5 Research-only (no committed date)

VLM/MLLM early-smoke localization limits (keep detector-first + VLM-veto per current evidence [7]); FireCLIP-style multimodal prompt tuning [10]; TROPOMI SO₂ + CAMS as contextual-only evidence [31,32,33]; FCI-vs-SEVIRI sensitivity study [25] and SLSTR active-fire characterization [29,30]; multi-season dataset expansion across lighting/weather/angle/episode shift; operational human-in-loop alerting protocol design (currently STAGED) before any public-facing claim; and the volcanic-source-inversion-as-QUBO research line (Chapter 5.8) — a parallel research thread, QPU BLOCKED.

References

Verification status: all 35 references VERIFIED (paper/source confirmed to exist via arXiv / publisher / IEEE / ScienceDirect / Hugging Face papers index, matching title and authors); [34] and [35] are official documentation (verified, non-paper), cited deliberately for the operational feed-reliability concept.

Govil, K.; Welch, M.L.; Ball, J.T.; Pennypacker, C.R. (2020). Preliminary Results from a Wildfire Detection System Using Deep Learning on Remote Camera Images. Remote Sensing 12(1):166. https://doi.org/10.3390/rs12010166
Dewangan, A.; Pande, Y.; Braun, H.-W.; Vernon, F.; Perez, I.; Altintas, I.; Cottrell, G.W.; Nguyen, M.H. (2021). FIgLib & SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection. arXiv:2112.08598. https://arxiv.org/abs/2112.08598
Lostanlen, M.; Veith, F.; Buc, C.; Barriere, V. (2024). Constructing a Real-World Benchmark for Early Wildfire Detection (PyroNear-2025 Dataset). arXiv:2402.05349. https://arxiv.org/abs/2402.05349
Tian, Y.; Ye, Q.; Doermann, D. (2025). YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv:2502.12524 (NeurIPS 2025). https://arxiv.org/abs/2502.12524
Zhao, Y.; Lv, W.; Xu, S.; et al. (2024). DETRs Beat YOLOs on Real-time Object Detection (RT-DETR). CVPR 2024; arXiv:2304.08069. https://arxiv.org/abs/2304.08069
Ren, T.; et al. / IDEA-Research (2024). Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection. arXiv:2405.10300. https://arxiv.org/abs/2405.10300
Qi, T.; Li, W.; Barnes, N. (2025). SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection. WACV 2026; arXiv:2512.11215. https://arxiv.org/abs/2512.11215
Zhang, C.; Wang, S. (2024). Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data. arXiv:2401.17600. https://arxiv.org/abs/2401.17600
Danish, M.S.; Munir, M.A.; Shah, S.R.A.; Kuckreja, K.; Khan, F.S.; Fraccaro, P.; Lacoste, A.; Khan, S. (2024). GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks. arXiv:2411.19325. https://arxiv.org/abs/2411.19325
FireCLIP: Enhancing Forest Fire Detection with Multimodal Prompt Tuning and Vision-Language Understanding. Fire (MDPI) 8(6):237, 2025. https://www.mdpi.com/2571-6255/8/6/237
Adhikari, R.; Thapaliya, S.; Dhakal, M.; Khanal, B. (2024). TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models. arXiv:2410.05239. https://arxiv.org/abs/2410.05239
Ide, R.; Yang, L. (2024/2025). Adversarial Robustness for Deep Learning-based Wildfire Detection Models (WARP). arXiv:2412.20006; Fire (MDPI) 8(2):50. https://arxiv.org/abs/2412.20006
Multilevel feature cooperative alignment and fusion for unsupervised domain adaptation smoke detection. Frontiers in Physics 11:1136021, 2023. https://www.frontiersin.org/articles/10.3389/fphy.2023.1136021/full
EDIF: boosting unsupervised cross-domain forest fire smoke detection with enhanced domain-invariant features. Geomatics, Natural Hazards and Risk, 2025. https://www.tandfonline.com/doi/full/10.1080/19475705.2025.2556144
Generative AI for Enhanced Wildfire Detection: Bridging the Synthetic-Real Domain Gap. arXiv:2511.16617, 2025. https://arxiv.org/abs/2511.16617
Pesonen, J.; Hakala, T.; Karjalainen, V.; Koivumäki, N.; Markelin, L.; Raita-Hakola, A.-M.; Suomalainen, J.; Pölönen, I.; et al. (2024). Detecting Wildfires on UAVs with Real-time Segmentation Trained by Larger Teacher Models. arXiv:2408.10843. https://arxiv.org/abs/2408.10843
A UAV-Based Multi-Scenario RGB-Thermal Dataset and Fusion Model for Enhanced Forest Fire Detection. Remote Sensing 17(15):2593, 2025. https://www.mdpi.com/2072-4292/17/15/2593
MCDet: Target-Aware Fusion for RGB-T Fire Detection. Forests 16(7):1088, 2025. https://www.mdpi.com/1999-4907/16/7/1088
A Study on Flame Detection Method Combining Visible Light and Thermal Infrared Multimodal Images. Fire Technology, 2024. https://link.springer.com/article/10.1007/s10694-024-01676-9
Habibpour, M.; Alipour Talemi, N.; Spodnik, J.; Khoury, C.J.; Afghah, F. (2026). WildFireVQA: A Large-Scale Radiometric Thermal VQA Benchmark for Aerial Wildfire Monitoring. arXiv:2604.20190. https://arxiv.org/abs/2604.20190
Yuan, F. (2014). Spatiotemporal bag-of-features for early wildfire smoke detection (HOOF temporal feature). Image and Vision Computing 32(1):24–33. https://doi.org/10.1016/j.imavis.2013.08.001
Zhao, Y.; et al. (2015). Forest Fire Smoke Video Detection Using Spatiotemporal and Dynamic Texture Features. Journal of Electrical and Computer Engineering 2015:706187. https://onlinelibrary.wiley.com/doi/10.1155/2015/706187
Ravi, N.; Gabeur, V.; Hu, Y.-T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; et al. (2024). SAM 2: Segment Anything in Images and Videos. arXiv:2408.00714. https://arxiv.org/abs/2408.00714
Ugwu, E.U.; Xinming, Z. (2025). Promptable Fire Segmentation: Unleashing SAM2's Potential for Real-Time Mobile Deployment with Strategic Bounding Box Guidance. arXiv:2510.21782. https://arxiv.org/abs/2510.21782
Major improvements in spaceborne early fire detection and small-fire FRP retrieval with the Meteosat Third Generation Flexible Combined Imager. Science of Remote Sensing, 2026. https://www.sciencedirect.com/science/article/pii/S2666017226000040
Paugam, R.; Filippi, J.-B.; Benali, A.; Gomes, J.; Xu, W.; Dutra, E.; Andre, F.; Boulanger, D.; Retornard, V.; Meraner, A.; Harvie, J.; Penot, V.; Denjean, C. (2026). Leveraging MTG-FCI fire observations for event-based fire behavior monitoring (FCI-FireDyn / Fire Event Tracker). arXiv:2606.06016. https://arxiv.org/abs/2606.06016
Unsupervised Wildfire Detection Using Multispectral MTG-FCI Data: A Feasibility Study. Journal of Imaging 12(6):229, 2026. https://doi.org/10.3390/jimaging12060229
Schroeder, W.; Oliva, P.; Giglio, L.; Csiszar, I.A. (2014). The New VIIRS 375 m active fire detection data product: Algorithm description and initial assessment. Remote Sensing of Environment 143:85–96. https://doi.org/10.1016/j.rse.2013.12.008
Xu, W.; Wooster, M.J. (2023). Sentinel-3 SLSTR Active Fire (AF) Detection and FRP Daytime Product — Algorithm Description and Global Intercomparison to MODIS, VIIRS and Landsat AF Data. Science of Remote Sensing 7:100087. https://www.sciencedirect.com/science/article/pii/S2666017223000123
Wooster, M.J.; Xu, W.; Nightingale, T. (2012). Sentinel-3 SLSTR active fire detection and FRP product: pre-launch algorithm development and performance evaluation using MODIS and ASTER datasets. Remote Sensing of Environment 120:236–254. https://doi.org/10.1016/j.rse.2011.09.033
Theys, N.; Hedelt, P.; De Smedt, I.; Lerot, C.; Yu, H.; Vlietinck, J.; Pedergnana, M.; et al. (2019). Global monitoring of volcanic SO2 degassing with unprecedented resolution from TROPOMI onboard Sentinel-5 Precursor. Scientific Reports 9:2643. https://www.nature.com/articles/s41598-019-39279-y
Exploiting Sentinel-5P TROPOMI and Ground Sensor Data for the Detection of Volcanic SO2 Plumes and Activity in 2018–2021 at Stromboli, Italy. Sensors 21(21):6991, 2021. https://www.mdpi.com/1424-8220/21/21/6991
Theys, N.; et al. Sulfur dioxide retrievals from TROPOMI onboard Sentinel-5 Precursor: Algorithm Theoretical Basis. Atmospheric Measurement Techniques. https://amt.copernicus.org/articles/10/119/2017/
NASA FIRMS — Fire Information for Resource Management System: product and latency documentation (VIIRS/MODIS active fire). NASA Earthdata. https://www.earthdata.nasa.gov/data/tools/firms/faq
EUMETSAT — Meteosat Third Generation Instruments (FCI) documentation. https://www.eumetsat.int/meteosat-third-generation-instruments

Appendices

Appendix A — Evidence ledger summary

Every headline claim traces to thesis_evidence_ledger.csv (claim_id, operational_status, evidence_path, commit, command/query, timestamp, metric, CI, limitation, thesis-safe wording). Summary of the 16 ledger rows:

Claim	Status	Metric	Evidence
C01 Frozen baseline	LIVE_OPERATIONAL	commit `4d1b0cc` (master)	`git rev-parse HEAD` @ 02:31Z
C02 64-feed board	LIVE_OPERATIONAL	46/7/10/1/0 of 64	feeds.json @ 02:31:33Z
C03 Overpass guard	LIVE_OPERATIONAL	23,379 roads / 341 rail (live)	`osm_roads_rail.py` L64-72
C04 Live FWI	LIVE_OPERATIONAL	13.06 (moderate)	feeds.json `effis_fwi`
C05 Camera wall	LIVE_OPERATIONAL (3/5)	75 s cadence, 3 online	`/api/cams` @ 02:44:37Z
C06 Camera sources	LIVE_OPERATIONAL	5 (INGV + 3 Windy + EtnaWalk)	`/api/cams` `sources[]`
C07 VLM trigger	LIVE_OPERATIONAL	~0.151/frame, 0 quiet	`/api/cams` + WF25
C08 WF25 volcanic FA	RESEARCH_ONLY	8.1% (5/62), CI [3.5–17.5%]	WF25_system_scoring.json
C09 WF25 daytime recall	RESEARCH_ONLY	94.4% (17/18), CI [74.2–99.0%]	WF25_system_scoring.json
C10 Deterministic cache	RESEARCH_ONLY	245/245 match, 0 leakage	WF25 provenance
C11 WF36 matrix	LIVE_OPERATIONAL	13 classes; >24 h = no-evidence	`corroboration.py` @ 02:56:39Z
C12 Bilingual parity	LIVE_OPERATIONAL	240 EN = 240 IT	`i18n.js`
C13 Scheduled jobs	LIVE_OPERATIONAL	both Running; feeds hourly	`schtasks` @ 02:32Z
C14 Model/prompt freeze	LIVE_OPERATIONAL	sha256 c0a3d0ea…; temp 0.0	`model_prompt_freeze.json`
C15 Dedup/staleness guards	LIVE_OPERATIONAL	IoU 0.4; 1800 s; 24 h	`config.py`
C16 Alert dispatch	STAGED_NOT_LIVE	`alert_email.enabled=false`	banner + cameras_wall.json

Appendix B — Failure appendix and crop references

failure_case_manifest.csv (38 rows); failure_crops/ (24 source frames + 24 thumbnails). Key illustrative crops: night-fire↔lava residual FN_007.jpg, FN_009.jpg; borderline town-lights FN_008.jpg; surviving volcanic FA FP_001.jpg … FP_006.jpg (FP_003 = the one vetoed sensor-artifact); Config-B-only night losses FNB_010.jpg, FNB_011.jpg; prior frame-level-veto losses FRAMEVETO_012–015.jpg; volcanic confounder library HN_018 … HN_024; contamination CONTAM_016 (false-colour satellite), CONTAM_017 (painting) — both correctly VLM-rejected and excluded from the recall denominator. Web hard-negatives HNWEB_025–029 carry blank paths (images off-repo, stated not fabricated). Vetoed-crop originals: crops/VETOED_dfire_pos_AoF07871_…_VOLCANIC.jpg, …AoF07875_…_VOLCANIC.jpg, …AoF07872_…crop0/crop1_NEITHER.jpg.

Appendix C — Model and prompt freeze manifest

Full manifest: model_prompt_freeze.json (commit 4d1b0cc, frozen 2026-06-29T02:45:00Z). Detector: YOLO11s 19-class, models/ingv_v1b_best.pt, sha256 c0a3d0ead257d318e70bec3bb84feaec7b99e9e3d55b132fc5f1ffd405cf0a20, 19,261,267 bytes, imgsz 960, conf 0.25, crop pad 0.25 / max 768 px, dedup IoU 0.4, cooldown 1800 s. VLM: qwen3-vl-32b, Model-Vault RunPod serverless OpenAI-compatible endpoint, crop-level routing, temperature 0.0, max_tokens 120, top_p server-default, retries 3 / backoff 8.0 s / timeout 180 s, image crop long-side ≤768 px JPEG q88 base64. Two frozen prompts (CROP_PROMPT primary veto, SMOKE_PROMPT degassing-vs-plume) with verbatim text in the manifest; strict-JSON output {label, confidence, reason} with PARSE_FAIL fallback. Cache: deterministic temp-0 per-crop (reports/crop_veto_outputs/te_crop_level_cpu_configA.json). Hardware: local OFFICE workstation (CPU/GPU auto) + remote serverless VLM; no DGX, no PHOENIX prod, no edge/Hailo assumed.

Appendix D — Full corroboration matrix

Full per-class matrix, per-class implemented rule + worked example on current live feeds + honest caveat: wf36_corroboration_matrix.md (implementation service/corroboration.py @ 4d1b0cc, feed snapshot 2026-06-29T02:56:39Z). Five rule-backed classes (wildfire smoke, wildfire flame, lava/incandescence, ash plume, steam/degassing) with cycle-level qualifiers; eight STAGED classes (cloud, glare, frozen-frame, fog/haze, industrial smoke, dust, camera artifact, unknown). Freshness rule: feeds older than FEED_MAX_AGE_H = 24 h are treated as no current evidence (uncorroborated, never contradicted). Reproduce: ETNA_FEEDS_OUT=../etna_dashboard/feeds/out python corroboration.py.

Prepared for the INGV deliverable. Frozen baseline commit 4d1b0cc. Every quoted number traces to a committed Phase-1 evidence artifact in flank_wildfire/reports/thesis/. The system is reported as an evidence-backed multi-source monitoring architecture with explicit operational classification, confidence intervals, and failure modes; it is not claimed to solve early wildfire detection generally.