Evidence-backed system specification & evaluation · specifica e valutazione del sistema basate su evidenze. Open in browser or Print → Save as PDF.
Proof of value — check us against your own reports
In plain terms. Imagine a fire lookout on Mount Etna. A wildfire on the mountain's
flank and the volcano's own glowing lava look almost identical — both are hot, bright and
smoky — and telling them apart by colour alone is essentially impossible. A naive
smoke-camera, pointed at an erupting volcano, cries “fire!” at every lava glow,
fumarole and ash puff (our raw detector did exactly that). The skill shown here is
teaching the system to do what a seasoned observer does: learn what lava, ash and degassing
actually look like, and confirm a real wildfire only when an independent witness —
a fire satellite, a burn scar, or the geometry of the heat relative to the vent — agrees.
The grades below are the false-alarm rate (how often it calls volcano-fire when there is
none), recall (how many real wildfires it catches), and agreement with INGV's own
reports — each with its sample size n and a confidence interval, and including the
cases we missed, because for a scientific partner the misses are the credibility.
This page exists to show, not claim. For each event we place what ADRIZ
independently produced next to INGV's own public report for the same date, with a live
link so you can verify every number yourself. We include the cases where we
missed or disagreed — credibility comes from showing those too. Every rate carries a
sample size n and a 95% confidence interval; read the CI, not the point estimate.
Primary INGV source for the volcanic rows: the INGV-OE weekly bulletin,
mirrored verbatim in English by the Smithsonian GVP (Etna 211060). Navigate the GVP Etna archive
to the week label in each row. Reproduction method per claim:GVP · Etna 211060 ·
ingv_proof_of_value.md
· raw data:proof_of_value.json.
1. Event timeline — ADRIZ vs INGV's own report (with live links)
3. Model proof — tested against INGV's own camera frames
4. Honest limits — where we miss
Stated plainly, not buried
5. Current literature & alignment — does this support or contradict published science?
Claim status: SUPPORTED — the central results agree with the peer-reviewed
volcano-remote-sensing literature; where an earlier internal figure overstated, we retract it
below and the corrected value matches what the literature would predict. Every statement is
reconcilable against a public source.
The volcano-hardened detector AGREES with the literature. Torrisi et al.
(2025; 2024) classify Etna volcanic clouds (ash / SO2 / mixed) from multispectral
geostationary imagery, and Corradino et al. (2023) apply deep learning to subtle
volcanic thermal anomalies — both treat the problem as multi-class, multi-cue plume/heat
disambiguation rather than a single spectral threshold. Our camera detector is the same problem
moved onto INGV's ground-camera frames: it learns lava / ash / degassing as their own classes
and confirms a wildfire only with independent off-vent corroboration (NASA FIRMS, Sentinel-2
SWIR). [21][22][23]
“Spectrally impossible” is grounded, not rhetorical. Lava is incandescent
— it shares the thermal and short-wave-infrared signature of fire — so the
volcano-thermal literature establishes that visible/SWIR spectrum alone cannot separate
lava from wildfire. Our residual flame-vs-lava confusion, and our reliance on geometry +
independent corroboration to resolve it, are consistent with that literature; the contribution
is the disambiguation method, not a claim to beat physics. [21][23]
SO2: we DISPROVE our own earlier over-claim, matching the literature. An
earlier internal note cited a single-date “~290×” summit SO2
enhancement. A full-population EMIT/S5P re-evaluation showed that was a single-scene artifact:
SO2 is a plume-presence cue (clear plume in ~78% of overpasses) but
not an eruptive-state classifier (active-vs-quiescent AUC 0.45 = chance). This
retraction is exactly what the Sentinel-5P/TROPOMI documentation implies — the product is
coarse atmospheric context, not pixel-level event classification. [24]
Burn-scar dNBR is used as the literature uses it. Our wildfire dNBR separability
(AUC 0.867 [0.842, 0.887]) supports Sentinel-2 for burned-area mapping,
and we explicitly do not use it for per-detection alerting — consistent with
geospatial-foundation burn-mapping work and with our own satellite research log. [17]
What this page does NOT claim. No specificity (false-positive) rate against
INGV-quiescent weeks (the graded overlap has zero quiet weeks — the volcanic-activity
metric is recall only; quote the 83.2% CI lower bound, not 100%); no eruptive-magnitude
or sub-weekly latency grading (the public INGV oracle is weekly); single site (Etna), small
samples (n = 5–62) — read the CIs. These are data-access limits, not
algorithm limits, and match the generalization cautions in the deep-learning fire-detection
reviews. [1]
Next, with INGV data: add INGV-quiescent weeks (to measure specificity), a calibrated
thermal feed (MIR+TIR) to enable sub-pixel intensity and lava/fire separation, per-camera vent
geometry, and per-site SO2/MultiGAS — each unlocks a grade we deliberately do
not claim today.
6. References
Ghali & Akhloufi, 2023, Fire — Deep Learning Approaches for Wildland Fires Using Satellite Remote Sensing Data. link
Shibli, Nascetti & Ban, 2026 — LoRA of Geospatial Foundation Models for Wildfire Mapping using Sentinel-2. link
Torrisi, 2025, Annals of Geophysics — Integrated ML for volcanic cloud tracking: Etna lava fountains 2020–2022. link
Torrisi et al., 2024 — Deep Learning & Geostationary Remote Sensing for Volcanic Cloud Monitoring (ABI/SEVIRI Ash RGB). link
Corradino et al., 2023, IEEE TGRS — Detection of Subtle Thermal Anomalies: Deep Learning on the ASTER Global Volcano Dataset. link
Copernicus Data Space — Sentinel-5P / TROPOMI documentation (SO2, aerosol index). link
Battaglia, Cervelli & Murray, 2013 — dMODELS: a MATLAB software package for modeling crustal deformation near active faults and volcanic centers (Mogi point source and related models), USGS Techniques and Methods 13-B1. link
Full annotated reference set and the PHOENIX/INGV literature reconciliation:
shared bibliography in the research repository (references/REFERENCES.md).
Tell us where this is wrong. Check these figures against your own INGV reports and say so here — corrections, missing context, or a better test are all welcome.
This is a conversation, not a lecture. Every entry below has a comment area — we want you to contradict us, add evidence, flag a source we missed, or suggest a hazard or test we should consider. Open the Comments on this entry panel under any entry to weigh in. Add your name to be credited, or stay anonymous — comments reach the ADRIZ team directly.
Research backbone for the INGV / Etna work. These entries are a mix of INGV-native studies (run on real Etna/INGV data, at the top) and foundational science copied from
the PHOENIX open research log (research.adr-wildfire.com) —
they cover the science this Etna monitoring rests on: how we separate a real wildfire from a volcano's
own heat (Etna, Stromboli) and other persistent “furnaces,” how independent satellites
corroborate a fire, and what the timeliness of each feed actually is. Every entry carries a
plain-language analogy, a claim-status tag, and a block reconciling the result against the
peer-reviewed literature (agreement vs. honest correction), with linked references — the same
standard as the public log. They live in both places by design. Rigor pass: 2026-06-28.
Detailed scientific prose in each entry below is presented in English (verbatim from the PHOENIX open research log) so that figures, claim-status tags and citations remain identical across both sites.
INGV-native research — run on real Etna / INGV data
Studies below were run directly on Etna/INGV-relevant data (FCI detections near the summit, GNSS geometry, flank-camera frames, multi-sensor thermal). Same standard as the public log: plain-language analogy, a claim-status tag, literature reconciliation, and confidence intervals. Honest negatives and corrections are first-class. Published 2026-06-28.
MTG-FCI over Etna: parallax correction is mandatory at altitude — and the “FCI is faster than polar” claim is survivorship bias
Date: 2026-06-28 Status: SUPPORTED (parallax, operational) + CORRECTED (lead-time) — real FCI detections over Etna, DEM-based geometry, polar-matched fires
The analogy. A geostationary satellite watches Etna from far out over the equator, so it sees the 3.3 km-high summit at an angle. If you pin a hot pixel to sea level, a vent right on the summit gets smeared more than three kilometres sideways on the map — far enough to look like a fire on the flank instead of the crater. Correcting for the mountain's height (a DEM-based “parallax” fix) puts the hot spot back where it really is. We also checked the popular claim that the geostationary feed beats the polar satellites on speed: for Etna specifically, it doesn't — that claim only counts the fires the geostationary sensor happened to catch.
BLUF. Over 1,617 FCI detections within 20 km of the Etna summit (34-day window), DEM-based parallax correction flipped 100 veto decisions: 96 CAUTION→VOLCANIC_VETO (upper-flank emitters correctly pulled into the volcanic veto ring), 3 CAUTION→WILDFIRE, 1 VOLCANIC_VETO→CAUTION. Raw FCI ellipsoid parallax error is elevation-dependent: 3.54 km at the summit (3357 m), 2.63 km at 2500 m, 0.21 km on the plain (200 m). On timeliness, a circulated “+21-min FCI lead over polar” is survivorship-biased; on 177 matched fires the measured median FCI−polar gap is −14.0 min (95% CI [−40, −1]) — polar (VIIRS/MODIS) detected first roughly twice as often, and FCI missed ~86% of polar fires entirely.
Complementarity (34-day matched window). ADRIZ events 1,264 · FCI events 230 · matched both 176 · ADRIZ-only 1,088 · FCI-only 54. FCI is a genuine complement (it sees some fires the others miss and adds a fast geostationary cadence), but it is not a faster-than-polar replacement for Etna.
Why it matters operationally. Without the parallax fix, summit degassing and crater incandescence are systematically displaced onto the flanks, where they masquerade as wildfire candidates; the correction is what lets the volcanic veto ring do its job at altitude. The honest lead-time picture means we fuse FCI with polar rather than advertising it as the earliest source.
Claim status: SUPPORTED (parallax correction, operational on real FCI detections) + CORRECTED-SUPERSEDED (the “+21-min FCI lead” figure, replaced by a polar-matched median of −14 min).
Current literature & alignment.Verdict: PARTIAL AGREEMENT with a local correction. Xu et al. [5] show MTG-FCI generally detects more active-fire pixels and earlier than SEVIRI; Paugam et al. [6][7] derive event-based fire products from FCI; EUMETSAT [8] documents the instrument and its geostationary geometry.
What the external literature reinforces: that FCI is a valuable event-tracking and fusion input, and that geostationary fire geolocation requires viewing-geometry / parallax handling at terrain height.
Where ADRIZ is stricter or diverges: for Etna specifically, polar sensors win on first detection (median −14 min, n=177); the apparent FCI lead in the wild is survivorship over the fires FCI caught. We require DEM-based parallax before any summit/flank veto decision.
What this entry does not claim: it does not claim FCI is slow or useless — it is complementary (54 FCI-only events) and high-cadence; it claims only that “earliest source for Etna” is not supported.
Next research test: measure parallax-corrected geolocation error against ground-surveyed vent positions; quantify FCI's marginal first-detection contribution once fused with polar in the live voter.
Public data sources:EUMETSAT Data Store (MTG-FCI L1c / FCI-AF L2) · NASA FIRMS (VIIRS/MODIS polar truth) · Copernicus DEM (terrain height for parallax). The parallax computation and matched-fire clustering (5 km / 24 h) are reproducible from these public sources.
Statistical reporting: the lead-time median is quoted with a 95% CI over n=177 matched fires; veto-flip counts are exact over the 1,617-detection population. Read the interval and n, not the point estimate.
INGV-native research
Locating the magma source (Mogi inversion): quantum optimisation loses to classical — clean negative, with a first-of-kind formulation
The analogy. When magma collects under a volcano, the ground above it swells a few centimetres. Working backwards from that swelling to where and how deep the magma pocket sits is a search problem — and people ask whether a quantum computer could do it faster or better. We built the test honestly and ran it: today, ordinary classical methods win, and by a wide margin.
BLUF. Mogi point-source inversion on a synthetic-but-physical Etna GNSS network (23 stations; scenarios: 2021 inflation, deep deflation, shallow inflation; 12 noise draws each). A QUBO solved by simulated annealing reached 100% success but took ~4,077 ms and inherits the grid-discretisation tax (depth error 1.292 km [CI 1.028–1.556]). Classical Levenberg–Marquardt least-squares hit sub-grid accuracy — depth error 0.585 km [CI 0.352–0.854] in 4.2 ms. CP-SAT returned the exact grid optimum in 0.027 ms. Quantum annealing is 10²–10⁴× slower with no accuracy benefit; QAOA on a simulator matches the optimum only with feasibility post-selection and is slower still.
The honest novelty. To our knowledge this is the first published formulation of volcanic-deformation (Mogi) source inversion as a quantum-optimisation (QUBO) problem — pending peer confirmation. But a first formulation is not an advantage: no quantum speed-up or accuracy gain was demonstrated, and we do not claim one.
Why this is the right answer. Mogi inversion is low-dimensional and smooth; continuous least-squares exploits that directly, and an exact constraint solver settles the discrete version in microseconds. The QUBO mapping pays a discretisation tax that classical continuous methods simply avoid.
Claim status: NEGATIVE (clean) — classical wins on accuracy and runtime; QPU remains BLOCKED for any advantage claim.
Current literature & alignment.Verdict: AGREES with the cautious quantum-remote-sensing literature. Misra et al. [25] and Dent et al. [26] caution that quantum-annealing / QUBO mappings rarely beat strong classical baselines on small structured problems; Rainjonneau et al. [27] show quantum EO-optimisation is feasible but not yet advantageous; a mandatory classical baseline (CP-SAT [28]) is exactly the control we ran. The Mogi forward model follows the standard deformation-source formulation [33].
What the external literature reinforces: that demonstrating a problem can be cast as QUBO says nothing about advantage without a strong classical control — which here decisively wins.
Where ADRIZ is stricter or diverges: we require both a continuous optimiser (LM) and an exact solver (CP-SAT) as baselines before any quantum claim, and we report runtime alongside accuracy so the 10²–10⁴× slowdown is visible.
What this entry does not claim: it does not claim quantum can never help geophysical inversion — only that for low-dimensional Mogi inversion, today, classical dominates.
Next research test: distributed / finite-volume sources and joint InSAR+GNSS inversions (higher-dimensional, non-convex) where a quantum or hybrid approach has a more plausible footing — still gated behind CP-SAT / LM baselines.
Public data sources: the Mogi forward model and the synthetic Etna GNSS network geometry (23 stations) are fully specified; classical baselines use Google OR-Tools CP-SAT and standard Levenberg–Marquardt. The deformation-source modelling reference is the USGS dMODELS package [33]. No proprietary data is required to reproduce the benchmark.
Statistical reporting: depth/horizontal errors are quoted with 95% CIs across 12 noise draws × 3 scenarios; runtimes are means across instances. Read the CI and the instance count.
INGV-native research
Flank-camera wildfire-vs-volcanic veto: a crop-level second look cuts false volcanic alarms while keeping real fires
Date: 2026-06-28 Status: SUPPORTED (small-n, with CIs) + night residual MITIGATED by a durable safety guard — real held-out frames, 0 perceptual-hash leakage
The analogy. A camera on Etna's flank sees two things that look almost identical to a smoke detector: a real wildfire plume, and the volcano's own degassing or night-time glow. A single-glance detector confuses them. So we add a second look — a vision-language model re-examines just the cropped region the detector flagged and asks “is this wildfire smoke, or the volcano being a volcano?” This is an operations-room (ops-room) two-stage cascade, not an edge device.
BLUF. On real held-out frames (62 bulletin-confirmed volcanic frames, 0 perceptual-hash leakage; 20 daytime visible-fire frames), the detector alone has a volcanic false-alarm rate of 9.7% [95% CI 4.5–19.5]. Adding the crop-level veto: Config A (recommended) → FA 8.1% [3.5–17.5], daytime fire recall 95.0% [76.4–99.1]; Config B (stricter) → FA 3.2% [0.9–11.0], recall 85.0% [64.0–94.8]. The veto reduces volcanic false alarms at a recall cost you can dial.
What it does NOT claim — and the honest residual. Not “zero false alarms.” The 5 residual false positives under Config A are summit degassing plumes the detector labels wildfire smoke — and smoke is, by design, never blindly vetoed (vetoing all smoke near a volcano would suppress a real flank fire). At night, lava incandescence and a night fire glow are genuinely ambiguous to a single-frame look (2 night fires in the slice were originally reclassified volcanic by the veto). The single-frame disambiguation is still hard — but the safety failure it caused is now closed by a durable guard (below).
Durable night-safety guard (shipped). The veto may never silently drop a wildfire-class alarm at night. When the visible panel is dark, a VLM “volcanic” suppression of a wildfire-class detection is honoured only if there is independent volcanic corroboration consistent with the vent (on-crater / summit-proximal FIRMS·SLSTR FRP, or the hot crop inside the summit ROI). With no such corroboration the alarm is not dropped — it is downgraded to a still-surfaced uncertain_night / needs_review state (alert feed + tile). Re-scored on the same temperature-0 cache (0 new model calls): the night true-fire silent false-negative count goes from 2 to 0, while daytime recall and the 8.1% (5/62) volcanic false-alarm rate are unchanged (none of the 62 volcanic frames are dark enough to trip the guard, so the daytime-degassing survivors are untouched). A real off-crater night fire therefore can no longer be erased by the veto alone.
Claim status: SUPPORTED (false-alarm reduction with CIs on real held-out frames) + night fire / lava-incandescence ambiguity MITIGATED (durable safety guard: night true-fire silent FN 2→0, volcanic FA unchanged) — the disambiguation itself remains hard, but the veto is now recall-safe at night.
Current literature & alignment.Verdict: AGREES that volcanic thermal/visual signatures are learnable and separable. Torrisi [21][22] and Corradino et al. [23] show deep learning separates volcanic activity in remote-sensing imagery; ADRIZ adds an explicit crop-level vision-language veto and a conservative “never blindly veto smoke” rule.
What the external literature reinforces: that a learned model can distinguish volcanic from non-volcanic thermal/visual features, supporting a confounder-aware second stage.
Where ADRIZ is stricter or diverges: we audit for perceptual-hash leakage between train and test (0 here), report the operating point with CIs at small n, and refuse to veto the smoke class outright.
What this entry does not claim: no claim of zero false alarms, no claim the night single-frame disambiguation is solved (the guard makes the failure mode safe, it does not classify the night scene with certainty), and no claim of generalisation beyond the tested frame sets.
Next research test: push the night case from “safe (uncertain_night surfaced)” toward “confidently classified” with a thermal-temporal cue (persistence + ROI motion) and a larger night-frame held-out set; expand the volcanic confuser set beyond summit degassing.
Public data sources: clean-source fire frames from public sets (HPWREN / D-Fire / Roboflow); volcanic frames from INGV public bulletin imagery. The crop-routing rule and the per-decision model reasoning are recorded in the evaluation outputs; the perceptual-hash leakage audit is reproducible.
Statistical reporting: false-alarm and recall rates are quoted with Wilson 95% CIs at the stated n (62 volcanic, 20 fire); read the interval — at this n the bounds are wide and the point estimates are indicative, not final.
INGV-native research
Multi-source thermal fusion: combining heat sensors helps a little — but not provably at our sample size; one fused rule gives zero false alarms at half recall
Date: 2026-06-28 Status: SUPPORTED (operating point) + NEGATIVE (fusion gain not significant at current n) — real multi-sensor land-surface-temperature data, bootstrap CIs
The analogy. Several satellites carry “heat cameras” (VIIRS, Landsat, ECOSTRESS). Stacking them ought to tell a real fire from warm background better than any single one. Honest answer: at the number of cases we have, the combination is a touch better but not provably better. What is useful right now is a simple combined rule that raises no false alarms while catching about half the fires.
BLUF.Volcano task (fused vs bulletin active/quiescent, n=33 dates): fused AUC 0.831 [95% CI 0.675–0.959] vs best single (VIIRS S-NPP) 0.768 [0.587–0.922]; the gain is +0.048, p(Δ>0)=0.79 — NOT statistically significant. Wildfire task (25 real fires / 20 controls): Landsat surface-temperature AUC 0.746 [0.594–0.892], fused (ECOSTRESS+Landsat) 0.740 [0.646–0.841]; the operating point (≥8 K anomaly, any sensor) gives precision 1.00, recall 0.48 (12/25 fires, 0 false alarms on controls). The highest fused-LST result, on FIRMS truth (n=22), is 0.93 [0.810–1.000].
Why we report it this way. An easy mistake is to headline “fusion wins.” The bootstrap CIs cross zero (the fused−single gain is within noise at n=22–33), so we explicitly do not claim a fusion win. What survives scrutiny is a conservative, operationally honest operating point: a high-confidence ≥8 K multi-sensor rule that fired on no control and caught roughly half the fires — useful as a precision-first corroborator, not a recall solution.
Claim status: SUPPORTED (precision-1.00 operating point) + NEGATIVE (multi-sensor fusion gain not statistically significant at current n).
Current literature & alignment.Verdict: AGREES in direction with multi-source fusion work, but we decline the unsupported win. Multi-source / high-temporal fusion [2] is the endorsed direction; ADRIZ adds the discipline of refusing to claim a fusion advantage the confidence intervals do not support.
What the external literature reinforces: that combining complementary thermal sensors is a sound strategy; the direction is right even where our n is too small to prove the increment.
Where ADRIZ is stricter or diverges: we report p(Δ>0) and bootstrap CIs for the fusion increment and treat “not significant” as the headline, not a footnote.
What this entry does not claim: no claim that fusion beats the best single sensor at current n, and no recall claim — the usable result is a precision-first corroboration rule.
Next research test: grow n (more fire/control dates, more co-observations) to test whether the +0.048 increment becomes significant; add SLSTR and the geostationary cadence to the fusion stack.
Public data sources:Copernicus / USGS Landsat L2 surface temperature · ECOSTRESS land-surface temperature · NASA FIRMS (VIIRS, external truth). Per-sensor 8 K anomaly thresholds and per-sensor weighting are specified; bootstrap CIs use 2000 resamples.
Statistical reporting: AUCs are quoted with bootstrap (2000×) 95% CIs and the fusion increment with p(Δ>0); read the CI and n — the fusion gain is reported as not significant by design.
INGV-native research
Does an SO₂ plume tell a volcano from a wildfire? A live Sentinel-5P test — a real but moderate cue, not a standalone veto
Date: 2026-06-28 Status: SUPPORTED (moderate cue) — supersedes the earlier SO₂ placeholder; live CDSE Sentinel-5P SO₂ over Etna vs Sicilian wildfire locations
The analogy. Volcanoes breathe out sulphur dioxide; wildfires barely do. So in principle, if a satellite sees an SO₂ plume sitting over a hot spot, that hot spot is probably the volcano, not a fire — a natural “veto” cue. We stopped assuming and actually measured it: we pulled real Sentinel-5P SO₂ over Etna's summit and over dozens of real Sicilian wildfire sites on their fire days, and asked how well SO₂ alone tells them apart. The honest answer: it helps, clearly more than a coin flip — but it is a supporting cue, not a decision-maker.
BLUF. Sentinel-5P/TROPOMI SO₂ total column (CDSE Sentinel Hub Statistical API) over a small Etna-summit box on 113 valid days (May–Sep 2021) vs over 45 Sicilian wildfire locations (NASA FIRMS, summer, >30 km from Etna) on their fire dates. Etna median SO₂ = 3.2×10⁻⁴ mol/m² (~0.72 DU) vs wildfire median = 5.6×10⁻⁵ mol/m² (~0.13 DU, essentially the retrieval noise floor). Separation: AUC = 0.71 (95% CI 0.62–0.80) on that first sample — and a larger re-test (160 wildfire events) firms the headline to AUC 0.78 (95% CI 0.72–0.84) (see the robustness Update below) — significantly above chance, but well below the ~0.9+ you would want from a standalone veto. The cue is available on ~74.3% of days (95% CI 66.9–80.6) over Etna; clouds and the daily revisit remove the rest.
Method & controls (for reproduction). SO₂ total column from Sentinel-5P L2 via the CDSE Sentinel Hub Statistical API (collection sentinel-5p-l2, band SO2), daily aggregation over a ~0.08° box at 0.01° resolution. Volcanic positives = the Etna-summit box on every day with valid (cloud-cleared) coverage. Wildfire negatives = NASA FIRMS VIIRS detections >30 km from Etna, summer (Jun–Sep), confidence nominal/high, FRP ≥ 2, queried over a ±1-day window to catch an overpass (best-valid-day taken). Discrimination is the Mann–Whitney AUC (volcanic vs wildfire) with a 2000× bootstrap 95% CI; day-availability with a Wilson 95% CI. Labels are location-derived priors (summit-above-tree-line is unambiguously volcanic; distant summer vegetation is wildfire), stated as such — no SO₂ value is ever used to assign a label, so the cue cannot be circular.
How to read it. Three honest takeaways: (1) wildfires sit at the SO₂ noise floor — vegetation fires do not produce a TROPOMI-visible SO₂ column, so a positive SO₂ reading is genuinely informative; (2) Etna's everyday passive degassing (~0.72 DU median) is only moderately above that floor and overlaps it on quiet/noisy days, which is why the AUC is 0.71 and not higher; (3) the cue is missing a quarter of the time. So SO₂ belongs as a weighted corroborator inside a fusion veto (it raises confidence that a thermal anomaly is volcanic), never as the sole arbiter.
What this corrects. SO₂ had been an untested placeholder / stale-literature prior in our planning. This entry replaces that with a live, reproducible measurement and a bounded verdict. It also retires an over-stated “SO₂ over-claim” that never had backing data — there is no large multiplicative SO₂ veto effect; the real effect is a modest, useful AUC 0.71.
Update (2026-06-28, robustness re-test — the plume-peak “lift” did NOT replicate). An initial single sample (n=45 wildfire events) hinted the plume peak (AOI max) beat the area mean (AUC 0.74 vs 0.71). It does not hold. Re-run on a larger, fresh sample (113 Etna days × 160 wildfire events) and across hundreds of random draws at small/medium/large n: the area-MEAN gives AUC 0.783 (95% CI 0.717–0.843), the plume-MAX 0.769 (0.709–0.827), and the difference is Δ = −0.014 (paired bootstrap 95% CI [−0.071, +0.045]; max beats mean in 0% of full-size draws and only ~31–39% of small draws). The earlier +0.03 was a small-n sampling artifact — at small n the AUC swings ~0.05. Net: the plume-peak is not better than the area-mean; if anything the mean is marginally more stable. The larger sample also tightens the headline SO₂-mean cue to AUC ~0.78 (0.72–0.84), so the “moderate corroborator, not standalone veto” verdict stands and is better-supported. Code/data: so2_peak_vs_mean_robustness.py · results.
Claim status: SUPPORTED (moderate, quantified cue) — supersedes EXPLORATORY/placeholder; explicitly NOT a standalone veto.
Current literature & alignment.Verdict: AGREES with satellite SO₂ remote sensing, with a sober SNR caveat. Copernicus/TROPOMI documentation [24] establishes S5P SO₂ total-column retrieval and its sensitivity limits; Kurchaba et al. [32] show satellite plume detection (TROPOMI NO₂) is feasible but low-SNR at small scales — consistent with our moderate AUC from daily small-AOI means.
What the external literature reinforces: that a volcanic SO₂ column is detectable from space and that small-area, single-overpass plume signals are noise-limited — both borne out here (Etna detectable; the cue moderate, not decisive).
Where ADRIZ is stricter or diverges: we quantify the cue's discrimination (AUC + bootstrap CI) AND its day-to-day availability, and we refuse to treat SO₂-presence as a hard veto; it is a weighted corroborator only.
What this entry does not claim (threats to validity): not a strong/standalone veto; the summer-2021 window is paroxysm-rich, so the AUC is likely an optimistic bound for the everyday cue; daily-mean small-AOI SO₂ is noisy (S5P retrieval admits near-zero/negative columns); labels are location priors rather than independently adjudicated per event; only a single year (2021) is tested. Each is a stated, bounded limitation, not a hidden one.
Next research test: max-column was tested and did NOT replicate on a larger set (see Update above); remaining = a true spatial plume mask + wind-advected footprint, testing across quiet (non-paroxysm) years, and fusing SO₂ as a weighted feature in the thermal/camera veto (done — see the thermal-fusion entry: small ~+0.02 lift, directionally robust).
Statistical reporting: AUC is quoted with a bootstrap (2000×) 95% CI; availability with a Wilson 95% CI; n = 113 Etna valid-days, 45 wildfire events. Read the interval and n.
INGV-native research
Does SO₂ add to a thermal volcano-vs-wildfire veto? A small but directionally consistent lift — and a clean contrast with the plume-peak null
Date: 2026-06-29 Status: SUPPORTED (small, directionally-robust lift; single-sample 95% CI grazes zero) — answers the “fuse SO₂ into the thermal veto” question from the SO₂ veto entry
The analogy. The thermal signal alone (how hot and how bright a hot pixel is) already does most of the work in telling Etna's summit lava from a vegetation fire. The open question: once you already have the heat features, does adding the SO₂ reading buy you anything extra? We trained a simple model on the heat features alone, then the same model with SO₂ added, and measured the difference honestly — with the same multi-draw stress test that just killed our plume-peak idea.
BLUF. On 110 Etna-summit thermal detections (volcanic) and 75 Sicilian wildfire detections, a cross-validated logistic model on thermal features only (FIRMS brightness Ti4, Ti5, Ti4−Ti5, log FRP) scores AUC 0.909 (95% CI 0.865–0.948). Adding SO₂ gives AUC 0.926 (0.886–0.960) — a lift of Δ ≈ +0.02. SO₂ alone scores 0.834 (0.766–0.894). The lift's direction is robust (positive in 100% of CV-fold seeds, Δ 0.026 [0.021–0.032]; positive in 100% of medium/large subsample draws, 81% at small n), but its magnitude is small and the single-sample paired bootstrap CI grazes zero: Δ 0.017 [−0.009, +0.043], P(Δ>0)=0.91.
How to read it — and why it is NOT the plume-peak mistake. Two SO₂ ideas were stress-tested the same way. The plume-peak (max vs mean column) flipped sign on a larger sample and beat the baseline in 0% of full-size draws — noise (see the SO₂ entry Update). This one is different: the sign is stable positive across seeds and sizes. So SO₂ does add a small, real, corroborating increment on top of thermal — consistent with its “weighted corroborator” role — even though the increment is too small to clear the strict 95% bar on one sample.
Why the lift is small (honest framing). Thermal alone is already strong here (0.91) because Etna's summit lava/vents are thermally distinct from vegetation fire, and our labels are location priors (summit=volcanic, distant-vegetation=wildfire) — an “easy” regime. The headroom for any extra cue is therefore small. SO₂'s marginal value should be larger exactly where thermal is ambiguous — upper-flank anomalies that could be either — which is the next, harder test.
Claim status: SUPPORTED (small, directionally-consistent lift of ~+0.02 AUC on top of thermal) — with the explicit caveat that the single-sample 95% paired-bootstrap CI includes zero; the support comes from cross-seed and cross-size direction stability, not from one CI.
Current literature & alignment.Verdict: AGREES with multi-source fusion, honestly bounded. Multi-source thermal/chemical fusion [2] is the endorsed direction; S5P SO₂ retrieval and its sensitivity limits [24] and the low-SNR nature of small-scale satellite plume signals [32] explain why the SO₂ increment is small but real.
What the external literature reinforces: that adding a complementary chemical cue to a thermal classifier is sound, and that the increment from a noisy small-AOI SO₂ column will be modest.
Where ADRIZ is stricter or diverges: we judge the lift by direction-stability across CV seeds and subsample sizes (the test that killed the plume-peak), not by a single sample's CI, and we report the increment as small rather than headline a fusion win.
What this entry does not claim: not a large or strongly-significant lift; not generalisation beyond this summit-vs-distant-vegetation labeling; the easy thermal regime caps the visible benefit.
Next research test: repeat on ambiguous upper-flank thermal anomalies (where thermal alone is weak), add the camera veto as a third feature, and test across a quiet (non-paroxysm) year.
Statistical reporting: AUCs are 5-fold cross-validated (out-of-fold) with 3000× bootstrap 95% CIs; the lift is reported three ways — paired bootstrap (CI + P>0), 50 CV-fold seeds (mean, sd, fraction>0), and small/medium/large subsamples (fraction>0). n = 110 volcanic, 75 wildfire. Labels are location priors; SO₂ is never used to assign a label.
INGV-native research
Foundational science — copied from the PHOENIX open research log
The entries below are copied verbatim from research.adr-wildfire.com because the Etna monitoring rests on them: separating real wildfire from a volcano's own heat, independent corroboration, and feed timeliness. They live in both places by design.
The analogy. A city keeps a list of the chimneys and furnaces that always set off the smoke alarm — volcanoes, refineries, greenhouses, quarries — so that when the alarm rings at one of those known spots you can safely ignore it instead of calling the fire brigade every time.
PHOENIX publishes an open-data catalog of persistent thermal anomalies in Sicily — volcanoes, refineries, glasshouses, solar farms, and quarries — that repeatedly cause false-positive wildfire detections. It is released under CC-BY-4.0 (data) + MIT (scripts) at `github.com/markl02us/persistent-thermal-sources-sicily` and is permanently citable via DOI 10.5281/zenodo.20369891.
How the catalog is built (6 steps): (1) mine the last 30 days of PHOENIX `internal_fires` + `external_fires`, flagging cells with ≥6 hits / ≥3 distinct days / no Sentinel-2-verified burn scar; (2) download a 250 m Esri World Imagery tile per candidate; (3) classify each tile with Claude Sonnet 4.5 into categories (volcanic vent, industrial, glasshouse, solar farm, quarry, urban, ag-burn, fire scar, other) with confidence; auto-promote at confidence ≥0.85 in the auto-annotate categories; (4) enrich with OpenStreetMap Overpass tags + Wikidata; (5) emit a per-source JSON card; (6) route confidence <0.85 candidates to daily human review.
Known anchor sources include Mt. Etna summit craters (15 km radius mask, FP-confidence 1.0), Stromboli, Vulcano (La Fossa), the Augusta-Priolo-Melilli petrochemical complex, and the Gela and Milazzo refineries. A full end-of-day re-review classified the catalog as 19 mask (2 glasshouse + 17 water) / 19 real-fire / 64 ag-burn / 12 unsure on origin, with 14 burn-scar sources wired in. The catalog feeds PHOENIX's `land_mask` FP suppression and is maintained by autonomous scheduled jobs (MODIS daily, FCI 6h, OLCI proxy daily, borderline-recheck daily, weekly SemVer bump).
Claim status: SUPPORTED.
Current literature & PHOENIX alignment.Verdict: AGREEMENT — NASA documentation and recent analysis corroborate our 'do not filter on confidence' doctrine.
What the external literature reinforces: NASA's VIIRS active-fire documentation [9] defines confidence as an intermediate-quantity quality flag (low/nominal/high) and attributes many low-confidence daytime pixels to sun-glint and weaker relative MIR anomalies, not to false fire; Dhage 2025 [11] documents systematic day/night structure in low-confidence labels.
Where PHOENIX is stricter or diverges: Confidence is one feature, never a drop rule; persistent-source history, cross-sensor agreement and multi-day recurrence are the directly relevant fire-vs-furnace signals.
What this post does not claim: FIRMS confidence is not a calibrated wildfire probability; 'low' is not 'false', and persistent false sources can sit in 'nominal'.
Next research test: Validate the 3-signal tiering against final PHOENIX grades once grade semantics are confirmed; measure new-source learning lag for the persistence mask; stratify static false positives (volcanic / industrial flare / offshore / urban / sensor artifact).
Public data sources:NASA FIRMS active-fire archive + area API (VIIRS/MODIS/SLSTR truth). Every figure in this entry is reproducible from these public sources with no access to PHOENIX infrastructure; the method is stated above and in any linked code.
Statistical reporting: proportions are quoted with Wilson 95% confidence intervals and ranking metrics (AUC) with bootstrap 95% CIs; read the interval and the sample size n, not the point estimate. A shuffled-label placebo (≈0.5) accompanies learned separability claims.
entry 0011
Anatomy of our false positives — the raw candidate stream, and why multi-sensor agreement is near-perfect
The analogy. When the system's raw, unfiltered hunches are checked against the actual scorched ground, only about a third turn out to be real fires, and most false alarms are fleeting tricks of cloud, dust or sun-glint rather than factory heat — but a hunch a second independent satellite also sees is right 99% of the time, which is why two witnesses beat one.
BLUF. This entry looks at where PHOENIX's *raw* satellite fire-candidates go wrong, using Sentinel-2 as the burn arbiter. Important framing first: these are raw candidates — the input our voting, persistence, weather and validator gates filter — not our shipped detections. Of the raw candidates that get Sentinel-2-checked, about 69% come back as no-burn, and crucially those false positives are transient one-offs (cloud, dust, sun-glint, warm bare soil), not industrial flares — only ~3% sit at recurring thermal sites. The standout positive: candidates corroborated by an independent satellite (FIRMS) are 99% real, which makes multi-sensor agreement our single strongest precision lever.
Method. We used the Sentinel-2-adjudicated truth table (a detection is "real" if a post-fire differenced-NBR burn scar is found, "false" if the surface is unburned). We computed the real-vs-false split for the raw candidate stream overall and per reporting source, the severity breakdown of the false ones, and how often false vs real events sit at recurring (industrial-like) hotspots. Read-only.
Result.
- Raw S2-checked candidates: 2,467 real vs 5,255 no-burn — i.e. the raw candidate stream is ~31% real before filtering. Again: this is the gate *input*, not the shipped output.
- By source: independent-satellite (FIRMS) corroborated candidates are 99% real (80/81). The bulk internal-detector candidate stream is ~31% real on its own — which is exactly why it is gated, not shipped directly.
- False positives are dominated by "unburned" surfaces (3,612) and "negative" (1,130) — transient warm/bright pixels, not persistent heat. Only 3% of false positives are at recurring hotspots (vs 7% of real fires), so industrial flares are a small part of the problem.
Why it matters. Two clear implications. (1) Multi-sensor agreement is the highest-value precision signal — a candidate seen by an independent satellite is almost always real. This is the principle behind the polar-anchored prior [0019] and the surfacing safety-net [0020], and it's now quantified. (2) The persistent-source filter we added [0020] only addresses ~3% of false positives (the industrial ones); the majority are transient atmospheric/surface confusers (cloud edges, dust, glint, hot bare soil) that the literature attacks with spectral dust/smoke discrimination. Building that needs per-pixel spectral data, which isn't in our event database — so it's a data-acquisition step, not just an algorithm.
Caveat (load-bearing). The 31% figure is the raw-candidate validation rate, not PHOENIX's public detection accuracy; the gating stack (voting, persistence, weather plausibility, satellite validator) exists precisely to convert this noisy candidate stream into high-precision shipped detections. Nothing here changes a shipped number.
Independence caveat (anchor circularity). The 99% "FIRMS-corroborated" figure is only meaningful if the corroboration is genuinely independent of the candidate — i.e. FIRMS (a separate polar instrument we don't own, only process) saw the fire *on its own*, not because we told our geostationary detector where to look. The same polar-anchored prior cited above [0019] can *relax* our geostationary detectors' thresholds at a location FIRMS already flagged; where that happens, "our detector + FIRMS agree" is partly FIRMS confirming a FIRMS-seeded detection, and counting it as independent would inflate the number. This 80/81 was measured over a window in which that anchor was inactive (born-expired until 17 June), so the figure stands as independent agreement — but with the anchor now live, the honest forward number must discount any geostationary vote produced under an active FIRMS anchor. Quantifying that anchor-discounted corroboration rate is an open audit, not a settled number.
Claim status: SUPPORTED.
Current literature & PHOENIX alignment.Verdict: AGREEMENT — handling persistent false sources by location/time persistence rather than single-frame radiometry matches FIRMS false-source guidance.
What the external literature reinforces: NASA FIRMS/VIIRS documentation [9][10] frames confidence as a quality flag, not a wildfire filter, consistent with our reliance on persistence and known-source masks.
Where PHOENIX is stricter or diverges: A too-tight flare/persistence filter can suppress a real fire that recurs near a static source; PHOENIX deliberately refuses filters that would hide real events.
What this post does not claim: Radiometry alone cannot separate fire from furnace; motion alone cannot either.
Next research test: Grow the Sicily flare/persistent-source catalog; use a temporal-signature discriminator (steady-in-time false source vs space-time-anomalous real fire) instead of blanket radius exclusion.
Public data sources:NASA FIRMS active-fire archive + area API (VIIRS/MODIS/SLSTR truth) · Element84 Earth Search / Copernicus Data Space Sentinel-2 L2A. Every figure in this entry is reproducible from these public sources with no access to PHOENIX infrastructure; the method is stated above and in any linked code.
Statistical reporting: proportions are quoted with Wilson 95% confidence intervals and ranking metrics (AUC) with bootstrap 95% CIs; read the interval and the sample size n, not the point estimate. A shuffled-label placebo (≈0.5) accompanies learned separability claims.
entry 0026
A 128 MW "fire" with no scar: adding an industrial-flare filter to the safety-net
Date: 2026-06-18 Status: defensible (shadow; precision 75% → 100% on the labeled set; not promoted)
The analogy. A fire that pours out hundreds of megawatts at the exact same spot day after day yet never leaves a single burn scar isn't a wildfire — it's an industrial gas flare, like a stove burner left on; a simple "too hot, too often, in one place" rule spots it and stops the map from crying wolf.
BLUF. While evaluating whether to promote our multi-sensor safety-net to the live map, it flagged a cluster on the far-west Sicilian coast reporting 128 MW of fire power but leaving no burn scar. That is not a wildfire — it's an industrial gas flare (the spot also registered 884 MW, 368 MW and 72 MW on other days the same week, thousands of detections at the same pixel). Real fires don't sustain tens-to-hundreds of megawatts at one location for a week, and they leave a scar. We added a physically-grounded filter — *a location with very high power (>50 MW) on multiple days is a persistent flare, not a fire* — which removes it while keeping every real fire, lifting the safety-net's precision from 75% to 100% on the labeled set.
Method. The persistent-source filter we already had excludes locations active more than five distinct days, which catches steady industrial sources. But this flare was intermittent enough to slip just under that threshold at the cluster centroid. We added a second, intensity-based test: count the days a location shows a >50 MW detection in the trailing 60 days; two or more such days marks it a flare and excludes it. The threshold is well clear of real fires — the genuine fires in our evaluation peaked under 12 MW (the Raffadali windmill fire was 11.7 MW), whereas this flare ran 72–884 MW.
Result. Re-evaluated, the safety-net's flagged clusters went from 7 (3 real / 1 false / 3 pending) to 6 (3 real / 0 false / 3 pending) — precision 100% on the labeled set, with 36 persistent industrial sources now correctly excluded (up from 32). All three confirmed real fires survived the new filter (the citizen-reported Raffadali fire, one independently confirmed by our own detector, one with a genuine post-fire burn scar). The change is isolated to the shadow tier; nothing is live.
Caveat (load-bearing). The labeled set is still small (three confirmed cases), and we tightened the filter immediately after observing the one false positive — so "100%" is on thin, recently-adjusted evidence. The flare filter itself is general and physically sound (it would catch any persistent high-power source, not just this one), but before promoting the safety-net to the public map we want the three still-maturing clusters to resolve and confirm the precision holds on a larger sample. Promotion stays gated.
Why it matters. A wildfire alerting system that cries "128 MW fire!" at a gas flare loses trust fast. Distinguishing industrial heat from wildfire is one of the oldest false-alarm problems in fire remote sensing; here a simple, interpretable rule grounded in fire physics (intensity-persistence plus the absence of a burn scar) does the job without machine learning or extra data.
Claim status: SUPPORTED.
Current literature & PHOENIX alignment.Verdict: AGREEMENT — handling persistent false sources by location/time persistence rather than single-frame radiometry matches FIRMS false-source guidance.
What the external literature reinforces: NASA FIRMS/VIIRS documentation [9][10] frames confidence as a quality flag, not a wildfire filter, consistent with our reliance on persistence and known-source masks.
Where PHOENIX is stricter or diverges: A too-tight flare/persistence filter can suppress a real fire that recurs near a static source; PHOENIX deliberately refuses filters that would hide real events.
What this post does not claim: Radiometry alone cannot separate fire from furnace; motion alone cannot either.
Next research test: Grow the Sicily flare/persistent-source catalog; use a temporal-signature discriminator (steady-in-time false source vs space-time-anomalous real fire) instead of blanket radius exclusion.
Public data sources:NASA FIRMS active-fire archive + area API (VIIRS/MODIS/SLSTR truth) · Element84 Earth Search / Copernicus Data Space Sentinel-2 L2A. Every figure in this entry is reproducible from these public sources with no access to PHOENIX infrastructure; the method is stated above and in any linked code.
Statistical reporting: proportions are quoted with Wilson 95% confidence intervals and ranking metrics (AUC) with bootstrap 95% CIs; read the interval and the sample size n, not the point estimate. A shuffled-label placebo (≈0.5) accompanies learned separability claims.
entry 0030
Telling fire from furnace: for Sicily's static hot sources, persistence beats radiometry
Date: 2026-06-25 Status: defensible (complete VIIRS FIRMS archive, 35,008 detections 2019–2024; radiometry-only vs radiometry-plus-persistence discrimination of vegetation fire from volcanic / industrial / offshore hot sources, leave-one-year-out) — a positive result with a clear operational lesson for the false-alarm filter
The analogy. Sicily doesn't only burn — Etna glows, refineries flare, ships light up — and a heat-sensing satellite sees them all. We asked what best separates a real wildfire from these permanent ‘furnaces’: the brightness numbers, or the fact that a furnace is always there? Across six years, knowing where heat persistently recurs beats radiometry hands down. The grade is discrimination of vegetation fire from static sources, validated leave-one-year-out.
Sicily does not only burn — it also glows in places that are not wildfire at all, and a thermal satellite sees all of them. Mount Etna and Stromboli put out volcanic heat year-round; the petrochemical complexes at Priolo–Augusta, Gela and Milazzo flare gas day and night; offshore platforms and the occasional ship light up over water. In the complete fire archive these persistent sources are not rare: 4,259 of 35,008 VIIRS detections (12%) are tagged by NASA as volcano, static land source, or offshore rather than vegetation fire. For a wildfire system every one of them is a potential false alarm, and the question this entry asks is a clean one: can the satellite tell a furnace from a fire by the *radiometry of a single detection* — how hot, how bright at 4 versus 11 microns, how much radiative power — or does it fundamentally need to know that *something is always burning in that spot*?
The raw signatures say the answer will be mixed, and say why. Real vegetation fires are hot and mostly daytime: 4-micron brightness 336 K, a 4-to-11-micron split of 33 K, radiative power 9 MW, and only 28% detected at night. The false sources are almost entirely nocturnal — volcano, industrial and offshore all sit at 95% night — because a persistent warm spot stands out against a cool night background and is easier to flag once the sun is not heating everything around it. That nocturnal skew is a real and usable clue. But it is not enough on its own, because the individual classes overlap fire in exactly the ways that matter: industrial flares are *weak* anomalies (4-to-11 split of only 17 K, power 2 MW) that look like small fires, and volcanic detections are *strong* ones (split 29 K, near fire's 33) that look like big ones. The thermal signature blurs into the fire distribution at both ends.
Trained on radiometry alone, a gradient-boosted classifier posts a deceptively healthy AUC of 0.926 — and then falls apart at the only threshold that matters operationally. If we insist on sacrificing no more than 1% of real vegetation fires (a wildfire filter that throws away more than that is unusable), radiometry alone flags just 30% of the false sources: 36% of volcano, 38% of offshore, and a near-useless 5.7% of industrial flares. The high AUC was hiding the operational truth — that to catch flares by their heat signature you would have to start discarding real fires, because a small gas flare and a small grass fire are, in one infrared frame, nearly the same object. Radiometry can rank, but it cannot *separate* at the precision a fire service needs.
Adding a single persistence feature — how many times that ~1 km cell was detected in the *training* years, never the test year, so no detection can see its own future — changes the picture completely. The AUC rises to 0.984, and at the same strict 1%-fire-loss threshold the false-source catch jumps from 30% to 91%: volcano 98%, industrial static 98%, offshore 76%. The reason is exactly the one the radiometry could not exploit: Etna does not move, and neither does the Priolo flare stack. A source that lit up in prior summers and lights up again is almost certainly not a wildfire, regardless of how fire-like its single-frame temperature looks. Where radiometry was blind — the weak industrial flares it caught 6% of — persistence is nearly perfect, catching 98%, because a refinery is the most spatially stable hot object in the scene. Offshore is the honest residual at 76%: platforms and ships are more scattered and less perfectly recurrent than a volcano or a refinery, so the location prior is weaker there.
The lesson is concrete and it confirms the design the false-alarm filter already leans on. For persistent hot sources the discriminating variable is where, not how hot — and a wildfire system should carry an explicit persistence/location mask rather than hope a radiometric classifier will tell fire from furnace, because at any usable fire-loss rate it will not. The one limit worth stating plainly is built into what persistence *is*: the prior can only flag a source it has already watched recur, so a brand-new flare or a first-season eruption vent needs a season of detections before the mask learns it, and until then it falls back to the weak radiometric signal. That is the correct failure mode to design around — seed the mask from known industrial and volcanic locations up front, let it accrete the rest — and it is a far better place to stand than a single-frame classifier that quietly waves 94% of gas flares through as fire.
Claim status: SUPPORTED.
Current literature & PHOENIX alignment.Verdict: AGREEMENT — NASA documentation and recent analysis corroborate our 'do not filter on confidence' doctrine.
What the external literature reinforces: NASA's VIIRS active-fire documentation [9] defines confidence as an intermediate-quantity quality flag (low/nominal/high) and attributes many low-confidence daytime pixels to sun-glint and weaker relative MIR anomalies, not to false fire; Dhage 2025 [11] documents systematic day/night structure in low-confidence labels.
Where PHOENIX is stricter or diverges: Confidence is one feature, never a drop rule; persistent-source history, cross-sensor agreement and multi-day recurrence are the directly relevant fire-vs-furnace signals.
What this post does not claim: FIRMS confidence is not a calibrated wildfire probability; 'low' is not 'false', and persistent false sources can sit in 'nominal'.
Next research test: Validate the 3-signal tiering against final PHOENIX grades once grade semantics are confirmed; measure new-source learning lag for the persistence mask; stratify static false positives (volcanic / industrial flare / offshore / urban / sensor artifact).
Public data sources:NASA FIRMS active-fire archive + area API (VIIRS/MODIS/SLSTR truth). Every figure in this entry is reproducible from these public sources with no access to PHOENIX infrastructure; the method is stated above and in any linked code.
Statistical reporting: proportions are quoted with Wilson 95% confidence intervals and ranking metrics (AUC) with bootstrap 95% CIs; read the interval and the sample size n, not the point estimate. A shuffled-label placebo (≈0.5) accompanies learned separability claims.
entry 0072
Two satellites agreeing is a near-perfect fire confirmation — for the third of fires both happen to see
Date: 2026-06-25 Status: defensible (complete FIRMS archive, 35,008 VIIRS + 6,773 MODIS detections 2019–2024; cross-sensor agreement as a confidence signal, with a verified check on the volcano result) — a positive safety-net result with a hard coverage limit and the usual rule attached
The analogy. If two independent witnesses describe the same event, you believe it. Two different fire satellites flagging the same spot at the same time is almost always a real fire — near-perfect confirmation. The catch is coverage: both satellites only happen to catch the same fire about a third of the time. So agreement confirms strongly, but its absence must never reject — a positive result with an honest limit.
The last entry showed that telling a real fire from a furnace needs to know *where the furnace always is* — a location-persistence prior. That prior is powerful but it has a cost: it needs a catalog, and a catalog can only flag sources it has already watched recur. So it is worth asking whether there is an independent confirmation signal that needs no catalog at all — and the obvious candidate is a second satellite. Sicily is watched by two different thermal instruments on different platforms: the 375-metre VIIRS imager on NOAA-20 and the 1-kilometre MODIS imagers on Terra and Aqua. If both independently flag a hot spot at the same place on the same day, that agreement ought to mean something. The question this entry asks is whether it means what you would naively hope — and we went in expecting the *opposite* of what we found.
The worry was this: persistent sources are always hot, so two sensors should agree on them *more* often than on a transient fire that one sensor happens to catch between the other's overpasses — which would make naive "both agree → high confidence" fusion quietly *upweight* the volcano and the refinery. The data say the reverse, and emphatically. Of the 35,008 VIIRS detections, those that have a same-day MODIS detection within about a kilometre are 33.9% of vegetation fires but 0.0% of volcano, 1.8% of static-industrial, and 2.7% of offshore detections. Among everything VIIRS flags, 12.2% are non-fire sources; among the subset that a second satellite confirms, that falls to 0.5% — a more than twenty-fold gain in purity. Two satellites agreeing is, in this archive, a 99.5%-pure fire signal, and it requires no location database whatsoever.
The volcano number — exactly zero out of 1,881 — was suspicious enough to verify before trusting, because a clean zero is as often a bug as a fact. It is a fact. When VIIRS flags Etna's volcanic heat, the nearest same-day MODIS detection anywhere in Sicily sits a median of 49 kilometres away, and not one is within two kilometres. MODIS detects the Etna area only 87 times in six summers against VIIRS's thousands, because Etna's persistent anomaly is a small, largely nocturnal hot spot that a 1-kilometre nighttime pixel simply does not register. That is the mechanism behind the whole result: the persistent false sources are *weak* and *nocturnal* — 95% of them are night detections, as the previous entry found — and a coarser instrument on a different orbit misses them. A real vegetation fire is hot, often daytime, and big enough that when the two overpasses overlap both sensors see it. Agreement filters furnaces not because it knows they are furnaces, but because furnaces are too faint for two independent eyes to catch at once.
The limit is just as important as the result, and it is a limit of *coverage*, not of trust. Only about a third of real fires are corroborated, and the reason is orbital, not physical: VIIRS and MODIS cross Sicily at different times, so a fire that is burning during one pass and not the other, or that flares between overpasses, is seen once and confirmed never. The two-thirds of vegetation fires with no second-sensor match are overwhelmingly real fires the other satellite's orbit missed, not false alarms — which means cross-sensor agreement can only ever be a signal that promotes confidence *upward*, never one that rejects. To treat a single-sensor detection as suspect because it lacks a partner would be to throw away most of the real fires on the island, the exact failure this project refuses. Agreement is a confirm-up tier; silence from the second sensor means nothing.
Put beside the previous entry, the two results compose into a clean tiered-confidence design that uses each signal for what it is good at. A detection that two satellites confirm is a near-certain fire, instantly, with no catalog — the highest-confidence tier, covering about a third of fires. A single-sensor detection that does *not* sit on a known persistent-source location is a probable fire to be acted on. A single-sensor detection that *does* sit on a recurring hot-spot is the one to treat as a likely furnace. Persistence catches the false sources by remembering where they are; cross-sensor agreement confirms the real ones by catching them twice at once; and neither is asked to do the other's job. The catalog-free confirmation is the genuinely new piece — a way to mark a third of Sicily's fires as high-confidence in real time from two public feeds, before any location prior has had a chance to learn anything.
Claim status: SUPPORTED.
Current literature & PHOENIX alignment.Verdict: AGREEMENT — multi-sensor fusion and event-based confirmation are exactly the directions the current literature endorses.
What the external literature reinforces: High-temporal multi-source fusion [2] and FCI event tracking [5][7] support cross-instrument confirmation; PHOENIX adds the operational 'confirm-up only' rule.
Where PHOENIX is stricter or diverges: Agreement between two sensors confirms strongly, but coverage is partial (only a fraction of fires are co-observed), so absence of a second sensor must never reject a candidate.
What this post does not claim: Multi-sensor precision figures (~99%) apply only to the subset of fires multiple sensors happen to see; they are not a system-wide recall claim.
Next research test: Wire higher-cadence FCI/SLSTR/MODIS into the voter; quantify the co-observation coverage fraction per fire class; add provenance/source-independence audits to the corroboration logic.
Public data sources:NASA FIRMS active-fire archive + area API (VIIRS/MODIS/SLSTR truth) · EUMETSAT Data Store MSG-SEVIRI L1.5 · EUMETSAT Data Store MTG-FCI L1c / FCI-AF L2 · Copernicus Data Space Sentinel-3 SLSTR. Every figure in this entry is reproducible from these public sources with no access to PHOENIX infrastructure; the method is stated above and in any linked code.
Statistical reporting: proportions are quoted with Wilson 95% confidence intervals and ranking metrics (AUC) with bootstrap 95% CIs; read the interval and the sample size n, not the point estimate. A shuffled-label placebo (≈0.5) accompanies learned separability claims.
entry 0073
A fire-danger map that peaks on a volcano: furnace contamination in the climatology, and the small ceiling correction it was hiding
Date: 2026-06-25 Status: correction + defensible (complete archive; the pre-ignition climatology and cross-year model rebuilt with the persistent-source mask applied to the labels) — the false-positive doctrine turned back on our own training data, with a measurable consequence
The analogy. Our ‘where fire starts’ map had a tell-tale flaw: its hottest spot was Mount Etna — a volcano, not a wildfire zone. The pre-ignition and false-positive threads were built from the same raw data but never compared notes, letting furnace heat leak into the danger map. Masking the persistent sources fixes the map and slightly corrects the ceiling. Data hygiene, caught by turning our own false-positive doctrine on our training labels.
The pre-ignition work and the false-positive work were built on the same raw material — the FIRMS detection archive — but they never talked to each other, and that gap turns out to hide a mistake. The false-positive thread established that roughly an eighth of the detections are not wildfires at all but persistent hot sources: Etna and Stromboli, the Priolo–Augusta and Gela and Milazzo industrial sites. The pre-ignition thread built its single strongest feature, the per-cell climatology, by counting *all* detections in each cell. Putting those two facts together for the first time produces an uncomfortable question: if the climatology counts a volcano's daily lava glow as "fire," what does the fire-danger map actually rank highest? The answer is exactly what you would fear. The top two cells in the raw climatology are Mount Etna — 1,979 and 1,252 detections, 98% and 97% of them volcanic — and the third is a contaminated coastal-industrial cell. The model's most important feature, asked where Sicily is most fire-prone, points first and most confidently at an active volcano.
Fixing it is a one-line application of the doctrine the false-positive work already validated: before building the climatology, drop the detections that fall in the persistent-furnace cells — the roughly one-kilometre locations that light up on fifteen or more distinct days, the always-on signature that the stationarity work showed is cleanly distinct from the few-days-a-year fire backbone. Twenty-six such cells exist; they hold 3,824 detections, 9% of the archive. With them removed, the climatology's top five cells become entirely genuine wildfire ground — the western-Sicily and Palermo-hinterland cells that carry zero false sources — and Etna drops out of the danger map entirely. The map now ranks fire country by fire, not by lava. For any downstream use of this layer — a danger overlay, a pre-positioning prior, a public-facing risk map — that correction is the whole point: a wildfire product should not tell a fire service that the single most dangerous place on the island is the one place that burns for reasons no fire service can do anything about.
The contamination was also quietly inflating the headline number, and the honest accounting matters. Etna's cells do not just rank high in the climatology; they appear in the training data as cells that are detected as "burning" nearly every single day, which makes them *trivially* predictable positives — the model scores them correctly with no skill required, and that free accuracy props up the cross-year AUC. Rebuilding the model on the furnace-cleaned labels moves the score from 0.803 to 0.793, a drop of 0.010, as 1,386 of those easy always-positive cell-days leave the positive class. It is a small correction, but it is real and it runs in the honest direction: the genuine difficulty of predicting *wildfire* over Sicily is very slightly higher than the contaminated number implied, because some of that 0.80 was the model being rewarded for "predicting" a volcano that erupts on schedule. Stacked on the earlier, larger correction — the sparse-sampling fix that brought the ceiling down from an inflated 0.86 to 0.80 — the fully honest pre-ignition ceiling for next-day *wildfire* ignition settles at about 0.79.
The wider lesson is the one worth keeping. A false-positive filter is usually thought of as a thing you apply to the live feed, at the output end, to keep furnaces out of alerts. But the same furnaces are sitting in the *training* data, in the climatology, in every per-cell statistic a model learns from, and there they do their damage silently — not as a visible bad alert but as a mis-ranked map and a flattering metric. The persistent-source mask earns its keep twice over: once at the output, where it removes 89% of the false-source contamination from the feed, and once at the input, where it stops a volcano from teaching the model what a wildfire looks like. Cleaning the data the model learns from is the same job as cleaning the alerts it emits, and until this pass the pre-ignition side of the system had only been doing half of it.
Claim status: CORRECTED / SUPERSEDED.
Current literature & PHOENIX alignment.Verdict: AGREEMENT on covariates and method, plus an explicit self-correction the literature would demand. Regional ML occurrence models [12][29] use the same land-cover / weather / human-geography covariates and temporally-held-out evaluation we use.
What the external literature reinforces: Mediterranean/North-African occurrence ML [12][29] validates climatology + weather + human geography as separable layers with temporal holdout and SHAP-style attribution.
Where PHOENIX is stricter or diverges: Earlier PHOENIX susceptibility AUCs were inflated by non-burnable sea, bare-rock and urban cells. On comparable burnable-land cells the useful signal is real but modest (AUC ~0.80 vs climatology ~0.76), validated leakage-free by a shuffled-feature placebo (~0.50). Fuel-moisture reviews [13][14] indicate the next gain needs real spatial fuel/fuel-moisture, not more model complexity.
What this post does not claim: The susceptibility layer is a useful static prior for triage and sensor/node placement, NOT a breakthrough location predictor; report the all-cell metric as diagnostic and the burnable-land metric for claims.
Next research test: Add live/dead fuel-moisture (LFMC), fuel load and crop/stubble seasonality; report calibration (Brier, reliability, decile lift); keep 'where fire can happen' separate from 'when it happens'.
Public data sources:NASA FIRMS active-fire archive + area API (VIIRS/MODIS/SLSTR truth) · Copernicus CDS ERA5 / ERA5-Land · ESA WorldCover 10 m land cover · JRC GHSL built-up + population · Hansen Global Forest Change tree cover · OpenStreetMap / GRIP roads. Every figure in this entry is reproducible from these public sources with no access to PHOENIX infrastructure; the method is stated above and in any linked code.
Statistical reporting: proportions are quoted with Wilson 95% confidence intervals and ranking metrics (AUC) with bootstrap 95% CIs; read the interval and the sample size n, not the point estimate. A shuffled-label placebo (≈0.5) accompanies learned separability claims.
entry 0080
The truth arrives nine hours late
Date: 2026-06-26 Status: defensible (end-to-end detection→ingest latency measured on ~110k real feed records across 16 sources; comparison drawn between timezone-unambiguous UTC-stamped external feeds)
The analogy. A fire alarm's worth is decided less by how accurate it is than by how late it rings. We measured, for 110,000 real records across 16 feeds, the gap between when a fire was detected and when PHOENIX actually got the data. The answer reorganises the whole stack: the most trusted ground-truth (polar FIRMS) arrives about nine hours late. The grade is pure latency — two timestamps, no modelling.
PHOENIX exists to raise the alarm early. That goal quietly decides which sensors can do which job, and the deciding factor is not how *accurate* a sensor is but how *late* its data arrives. So we measured it directly: for every fire record from every feed, the gap between the fire's physical detection time and the moment PHOENIX actually receives the data. No modelling, no labels — just two timestamps per record across roughly 110,000 of them. The answer reorganizes how you should think about the whole sensor stack, because the feeds everyone treats as "ground truth" turn out to be the slowest things in the building.
The cleanest comparison is between two *external* feeds whose timestamps are both unambiguous UTC, so no clock convention can confound them. EUMETSAT's geostationary active-fire product, mtg_af_l2, reaches PHOENIX a median of 23 minutes after detection (10th–90th percentile 20–29 min) — it watches Sicily continuously from 36,000 km and ships a detection within the half-hour. NASA's FIRMS VIIRS, the polar-orbiting product that the wildfire community (and much of our own validation) treats as the reference answer, arrives a median of 9.1 hours later for NOAA-20, 8.8 h for SNPP, 8.6 h for NOAA-21. The two products look at the same island and both find real fires; one is useful for a first alert and the other simply is not, and the difference is two orders of magnitude in time. That gap is not a flaw in VIIRS — it is the price of a 375-metre polar instrument that must overpass, downlink, and run NASA's near-real-time processing before anyone sees it.
Sort every feed this way and a clear three-tier structure falls out, defined purely by latency. First-alert tier (minutes): the geostationary products — mtg_af_l2 at 23 min, and PHOENIX's own in-house detectors that run on the live MTG feed and surface within their processing cycle (their raw timestamps are locally-stamped so we won't quote a false-precision number, but they sit firmly in this fast class, which is the entire reason we run them). Ground truth from the Vigili del Fuoco also lands here — it's logged the moment it's phoned in. Confirmation tier (hours): FIRMS MODIS at 6.6 h, Sentinel-3 SLSTR at 7.1–7.4 h, FIRMS VIIRS at ~9 h, the TROPOMI atmospheric products at 16 h. Forensic tier (days): Sentinel-1 SAR change at a median of 12.5 days, Landsat-8 at 15.8 days, MAIAC smoke at 17.7 days. These last three are the burn-scar and change-detection sensors we lean on to *confirm* a fire happened — and they cannot, by their orbital nature, tell you anything until the fire is long out.
This is the operational backbone behind a result we already published — that the high-resolution sensors are a *confirmation* net, not a detection net (the multi-sensor safety-net thread). Now we can say *why* in hard numbers: it is not mainly that they miss small fires, it is that they arrive hours to weeks late. An architecture that waited for FIRMS to declare a fire before acting would be, on average, nine hours behind the event — long enough for a Sicilian summer fire to run from a roadside ignition to hectares of burned ground. PHOENIX's design answer is the only one the latencies allow: detect on the geostationary feed in minutes, then let the slower, sharper, higher-resolution instruments roll in over the following hours and days to confirm, grade and learn from what was already flagged. The fast sensor sounds the alarm; the slow sensors write the history.
Two honesties bound the claim. These are *end-to-end* latencies as PHOENIX experiences them — they fold the producer's processing delay together with our own polling cadence, so the ~9 h for VIIRS is "time until PHOENIX can act," not VIIRS's intrinsic spec (NASA's NRT target is tighter; our pull schedule adds to it, and tightening that schedule is a concrete lever worth pulling). And the in-house detectors are deliberately excluded from the precise ranking because their timestamps are locally-stamped rather than UTC; the timezone-safe comparison that carries the argument is geostationary-external (23 min) versus polar-external (9 h), and that gap is real, large, and not an artifact.
Claim status: SUPPORTED.
Current literature & PHOENIX alignment.Verdict: strong AGREEMENT — the newest 2025-2026 FCI literature directly corroborates PHOENIX's FCI-first strategy.
What the external literature reinforces: Xu et al. [5] report MTG-FCI detects fires earlier and finds many more active-fire pixels than SEVIRI, with improved small-fire FRP; Paugam et al. [6][7] derive fire-arrival maps, rate-of-spread and persistent event IDs from FCI — the event-tracking direction PHOENIX has queued; EUMETSAT [8] confirms the operational mandate.
Where PHOENIX is stricter or diverges: Literature reports the instrument's potential; PHOENIX additionally requires demonstrable ingest freshness (latest product timestamp, candidate-creation proof) before any operational claim — a system-health condition the papers do not address.
What this post does not claim: FCI should not be framed merely as a SEVIRI replacement, nor as operational while ingest is stale; it is an event tracker, early-detection source, FRP/ROS source and fusion input.
Next research test: Build the FCI event tracker (space-time clustering, persistent IDs, FRP time-series, growth direction); add a time-to-first-candidate metric per real event; compare against FIRMS/VIIRS/MODIS/SLSTR and EFFIS/Copernicus EMS perimeters.
Public data sources:EUMETSAT Data Store MTG-FCI L1c / FCI-AF L2 · EUMETSAT Data Store MSG-SEVIRI L1.5 · NASA FIRMS active-fire archive + area API (VIIRS/MODIS/SLSTR truth). Every figure in this entry is reproducible from these public sources with no access to PHOENIX infrastructure; the method is stated above and in any linked code.
Statistical reporting: proportions are quoted with Wilson 95% confidence intervals and ranking metrics (AUC) with bootstrap 95% CIs; read the interval and the sample size n, not the point estimate. A shuffled-label placebo (≈0.5) accompanies learned separability claims.
entry 0084
Geostationary doesn't see the fire sooner — it tells us sooner
Date: 2026-06-26 Status: defensible (symmetric episode-matched detection-time comparison on real PHOENIX vs FIRMS detections, robust across detector subsets; a biased first cut is shown and discarded)
The analogy. It sounds obvious that a satellite watching Sicily non-stop must spot each fire earlier than one passing twice a day — but obvious isn't the same as true. On matched fires we tested it carefully (and threw out a biased first cut). Geostationary doesn't see fires earlier — it tells us sooner, because its data is delivered in minutes, not hours. The grade is detection-time difference on the same fires; the advantage is delivery, not vision.
The previous entry (0084) showed that PHOENIX's value is speed: its geostationary detections are *delivered* in minutes while polar FIRMS arrives some nine hours later. That invites an intuitive next claim — that geostationary, watching Sicily continuously, must also *see* each fire earlier than a polar satellite that only passes twice a day. It is the kind of claim that sounds obviously true and is worth testing precisely because of that. We tested it on matched fires, and it is false: on the fires both systems detect, PHOENIX does not see them sooner. The early-warning advantage is real, but it lives entirely in the delivery pipeline, not in the moment of detection.
First, the trap, because we nearly fell into it. A naive matching — for each FIRMS fire, take the *earliest* PHOENIX detection within a day before it — reports that PHOENIX leads 79% of the time by a median of 8.4 hours. That number is an artifact. A fire that burns for hours generates a stream of PHOENIX detections, and reaching back up to 24 hours to grab the earliest one, while only looking 6 hours forward, manufactures a positive lead out of an asymmetric window. The honest test compares each sensor's *first* detection of the *same* fire episode, symmetrically: cluster all detections — PHOENIX and FIRMS together — into space-time episodes, and for each episode that both sensors caught, subtract PHOENIX's earliest detection time from FIRMS's earliest. No reach-back, no asymmetry.
Done that way, the lead collapses to nothing. Across the shared fire episodes PHOENIX's detection came first just 49% of the time — a coin flip — with a median lead of zero hours (mean within an hour of zero). It is robust: drop the two anomalous detector-flood days and it is 44% with a slightly *negative* median; split it by detector and every one lands the same way — subpixel 40%, FCI 41%, wind-diff 49%, all at or just below an even split, none showing a real head start. If anything FIRMS detects marginally first. The reason is a genuine physical trade-off. PHOENIX's geostationary detectors watch without blinking but through a coarse three-kilometre pixel; FIRMS only passes overhead twice a day but sees through a 375-metre pixel that catches a fire while it is still small. The continuous-watch advantage and the sharp-pixel advantage very nearly cancel, and the net detection-time difference is zero.
That same matching tells us something we already expected from the sensitivity floor: PHOENIX's own detectors independently catch only about 16–20% of the FIRMS fire episodes. The other ~80% sit below the geostationary pixel's reach — the small fires that a 375-metre instrument resolves and a three-kilometre one cannot, exactly the floor we have characterized before. So the matched-fire population here is the *larger* fires, the ones PHOENIX can see at all — and even on those, it does not see them earlier.
Put 0084 and this entry together and the picture is sharp and a little counterintuitive. The fire becomes visible to PHOENIX and to FIRMS at about the same moment. What differs by nine hours is not when each *sees* it but when each can *act* on it: PHOENIX runs its detector in-house on the live geostationary feed and has an answer in minutes, while the FIRMS detection has to overpass, downlink, process and be polled before it reaches anyone. The early-warning win is a *delivery* win, not a *detection* win — and that matters operationally, because it says the lever for catching fires earlier is not a sharper or faster-staring sensor (the detection moment is already as early as the physics allows) but a tighter delivery pipeline: our own low-latency detectors, and a faster pull of the external feeds we depend on. Two caveats bound it: this covers only the fires PHOENIX detects at all (the ~20% above its floor), and it is one early-summer window over Sicily — but within those bounds the result is clean, and it corrects a claim we would have been tempted to make.
Claim status: SUPPORTED.
Current literature & PHOENIX alignment.Verdict: strong AGREEMENT — the newest 2025-2026 FCI literature directly corroborates PHOENIX's FCI-first strategy.
What the external literature reinforces: Xu et al. [5] report MTG-FCI detects fires earlier and finds many more active-fire pixels than SEVIRI, with improved small-fire FRP; Paugam et al. [6][7] derive fire-arrival maps, rate-of-spread and persistent event IDs from FCI — the event-tracking direction PHOENIX has queued; EUMETSAT [8] confirms the operational mandate.
Where PHOENIX is stricter or diverges: Literature reports the instrument's potential; PHOENIX additionally requires demonstrable ingest freshness (latest product timestamp, candidate-creation proof) before any operational claim — a system-health condition the papers do not address.
What this post does not claim: FCI should not be framed merely as a SEVIRI replacement, nor as operational while ingest is stale; it is an event tracker, early-detection source, FRP/ROS source and fusion input.
Next research test: Build the FCI event tracker (space-time clustering, persistent IDs, FRP time-series, growth direction); add a time-to-first-candidate metric per real event; compare against FIRMS/VIIRS/MODIS/SLSTR and EFFIS/Copernicus EMS perimeters.
Public data sources:EUMETSAT Data Store MTG-FCI L1c / FCI-AF L2 · EUMETSAT Data Store MSG-SEVIRI L1.5 · NASA FIRMS active-fire archive + area API (VIIRS/MODIS/SLSTR truth). Every figure in this entry is reproducible from these public sources with no access to PHOENIX infrastructure; the method is stated above and in any linked code.
Statistical reporting: proportions are quoted with Wilson 95% confidence intervals and ranking metrics (AUC) with bootstrap 95% CIs; read the interval and the sample size n, not the point estimate. A shuffled-label placebo (≈0.5) accompanies learned separability claims.
entry 0085
References
Primary sources and Consensus links used in the literature-alignment blocks above. Full annotated mapping in the repository (references/LITERATURE_ADDENDUM_v1.md).
Ghali & Akhloufi, 2023, Fire — Deep Learning Approaches for Wildland Fires Using Satellite Remote Sensing Data: Detection, Mapping, and Prediction. link
Zhang et al., 2024, Neurocomputing — 10-minute forest early wildfire detection: fusing multi-type and multi-source information via recursive transformer. link
Wang et al., 2024, Geo-spatial Information Science — FASDD: an open flame and smoke detection dataset for deep learning. link
Dong & Wang, 2025, Remote Sensing — HybriDet: a hybrid CNN+Transformer for wildfire detection. link
Xu et al., 2026, Science of Remote Sensing — Major Improvements in Spaceborne Early Fire Detection and Small-Fire FRP Retrieval with MTG-FCI. link
Paugam et al., 2025 — Fire behaviour monitoring using Meteosat Third Generation (FCI-FireDyn algorithm). link
Paugam et al., 2026 — Leveraging MTG-FCI fire observations for event-based fire behaviour monitoring. link
EUMETSAT — Meteosat Third Generation (FCI + Lightning Imager). link
NASA Earthdata — VIIRS I-Band 375 m Active Fire Data (confidence field definition). link
NASA FIRMS — Fire Information for Resource Management System. link
Dhage, 2025 — Systematic Absence of Low-Confidence Nighttime Fire Detections in the VIIRS Active Fire Product. link
Ouazri et al., 2026 — Machine-learning wildfire occurrence prediction (ERA5/FWI/FIRMS, northern Morocco). link
Han et al., 2026, Forests — A Comparative Review of Wildfire Danger Rating Systems: fuel-moisture modeling frameworks. link
McNorton & Di Giuseppe, 2024, Biogeosciences — A global fuel characteristic model and dataset for wildfire prediction. link
Jakubik et al., 2023 — Prithvi: foundation models for generalist geospatial AI (HLS; wildfire-scar fine-tune). link
Szwarcman et al., 2024, IEEE TGRS — Prithvi-EO-2.0: a versatile multitemporal EO foundation model. link
Shibli, Nascetti & Ban, 2026 — Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping using Sentinel-2. link
Hong et al., 2023, IEEE TPAMI — SpectralGPT: spectral remote-sensing foundation model. link
Liu et al., 2023, IEEE TGRS — RemoteCLIP: a vision-language foundation model for remote sensing. link
Li et al., 2025, IEEE TGRS — FlexiMo: a flexible remote-sensing foundation model (sensor/resolution heterogeneity). link
Torrisi, 2025, Annals of Geophysics — Integrated ML for volcanic cloud tracking: Etna lava fountains 2020-2022. link
Torrisi et al., 2024 — Deep learning + geostationary remote sensing for volcanic-cloud monitoring (ABI/SEVIRI Ash RGB). link
Corradino et al., 2023, IEEE TGRS — Detection of subtle thermal anomalies: deep learning on the ASTER global volcano dataset. link
Copernicus Data Space — Sentinel-5P / TROPOMI documentation (SO2, aerosol index). link
Misra, Moorthi & Dhar, 2026 — Quantum annealing for remote-sensing data processing: a review of optimization applications. link
Dent et al., 2026, Communications Engineering — Network separation modeling and quantum computing for wildfire fuelbreak strategy. link
Rainjonneau et al., 2023, IEEE JSTARS — Quantum algorithms applied to satellite mission planning for Earth observation. link
Google OR-Tools — CP-SAT solver documentation (mandatory classical baseline). link
Purnama et al., 2024 — Mediterranean forest-fire vulnerability ML in Turkiye (land cover, roads, population, weather, terrain). link
Zhang, Gao & Shi, 2025 — Lightning-ignited wildfire prediction in Texas (contrast region). link
Bountzouklis et al., 2023 — Explainable AI for wildfire ignition causes in Southern France. link
Kurchaba et al., 2022 — TROPOMI ship-plume NO2 segmentation (satellite plume ML is feasible but low-SNR). link
Emergency management — multi-hazard situational awareness for INGV
In plain terms. INGV does not only watch for fire — it is Italy's multi-hazard
watch floor: earthquakes, eruptions, ash, ground swelling and tsunamis, feeding Civil
Protection around the clock. ADRIZ is not a replacement for that authoritative network.
It is a fusion + independent-corroboration layer on top of it: it pulls third-party
satellites, cameras and (optionally) hyperlocal ground nodes into one auditable picture,
faster, and cross-checks each hazard signal against independent evidence. Where INGV's
instruments are the truth, ADRIZ adds speed, a second witness, cross-modal correlation, and
coverage in the gaps — with the same honesty as the rest of this site: proven,
scoped, and roadmap items are labelled, and every number carries its source and confidence.
Scope. ADRIZ's contribution is situational awareness / decision support, independent
corroboration, and hyperlocal ground sensing. It deliberately does not duplicate INGV's
authoritative seismic, volcanic-surveillance or tsunami-warning networks; it consumes their
public products and adds value around them. Tsunami warning is explicitly out of scope.
Multi-hazard capability map — what ADRIZ adds, and how honestly
Hazard lane
What ADRIZ adds
Status
Public source
Honest limit
Volcanic — lava effusion extent
20 m quantitative active-lava footprint from Sentinel-2 SWIR (NHI), e.g. 18.5 ha on 2025-08-21 — a number the weekly bulletin narrative does not carry.
PROVEN
Copernicus / Element84 Sentinel-2 L2A
5-day optical revisit; a short paroxysm seen >2–3 days late is missed (the Jun-2025 +3d miss).
Wildfire-vs-volcanic on INGV's own camera frames: 9.7% volcanic false-alarm [4.5%, 19.5%], n=62; 4.8–6.5% at operating confidence; wildfire recall 96.8%.
PROVEN
INGV-OE EtnaTVChn (CC BY 4.0)
Flame-vs-lava needs an orthogonal cue (geometry + off-vent corroboration); real on-camera wildfire positives are rare (synthetic-augmented).
Ground deformation (InSAR)
Sentinel-1 line-of-sight deformation (HyP3/GAMMA) as an independent inflation/deflation + coseismic signal.
PRODUCT
Copernicus Sentinel-1 · ASF HyP3
Atmospheric noise + revisit; automated alerting and time-series inversion are roadmap, not yet claimed.
Seismic & cascading impacts
Hyperlocal ground seismic node (P-wave puck) + post-earthquake fire-ignition awareness (broken gas/power). INGV's seismic network is authoritative; ADRIZ adds a hyperlocal node and the cross-hazard fire-after-quake link.
SCOPED
ground node (build not authorized) · NASA FIRMS
Does not replace INGV seismology; node is scoped only. We MEASURED the quake→wildfire correlation for Sicily (INGV catalogue × FIRMS, case-crossover) and it is NULL — no cascade (research entry 0089). So this lane is hyperlocal shaking detection + the urban/structural post-quake fire link from the literature, NOT a vegetation-fire correlation.
Multi-hazard situational awareness
One auditable, fused picture (the “wildfire state” engine generalized) merging satellite + camera + ground + bulletin evidence for the surveillance room, with provenance and confidence on every layer.
ROADMAP
fusion of the above public sources
Conceptual / design stage; value is decision-support and speed, not new authoritative measurement.
Tsunami
—
OUT OF SCOPE
—
No sea-level sensing; this is CAT-INGV's authoritative remit and ADRIZ makes no claim here.
Claim status: EXPLORATORY/positioning — the volcanic and camera lanes are
PROVEN with the numbers shown elsewhere on this site; InSAR is a built PRODUCT; the seismic
node and multi-hazard fusion are SCOPED/ROADMAP and make no measured claim yet.
Current literature & alignment. The volcanic lanes agree with the volcano-remote-
sensing literature (Torrisi 2025/2024; Corradino 2023) and the Sentinel-5P documentation
already cited on the Proof-of-value tab. The deformation and multi-hazard lanes rest on
operational European services — Copernicus Sentinel-1 InSAR and the Copernicus Emergency
Management Service — rather than on ADRIZ measurements; peer-reviewed InSAR-deformation
and multi-hazard-EO references are queued to be added (citation source temporarily rate-limited).
Statistical reporting: every proven rate on this page carries a sample size and a
Wilson 95% confidence interval; read the interval, not the point estimate.
What hazard or lane are we missing? This emergency-management map is a draft for INGV review — tell us which hazards, data sources, or cascading risks to add or drop.
Live data feeds— every public source we pull for Etna fire + volcano tasks
Public Etna cameras (verified)
Camera wall — live flame / smoke watch
Loading…
Every public Etna viewpoint runs the same YOLO
detector + crop-level Qwen3-VL veto, refreshed about every 75 s. The big badge is
the verdict an operator needs at a glance:
🔥 FLAME = a hot/bright crop the
veto confirmed as wildfire (not lava/volcanic);
💨 SMOKE = pass-through
early-warning smoke (never second-guessed);
✓ CLEAR = no wildfire-class hit.
Any FLAME/SMOKE tile floats to the top. Windy cams are low-res coarse corroboration. Offline
sources are shown honestly — never faked.
Loading cameras…
Layers
Volcanic (SO2)
Wildfire active
Quiet AOI
FIRMS fire (size = FRP)
Live camera (day RGB / night thermal)
What you are seeing (for the INGV reader)
This is the system running live on real data over Etna. The orange dots on
the map are NASA FIRMS NRT active-fire detections (VIIRS/MODIS, ~3–6 h latency),
refreshed every 30 min by a serverless worker — not a static snapshot. The coloured AOI
markers are our per-area verdict (volcanic / wildfire / quiet) with our score, confidence and
source timestamp. Below, the system auto-grades itself against your own weekly bulletin
(via GVP 211060). For the curated, verifiable head-to-head — our detections next to your own
reports, with links and honest misses — open the “Proof of value” tab above.
Per-AOI status — OUR assessment
Loading…
Live camera — INGV's own feed (EtnaTVChn)
Loading camera…
Live wildfire alerts — crop-level Qwen3-VL veto
Hot/bright crops (flame, lava) and large/diffuse smoke are routed to a
vision-language verifier per-crop to disambiguate wildfire from volcanic
incandescence/degassing; small distant smoke is never second-guessed (early-warning). Each
alert carries the veto verdict + reasoning and full provenance.
Loading alerts…
ADRIZ vs INGV — auto self-grading
Live agreement against the INGV-OE bulletin
We ingest INGV-OE's public weekly reporting on Etna (via Smithsonian GVP,
volcano 211060) into a machine-readable ground-truth feed and AUTO-COMPARE it against our own
satellite-derived calls. Each rate carries a sample size n and a Wilson 95% CI.
Read the CI, not the point estimate — the overlap is thin and historical.
Loading comparison…
Dimension
Agreement
95% CI
n
per-date detail (our call vs the INGV weekly window)
Obs date
INGV week / state
Our SO2 / fire
Match
What we ingest from INGV-OE
Loading INGV feed…
SO2 here is a plume-presence cue, not an eruptive-state
classifier. Active fire is FIRMS NRT (~3–6 h). INGV bulletins are weekly;
our signals are per-overpass, so per-day calls and the weekly window legitimately differ at
episode boundaries.
Capability slots
Data latency / freshness
Layer
Observed
Age
Typical
Operating model
Real-time (auto-refreshed):
Active fire (FIRMS NRT) — pulled by a serverless Worker on a 30-min cron and
served from edge storage; the page re-checks it every 5 min. Source latency ~3–6 h.
SO2 plume status — refreshed each Sentinel-5P overpass (~daily) by the feed
processor (this preview reads an archived per-scene series).
File-fed (update when their result file is published):detector alerts, multi-source thermal/FRP, escalation flag, burn-scar, fuel-danger.
These are the heavy model layers — they show pending until their result file lands,
then light up. They are not live-polled.