Comparing deep learning models for phase picking tasks

Sérgio Oliveira¹, Luís Gomes², João Luís Gaspar¹, Sérgio Moro³

Affiliations: ¹Instituto de Investigação em Vulcanologia e Avaliação de Riscos, Universidade dos Açores, Ponta Delgada, Portugal; ²Faculdade de Ciências e Tecnologia, Universidade dos Açores, Ponta Delgada, Portugal; ³Information Sciences, Technologies and Architecture Research Centre, ISCTE-IUL, Lisboa, Portugal;

Presentation type: Poster

Presentation time: Monday 16:30 - 18:30, Room Poster Hall

Poster Board Number: 168

Programme No: 3.1.60

Theme 3 > Session 1

Abstract

Automating the process of locating an earthquake's hypocentral location and calculating its magnitude is essential to ensure the activation of early seismic warning systems. In volcanic regions, this fact becomes even more significant, as it can also allow the location and characterization of possible magmatic sources and monitor in almost real time the direction and propagation of the fracturing that normally precedes an eruption. Deep learning approaches are showing that it can outperform classical approaches by achieving performance that that rivals humans. However, because recent researches differ in their datasets and evaluation tasks, it is unclear how the various models compare to one another when applied to unknown data. Most models are evaluated on only a single dataset, and their performance on new, different data is hard to predict. This study aims to offer a comparison between five previously published models (BasicPhaseAE, EQTransformer, PhaseNet, PhaseNetLight and GDP) trained with four different datasets (ETHZ, Geofon, Instance, Stead). The data used to test the models in this study belongs to the seismo-volcanic crisis occurring since 2022 on Terceira Island in the Azores archipelago. Here we show that there can exist considerable differences in the same model when trained with a different dataset, even between phases, making the choice for any particular model harder to make. The best performer for P phases was GDP, trained with the Stead dataset, with an F1 score of 79%, and for S phases was Phasenet trained with the Instance dataset with an F1 score of 77%.