Benchmark Dataset for Catalysis on 2D MXenes

May 30, 2026·
Pavlo Melnyk*
Pavlo Melnyk*
,
Anmar Karmush*
,
Mårten Wadenbäck
,
Ania Beatriz Rodríguez-Barrera
,
Johanna Rosen
,
Michael Felsberg
,
Jonas Björk*
· 2 min read
Abstract
Merging first-principles calculations with machine learning (ML), we aim to accelerate the exploration of catalytic behaviour in novel materials. We focus on two-dimensional (2D) Ti2CTy MXenes, whose versatile surface chemistry makes them particularly compelling candidates for catalysis. Resolving their composition and structure under realistic conditions exceeds the reach of standard density functional theory (DFT) due to computational cost.

To address this challenge, we generate a comprehensive dataset of 50,000 DFT calculations for training and 10,000 for testing, encompassing both Ti2CTy MXene configurations and molecular systems, along with an additional test dataset with 1000 genuinely new, larger systems to investigate how well models generalise. We train and validate widely used and competitive machine learning interatomic potentials (MLIP) models, EquiformerV2, MACE, MatRIS, UPET, and MatRIS that accurately predict atomic forces and formation energies — quantities that DFT must repeatedly compute for structural and catalytic investigations — for these 2D materials.

This combined DFT–ML framework achieves computational acceleration of the order ∼1 − 4 · 10$^3$ (on a CPU) while maintaining desired-level accuracy (∼±10 meV/A for forces and ∼±1 meV for per-atom energies), paving the way for more efficient investigations of MXene catalytic behaviour. Moreover, we perform an extensive qualitative evaluation of the trained models, showcasing the importance of the comprehensive simulation-based comparison beyond the benchmark metrics. The dataset and the trained models with the code are available at https://huggingface.co/datasets/CatalystAnonymous/catalyst_mxenes.
Type
Publication
In arXiv

Merging first-principles calculations with machine learning (ML), we aim to accelerate the exploration of catalytic behaviour in novel materials. We focus on two-dimensional (2D) Ti2CTy MXenes, whose versatile surface chemistry makes them particularly compelling candidates for catalysis. Resolving their composition and structure under realistic conditions exceeds the reach of standard density functional theory (DFT) due to computational cost.

To address this challenge, we generate a comprehensive dataset of 50,000 DFT calculations for training and 10,000 for testing, encompassing both Ti2CTy MXene configurations and molecular systems, along with an additional test dataset with 1000 genuinely new, larger systems to investigate how well models generalise. We train and validate widely used and competitive machine learning interatomic potentials (MLIP) models, EquiformerV2, MACE, MatRIS, UPET, and MatRIS that accurately predict atomic forces and formation energies — quantities that DFT must repeatedly compute for structural and catalytic investigations — for these 2D materials.

This combined DFT–ML framework achieves computational acceleration of the order ∼1 − 4 · 103 (on a CPU) while maintaining desired-level accuracy (∼±10 meV/A for forces and ∼±1 meV for per-atom energies), paving the way for more efficient investigations of MXene catalytic behaviour. Moreover, we perform an extensive qualitative evaluation of the trained models, showcasing the importance of the comprehensive simulation-based comparison beyond the benchmark metrics. The dataset and the trained models with the code are available at https://huggingface.co/datasets/CatalystAnonymous/catalyst_mxenes.