Pavlo Melnyk

Pavlo Melnyk Paulus • Molinarius

Postdoctoral Researcher

CVL, Linköping University

De Geometria Discenda

I study symmetry, structure, and equivariance in machine learning, developing geometric representations that bridge learning systems with the structure of the physical world.

I’m currently a postdoctoral researcher in the division of Computer Vision and Learning Systems, Linköping University, where I previously earned my PhD in Electrical Engineering with a specialisation in Computer Vision, focusing on Geometric Deep Learning. I was supervised by Michael Felsberg and funded by WASP. Further details can be found in my CV.

Research Themes

Geometric Deep Learning
Scientific Machine Learning
Robust Perception Systems
3D Vision

Education

PhD Computer Vision

Linköping University, Sweden
MEng Computer Science and Technology

Hunan University, China
BSc Information Security Systems

DonNTU, Ukraine

Featured Publications

E$(n)$-Equivariant Spherical Decision Surfaces

Equivariance

E$(n)$-Equivariant Spherical Decision Surfaces

We present a constructive derivation of exactly E(n)-equivariant spherical decision surfaces by extending prior O(n)-equivariant hypersphere neurons to include translations. To achieve this, we present a decomposition of the features of the O(n)-equivariant neurons and provide explicit representations for translation and E(n)-transformations to fulfil the respective equivariance constraints. The resulting decision surfaces are exactly E(n)-equivariant without input centring or explicit pairwise differences, and admit explicit closed-form matrix representations. In addition, we numerically verify the correctness of the derivations and perform a downstream check of the resulting geometric primitives.

Mar 2, 2026

QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture

Human Pose

QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture

Vision-based 3D human motion capture from videos remains a challenge in computer vision. Traditional 3D pose estimation approaches often ignore the temporal consistency between frames, causing implausible and jittery motion. The emerging field of kinematics-based 3D motion capture addresses these issues by estimating the temporal transitioning between poses instead. A major drawback in current kinematics approaches is their reliance on Euler angles. Despite their simplicity, Euler angles suffer from discontinuity that leads to unstable motion reconstructions, especially in online settings where trajectory refinement is unavailable. Contrarily, quaternions have no discontinuity and can produce continuous transitions between poses. In this paper, we propose QuaMo, a novel Quaternion Motions method using quaternion differential equations (QDE) for human kinematics capture. We utilize the state-space model, an effective system for describing real-time kinematics estimations, with quaternion state and the QDE describing quaternion velocity. The corresponding angular acceleration are computed from a meta-PD controller with a novel acceleration enhancement that adaptively regulates the control signals as the human quickly change to new pose. Unlike previous work, our QDE is solved under the quaternion geometric constraints that results in more accurate estimations. Experimental results show that our novel formulation of the QDE with acceleration enhancement accurately estimates 3D human kinematics with no discontinuity and minimal implausible artifact. QuaMo outperforms comparable state-of-the-art methods on multiple datasets, namely Human3.6M, Fit3D, SportsPose and a subset of AIST. The code is available at https://github.com/cuongle1206/QuaMo.

Jan 26, 2026

On the Role of Rotation Equivariance in Monocular 3D Human Pose Estimation

Equivariance

On the Role of Rotation Equivariance in Monocular 3D Human Pose Estimation

Estimating 3D from 2D is one of the central tasks in computer vision. In this work, we consider the monocular setting, i.e. single-view input, for 3D human pose estimation (HPE). Here, the task is to predict a 3D point set of human skeletal joints from a single 2D input image. While by definition this is an ill-posed problem, recent work has presented methods that solve it with up to several-centimetre error. Typically, these methods employ a two-step approach, where the first step is to detect the 2D skeletal joints in the input image, followed by the step of 2D-to-3D lifting. We find that common lifting models fail when encountering a rotated input. We argue that learning a single human pose along with its in-plane rotations is considerably easier and more geometrically grounded than directly learning a point-to-point mapping. Furthermore, our intuition is that endowing the model with the notion of rotation equivariance without explicitly constraining its parameter space should lead to a more straightforward learning process than one with equivariance by design. Utilising the common HPE benchmarks, we confirm that the 2D rotation equivariance per se improves the model performance on human poses akin to rotations in the image plane, and can be efficiently and straightforwardly learned by augmentation, outperforming state-of-the-art equivariant-by-design methods.

Jan 20, 2026

Equivariant Modelling for Catalysis on 2D MXenes

Materials Science

Equivariant Modelling for Catalysis on 2D MXenes

Merging advanced computations with machine learning, we aim to accelerate the exploration of catalytic behaviour in novel materials. We focus on two-dimensional (2D) Ti$_2$CT$_y$ MXenes, whose versatile surface chemistry makes them particularly compelling candidates for catalysis. However, resolving their composition and structure under realistic conditions requires going beyond the systems typically studied with density functional theory (DFT), as the computational cost of such calculations limits accessible system sizes and timescales, calling instead for more efficient approaches. To address this challenge, we generate a comprehensive dataset of 50,000 DFT calculations for training and 10,000 for testing, encompassing both Ti$_2$CT$_y$ MXene configurations and molecular systems, along with an augmented dataset where systems are artificially repeated to investigate how well models generalise to larger systems.Employing advances in geometric deep learning, we train and validate an equivariant (\ie symmetry-aware) model (EquiformerV2) that accurately predicts atomic forces and formation energies — quantities that DFT must repeatedly compute for structural and catalytic investigations — for these 2D materials. This combined DFT–ML framework achieves computational acceleration of the order ${\sim}10^3$–$10^4$ (on a CPU) while maintaining DFT-level accuracy (${\sim} {\pm} 45$ meV/Å for forces and ${\sim} {\pm} 6$ meV for per-atom energies), paving the way for more efficient investigations of MXene catalytic behaviour. Moreover, we confirm that the total energy prediction error of the model grows linearly with the number of atoms in an input system, while the force error remains the same, which, along with the equivariant model design, is a necessity for a robust model. The dataset and the trained models with the code are available at \url{https://github.com/CataLiUst}.

Nov 24, 2025

O$n$ Learning Deep O$(n)$-Equivariant Hyperspheres

Equivariance

O$n$ Learning Deep O$(n)$-Equivariant Hyperspheres

In this paper, we utilize hyperspheres and regular $n$-simplexes and propose an approach to learning deep features equivariant under the transformations of $n$D reflections and rotations, encompassed by the powerful group of $\text{O}(n)$. Namely, we propose $\text{O}(n)$-equivariant neurons with spherical decision surfaces that generalize to any dimension $n$, which we call Deep Equivariant Hyperspheres. We demonstrate how to combine them in a network that directly operates on the basis of the input points and propose an invariant operator based on the relation between two points and a sphere, which as we show, turns out to be a Gram matrix. Using synthetic and real-world data in $n$D, we experimentally verify our theoretical contributions and find that our approach is superior to the competing methods for $\text{O}(n)$-equivariant benchmark datasets (classification and regression), demonstrating a favorable speed/performance trade-off.

Jul 22, 2024

TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis

Point Clouds

TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis

In many practical applications 3D point cloud analysis requires rotation invariance. In this paper we present a learnable descriptor invariant under 3D rotations and reflections i.e. the O(3) actions utilizing the recently introduced steerable 3D spherical neurons and vector neurons. Specifically we propose an embedding of the 3D spherical neurons into 4D vector neurons which leverages end-to-end training of the model. In our approach we perform TetraTransform—an equivariant embedding of the 3D input into 4D constructed from the steerable neurons—and extract deeper O(3)-equivariant features using vector neurons. This integration of the TetraTransform into the VN-DGCNN framework termed TetraSphere negligibly increases the number of parameters by less than 0.0002%. TetraSphere sets a new state-of-the-art performance classifying randomly rotated real-world object scans of the challenging subsets of ScanObjectNN. Additionally TetraSphere outperforms all equivariant methods on randomly rotated synthetic data: classifying objects from ModelNet40 and segmenting parts of the ShapeNet shapes. Thus our results reveal the practical value of steerable 3D spherical neurons for learning in 3D Euclidean space.

Jun 17, 2024

Steerable 3D Spherical Neurons

Steerability

Steerable 3D Spherical Neurons

Emerging from low-level vision theory, steerable filters found their counterpart in prior work on steerable convolutional neural networks equivariant to rigid transformations. In our work, we propose a steerable feed-forward learning-based approach that consists of neurons with spherical decision surfaces and operates on point clouds. Such spherical neurons are obtained by conformal embedding of Euclidean space and have recently been revisited in the context of learning representations of point sets. Focusing on 3D geometry, we exploit the isometry property of spherical neurons and derive a 3D steerability constraint. After training spherical neurons to classify point clouds in a canonical orientation, we use a tetrahedron basis to quadruplicate the neurons and construct rotation-equivariant spherical filter banks. We then apply the derived constraint to interpolate the filter bank outputs and, thus, obtain a rotation-invariant network. Finally, we use a synthetic point set and real-world 3D skeleton data to verify our theoretical findings.

Jul 19, 2022

Embed Me If You Can: A Geometric Perceptron

Conformal Embedding

Embed Me If You Can: A Geometric Perceptron

Solving geometric tasks involving point clouds by using machine learning is a challenging problem. Standard feed-forward neural networks combine linear or, if the bias parameter is included, affine layers and activation functions. Their geometric modeling is limited, which motivated the prior work introducing the multilayer hypersphere perceptron (MLHP). Its constituent part, i.e., the hypersphere neuron, is obtained by applying a conformal embedding of Euclidean space. By virtue of Clifford algebra, it can be implemented as the Cartesian dot product of inputs and weights. If the embedding is applied in a manner consistent with the dimensionality of the input space geometry, the decision surfaces of the model units become combinations of hyperspheres and make the decision-making process geometrically interpretable for humans. Our extension of the MLHP model, the multilayer geometric perceptron (MLGP), and its respective layer units, i.e., geometric neurons, are consistent with the 3D geometry and provide a geometric handle of the learned coefficients. In particular, the geometric neuron activations are isometric in 3D, which is necessary for rotation and translation equivariance. When classifying the 3D Tetris shapes, we quantitatively show that our model requires no activation function in the hidden layers other than the embedding to outperform the vanilla multilayer perceptron. In the presence of noise in the data, our model is also superior to the MLHP.

Sep 21, 2021

A High-Performance CNN Method for Offline Handwritten Chinese Character Recognition and Visualization

OCR

A High-Performance CNN Method for Offline Handwritten Chinese Character Recognition and Visualization

Recent researches introduced fast, compact and efficient convolutional neural networks (CNNs) for offline handwritten Chinese character recognition (HCCR). However, many of them did not address the problem of network interpretability. We propose a new architecture of a deep CNN with high recognition performance which is capable of learning deep features for visualization. A special characteristic of our model is the bottleneck layers which enable us to retain its expressiveness while reducing the number of multiply-accumulate operations and the required storage. We introduce a modification of global weighted average pooling (GWAP) - global weighted output average pooling (GWOAP). This paper demonstrates how they allow us to calculate class activation maps (CAMs) in order to indicate the most relevant input character image regions used by our CNN to identify a certain class. Evaluating on the ICDAR-2013 offline HCCR competition dataset, we show that our model enables a relative 0.83% error reduction while having 49% fewer parameters and the same computational cost compared to the current state-of-the-art single-network method trained only on handwritten data. Our solution outperforms even recent residual learning approaches.

May 30, 2020

Recent Publications

Cuong Le, Pavlo Melnyk, Bastian Wandt, Mårten Wadenbäck (2026). Flow Matching for Probabilistic Monocular 3D Human Pose Estimation. TMLR.

PDF Cite Code arXiv

Pavlo Melnyk, Michael Felsberg, Kostas Daniilidis (2026). E$(n)$-Equivariant Spherical Decision Surfaces. ICLR 2026 Workshop GRaM.

PDF Cite Code

Cuong Le, Pavlo Melnyk, Urs Waldmann, Mårten Wadenbäck, Bastian Wandt (2026). QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture. ICLR 2026.

PDF Cite Code arXiv

Pavlo Melnyk, Cuong Le, Urs Waldmann, Per-Erik Forssén, Bastian Wandt (2026). On the Role of Rotation Equivariance in Monocular 3D Human Pose Estimation. arXiv preprint.

PDF Cite DOI arXiv

Pavlo Melnyk*, Anmar Karmush*, Ania Beatriz Rodríguez-Barrera, Mårten Wadenbäck, Michael Felsberg, Johanna Rosen, Jonas Björk* (2025). Equivariant Modelling for Catalysis on 2D MXenes. EurIPS 2025 Workshop on SIMBIOCHEM Spotlight (non-archival).

PDF Cite Code Dataset Project

See all publications

News

Paper accepted at TMLR

May 25, 2026

“Flow Matching for Probabilistic Monocular 3D Human Pose Estimation” has been accepted at TMLR.

May 25, 2026

Paper accepted at GRaM @ ICLR'26 (PMLR track)

Mar 2, 2026

“E$(n)$-Equivariant Spherical Decision Surfaces” has been accepted at ICLR 2026 Workshop on Geometry-grounded Representation Learning and Generative Modeling.

Mar 2, 2026

Paper accepted at ICLR'26

Jan 26, 2026

“QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture” has been accepted at ICLR'26.

Jan 26, 2026

WASP 10-Year Anniversary Feature

Oct 24, 2025

I was delighted to be interviewed and featured in the WASP 10-year anniversary article series.

Oct 24, 2025

PhD Graduation Ceremony

May 24, 2025

I had the honor of participating in the centuries-old tradition along with some extraordinary honorary doctors.

May 24, 2025