Privacy-Preserving Feature Valuation in Vertical Federated Learning Using Shapley-CMI and PSI Permutation

Laskurain, Unai; Aguirre, Aitor; Zurutuza, Urko

Ver/Abrir

Privacy-Preserving Feature Valuation in Vertical Federated Learning Using Shapley-CMI and PSI Permutation.pdf (708.6Kb)

Registro completo

Impacto

Guarda la referencia

Título

Privacy-Preserving Feature Valuation in Vertical Federated Learning Using Shapley-CMI and PSI Permutation

Autor-a

Laskurain, Unai

Aguirre, Aitor

Zurutuza, Urko

Grupo de investigación

Análisis de datos y ciberseguridad

Otras instituciones

Mondragon Unibertsitatea

Versión

Postprint

Tipo de documento

Contribución a congreso

Idioma

Inglés

Derechos

Acceso

Acceso abierto

URI

https://hdl.handle.net/20.500.11984/13975

Publicado en

IEEE International Conference on Federated Learning Technologies and Applications 3. Dubrovnik (Croatia), 15-17 october, 2025

Editorial

IEEE

Palabras clave

Federated Learning
Security and privacy
ODS 9 Industria, innovación e infraestructura

Materia (Tesauro UNESCO)

Protección de datos

Resumen

Federated Learning (FL) is an emerging machine learning paradigm that enables multiple parties to collaboratively train models without sharing raw data, ensuring data privacy. In Vertical FL (VFL), where each party holds different features for the same users, a key challenge is to evaluate the feature contribution of each party before any model is trained, particularly in the early stages when no model exists. To address this, the Shapley- CMI method was recently proposed as a model-free, informationtheoretic approach to feature valuation using Conditional Mutual Information (CMI). However, its original formulation did not provide a practical implementation capable of computing the required permutations and intersections securely. This paper presents a novel privacy-preserving implementation of Shapley-CMI for VFL. Our system introduces a private set intersection (PSI) server that performs all necessary feature permutations and computes encrypted intersection sizes across discretized and encrypted ID groups, without the need for raw data exchange. Each party then uses these intersection results to compute Shapley-CMI values, computing the marginal utility of their features. Initial experiments confirm the correctness and privacy of the proposed system, demonstrating its viability for secure and efficient feature contribution estimation in VFL. This approach ensures data confidentiality, scales across multiple parties, and enables fair data valuation without requiring the sharing of raw data or training models. [-]

Financiador

Gobierno Vasco

Programa

Ikertalde Convocatoria 2022-2025

Número

IT1676-22

URI de la ayuda

Sin información

Proyecto

Grupo de sistemas inteligentes para sistemas industriales (IKERTALDE 2022-2025)

Colecciones

Congresos - Ingeniería [469]

eBiltegia