Paper
1 April 2024 Bridging efficacy and efficiency: Innovations in Shapley value estimation for model-agnostic data valuation in machine learning
Yilu Yang
Author Affiliations +
Proceedings Volume 13077, Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024); 1307708 (2024) https://doi.org/10.1117/12.3027116
Event: 4th International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 2024, Chicago, IL, United States
Abstract
The escalating advancement of generative AI models amplifies the imperative for adept data valuation techniques. Amidst a myriad of methodologies, various Shapley value estimation techniques, such as Data Shapley, have garnered attention for their proficient data valuation capabilities, despite computational challenges when grappling with large datasets. This paper introduces an innovative, empirically-driven batch method, aiming to expedite data valuation while preserving precision. This method strategically optimizes training batch sizes and testing subsets, effectively striking a balance between computational efficiency and valuation accuracy, a critical step forward given the substantial volume of data processed in contemporary machine learning tasks. A thorough evaluation of different Shapley value estimation techniques is conducted, underscoring TMC-Shapley for its notable efficacy. Furthermore, the exploration delves into the modelagnostic nature of Shapley value estimations, utilizing diverse machine learning models across distinct training phases. This practice not only demonstrates the versatility of Shapley value methods but also highlights their adaptability and generalizability across varied model architectures, reaffirming the significance of this approach in the broader context of machine learning research. The holistic approach and findings presented herein serve as a robust foundation for future explorations and optimizations in the realm of data valuation, paving the way for more nuanced and efficient methodologies
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Yilu Yang "Bridging efficacy and efficiency: Innovations in Shapley value estimation for model-agnostic data valuation in machine learning", Proc. SPIE 13077, Fourth International Conference on Signal Processing and Machine Learning (CONF-SPML 2024), 1307708 (1 April 2024); https://doi.org/10.1117/12.3027116
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Education and training

Machine learning

Data analysis

Performance modeling

Data processing

Matrices

Back to Top