Presentation + Paper
6 June 2024 Machine learning-based real-time task scheduling for Apache Storm
Author Affiliations +
Abstract
Apache Storm is a popular open-source distributed computing platform for real-time big-data processing. However, the existing task scheduling algorithms for Apache Storm do not adequately take into account the heterogeneity and dynamics of node computing resources and task demands, leading to high processing latency and suboptimal performance. In this thesis, we propose an innovative machine learning-based task scheduling scheme tailored for Apache Storm. The scheme leverages machine learning models to predict task performance and assigns a task to the computation node with the lowest predicted processing latency. In our design, each node operates a machine learning-based monitoring mechanism. When the master node schedules a new task, it queries the computation nodes obtains their available resources, and processes latency predictions to make the optimal assignment decision. We explored three machine learning models, including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and Deep Belief Networks (DBN). Our experiments showed that LSTM achieved the most accurate latency predictions. The evaluation results demonstrate that Apache Storm with the proposed LSTM-based scheduling scheme significantly improves the task processing delay and resource utilization, compared to the existing algorithms.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Cheng-Ying Wu, Qi Zhao, Cheng-Yu Cheng, Yuchen Yang, Muhammad A. Qureshi, Hang Liu, and Genshe Chen "Machine learning-based real-time task scheduling for Apache Storm", Proc. SPIE 13062, Sensors and Systems for Space Applications XVII, 130620I (6 June 2024); https://doi.org/10.1117/12.3021842
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Machine learning

Distributed computing

Performance modeling

Data processing

Systems modeling

Design

Computing systems

Back to Top