Explainable Gradient-Boosting Classifier for SQL Query Performance Anomaly Detection

Authors

  • Srikanth Gorle Foot Locker, USA Author
  • Jessy Christadoss SiriusXM Radio, USA Author
  • Swaminathan Sethuraman Visa, USA Author

Keywords:

SQL performance, execution plans, anomaly detection, XGBoost, SHAP values, join order

Abstract

This article illustrates how to use XGBoost, an explainable gradient-boosting classifier, to extract structured features from execution plans to find SQL query performance concerns. The suggested technique creates a high-dimensional feature space using execution plan parameters including join order, predicate selectivity, predicted cardinality, memory grants, operator parallelism, and cost-based optimiser choices to capture subtle performance signals. These variables train an XGBoost model to identify baseline executions from anomalous and low-performing queries. Ensemble learning methods are ambiguous since SHAP values predict plan attributes. This helps database administrators find and fix the reason. CVS-scale distributed data infrastructure telemetry logs give the model an F1 score of 0.92. Human triage effort is cut in half. The findings show that database performance monitoring pipelines can use explainable machine learning.

Downloads

Download data is not yet available.

References

S. Chaudhuri and V. Narasayya, “Self-tuning database systems: A decade of progress,” Proc. VLDB, vol. 3, no. 1, pp. 3–14, 2007.

B. Ding, L. Kot, and J. Gehrke, “Informix under the hood: Enhancing query performance diagnostics with statistical models,” IEEE Data Eng. Bull., vol. 35, no. 3, pp. 27–34, 2012.

T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2016, pp. 785–794.

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Proc. 31st NeurIPS, 2017, pp. 4765–4774.

R. Marcus and O. Papaemmanouil, “Towards a hands-free query optimizer through deep learning,” arXiv preprint arXiv:1809.10212, 2018.

D. Halperin, D. R. Cheng, M. Armbrust, and J. M. Hellerstein, “Diagnostics for query performance regressions,” in Proc. ACM SIGMOD, 2016, pp. 1579–1590.

G. Li, “Self-driving database management systems: The roadmap,” in Proc. IEEE ICDE, 2019, pp. 1373–1384.

A. Pavlo, G. Angulo, J. Arulraj, et al., “Self-driving database management systems,” in Proc. CIDR, 2017.

M. Kipf, T. Kipf, V. Leis, A. Kemper, and T. Neumann, “Estimating cardinalities with deep sketches,” in Proc. ACM SIGMOD, 2019, pp. 1937–1940.

S. Krishnan, S. Wu, J. Franklin, and K. Goldberg, “Active learning for anomaly and rare-category detection,” in Proc. 25th IJCAI, 2016, pp. 1673–1679.

H. Patel and V. Raghavan, “Query performance prediction using machine learning,” in Proc. Int. Conf. Inf. Reuse Integr. Data Sci., 2020, pp. 257–262.

P. Li, D. Cheng, Y. Chi, and W. Wu, “QPPNet: Neural network-based query performance prediction using execution plan features,” in Proc. EDBT, 2021, pp. 307–318.

M. Stillger, G. M. Lohman, V. Markl, and M. Kandil, “LEO – DB2’s learning optimizer,” in Proc. VLDB, 2001, pp. 19–28.

V. Leis, A. Kemper, and T. Neumann, “The adaptive execution of compiled queries,” in Proc. VLDB, vol. 9, no. 11, pp. 928–939, 2016.

M. Bendre, S. Chakrabarti, and S. Sudarshan, “Finding dominant contributing causes of slow queries,” in Proc. IEEE ICDE, 2019, pp. 1110–1121.

K. Hellerstein, M. Stonebraker, and J. Hamilton, “Architecture of a database system,” Found. Trends Databases, vol. 1, no. 2, pp. 141–259, 2007.

P. Bailis et al., “Macrobase: Prioritizing attention in fast data,” in Proc. ACM SIGMOD, 2017, pp. 541–556.

J. Zou, M. Ermon, and L. Guibas, “Anomaly detection via minimum volume set estimation,” in Proc. NeurIPS, 2018.

R. Mahajan, A. Mahajan, and S. Sudarshan, “Tuning database configurations with reinforcement learning,” in Proc. ICDE, 2022, pp. 2554–2566.

S. Mirhosseini, M. S. Ismail, and T. Risch, “Real-time performance anomaly detection using query plan features,” in Proc. DOLAP, 2020, pp. 49–58.

Downloads

Published

01-01-2025

How to Cite

[1]
Srikanth Gorle, Jessy Christadoss, and Swaminathan Sethuraman, “Explainable Gradient-Boosting Classifier for SQL Query Performance Anomaly Detection ”, American J Cognit Comput AI Syst, vol. 9, pp. 54–87, Jan. 2025, Accessed: May 30, 2026. [Online]. Available: https://ajccai.org/index.php/publication/article/view/36