Publications / CSCE/CRC 2025 - Montreal, Canada
The water distribution system supplies drinking water to households through service lines (SLs), which may contain lead, posing significant health risks, especially for children. Due to its durability and corrosion resistance, lead was widely used in residential service lines in the U.S., leaving many municipalities uncertain about the number and locations of remaining Lead Service Lines (LSLs). With updated health regulations and growing public concern, municipalities must replace LSLs, but challenges such as high replacement costs, complex tap water testing, and incomplete pipe inventories hinder efforts. This study addresses these uncertainties by applying a data mining approach to predict LSL locations. A DBSCAN clustering algorithm was used to identify priority areas, followed by the development and evaluation of three predictive models: Decision Tree (DT), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). KNN demonstrated strong performance with an F1-score of 99.0% in two out of three scenarios, while DT outperformed in one scenario with an F1-score of 99.8%. Given its consistency and high recall, KNN was integrated into a decision-making tool to help the municipality of Aurora, Illinois efficiently and accurately predict LSLs. This study provides a scalable framework for municipalities to identify and replace lead pipes, improving public health and resource allocation.