Assessment of compressive strength of eco-concrete reinforced using machine learning tools - Scientific Reports
R-squared (R2): R-squared measures the proportion of variance in the target variable that is explained by the model. It ranges from 0 to 1, with higher values indicating better fit.
As outlined earlier, a diverse range of machine learning models were successfully implemented using the PyCaret library. The results obtained from using these models are presented in Table 3. Notably, the Extra Trees Regressor algorithm emerged as the superior performer when compared to other algorithms, exhibiting remarkable accuracy and precision as evidenced by evaluation metrics for example, root mean square error (RMSE), mean square error (MAE), root mean square logarithmic error (RMSLE), and root mean square error (RM). For MAE, MSE, RMSE, and RMSLE, the corresponding findings were 0.1899, 0.2410, 0.4909, and 0.0988. Additionally, the high reliability of the model used in the investigation was indicated by the correlation coefficient (R2), which was found to be 0.9444.
where: RMSLE is the Root Mean Squared Logarithmic Error, MSE is the Mean Squared Error, RMSE is the Root Mean Squared Error, MAE is the Mean Absolute Error, |x| denotes the absolute value of x, is the correlation coefficient, n is how many samples there are in the dataset overall, is the target variable's real (true) value for the iβ-βth sample, is the target variable's anticipated value for the i-th sample.log () denotes the natural logarithm function, SSR is the total squared residual a measurement of the discrepancy between the actual and anticipated values SST is the total sum of squares, which is a measure of the total variation in the data.
The value can also be calculated using the following formula:
In both cases the value is a number between 0 and 1 where 1 indicates a perfect fit and 0 indicates no fit. A higher value indicates a better fit.
Evaluation metrics in regression models provide different insights into how well the model is performing in terms of predicting continuous numerical outcomes. Each metric captures different aspects of model accuracy and error, and considering multiple metrics is crucial for obtaining a comprehensive understanding of the model's performance.
Using R (Coefficient of Determination) as the primary metric for comparing model performance in predictive modeling tasks has both advantages and limitations. Let's critically analyze the use of R and discuss its potential limitations in the context of predictive modeling.
The table displaying performance indicators for different regression models in machine learning Regression models are frequently assessed for performance using these criteria. Here is a quick rundown of each metric and the outcomes for the various models:
The Mean Absolute Error or MAE is a statistical measure of the average absolute differences between the expected and actual values. Lower values indicate better performance. The best performing models are LLAR (1.5969) and XGBoost (0.5077).
The mean squared error or MSE calculates the average squared differences between the actual and predicted values Lower values indicate better performance. The best model is XGBoost (0.6670); the worst is LLAR (4.0673).
The percentage of the dependent variable's variance that can be predicted using the independent variables is shown by the R-squared (R2) statistic A better match is indicated by higher values , Lasso Least Angle Regression (-0.0851) is the worst model, and Extra Trees Regressor (0.0988) is the best.
The Root Mean Squared Logarithmic Error or RMSLE is a logarithmic scale used to assess the accuracy of forecasts. Better performance is indicated by lower values. Models that perform the best are XGBoost (0.1710) and LLAR (0.4849).
Measuring the percentage difference between expected and actual numbers is called MAPE (Mean Absolute Percentage Error). Better performance is indicated by lower values. XGBoost (0.2103) is the best model; LLAR (1.1185) is the worst.
The time required to train each model is shown by the symbol TT (Sec) (Training Time in Seconds). XGBoost has the fastest training (0.1270 s) linear regression has the longest training (1.2200 s) In terms of MAE, MSE, RMSE, RMSLE, MAPE, and R2, XGBoost seems to be the top-performing model according to these measures. When selecting a model, it's crucial to take into account additional elements including model interpretability, computational resources, and the particular needs of your situation.
The Extra Trees Regressor algorithm operates by building an ensemble of decision trees with randomized feature selection and split points, leveraging bagging to improve robustness and reduce overfitting. In predicting compressive strength in clay concrete, its ability to handle nonlinear relationships, robustness to noisy data, and efficiency in model training make it a suitable choice. By leveraging ensemble averaging and randomization, the Extra Trees Regressor tends to perform well in applications where dataset sizes are moderate, and predictive accuracy and generalization are paramount. Thus, it is well-suited for optimizing predictions of compressive strength in clay concrete based on the dataset's characteristics and requirements.
Because XGBoost and other comparable sophisticated models are good at capturing complex correlations in the data, overfitting may happen when estimating the compressive strength of clay concrete. To make sure that the model generalizes successfully to new data instances, it is essential to keep an eye on performance metrics on both training and validation sets, use cross-validation, and apply regularization strategies. Through meticulous feature selection, hyperparameter tweaking, and thorough validation, overfitting hazards can be reduced, resulting in more accurate compressive strength predictions in real-world applications.
While complex models like XGBoost offer superior predictive performance, simpler models such as linear regression remain highly interpretable. PyCaret bridges this gap by providing tools and visualizations that enhance the interpretability of complex models like XGBoost through feature importance, partial dependence plots, and SHAP values. This interpretability is crucial in applications such as predicting compressive strength in clay concrete, where insights into the factors influencing predictions drive informed decisions and improvements in construction and materials engineering.
selecting the best model for predicting compressive strength in clay concrete involves navigating trade-offs between accuracy, training time, computational resources, and real-world application requirements. By carefully evaluating these factors and leveraging PyCaret's capabilities for model comparison and optimization, practitioners can identify models that strike an optimal balance between predictive performance and practical constraints.
Through careful implementation and validation procedures, practitioners can effectively manage associated drawbacks and harness the benefits of ensemble methods to significantly improve predictive performance for predicting compressive strength in clay concrete by utilizing PyCaret's capabilities in model selection, tuning, and ensemble integration.
understanding the trade-offs between Extra Trees Regressor, XGBoost, and Linear Regression in terms of predictive power, computational efficiency, and ease of implementation is crucial for selecting the most suitable model for predictive modeling tasks in both research and practical applications. By leveraging these insights, researchers and practitioners can make informed decisions to optimize model performance while meeting specific application requirements and constraints.
The results provide valuable insights into the performance of various regression models evaluated using different performance metrics. Here are some insights that can be drawn from the results:
Since it is a more practical and intelligible way to compare the performance of different ML models, we selected the R2 metric as the primary metric index in the analysis that follows. Prediction accuracy is measured by a statistic called R2, and a high value for this metric indicates that a model has performed well in terms of prediction accuracy. When the values are less than 0.04 for the compressive strength, the machine learning model fits the data satisfactorily.
While the Extra Trees Regressor demonstrated the best performance in predicting the compressive strength of concrete, achieving the highest coefficient of determination (Rβ=β0.9444) and the lowest error metrics (RMSEβ=β0.4909, MAEβ=β0.1899), the reasons behind its superior performance warrant further discussion.
The results highlight that the Extra Trees Regressor (ET) outperformed all other models in predicting concrete compressive strength (CCS), achieving the highest R (0.9444) and lowest error metrics (MAEβ=β0.1899, RMSEβ=β0.4909). While these metrics emphasize its accuracy and reliability, a deeper examination of its performance and underlying mechanisms is necessary to fully understand its superiority.
The compressive strength of concrete depends on highly nonlinear interactions among its constituents, such as cement, clay, sand, and fibers. The ET model excels in capturing these interactions due to its use of randomized decision trees that partition the data space iteratively. This allows the model to uncover subtle patterns and complex relationships that simpler models (e.g., linear regression) or less sophisticated ensemble methods struggle to detect.
The ET model inherently ranks the importance of input features, allowing it to focus on dominant predictors such as cement content and fiber properties. The high correlation observed between fiber content and cement (correlation coefficient = 0.9444) indicates their combined effect on CCS. ET's ability to account for these interactions enables it to provide robust and precise predictions.
ET incorporates random splits of data and features during the construction of decision trees, reducing the risk of overfitting -- a common issue in predictive modeling of small or highly variable datasets. This robustness is critical for modeling CCS, where experimental noise and material variability are common challenges.
By averaging predictions across multiple trees, ET reduces sensitivity to outliers and ensures stability in its predictions. This property is particularly beneficial for datasets with high variability in feature distributions, such as those involving varying proportions of concrete constituents.
The superior performance of ET underscores its ability to serve as a reliable tool for CCS prediction, reducing reliance on resource-intensive laboratory experiments. This capability is particularly valuable for optimizing concrete formulations and quality control in real-world applications.
By emphasizing feature importance and capturing nonlinearity, ET provides insights into the critical factors driving compressive strength. For instance, the model's performance highlights the significant roles of cement content, fiber characteristics, and their interactions, offering a pathway for targeted material optimization.
The results affirm the utility of advanced machine learning methods like ET in addressing complex material science problems. This contributes to a growing body of evidence supporting the integration of AI-driven techniques in predictive modeling for construction materials.
This study not only demonstrates the applicability of ET in CCS prediction but also provides a framework for evaluating and understanding the underlying relationships between input features and material properties. By revealing the strengths of ensemble methods in handling nonlinearity, feature interactions, and variability, it paves the way for more sophisticated approaches in modeling other material properties. Future research could explore integrating domain knowledge into feature engineering or hybrid models to further enhance predictive accuracy and interpretability.
By providing these insights, the findings not only validate the use of the Extra Trees Regressor as a robust predictive tool but also advance our understanding of the factors governing concrete compressive strength, offering potential pathways for optimization in material design and construction.
In summary, the results provide valuable guidance for selecting an appropriate regression model for predicting compressive strength in earthen concrete. While XGBoost stands out as a top-performing model, it's essential to consider various factors and trade-offs to make an informed decision based on the specific requirements and constraints of the problem at hand.
Table 4 shows a dataset with various input variables and output variables to predict compressive strength using machine learning. Here is a breakdown of the columns in our dataset.
Output variable target (Compressive strength Y1): Compressive strength of earth concrete this is a variable that we want to predict using machine learning, Label Prediction using ML means that the compressive strength is predicted using a machine learning model. (This column contains the actual predictions made by a model, We have a dataset with different samples (for example, samples 177, 69, 273, etc.) and the corresponding values for these entities and the target variable. Using the provided input features, we can create a machine learning model to predict compressive strength using this dataset. The "FIBER TYPE" column also suggests that the type of fiber used may be an important categorical characteristic in our predictive model. Please note that when we work with this dataset, it needs to be split into a training set and a testing set so that we can evaluate the effectiveness of our machine, learning model. Additionally, data preprocessing and feature engineering is required before training our model.
The following outcomes are displayed in the tables as performance measures for the Extra Trees Regressor regression model:
RMSE (root mean square error): 0.4909, R2 (R squared): 0.9444, MAE (mean absolute error): 0.1899, MSE (root mean square error): 0.2410 and RMSLE (root mean square logarithmic error): 0.0988 Regression models are frequently assessed for performance using these criteria. The model appears to have performed well in this instance. The model predictions are, on average, fairly close to the actual values, as indicated by the low MAE and MSE values.
The RMSE provides a measure of prediction error, and a value of 0.4909 suggests relatively small errors. The low RMSLE suggests that the model's predictions on a logarithmic scale are accurate It is less than or similar to previous studies. The high R2 value of 0.9444 indicates that the model explains a significant portion of the variance in the target variable and fits the data well. Overall, these results suggest that the Extra Trees Regressor model performed well in this context. However, it is essential to consider the specific problem and domain when interpreting these results, as different applications may have different requirements for model performance See Table 5 above.
the results in Table 5 provide a comprehensive evaluation of the effectiveness of the Extra Trees Regressor approach in estimating compressive strength. They offer insights into the model's predictive accuracy, variability, and overall fit to the data, helping stakeholders make informed decisions about its suitability for the task at hand.
The manuscript briefly notes that the results obtained in this study are comparable to those of prior studies, but it lacks a detailed discussion that situates these findings within the broader context of research on concrete compressive strength (CCS) prediction. A comprehensive comparison with existing literature would significantly enhance the manuscript, underscoring the novel contributions and improvements this work offers.
Previous studies have consistently demonstrated the superior performance of ensemble methods like Random Forest (RF) and Gradient Boosting Regressor (GBR) for CCS prediction. For instance, reported an R of 0.82 using RF, which aligns with the performance of the RF model in this study (Rβ=β0.7780). Similarly, the performance of Gradient Boosting (Rβ=β0.7901) is consistent with values reported in comparable works, reinforcing the reliability of ensemble methods.
Studies that employed nonlinear models such as XGBoost and Support Vector Machines (SVM) have reported robust performance metrics, particularly in datasets with complex interactions among features. This aligns with the findings in this study, where XGBoost achieved a strong R of 0.8191, comparable to results reported by.
This study identifies the Extra Trees Regressor (ET) as the best-performing model, achieving an R of 0.9444. This significantly surpasses the performance metrics reported for ensemble models in prior research. The novelty lies in demonstrating ET's capability to not only match but exceed the predictive accuracy of more commonly used ensemble methods like RF and GBR. This is a notable contribution, as ET's potential in this domain has been underexplored.
While earlier works have highlighted the importance of individual features (e.g., cement content or water-to-cement ratio) in CCS prediction, this study emphasizes the interactions between multiple factors, such as fiber content and cement proportion. The high correlation coefficient (0.9444) between these features, combined with ET's ability to capture their nonlinear interplay, offers deeper insights into the determinants of CCS.
The integration of PyCaret to streamline and optimize the model selection and evaluation process is a novel approach in this context. While prior studies often rely on manually tuned models, this study demonstrates the advantages of leveraging automated tools to enhance efficiency and reproducibility.
The R of 0.9444 achieved by ET in this study is among the highest reported in the literature for CCS prediction, indicating a significant improvement in model precision and reliability.
The use of automated hyperparameter tuning via PyCaret and genetic algorithms (GA) reduces the time and effort required for model development compared to traditional manual approaches.
By comparing multiple algorithms and emphasizing feature importance, this study provides a more comprehensive framework for CCS prediction, applicable to various concrete formulations and experimental conditions.
By situating these findings alongside existing research, this study highlights its contributions to the growing field of machine learning applications in civil engineering. The demonstration of ET's superior performance, coupled with the methodological innovations in AutoML, underscores the potential for further advancements in predictive modeling. Such a detailed comparison not only validates the findings but also positions this work as a valuable reference for future research on CCS prediction.
A key component of developing predictive models such as those that forecast concrete's compressive strength, is feature importance. Selecting features and eventually interpreting the model more effectively can be aided by knowing which features -- also referred to as variables or predictors -- have the greatest influence on the goal variable or compressive strength. The following stages outline how to assess the significance of a feature in order to forecast the compressive strength of crushed earth blocks (Fig. 5).
Figure 5 likely illustrates the process of determining feature importance in predicting compressive strength of compacted earth blocks. Here's an elaboration on each step depicted in Fig. 5:
A number of variables, such as the production process and the composition of the earth mixture, affect the compressive strength of compressed earth blocks (CEBs). The compressive strength of CEBs can be greatly impacted by various materials and their ratios. The following are important things to keep in mind while evaluating how various materials affect the compressive strength of CEBs. Plot histograms for each variable's impact on concrete's compressive strength are displayed in Fig. 6.
the insights gained from Fig. 6 can guide construction practices towards producing high-quality, durable, and cost-effective compressed earth blocks suitable for a wide range of construction applications.
The heatmap would show the correlation coefficients between all the variables in the dataset as a color scale, the association is stronger the darker the color, the association is less the lighter the color, the diagonal of the heatmap would show the correlation coefficients of each variable with itself. These correlation coefficients are always 1.
The key points of the heatmap would be:
An effective tool for comprehending the intricate correlations between the various variables used to estimate compressive strength is the correlation heatmap. Through the application of these insights, practitioners can enhance the precision and dependability of predictive models for compressive strength estimation by making well-informed decisions during the model building, feature selection, and data analysis phases.
With this study, we can arrive at the following most important and striking results:
Cement plays a very important role in increasing the compressive strength of (CEB), and for the result to be ideal, the percentage of cement must be limited to 10 to 15% of the mass of (CEB). Fibers also play a very important role in cohesion of (CEB) and increasing compressive strength, but artificial fibers remain better than natural fibers because artificial fibers are durable and last for decades, while natural fibers have a long lifespan Life limit. However, natural fibers remain the most used due to their availability and their good, cheaper price. And the best percentage for reinforcing concrete with fibers is limited to between 1 and 2%. Above this percentage, fiber can play a more negative than positive role. The components of the concrete floor should also be as follows: clay 5 to 25%, sand between 50 to 75%, silts between 5 to 15% it's given proportions to obtain an ideal earth concrete. We can represent this data in a pyramid according to the priority and the amount of effect in the earth concrete. The most important effect is the highest. The results are briefly represented in the Fig. 8 below.
The priority pyramid depicted in Fig. 8 offers a structured framework for guiding decisions regarding the ideal ratios of components in earth concrete to achieve optimal compressive strength. Here's how the priority pyramid aids in this process :
Overall, the priority pyramid in Fig. 8 serves as a valuable tool for guiding decisions regarding the ideal ratios of components in earth concrete for optimal compressive strength. It helps stakeholders focus their efforts on the most influential factors while providing flexibility for adaptation and customization to meet project-specific requirements.
In Fig. 8 above, the proportions optimal % content of each materials A were obtained through artificial Al algorithms Through analyzing the tables, figures and data obtained in the model used in this study.
Since the machine model showed great potential in determining the compressive strength without requiring laboratory experiments, we can conclude from this study that artificial intelligence has become indispensable in the field of civil engineering, particularly in determining the compressive strength of concrete. This conclusion is supported by previous research as well as this one. Notwithstanding all of these benefits, there are still gaps in the laws and policies governing the application of AI in civil engineering.
Selecting appropriate input variables is crucial for predictive modeling of compressive strength in clay concrete (or any material). Each variable chosen should contribute meaningfully to the model's ability to accurately predict compressive strength. Here's how the chosen variables -- age, fiber percentage, fiber length, sand, silt, cement, and fiber tensile strength -- contribute to the accuracy and reliability of the PyCaret model:
The development of precise and dependable models for the predictive modeling of compressive strength in clay concrete depends critically on the suitable selection of input factors. The age, fiber qualities, sand, silt, and cement content are among the variables that are selected, and each one is important in influencing the mechanical and structural features of concrete. It is ensured that the produced models are not only accurate but also capable of offering practical insights into optimizing concrete mix designs for particular strength requirements by PyCaret's ability to incorporate these variables and assess their influence on model predictions. Therefore, thorough evaluation of these factors improves the predictive models produced using PyCaret for compressive strength in clay concrete applications in terms of both accuracy and dependability.
Ach technique -- PyCaret's interactive explanation tool, SHAP, and LIME -- has its strengths and suitability depending on the interpretability needs of the application. PyCaret's tool is convenient for users within its ecosystem, while SHAP and LIME offer broader model-agnostic capabilities for detailed interpretability needs in various contexts. The choice often depends on the specific goals of interpretability and the complexity of the model and data being analyzed.
Improved Compressive Strength: The study offers important recommendations for improving the structural performance of earth concrete by identifying the variables that most affect compressive strength, such as cement content, fiber reinforcing, and material proportions. Elevating these variables can result in increased compressive strength, enhancing the longevity and load-carrying capability of earth-based constructions.
Enhanced Compressive Strength: The study offers helpful advice for improving the structural performance of earth concrete by identifying the most important parameters that affect compressive strength, such as cement content, fiber reinforcing, and material proportions. By maximizing these variables, earth-based constructions' durability and load-bearing capacity can be increased through higher compressive strength.
Although there are drawbacks to using a small dataset to predict the compressive strength of clay concrete, such as restricted data representation and the possibility of overfitting, these drawbacks can be avoided by applying regularization strategies, ensemble methods, data augmentation, and robust validation techniques with consideration. These mitigating techniques can be used to create machine learning models that perform well in terms of generalization, prediction accuracy, and dependability -- all while requiring a minimal amount of training data. It is imperative to acknowledge the underlying limitations of the dataset and evaluate the model's predictions in light of these constraints.
Feature engineering plays a pivotal role in enhancing the performance of machine learning models by transforming raw data into more informative features that better capture the underlying relationships and patterns in the data. In the context of predicting compressive strength in clay concrete using PyCaret models,
Cost-Effectiveness By striking a balance between performance needs and cost considerations, the study's recommendations for ideal material compositions and proportions can aid in the optimization of construction processes. For instance, construction projects can reach required strength levels while minimizing material costs by defining the appropriate ranges for the quantities of clay, sand, and silt.
Sustainability The study's recommended ideal material compositions and proportions can help promote sustainable building techniques. It is possible to make earth concrete building more resource- and environmentally-friendly by using locally accessible resources to the fullest extent possible and reducing the usage of cement and other large-impact products.
Standardization and Quality Control The results of the study offer a foundation for uniform material requirements and quality assurance procedures in the building of earth concrete structures. Earth-based structure manufacturing can be made consistent and reliable by setting explicit rules for material selection and proportioning in construction methods.
Making Well-Informed Decisions: The study provides construction professionals with important knowledge on the performance traits of various material compositions and proportions. Better results are achieved overall when this knowledge is used to inform decisions made during the design, building, and maintenance stages of earth concrete structures.
All things considered; the study's conclusions provide useful recommendations for enhancing building procedures in earth concrete applications. Construction projects can achieve enhanced performance, cost-effectiveness, sustainability, and risk mitigation in the building of earth-based structures by putting into practice the suggested material compositions and proportions.
Overall, the study's conclusions show how machine learning more specifically, the Extra Trees Regressor approach can improve building materials by facilitating precise forecasting, material composition optimization, effective design iterations, identification of crucial variables, and generalizability of insights. Construction professionals may create more long-lasting, affordable, and easily available building materials that satisfy the changing demands of the built environment by utilizing machine learning.
Predictive modeling of compressive strength in clay concrete offers substantial benefits to civil engineering and construction by optimizing mix designs, improving structural integrity, and advancing construction practices. By leveraging these insights, stakeholders can achieve sustainable, resilient infrastructure that meets both current and future demands effectively.