1 INTRODUCTION
The tunnel boring machine (TBM) is widely used in tunnel excavation, especially for long and deep tunnels, due to its high efficiency and safety. To improve safety and efficiency, it is important to develop rapid and accurate methods to acquire the rock mass parameters. However, due to the restrictions posed by the complex machine structure and the narrow space of TBM, it might be difficult to use the in-field rock mass parameter testing methods applicable to the surface in this environment. Although many researchers have studied rock mass parameter testing methods applicable to the TBM environment, such as Poisel et al. (2010), Naeimipour et al. (2016), Wang, Gao et al. (2020), Goh et al. (2011), Kong and Shang (2018), Liu et al. (2018), Xu et al. (2023), Lussu et al. (2019), and Cordes et al. (2019), most of these methods focus on testing or analysis process, which makes it difficult to acquire the data at the speed of TBM excavation.
To solve this problem, mapping is built between real-time TBM tunneling data and rock mass parameters with machine learning. For example, Fattahi and Babanouri (2017) compared the TBM performance prediction models established by data mining algorithms such as the gravity search algorithm, the differential evolution algorithm (DE), artificial bee colony, and support vector regression (SVR), and found that the combination algorithm of SVR and DE shows higher prediction accuracy. Salimi et al. (2018) used the tree regression algorithm (CART) to build the prediction model of the specific rock mass boreability index (SRMBI), and compared this method with multiple variable regression analysis. Their study proved that CART had significant advantages in predicting SRMBI. Li, Hu et al. (2022) and Li, Zhang et al. (2022) studied the influence of tunneling parameters of a single cutter, such as penetration and cutter space, on the rock-cutting efficiency, by full-scale linear cutting tests and numerical simulation. In addition, Armaghani et al. (2017), Zare and Naghadehi (2017), Zare et al. (2018), Mahdevari et al. (2014), Liu et al., (2019, 2020), Yagiz and Karahan (2011), and Minh et al. (2017) applied machine learning algorithms such as artificial neural network, particle swarm optimization, fuzzy logic, and gene expression program for rock machine mapping, which yielded good results.
Previous studies generally used multiple field-measured data as training data. The prediction or evaluation targets are taken as output and known data as input. With regression or data mining, mappings between the input and output are established as the basis for evaluation. By including newly collected data in the mappings, the outputs, namely, the evaluated results of the targets, can be calculated.
These research results have been widely used to predict rock mass parameters and yielded acceptable accuracy. However, there are still problems in constructing mappings between rock mass parameters and TBM driving data. TBM is always exposed to complex rock mass conditions, which might be included in field-collected data. However, the changing rules between tunneling data and rock mass parameters vary with rock conditions. In this case, a consensus has been reached that a single or limited number of mappings may not be suitable for different rock mass conditions. Hence, some researchers have categorized known rock conditions and established mappings. For example, Gong et al. (2007, 2009, 2020) established multiple mappings between TBM penetration and the brittleness index, under different joint orientations, volumetric joint counts, and uniaxial compressive strengths. On this basis, they further modified the China hydropower classification (HC) method and obtained good evaluation results. Xue et al. (2018) proposed a dynamic rock mass classification method according to uniaxial compressive strength, intactness index, groundwater state, and initial geo-stress state, which is verified based on the YHJW project. As mentioned above, field sample classification is a feasible supplementary method to improve the applicability of a single regression model in multiple complex rock conditions. In the geotechnical field, researchers have attempted to evaluate rock mass parameters by clustering methods, and proved their feasibility. This kind of research can be classified into the following three groups according to the research targets.
The first group of research studies included direct classification of the rock condition into groups by clustering. Kitzig et al. (2017) tested the petrophysical and geochemical properties of multiple rock samples as clustering criteria, and proposed a rock classification method based on fuzzy C-means (FCM). Saeidi et al. (2014) proposed an adaptive neuro-fuzzy inference system (ANFIS) based on fuzzy C-means to evaluate the rock mass diggability index. On comparing results with the tested value, the ANFIS system showed higher accuracy than traditional regression. Rad and Jalali (2019) modified the rock mass rating (RMR), a rock classification system, by a fuzzy clustering algorithm, which was validated by the Sangan Iron Ore project.
The second group of research studies recognized discontinuities or structural plane on the rock surface. Liang et al. (2012) used K-means clustering to evaluate rock mass discontinuity attitude elements, and obtained good accuracy. Cui and Yan (2020) proposed an improved clustering method based on differential evolution, which performed well in terms of identification of rock discontinuity. Wang, Zheng et al. (2020), Li et al. (2021), (2014), and Gao et al. (2018) realized good performance on rock discontinuities and structural plane recognition using methods such as multidimensional clustering, fuzzy spectral clustering, Ant colony ATTA clustering, and clustering by fast searches and finding density peaks (CFSFDP).
The third group of research studies, which is also the most relevant to this research among the three groups of research studies mentioned, focused on predicting the mechanical parameters of rock mass. Majdi and Beiki (2019) established a data set including 205 field samples. According to the difference of the elasticity modulus, rock mass quality designation, and the geological strength index, a deformation modulus prediction model is well established. On the basis of the above-mentioned research, Fattahi (2016b) used RMR, uniaxial compressive strength (UCS), buried depth, and elasticity modulus as input for the deformation modulus. Moreover, Bashari et al. (2011) included joint frequency (m−1), porosity, and density as input.
All of the above research studies obtained acceptable results and therefore provide good references. However, most of them classify rock conditions or predict related parameters with several known rock mass parameters. This is almost impossible in TBM tunneling due to the difficulty in acquiring real-time rock mass parameters. This is one of the reasons why clustering methods are rarely used in TBM tunneling to evaluate the rock mass parameters. This paper makes full use of TBM tunneling data, and proposes the grouped prediction method of rock mass parameters based on rock-TBM mappings and clustering. In brief, field rock-TBM data are grouped into multiple clusters using the clustering method. By using samples in different clusters, multiple submodels are obtained, and each submodel shows better ability to predict the rock mass parameters of certain clusters. With the weighting of predicting results of multiple submodels, the prediction accuracy can be improved. In detail, the proposed method requires a series of field test data, usually with dozens to hundreds of samples, each containing target rock mass parameters, such as uniaxial compressive strength and joint frequency, and corresponding TBM tunneling data collected at the same mileage as the rock mass data. After data preparation and pretreatment, fuzzy C-means clustering is used to classify field-measured samples with similar tunneling data. After clustering, a membership degree matrix is obtained, which provides the membership degree of each sample belonging to each cluster. Samples in the same cluster always have similar tunneling data distribution, while the corresponding geological conditions that the samples are exposed to are always similar. On this basis, each submodel is trained by samples with the weights of their membership degree of a certain cluster, and multiple submodels are trained with pertinence, with TBM tunneling data as input and rock mass data as output. For test samples or newly encountered conditions, the field tunneling data can be used to calculate the membership degree of the sample to each cluster, which is used to determine the weights of results predicted by submodels. The weighted prediction results always show higher accuracy than rock-TBM mapping established by the pure machine learning method. In this study, 100 training samples and 30 test samples are collected from the C1 part of the Pearl Delta water resources allocation project. Fuzzy C-means clustering is combined with BP neural network (BPNN), SVR, and random forest (RF), and corresponding submodels and weighted models are trained by the 100 training samples. By comparing the accuracies of the 30 test samples that are predicted, respectively, by pure machine learning models and by models combined with fuzzy C-means clustering, the accuracy improvement effect of fuzzy C-means clustering on multiple machine learning methods is verified. In particular, the main novelty of this paper is use of clustering for grouping samples and improving the prediction accuracy by weighted submodels, instead of improving the performance of the machine learning method itself. The proposed method is a supplementary method for machine learning, and it can be used in conjunction with various existing machine learning methods.
The paper is organized as follows. The 2nd section introduces the proposed method, combining fuzzy C-means clustering with machine learning methods for improving the prediction accuracy of rock mass parameters. In the 3rd section, the collected field data used for training and testing the proposed method and their pretreatment process are introduced. In the 4th section, BP neural network combined with fuzzy C-means clustering is used as an example, and the corresponding models are built and tested by the field data. The 5th section discusses the application of the proposed method to different machine learning methods and harden clustering methods. The 6th section presents the conclusion.
3 CASE STUDY
3.1 Project overview
This research is based on the C1 part of the Pearl Delta water resources allocation project. This area is located in Dongguan, Guangdong Province, China, extending from Shaxi reservoir to the SL02# work well in Yangwu country. The tunnel goes from west to east. The 2# main cave was excavated by TBM, with a total length of 9.75 km and a diameter of 8.2 m, from mileage SL14+958 to SL5+213. Low mountains and hills are the main landform along the tunnel, which passes under multiple reservoirs and highways. The depth of the tunnel ranges from 50 to 270 m. The tunnel slope ranges from 20° to 30°, as this area is very high in the east. According to the hydropower classification (HC) method, the surrounding rocks along the tunnel mainly consist of class II and III rocks. The geological profile of the area under study is shown in Figure 2.
Geological profile of the area under study.
The tunnel goes through a stratum with multiple lithologies, and the rock conditions of different areas vary sharply. Along the tunnel, 130 samples were collected in different strata, and the uniaxial compressive strength, joint frequency, muck size, and corresponding TBM operating parameters of each sample were measured or tested in the field. Among the 130 samples, 100 samples were collected from mileage SL 10+560 to SL 8+780 and used to obtain the evaluated results by the proposed method as a training set. To verify the method and its results, the other 30 samples were collected from mileage SL 8+240 to SL 7+810, which composed the testing set.
In this paper, rock mass parameters, including UCS and joint frequency (Jf), are used as the prediction target. The two rock mass parameters were collected by field tests. UCS increased from 5.2 to 135.6 MPa and Jf ranged from 1.21 to 4.22 m−1. Considering the complex rock mass condition, the corresponding TBM tunneling parameters showed a sharp fluctuation, and this was recorded and used as the input of the prediction model. The TBM tunneling data were recorded from July 20, 2021 to January 10, 2022, with a frequency of 1 s−1, including 276 features. More than 15.2 × 106 records were collected. The basic information of the rock mass and the main TBM tunneling parameters is listed in Table 1.
Table 1. Basic information of the rock mass and the main tunnel boring machine (TBM) tunneling parameters.
Type |
Parameter |
Maximum |
Minimum |
Average |
Standard deviation |
Rock mass parameters |
Uniaxial compressive strength (MPa) |
135.6 |
5.2 |
60.8 |
29.2 |
Joint frequency (m−1) |
4.22 |
1.21 |
2.71 |
0.89 |
Main TBM tunneling data |
Thrust (kN) |
10 214.3 |
2561.5 |
5435.2 |
1532.1 |
Torque (kN · m) |
1824.0 |
199.3 |
840.4 |
332.2 |
Penetration rate (mm/min) |
82.6 |
8.1 |
46.7 |
17.6 |
Revolutions per minute (r/min) |
5.80 |
4.00 |
5.10 |
0.39 |
3.2 Pretreatment of the TBM tunneling data
Compared with the method of taking the field rock mass parameters as the output of the prediction model, which has only two values per sample, the method of taking the TBM tunneling data as input is more complex. The complexity of the TBM tunneling data arises due to their two dimensions. The first dimension can be called the time dimension. The TBM driving data are time-series data, which are collected at a frequency of 1 Hz. Therefore, each sample always contains more than 1000 series of tunneling data. The other dimension can be called the feature dimension. All tunneling data consist of 276 samples, which are too much for data mining. Considering the complexity of the field tunneling data, they should be pretreated and simplified. Essentially, the pretreatment process involves dimensionality reduction in the time and feature dimensions of the TBM tunneling data.
First, in the time dimension, the pretreated tunneling data should only contain one time node. In addition, it should satisfy three requirements: (1) the retained TBM tunneling data should be collected from or near the same area as that of the rock mass data; (2) the retained data corresponding to each rock mass sample should be collected in a limited range, and it is helpful to avoid sharp changes of the rock condition within this range; and (3) the retained data should be collected during TBM excavation, not at TBM stoppage.
According to the above requirements, this research collected multiple samples of TBM tunneling data, and each sample is collected with the same location with the corresponding rock mass samples. In this way, a long continuous area, in which the TBM tunneling data were collected, were divided into 100 parts with a length of 1 m, and the rock mass parameters were tested and collected in the center position of each part. This satisfies the first two requirements, but each part still includes both excavation and stoppage data, and the third requirement still needs to be met. Accordingly, the penetration rate, that is, the driving speed of a TBM, was used as the criterion for judging TBM excavation or stoppage. If the penetration rate of a series of TBM tunneling data is lower than 10 mm/min, the data are regarded to have been collected during TBM stoppage or trial driving. In this sense, these data are reduced. By contrast, the data with penetration rates higher than 10 mm/min would be retained. Figure 3 shows an example of data reducing and retaining.
Diagram of data reducing and retaining method.
After redundant tunneling data reduction, rock mass data and valid TBM tunneling data collected in the matched 100 areas are obtained. However, there is only one series of rock mass data in each 1-m part, while there are usually more than 1000 series of valid tunneling data, because they are sequential with a frequency of 1 s
−
1, and tunneling a 1-m-long area always takes more than 1000 s. In the proposed method, each tunneling parameter in a 1-m-long part is used as a feature of the sample, which should be a concrete value instead of sequential data including more than one value. Therefore, the average value is used to represent the sequential tunneling data of each, that is,
(7)
where
is the handle results of the
jth feature of the
ith sample;
x
ijk represents the value of
x
ij measured at the
kth second; and
n is the total number of seconds spent in the
ith part.
To reduce the feature dimension, two steps are conducted. First, normalization is used to transform the different features of in-field collected rock mass and TBM tunneling parameters to a similar range so as to remove their dimensions. However, there are large differences between the distribution rules of the 276 features. For example, data directly related to tunneling, such as thrust and torque, usually have a large positive value, and their magnitude is inconsistent. Data that are used to control the tunneling directions, such as horizon and vertical angles, have both positive and negative values, representing the different directions. Some data such as the motor current of the cutterhead are always stable and close to a certain value with small fluctuations.
Obviously, a simple normalization method might not be suitable for such complex and varying data. Therefore, a two-step method is proposed. The magnitudes and dimensions of all TBM tunneling data are removed by zero-mean normalization, and then they have the same range. Logistic normalization is then used to balance the fluctuation difference between features. Zero-mean and logistic normalization can be expressed by the below equations:
(8)
(9)
where
and
represent variables before and after normalization of the
ith sample;
µ
i and
σ
i denote the average value and variance of the
ith sample, respectively.
After normalization, principal component analysis (PCA) is used to reduce the total of 276 tunneling data features. The basic principle of PCA is to reorganize original variables to form new and independent variables through orthogonal transformation (Wu et al., 2020). First, the covariances among the 276 features are calculated to form a covariance matrix. Among them, covariances among some main features, including torque (Tor), revolution speed of the cutterhead (R), penetration (P), thrust (Th), and penetration rate (PR), are shown in Figure 4.
Covariances of some main features.
Figure
4 shows covariances between some important tunneling parameters. The greater the absolute value of the covariance, the greater the influence of the two features on each other. A positive covariance means that the two features increase or decrease together, while a negative covariance means that one of the two features increases, while the other decreases. In total, a 276-dimensional covariance matrix is obtained and recorded as
C. It has 276 eigenvalues and corresponding eigenvectors, which are recorded as
λ
i and
u
i, respectively, including the repeated one. Matrix
C,
λ
i, and
u
i, satisfies the below equation.
(10)
Therefore,
r eigenvalues with high average absolute values are selected, and the corresponding eigenvectors are regarded as the new and independent variables used in clustering. The value of
r is a key factor in this procedure. If
r is too small, this might lead to serious loss of feature information, while too large a value results in redundant variables and increases the computational process. The proportion of accumulated information is used to determine the number of handled features, and can be calculated by the following equation (Wu et al.,
2020):
(11)
where
PI
l is the proportion of accumulated information from the 1st to
lth handled features;
k
i represents the eigenvalues of the
ith handled features; and
n denotes the dimension of the covariance matrix, and also the total number of original features. Usually, as few handled features as possible should be reserved, and the information and the
PI value should be higher than 0.95. The eigenvalues of each handled feature and the proportion of accumulated information are shown in Figure
5.
Eigenvalues of each handled feature.
As shown in Figure 5, only three features with the highest eigenvalues should be reserved, and the dimension of the tunneling data is reduced from 276 to 3. This shows that this method can effectively simplify the TBM tunneling data and provide a processable data set for clustering.
4 RESULTS
4.1 Fuzzy C-means
After data pretreatment, the number of TBM tunneling data features is reduced to three, and the three features are used as the clustering basis of the samples. Before conducting fuzzy C-means, the cluster number
c should be determined according to the data set characters. Considering the clustering target, making the distance between clusters as large as possible and the distance between samples in the same cluster as short as possible, the below equations are used to determine the most suitable cluster number
c.
(12)
(13)
(14)
To evaluate the clustering effect and determine the most suitable c, Equations (12–14) are constructed using the silhouette coefficient of K-means clustering (Sinaga & Yang, 2020; Yang & Sinaga, 2019). In Equations (12–14), aj represents the average distance between the jth sample and other samples in the same group; bj represents the average distance between the jth sample and samples in different groups; and Ni represents the sample number in the ith cluster. Different from the silhouette coefficient calculation in K-means clustering, fuzzy C-means is a soft clustering method, and the membership degree of the jth sample belonging to the ith cluster is recorded as
, which is included in the calculation of aj and bj. S is the silhouette coefficient of the total data set with n samples. It is also the average of the silhouette coefficients of the n samples. A value of S closer to 1 indicates better clustering results, because there is a large distance between samples in different groups and a small distance between samples in the same group.
To determine the most suitable cluster number c, initial centroids among the 100 training samples are randomly selected and the optimal silhouette coefficients with cluster numbers from 2 to 10 are used. To avoid local optimization, fuzzy C-means are conducted five times with different random initial centroids for each c, and the maximum of their silhouette coefficients is used to assess the rationality of c. The maximum silhouette coefficient of each c value is shown in Figure 6, from which it can be seen that the silhouette coefficient reaches its maximum of 0.711 when c is selected as 4. Then, the fuzzy C-means algorithm is coded by anaconda; the fuzzification parameter m is set as 2; and the maximum iteration is set as 200. The fuzzy C-means algorithm is programmed by Anaconda 2.7, and the code structure is shown in Figure 7. The coded fuzzy C-means algorithm is used to group the 100 training samples into four clusters, and the changing process of the loss function J with the iterations is shown in Figure 8.
Silhouette coefficient under different numbers of clusters
c.
Code structure of the fuzzy C-means algorithm.
Loss function changing process.
After 54 iterations, the centroids of each cluster remain stable, and the loss function J reaches its minimum of 5.25. The 100 training samples are divided into four clusters, while each training sample is assigned a combination of membership degree of the four clusters, which is shown in Figure 9.
Membership degree of each training sample.
4.2 Prediction results of the four BP neural network-based submodel
In Section
4.1, the 100 training samples are grouped into four clusters. In the clustering results, samples with a similar distribution of membership degree are close to each other, and the corresponding submodel is relatively suitable for similar testing samples in theory. Therefore, on the basis of the clustering results, weighted multiple submodels may improve the prediction accuracy of the machine learning method. To verify the above hypothesis, the back propagation (BP) neural network is taken as an example. To be specific, BP neural network is programmed by Weka 3.8.6, a machine learning workbench, which provides an extensive collection of machine learning algorithms and data preprocessing methods. There are three key hyperparameters in BP neural networks, namely, activation functions, number of layers, and neuron number of each layer. Considering the limited sample size and feature size of the data, a network with only one hidden layer, and three layers in total, is used. The numbers of input layer, hidden layers, and output layer neurons are represented by
I,
H, and
O, respectively.
I and
O are consistent with input and output features, which equal (3) and (2), respectively, and
H is set to 6. The activation function determines the relationship of variables between input and hidden layers, and hidden and output layers. This research follows the widely used activation function, which is shown in Equation (
15). With networks with the mentioned structure and hyperparameters, the 100 training samples are directly used to train a BP neural network-based model with same weights, and it is tested by the 30 samples in the testing set. The prediction results are shown in Figure
10.
(15)
Prediction results of the 30 testing samples by a pure back propagation neural network-based model. (a) Results of uniaxial compressive strength. (b) Results of
J
f.
In this paper, the average percentage error
E of each sample is used to measure the predicted accuracy as
(16)
(17)
where
represents the percentage error of
ith samples;
and
represent the actual and predicted value of
ith samples, respectively. As shown in Figure
10, the trend between the actual value and the predicted value broken line is relatively consistent. The average percentage error of
UCS and
J
f predicted by the BP neural network without fuzzy C-means is 13.62% and 12.38%, respectively. The highest percentage error of all samples is 23.6% (24# sample of
UCS) and 24.2% (11# sample of
J
f). On the whole, the BP neural network-based model performs well on the 30 testing samples and achieves acceptable accuracy.
To verify the advantages of fuzzy C-means, training samples are weighted according to their membership degree to each cluster, and four corresponding BP neural network-based submodels are built, according to Equations (5)–(6). The four submodels are used separately on the 30 testing samples, and the UCS and Jf prediction results are shown in Figures 11 and 12, respectively. In particular, to clearly observe the effect of different submodel on samples with different membership degrees, the testing samples listed in Figures 11 and 12 are reordered according to their membership degrees. In detail, the first six samples (from 1# to 6# sample) have the highest membership degree on the 1st cluster, the next eight samples (from 7# to 14# sample) and seven samples (from 15# to 21# sample) have the highest membership degree on the 2nd and 3rd cluster, and the last nine samples (from 22# to 30# sample) have the highest membership degree on the 4th clusters. The average prediction accuracies of the four submodels on the four kinds of testing samples are listed in Table 2.
Uniaxial compressive strength (
UCS) prediction results of the testing samples by the four back propagation (BP) neural network-based submodels. (a) Results of the 1st submodel. (b) Results of the 2nd submodel. (c) Results of the 3rd submodel. (d) Results of the 4th submodel.
Joint frequency prediction results of the testing samples by the four back propagation (BP) neural network-based submodels. (a) Results of the 1st submodel. (b) Results of the 2nd submodel. (c) Results of the 3rd submodel. (d) Results of the 4th submodel.
Table 2. Average percentage error of the four submodels on testing samples.
Prediction target |
Serial number |
Average percentage error (%) |
1–6 # samples |
7–13 # samples |
14–20 # samples |
21–30 # samples |
Total |
Uniaxial compressive strength (MPa) |
1st |
6.94 |
12.22 |
17.35 |
15.77 |
13.54 |
2nd |
22.10 |
3.72 |
11.41 |
16.95 |
13.60 |
3rd |
10.06 |
15.62 |
6.31 |
18.40 |
13.26 |
4th |
12.04 |
11.68 |
18.67 |
7.93 |
12.13 |
Jf (m−1) |
1st |
5.10 |
16.58 |
15.54 |
12.88 |
12.81 |
2nd |
19.82 |
6.38 |
15.06 |
12.28 |
13.06 |
3rd |
19.69 |
14.13 |
7.41 |
14.09 |
13.66 |
4th |
10.21 |
14.70 |
10.35 |
6.52 |
10.06 |
As shown in Figures 11 and 12, there is relatively good consistency between the actual value and the value predicted by the four submodels. In addition, the UCS average percentage errors predicted by the four BP neural network-based submodels are 13.54%, 13.60%, 13.26%, and 12.13%, while the Jf average percentage errors predicted by the four submodels are 12.81%, 13.06%, 13.66%, and 10.06%, respectively. Compared with the model built without fuzzy C-means, there are no significant improvement in the prediction accuracy of the four submodels. Due to the significant imbalance in sample error, the prediction results vary considerably, although the average percentage errors of the four submodels are close to each other. In detail, each submodel is relatively good at predicting samples with high membership degree to the corresponding clusters. The 2nd submodel can be taken as an example. Samples from the 7th to 13rd clusters have higher membership degree to the 2nd cluster, while the error predicted by the 2nd submodel of their UCS and Jf is not more than 6.35% (13rd sample) and 10.20% (9th sample), respectively. The prediction results of the other three submodels also verify this rule. In other words, it is found that each submodel is good at predicting samples with high membership degree to the corresponding cluster, but the total accuracy still needs to be improved. However, the imbalance in sample error provides the possibility of improving accuracy through the weighted submodels.
After the four submodels have been weighted, a grouped BP neural network-based model is established. The ith sample can be taken as an example. Its membership degrees to the four clusters are recorded as
,
,
, and
, while the output of the ith sample predicted by the grouped model is calculated using Equation (6), and the prediction results of the 30 testing samples by the grouped BP neural networks-based model are shown in Figure 13.
Prediction results of the hybrid model weighted by the four submodels. (a) Results of uniaxial compressive strength (
UCS) and (b) results of
J
f.
As shown in Figure 13, the average percentage errors of the 30 testing samples predicted by the hybrid model are 7.66% (UCS) and 6.40% (Jf). Compared with the model without fuzzy C-means clustering, in which the error reaches 13.62% (UCS) and 12.38% (Jf), the accuracy of the model built by the proposed method shows a significant improvement. As mentioned above, there are considerable differences between the errors of different samples predicted by the four submodels. On the contrary, the results predicted by the hybrid model are relatively stable; the largest UCS and Jf error of the 30 samples are 12.29% (24th sample) and 13.18% (17th sample), respectively. The results confirm that the hybrid model can make full use of the advantages for the four submodels predicting different samples, and the prediction accuracy and stability have been improved, compared with the model without fuzzy C-means clustering.
5 DISCUSSION
5.1 Adaptability of fuzzy C-means to multiple machine learning algorithms
In Section 4.1, the 100 training samples are grouped into four clusters. Theoretically, the clustering results can be used in combination with multiple machine learning algorithms to improve the accuracy. In the 4th section of this paper, BP neural networks are used to verify the effect of the fuzzy C-means clustering in improving the prediction accuracy. In this section, the applicability of the fuzzy C-means to other machine learning algorithms is further discussed using two machine learning algorithms: SVR and RF. The verification process of the two algorithms is similar to that of the BP neural network. First, 100 training samples are directly solved by SVR and RF with equal weights in Weka 3.8.6, which is regarded as the original model and recorded as models “SVR” and “RF.” Then, according to the clustering results obtained by fuzzy C-means, four SVR- and RF-based submodels are established by samples with the weights of their membership degree. Finally, the four submodels are combined with the hybrid models, which are recorded as models “FCM-SVR” and “FCM-RF.” The SVR- and RF-based original models, submodels, and hybrid models are tested by the 30 testing samples, and their prediction accuracies are calculated (Figures 14 and 15).
Comparison between results predicted by the normal support vector regression (SVR)-based model and the FCM- and SVR-based model. (a) Percentage error of uniaxial compressive strength (
UCS), (b)
R
2 of
UCS, (c) percentage error of
J
f, and (d)
R
2 of
J
f.
In terms of SVR, there are three key hyperparameters, namely, the kernel function, the penalty parameter (C), and the insensitive parameter (
). In this research, the most widely used kernel function, poly kernel, is adapted to build the models. In addition, C is used to control the weight between the accuracy and complexity of the model. Higher C always leads to a model with higher accuracy and complexity, and for this model, overfitting is easier.
determines the smooth degree of the model, and only samples with error higher than this are used to update the model. By exhaustion, C and
are set as 50 and 0.05, respectively.
The pure SVR-based model and the SVR-based model improved by the fuzzy C-means results show acceptable predictability. However, the UCS and Jf percentage errors predicted by the pure SVR-based model are 12.73% and 10.23%, while those predicted by the SVR-based model improved by fuzzy C-means results are 6.61% and 6.17%, respectively. In particular, there are 18 UCS samples and 16 Jf samples whose errors predicted by the pure SVR-based model are more than 10%. On the contrary, the errors of most samples predicted by the SVR-based model, which is improved by fuzzy C-means results, remain below 10%. Specifically, only five UCS samples and eight Jf samples have errors more than 10%. In addition, the R2 between the actual and predicted value of the pure SVR-based model is 0.915 (UCS) and 0.760 (Jf), while the R2 of the improved SVR-based model is higher, which is 0.970 (UCS) and 0.901 (Jf), respectively. These results show that the fuzzy C-means results do improve the prediction effect of the SVR model (Figure 14).
Comparison between results predicted by the normal RF-based model and the FCM-random forest (RF)-based model. (a) Percentage error of uniaxial compressive strength (
UCS), (b)
R
2 of
UCS, (c) percentage error of
J
f, and (d)
R
2 of
J
f.
The main hyperparameters of RF include the number of decision tree, minimum samples to split, and maximum of features. Among them, the maximum of features consists of the features number, and it is set as 3 after pretreatment. The number of decision tree and minimum samples to split are set according to the sample size. Due to the fact that only 100 samples are used, the number of decision tree is determined to be 3, and the minimum samples to split is determined to be 10. The application of the fuzzy C-means clustering to the RF also shows similar characteristics as in the case of SVR. Combined with the clustering results of fuzzy C-means, the R2 of the UCS predicted result is improved from 0.868 to 0.966, while the R2 of Jf is improved from 0.740 to 0.911. In addition, the average percentage errors of UCS and Jf are reduced from 13.73% to 5.72% and from 15.47% to 7.09%, respectively (Figure 15). The results proved the effect of the fuzzy C-means clustering in improving the prediction accuracy of multiple machine learning models, including but not limited to BP neural network, SVR, and RF.
5.2 Differences between harden clustering and fuzzy clustering methods
The above results have proved that the fuzzy C-means clustering results have a positive influence on the prediction accuracy by machine learning models. It needs to be discussed whether harden clustering methods can replace the fuzzy C-means clustering method. Different from the fuzzy clustering method, in which the membership degree matrix is used to express the clustering results, in the harden clustering method, each sample clearly belongs to a certain cluster. Combined with the harden clustering method, the training and testing processes of the machine learning are partly changed. According to the results of harden clustering, the training and testing samples are divided into C clusters. In this section, the number of C is taken as 4 in order to maintain consistency with that of the fuzzy C-means clustering method used in this paper. In total, C clusters of training samples are trained to form C submodels by machine learning methods. Correspondingly, the testing samples are also grouped into C clusters, and each sample is tested by the corresponding submodel, while the average predicted errors of the 30 testing samples are used to measure the accuracy of models with harden clustering methods and compare with the one obtained by the fuzzy C-means clustering method.
In this section, K-means clustering is used, and its clustering results are combined with the above-mentioned machine learning methods. K-means clustering is one of the typical harden clustering methods, and its principle is not described in this paper. To verify the K-means clustering results, the clustering results obtained by fuzzy C-means are hardened, and the 100 training samples and 30 testing samples are listed according to their clusters. In detail, for each sample, the cluster with the largest membership degree is regarded as the cluster that the sample belongs to. As shown in Figure 9, for the 15th sample, which belongs to four clusters as an example, the membership degrees of the sample are 0.62, 0.08, 0.09, and 0.21. Therefore, the 15th sample is divided into the first clusters, and also for other samples. The training set results of harden fuzzy C-means are as follows: samples from the 1st to the 25th belong to cluster 1, samples from the 26th to the 49th belong to cluster 2, samples from the 50th to the 78th belong to cluster 3, and samples from the 79th to the 100th belong to cluster 4. Correspondingly, the 30 testing samples are also divided into the four clusters by harden fuzzy C-means. Specifically, samples from the 1st to the 6th belong to cluster 1, samples from the 7th to the 13th belong to cluster 2, samples from the 14th to the 20th belong to cluster 3, and samples from the 21st to the 30th samples belong to cluster 4.
By K-means clustering, the 100 training samples and 30 testing samples are divided into 4 clusters, and the clustering results are shown in Figure 16. There is considerable similarity between the cluster results of harden fuzzy C-means and those of K-means methods. In the 100 training samples, only the results of six samples (29th, 43rd, 82nd, 83rd, 85th, and 95th sample) are different, while in the 30 testing samples, only the results of one sample (30th) are different.
Clustering results of training and testing samples by K-means clustering.
The above clustering results are used to replace the membership degree matrix in this paper. The results are combined with machine learning methods in order to measure the difference between harden clustering methods and the fuzzy clustering one in improving the prediction accuracy. According to the harden clustering result, the 100 training samples are used to train four submodels, and corresponding testing samples are tested by them. The harden fuzzy C-means can be taken as an example. The training samples from the 1st to the 25th all belong to cluster 1, and are used to train a submodel, called the 1st submodel. Then, the 1st submodel is applied to test the testing samples from the 1st to the 6th, and the other test samples are tested by corresponding submodels. The average percentage error of the 30 testing samples is used to measure the accuracy of the prediction model. The prediction results by the BP neural network combined with the K-means clustering are shown in Figure 17.
Prediction results by the back propagation (BP) neural network combined with K-means clustering. (a) Results of uniaxial compressive strength (
UCS) and (b) results of
J
f.
As shown in Figure 17, the actual and predicted UCS and Jf show relatively high consistency on the whole. In addition, the prediction abilities of the four submodels are similar. The average percentage error of the four clusters of UCS is 12.54%, 11.27%, 13.64%, and 11.08%, respectively, while that of Jf is 12.38%, 12.78%, 10.75%, and 10.95%. The average percentage error of predicted UCS and Jf is 12.02% and 11.68%, respectively. Compared with the normal BP neural network-based model (the average percentage error of UCS and Jf being 13.62% and 12.38%, respectively), the accuracy has been improved slightly, but not significantly. However, the BP neural network-based model combined with fuzzy C-means clustering yields higher accuracy (the average percentage error of UCS and Jf being 7.66% and 6.40%, respectively). This proves that the fuzzy clustering methods, including the fuzzy C-means clustering, are more suitable to be combined with machine learning methods so as to improve their accuracy than harden clustering methods, such as K-means clustering.
To further verify the above conclusion, it is attempted to combined K-means clustering with SVR and RF, and their prediction results are compared with the one predicted by pure machine learning methods and methods combined with fuzzy C-means clustering. The average percentage errors of the prediction results are listed in Table 3.
Table 3. Average percentage errors of the prediction results.
|
|
|
Combined with K-means (%) |
|
Method |
Prediction target |
Pure method (%) |
Cluster 1 |
Cluster 2 |
Cluster 3 |
Cluster 4 |
Total |
Combined with fuzzy C-means (%) |
BPNN |
Uniaxial compressive strength (UCS) (MPa) |
13.62 |
12.54 |
11.27 |
13.64 |
11.08 |
12.02 |
7.66 |
Jf (m−1) |
12.38 |
12.38 |
12.78 |
10.75 |
10.95 |
11.68 |
6.40 |
Support vector regression |
UCS (MPa) |
12.73 |
10.06 |
13.20 |
12.94 |
12.86 |
12.41 |
6.61 |
Jf (m−1) |
10.23 |
10.03 |
10.27 |
12.09 |
12.70 |
11.38 |
6.17 |
Random forest |
UCS (MPa) |
13.73 |
11.52 |
10.42 |
14.01 |
11.20 |
11.71 |
5.72 |
Jf (m−1) |
15.74 |
14.68 |
14.57 |
14.68 |
10.21 |
13.31 |
7.09 |
As can be seen from the results listed in Table 3, the results of SVR and SF are similar to those of BP neural networks. After the pure machine learning models were combined with K-means clustering, the accuracy is mostly improved by 1–3 percentiles; even the prediction accuracy of some models is reduced, such as the SVR-based joint frequency prediction model. When combined with K-means clustering, the error increases from 10.23% to 11.38%. On the contrary, the accuracies of all the machine learning models combined with fuzzy C-means clustering are significantly improved.
Theoretically, after K-means clustering, the 100 training samples are divided into four clusters in this paper, and each submodel is trained by only about 20–30 samples. Although the training samples after clustering have higher pertinence for training submodels, the training sample number is reduced, which may have a negative influence on prediction accuracy. The field rock mass samples, which are hard to collect, are always only tens and hundreds of series. Therefore, harden clustering methods may result in too few samples for training submodels. Different from harden clustering methods, fuzzy C-means only changes the training weight of each sample, but the training sample number is not reduced. Therefore, in theory, fuzzy clustering methods are more suitable for improving the prediction accuracy of machine learning methods than harden clustering methods, especially when the number of training samples is limited, which has also been proved by the results presented in this section.