Grouped machine learning methods for predicting rock mass parameters in a tunnel boring machine-driven tunnel based on fuzzy C-means clustering-深地科学

Grouped machine learning methods for predicting rock mass parameters in a tunnel boring machine-driven tunnel based on fuzzy C-means clustering

Highlights

The traditional rock mass parameter predicting method based on machine learning and tunnel boring machine driving data is improved by fuzzy C-means clustering.
Using fuzzy C-means clustering, samples are divided into multiple clusters and multiple targeted submodels are trained. The prediction results weighted by the multiple submodels are more accurate than those obtained by the traditional method.
The proposed method is verified by multiple machine learning algorithms, including back propagation neural network, support vector regression, random forest, and 130 series of field data. The results show that the proposed method has advantages in terms of accuracy compared with the traditional method.

1 INTRODUCTION

The tunnel boring machine (TBM) is widely used in tunnel excavation, especially for long and deep tunnels, due to its high efficiency and safety. To improve safety and efficiency, it is important to develop rapid and accurate methods to acquire the rock mass parameters. However, due to the restrictions posed by the complex machine structure and the narrow space of TBM, it might be difficult to use the in-field rock mass parameter testing methods applicable to the surface in this environment. Although many researchers have studied rock mass parameter testing methods applicable to the TBM environment, such as Poisel et al. (2010), Naeimipour et al. (2016), Wang, Gao et al. (2020), Goh et al. (2011), Kong and Shang (2018), Liu et al. (2018), Xu et al. (2023), Lussu et al. (2019), and Cordes et al. (2019), most of these methods focus on testing or analysis process, which makes it difficult to acquire the data at the speed of TBM excavation.

To solve this problem, mapping is built between real-time TBM tunneling data and rock mass parameters with machine learning. For example, Fattahi and Babanouri (2017) compared the TBM performance prediction models established by data mining algorithms such as the gravity search algorithm, the differential evolution algorithm (DE), artificial bee colony, and support vector regression (SVR), and found that the combination algorithm of SVR and DE shows higher prediction accuracy. Salimi et al. (2018) used the tree regression algorithm (CART) to build the prediction model of the specific rock mass boreability index (SRMBI), and compared this method with multiple variable regression analysis. Their study proved that CART had significant advantages in predicting SRMBI. Li, Hu et al. (2022) and Li, Zhang et al. (2022) studied the influence of tunneling parameters of a single cutter, such as penetration and cutter space, on the rock-cutting efficiency, by full-scale linear cutting tests and numerical simulation. In addition, Armaghani et al. (2017), Zare and Naghadehi (2017), Zare et al. (2018), Mahdevari et al. (2014), Liu et al., (2019, 2020), Yagiz and Karahan (2011), and Minh et al. (2017) applied machine learning algorithms such as artificial neural network, particle swarm optimization, fuzzy logic, and gene expression program for rock machine mapping, which yielded good results.

Previous studies generally used multiple field-measured data as training data. The prediction or evaluation targets are taken as output and known data as input. With regression or data mining, mappings between the input and output are established as the basis for evaluation. By including newly collected data in the mappings, the outputs, namely, the evaluated results of the targets, can be calculated.

These research results have been widely used to predict rock mass parameters and yielded acceptable accuracy. However, there are still problems in constructing mappings between rock mass parameters and TBM driving data. TBM is always exposed to complex rock mass conditions, which might be included in field-collected data. However, the changing rules between tunneling data and rock mass parameters vary with rock conditions. In this case, a consensus has been reached that a single or limited number of mappings may not be suitable for different rock mass conditions. Hence, some researchers have categorized known rock conditions and established mappings. For example, Gong et al. (2007, 2009, 2020) established multiple mappings between TBM penetration and the brittleness index, under different joint orientations, volumetric joint counts, and uniaxial compressive strengths. On this basis, they further modified the China hydropower classification (HC) method and obtained good evaluation results. Xue et al. (2018) proposed a dynamic rock mass classification method according to uniaxial compressive strength, intactness index, groundwater state, and initial geo-stress state, which is verified based on the YHJW project. As mentioned above, field sample classification is a feasible supplementary method to improve the applicability of a single regression model in multiple complex rock conditions. In the geotechnical field, researchers have attempted to evaluate rock mass parameters by clustering methods, and proved their feasibility. This kind of research can be classified into the following three groups according to the research targets.

The first group of research studies included direct classification of the rock condition into groups by clustering. Kitzig et al. (2017) tested the petrophysical and geochemical properties of multiple rock samples as clustering criteria, and proposed a rock classification method based on fuzzy C-means (FCM). Saeidi et al. (2014) proposed an adaptive neuro-fuzzy inference system (ANFIS) based on fuzzy C-means to evaluate the rock mass diggability index. On comparing results with the tested value, the ANFIS system showed higher accuracy than traditional regression. Rad and Jalali (2019) modified the rock mass rating (RMR), a rock classification system, by a fuzzy clustering algorithm, which was validated by the Sangan Iron Ore project.

The second group of research studies recognized discontinuities or structural plane on the rock surface. Liang et al. (2012) used K-means clustering to evaluate rock mass discontinuity attitude elements, and obtained good accuracy. Cui and Yan (2020) proposed an improved clustering method based on differential evolution, which performed well in terms of identification of rock discontinuity. Wang, Zheng et al. (2020), Li et al. (2021), (2014), and Gao et al. (2018) realized good performance on rock discontinuities and structural plane recognition using methods such as multidimensional clustering, fuzzy spectral clustering, Ant colony ATTA clustering, and clustering by fast searches and finding density peaks (CFSFDP).

The third group of research studies, which is also the most relevant to this research among the three groups of research studies mentioned, focused on predicting the mechanical parameters of rock mass. Majdi and Beiki (2019) established a data set including 205 field samples. According to the difference of the elasticity modulus, rock mass quality designation, and the geological strength index, a deformation modulus prediction model is well established. On the basis of the above-mentioned research, Fattahi (2016b) used RMR, uniaxial compressive strength (UCS), buried depth, and elasticity modulus as input for the deformation modulus. Moreover, Bashari et al. (2011) included joint frequency (m−1), porosity, and density as input.

All of the above research studies obtained acceptable results and therefore provide good references. However, most of them classify rock conditions or predict related parameters with several known rock mass parameters. This is almost impossible in TBM tunneling due to the difficulty in acquiring real-time rock mass parameters. This is one of the reasons why clustering methods are rarely used in TBM tunneling to evaluate the rock mass parameters. This paper makes full use of TBM tunneling data, and proposes the grouped prediction method of rock mass parameters based on rock-TBM mappings and clustering. In brief, field rock-TBM data are grouped into multiple clusters using the clustering method. By using samples in different clusters, multiple submodels are obtained, and each submodel shows better ability to predict the rock mass parameters of certain clusters. With the weighting of predicting results of multiple submodels, the prediction accuracy can be improved. In detail, the proposed method requires a series of field test data, usually with dozens to hundreds of samples, each containing target rock mass parameters, such as uniaxial compressive strength and joint frequency, and corresponding TBM tunneling data collected at the same mileage as the rock mass data. After data preparation and pretreatment, fuzzy C-means clustering is used to classify field-measured samples with similar tunneling data. After clustering, a membership degree matrix is obtained, which provides the membership degree of each sample belonging to each cluster. Samples in the same cluster always have similar tunneling data distribution, while the corresponding geological conditions that the samples are exposed to are always similar. On this basis, each submodel is trained by samples with the weights of their membership degree of a certain cluster, and multiple submodels are trained with pertinence, with TBM tunneling data as input and rock mass data as output. For test samples or newly encountered conditions, the field tunneling data can be used to calculate the membership degree of the sample to each cluster, which is used to determine the weights of results predicted by submodels. The weighted prediction results always show higher accuracy than rock-TBM mapping established by the pure machine learning method. In this study, 100 training samples and 30 test samples are collected from the C1 part of the Pearl Delta water resources allocation project. Fuzzy C-means clustering is combined with BP neural network (BPNN), SVR, and random forest (RF), and corresponding submodels and weighted models are trained by the 100 training samples. By comparing the accuracies of the 30 test samples that are predicted, respectively, by pure machine learning models and by models combined with fuzzy C-means clustering, the accuracy improvement effect of fuzzy C-means clustering on multiple machine learning methods is verified. In particular, the main novelty of this paper is use of clustering for grouping samples and improving the prediction accuracy by weighted submodels, instead of improving the performance of the machine learning method itself. The proposed method is a supplementary method for machine learning, and it can be used in conjunction with various existing machine learning methods.

The paper is organized as follows. The 2nd section introduces the proposed method, combining fuzzy C-means clustering with machine learning methods for improving the prediction accuracy of rock mass parameters. In the 3rd section, the collected field data used for training and testing the proposed method and their pretreatment process are introduced. In the 4th section, BP neural network combined with fuzzy C-means clustering is used as an example, and the corresponding models are built and tested by the field data. The 5th section discusses the application of the proposed method to different machine learning methods and harden clustering methods. The 6th section presents the conclusion.

2 METHOD

A widely used method is to predict the rock mass parameters by mapping between them and TBM tunneling data, which can be established by the machine learning method. In detail, field samples including rock mass parameters and TBM tunneling data are prepared for training. Then, machine learning methods, such as RF, SVR, and artificial neural networks, are used to build mappings with the input of pretreated TBM driving data and the output of rock mass parameters. However, field samples always show different data characteristics, which interfere with each other and affect the prediction accuracy. To overcome this drawback, on the basis of the above overall method, field samples are first grouped by the clustering method, and then each sample group is used to establish mutually independent mappings. Finally, a grouped predicting method is proposed with three steps, which are as follows:

1.
Fuzzy C-means algorithm is used to distinguish the input differences between the field samples and to group them into multiple clusters. In the clustering results, the relationship between each sample and each cluster is not absolute, but expressed by membership degree. The higher the degree, the more similar the sample is to the cluster.
2.
With samples of each cluster as training samples, multiple submodels can be trained by the machine learning method with the input of TBM tunneling parameters and the output of rock mass parameters. In particular, in building a submodel corresponding to a certain cluster, the similarity between each sample and the cluster varies. The higher the similarity of the sample, the higher the training weight that should be obtained by the sample. Therefore, the membership degree is used as the training weights, which can improve the specificity of submodels for the samples in the corresponding clusters.
3.
For a sample to be predicted, whether it is a test sample or a newly encountered tunneling condition, each submodel can render a predicted result. Meanwhile, the membership degree of the sample can be obtained according to the clustering results. By adding results predicted by all submodels with the weight of the membership degree, the predicted result of the sample to be predicted can be obtained. In particular, the aforementioned machine learning methods are not limited to a specific algorithm, but can also be artificial neural networks, SVR, or other algorithms. The proposed method is introduced in the 2nd section of this paper.

In particular, this research does not improve the machine learning method itself as mentioned in step 2, but combines existing machine learning methods. The main novelty of this research is that the focus is on the first and third steps, that is, using the fuzzy-C means to classify samples and weighting multiple submodels to improve the accuracy. The flow chart of the proposed method is shown in Figure 1.

Details are in the caption following the image — Figure 1
Open in figure viewer PowerPoint

Flow chart of the proposed method.

2.1 Fuzzy C-means algorithm

Different from hard classification methods such as K-means, in fuzzy C-means, each sample is not completely classified into a specific group. In fuzzy C-means, the grouping results are expressed by the membership matrix, and each sample has a membership vector (Shi et al., 2019; Song et al., 2019). For example, a data set with N samples is grouped into K groups, and its fuzzy C-means grouping result is an matrix, which is obtained by using the objective function as follows (Fattahi, 2016a):

� = \sum_{� = 1}^{�} �_{�} = \sum_{� = 1}^{�} (\sum_{� = 1}^{�} {�_{ij}}^{�} {‖ �_{�} - �_{�} ‖}^{2}),

(1)

where X j and C i represent the jth sample and the centroid of the ith group; w ij denotes the membership that represents the degree to which the jth sample belongs to the ith group; and m is a fuzzification parameter that is larger than 1. The larger the w ij, the closer the jth sample is to the data features of the ith group. The membership w ij satisfies the below equation.

\sum_{� = 1}^{�} �_{ij} = 1 .

(2)

The centroids of the K groups can be calculated as (Fattahi, 2016a)

�_{�} = \frac{\sum_{� = 1}^{�} {�_{ij}}^{�} �_{�}}{\sum_{� = 1}^{�} {�_{ij}}^{�}} .

(3)

Given a known weight matrix, when the centroids are obtained, the weight matrix can be updated by (Fattahi, 2016a)

�_{ij} = \frac{1}{\sum_{� = 1}^{�} {(\frac{‖ �_{�} - �_{�} ‖}{‖ �_{�} - �_{�} ‖})}^{\frac{2}{� - 1}}} .

(4)

During multiple iterations, the objective function value continues to decreases, and iterations continue until each reduction is less than a specific value. As sufficient iterations are needed to avoid local optimization, a small value of 0.01 is selected in this paper.

2.2 Building method for grouped mappings

Generally, samples in the same groups obtained by fuzzy C-means are relatively similar, whereas samples in different groups show relatively huge differences. Theoretically, multiple machine learning mappings are used to solve different group samples. As each mapping is established by similar samples, there may be less interference of abnormal samples on the mapping. Different from hard clustering methods, such as K-means clustering, each sample in fuzzy C-means partially belongs to all groups with a certain membership, instead of completely belonging to a specific group. The higher the membership, the closer the sample is to the data characteristic of the group, and the greater the influence on the mapping built according to the group samples. Therefore, the membership is included in the objective function of machine learning method to adjust the influence of each sample on the mappings. A data set with N samples and K groups can be taken as an example: the grouped mapping consists of K submodels, and the loss function can be expressed as

{Loss}_{�} = \sum_{� = 1}^{�} �_{ij} {(�_{ij} - �_{�}^{'})}^{2},

(5)

where Loss i represents the objective function of the submodel established by the ith group samples; and represent the predicted by the ith submodel and the actual rock mass parameters of the jth sample. With iterations, the objective function is reduced to an acceptable level, and K submodels are built. For new working conditions with known TBM driving data, the rock mass parameters can be predicted by the K submodels as

�_{�}^{''} = \sum_{� = 1}^{�} �_{ij} ∙ �_{ij},

(6)

where is the predicted result of the jth sample by the grouped multiple mappings. Similar to training samples, the test samples partially belong to all groups instead of a specific group. Correspondingly, their prediction results are given by all weighted submodels. The higher the membership of the sample is to the group, the higher the weight of submodel result. In the grouped prediction method, for each test sample, the weight of the submodel, which is established by training samples close to it, is larger. Compared with the method of solving all samples with a single mapping, in which the weights of all training samples are the same, the proposed method is less affected by the differences between samples.

3 CASE STUDY

3.1 Project overview

This research is based on the C1 part of the Pearl Delta water resources allocation project. This area is located in Dongguan, Guangdong Province, China, extending from Shaxi reservoir to the SL02# work well in Yangwu country. The tunnel goes from west to east. The 2# main cave was excavated by TBM, with a total length of 9.75 km and a diameter of 8.2 m, from mileage SL14+958 to SL5+213. Low mountains and hills are the main landform along the tunnel, which passes under multiple reservoirs and highways. The depth of the tunnel ranges from 50 to 270 m. The tunnel slope ranges from 20° to 30°, as this area is very high in the east. According to the hydropower classification (HC) method, the surrounding rocks along the tunnel mainly consist of class II and III rocks. The geological profile of the area under study is shown in Figure 2.

The tunnel goes through a stratum with multiple lithologies, and the rock conditions of different areas vary sharply. Along the tunnel, 130 samples were collected in different strata, and the uniaxial compressive strength, joint frequency, muck size, and corresponding TBM operating parameters of each sample were measured or tested in the field. Among the 130 samples, 100 samples were collected from mileage SL 10+560 to SL 8+780 and used to obtain the evaluated results by the proposed method as a training set. To verify the method and its results, the other 30 samples were collected from mileage SL 8+240 to SL 7+810, which composed the testing set.

In this paper, rock mass parameters, including UCS and joint frequency (Jf), are used as the prediction target. The two rock mass parameters were collected by field tests. UCS increased from 5.2 to 135.6 MPa and Jf ranged from 1.21 to 4.22 m−1. Considering the complex rock mass condition, the corresponding TBM tunneling parameters showed a sharp fluctuation, and this was recorded and used as the input of the prediction model. The TBM tunneling data were recorded from July 20, 2021 to January 10, 2022, with a frequency of 1 s−1, including 276 features. More than 15.2 × 106 records were collected. The basic information of the rock mass and the main TBM tunneling parameters is listed in Table 1.

Table 1. Basic information of the rock mass and the main tunnel boring machine (TBM) tunneling parameters.

Type	Parameter	Maximum	Minimum	Average	Standard deviation
Rock mass parameters	Uniaxial compressive strength (MPa)	135.6	5.2	60.8	29.2
Rock mass parameters	Joint frequency (m−1)	4.22	1.21	2.71	0.89
Main TBM tunneling data	Thrust (kN)	10 214.3	2561.5	5435.2	1532.1
	Torque (kN · m)	1824.0	199.3	840.4	332.2
	Penetration rate (mm/min)	82.6	8.1	46.7	17.6
	Revolutions per minute (r/min)	5.80	4.00	5.10	0.39

3.2 Pretreatment of the TBM tunneling data

Compared with the method of taking the field rock mass parameters as the output of the prediction model, which has only two values per sample, the method of taking the TBM tunneling data as input is more complex. The complexity of the TBM tunneling data arises due to their two dimensions. The first dimension can be called the time dimension. The TBM driving data are time-series data, which are collected at a frequency of 1 Hz. Therefore, each sample always contains more than 1000 series of tunneling data. The other dimension can be called the feature dimension. All tunneling data consist of 276 samples, which are too much for data mining. Considering the complexity of the field tunneling data, they should be pretreated and simplified. Essentially, the pretreatment process involves dimensionality reduction in the time and feature dimensions of the TBM tunneling data.

First, in the time dimension, the pretreated tunneling data should only contain one time node. In addition, it should satisfy three requirements: (1) the retained TBM tunneling data should be collected from or near the same area as that of the rock mass data; (2) the retained data corresponding to each rock mass sample should be collected in a limited range, and it is helpful to avoid sharp changes of the rock condition within this range; and (3) the retained data should be collected during TBM excavation, not at TBM stoppage.

According to the above requirements, this research collected multiple samples of TBM tunneling data, and each sample is collected with the same location with the corresponding rock mass samples. In this way, a long continuous area, in which the TBM tunneling data were collected, were divided into 100 parts with a length of 1 m, and the rock mass parameters were tested and collected in the center position of each part. This satisfies the first two requirements, but each part still includes both excavation and stoppage data, and the third requirement still needs to be met. Accordingly, the penetration rate, that is, the driving speed of a TBM, was used as the criterion for judging TBM excavation or stoppage. If the penetration rate of a series of TBM tunneling data is lower than 10 mm/min, the data are regarded to have been collected during TBM stoppage or trial driving. In this sense, these data are reduced. By contrast, the data with penetration rates higher than 10 mm/min would be retained. Figure 3 shows an example of data reducing and retaining.

After redundant tunneling data reduction, rock mass data and valid TBM tunneling data collected in the matched 100 areas are obtained. However, there is only one series of rock mass data in each 1-m part, while there are usually more than 1000 series of valid tunneling data, because they are sequential with a frequency of 1 s − 1, and tunneling a 1-m-long area always takes more than 1000 s. In the proposed method, each tunneling parameter in a 1-m-long part is used as a feature of the sample, which should be a concrete value instead of sequential data including more than one value. Therefore, the average value is used to represent the sequential tunneling data of each, that is,

�_{ij}^{'} = \frac{1}{�} \sum_{� = 1}^{�} �_{ijk},

(7)

where is the handle results of the jth feature of the ith sample; x ijk represents the value of x ij measured at the kth second; and n is the total number of seconds spent in the ith part.

To reduce the feature dimension, two steps are conducted. First, normalization is used to transform the different features of in-field collected rock mass and TBM tunneling parameters to a similar range so as to remove their dimensions. However, there are large differences between the distribution rules of the 276 features. For example, data directly related to tunneling, such as thrust and torque, usually have a large positive value, and their magnitude is inconsistent. Data that are used to control the tunneling directions, such as horizon and vertical angles, have both positive and negative values, representing the different directions. Some data such as the motor current of the cutterhead are always stable and close to a certain value with small fluctuations.

Obviously, a simple normalization method might not be suitable for such complex and varying data. Therefore, a two-step method is proposed. The magnitudes and dimensions of all TBM tunneling data are removed by zero-mean normalization, and then they have the same range. Logistic normalization is then used to balance the fluctuation difference between features. Zero-mean and logistic normalization can be expressed by the below equations:

�_{�}^{'} = \frac{�_{�} - µ_{�}}{�_{�}},

(8)

�_{�}^{'} = \frac{1}{1 + e^{- �_{�}}},

(9)

where and represent variables before and after normalization of the ith sample; µ i and σ i denote the average value and variance of the ith sample, respectively.

After normalization, principal component analysis (PCA) is used to reduce the total of 276 tunneling data features. The basic principle of PCA is to reorganize original variables to form new and independent variables through orthogonal transformation (Wu et al., 2020). First, the covariances among the 276 features are calculated to form a covariance matrix. Among them, covariances among some main features, including torque (Tor), revolution speed of the cutterhead (R), penetration (P), thrust (Th), and penetration rate (PR), are shown in Figure 4.

Figure 4 shows covariances between some important tunneling parameters. The greater the absolute value of the covariance, the greater the influence of the two features on each other. A positive covariance means that the two features increase or decrease together, while a negative covariance means that one of the two features increases, while the other decreases. In total, a 276-dimensional covariance matrix is obtained and recorded as C. It has 276 eigenvalues and corresponding eigenvectors, which are recorded as λ i and u i, respectively, including the repeated one. Matrix C, λ i, and u i, satisfies the below equation.

� � = � � .

(10)

Therefore, r eigenvalues with high average absolute values are selected, and the corresponding eigenvectors are regarded as the new and independent variables used in clustering. The value of r is a key factor in this procedure. If r is too small, this might lead to serious loss of feature information, while too large a value results in redundant variables and increases the computational process. The proportion of accumulated information is used to determine the number of handled features, and can be calculated by the following equation (Wu et al., 2020):

{PI}_{�} = \frac{\sum_{� = 1}^{�} �_{�}}{\sum_{� = 1}^{�} �_{�}},

(11)

where PI l is the proportion of accumulated information from the 1st to lth handled features; k i represents the eigenvalues of the ith handled features; and n denotes the dimension of the covariance matrix, and also the total number of original features. Usually, as few handled features as possible should be reserved, and the information and the PI value should be higher than 0.95. The eigenvalues of each handled feature and the proportion of accumulated information are shown in Figure 5.

As shown in Figure 5, only three features with the highest eigenvalues should be reserved, and the dimension of the tunneling data is reduced from 276 to 3. This shows that this method can effectively simplify the TBM tunneling data and provide a processable data set for clustering.

4 RESULTS

4.1 Fuzzy C-means

After data pretreatment, the number of TBM tunneling data features is reduced to three, and the three features are used as the clustering basis of the samples. Before conducting fuzzy C-means, the cluster number c should be determined according to the data set characters. Considering the clustering target, making the distance between clusters as large as possible and the distance between samples in the same cluster as short as possible, the below equations are used to determine the most suitable cluster number c.

�_{�} = \sum_{� = 1}^{�} \sum_{� = 1}^{�_{�}} {�_{ij} ∙ �_{ik} ∙ (�_{�} - �_{�})}^{2},

(12)

�_{�} = \sum_{� = 1}^{�} \sum_{� = 1}^{�_{�}} {(1 - �_{ij}) ∙ �_{ik} ∙ (�_{�} - �_{�})}^{2},

(13)

� = \frac{1}{�} \sum_{� = 1}^{�} \frac{�_{�} - �_{�}}{{\max (�_{�}, �}_{�})} .

(14)

To evaluate the clustering effect and determine the most suitable c, Equations (12–14) are constructed using the silhouette coefficient of K-means clustering (Sinaga & Yang, 2020; Yang & Sinaga, 2019). In Equations (12–14), aj represents the average distance between the jth sample and other samples in the same group; bj represents the average distance between the jth sample and samples in different groups; and Ni represents the sample number in the ith cluster. Different from the silhouette coefficient calculation in K-means clustering, fuzzy C-means is a soft clustering method, and the membership degree of the jth sample belonging to the ith cluster is recorded as , which is included in the calculation of aj and bj. S is the silhouette coefficient of the total data set with n samples. It is also the average of the silhouette coefficients of the n samples. A value of S closer to 1 indicates better clustering results, because there is a large distance between samples in different groups and a small distance between samples in the same group.

To determine the most suitable cluster number c, initial centroids among the 100 training samples are randomly selected and the optimal silhouette coefficients with cluster numbers from 2 to 10 are used. To avoid local optimization, fuzzy C-means are conducted five times with different random initial centroids for each c, and the maximum of their silhouette coefficients is used to assess the rationality of c. The maximum silhouette coefficient of each c value is shown in Figure 6, from which it can be seen that the silhouette coefficient reaches its maximum of 0.711 when c is selected as 4. Then, the fuzzy C-means algorithm is coded by anaconda; the fuzzification parameter m is set as 2; and the maximum iteration is set as 200. The fuzzy C-means algorithm is programmed by Anaconda 2.7, and the code structure is shown in Figure 7. The coded fuzzy C-means algorithm is used to group the 100 training samples into four clusters, and the changing process of the loss function J with the iterations is shown in Figure 8.

After 54 iterations, the centroids of each cluster remain stable, and the loss function J reaches its minimum of 5.25. The 100 training samples are divided into four clusters, while each training sample is assigned a combination of membership degree of the four clusters, which is shown in Figure 9.

4.2 Prediction results of the four BP neural network-based submodel

In Section 4.1, the 100 training samples are grouped into four clusters. In the clustering results, samples with a similar distribution of membership degree are close to each other, and the corresponding submodel is relatively suitable for similar testing samples in theory. Therefore, on the basis of the clustering results, weighted multiple submodels may improve the prediction accuracy of the machine learning method. To verify the above hypothesis, the back propagation (BP) neural network is taken as an example. To be specific, BP neural network is programmed by Weka 3.8.6, a machine learning workbench, which provides an extensive collection of machine learning algorithms and data preprocessing methods. There are three key hyperparameters in BP neural networks, namely, activation functions, number of layers, and neuron number of each layer. Considering the limited sample size and feature size of the data, a network with only one hidden layer, and three layers in total, is used. The numbers of input layer, hidden layers, and output layer neurons are represented by I, H, and O, respectively. I and O are consistent with input and output features, which equal (3) and (2), respectively, and H is set to 6. The activation function determines the relationship of variables between input and hidden layers, and hidden and output layers. This research follows the widely used activation function, which is shown in Equation ( 15). With networks with the mentioned structure and hyperparameters, the 100 training samples are directly used to train a BP neural network-based model with same weights, and it is tested by the 30 samples in the testing set. The prediction results are shown in Figure 10.

\tan ℎ (�) = \frac{1 - e^{- 2 �}}{1 + e^{- 2 �}} .

(15)

In this paper, the average percentage error E of each sample is used to measure the predicted accuracy as

�_{�} = \frac{| �_{�} - �_{�}^{'} |}{�_{�}},

(16)

� = \sum_{� = 1}^{�} �_{�},

(17)

where represents the percentage error of ith samples; and represent the actual and predicted value of ith samples, respectively. As shown in Figure 10, the trend between the actual value and the predicted value broken line is relatively consistent. The average percentage error of UCS and J f predicted by the BP neural network without fuzzy C-means is 13.62% and 12.38%, respectively. The highest percentage error of all samples is 23.6% (24# sample of UCS) and 24.2% (11# sample of J f). On the whole, the BP neural network-based model performs well on the 30 testing samples and achieves acceptable accuracy.

To verify the advantages of fuzzy C-means, training samples are weighted according to their membership degree to each cluster, and four corresponding BP neural network-based submodels are built, according to Equations (5)–(6). The four submodels are used separately on the 30 testing samples, and the UCS and Jf prediction results are shown in Figures 11 and 12, respectively. In particular, to clearly observe the effect of different submodel on samples with different membership degrees, the testing samples listed in Figures 11 and 12 are reordered according to their membership degrees. In detail, the first six samples (from 1# to 6# sample) have the highest membership degree on the 1st cluster, the next eight samples (from 7# to 14# sample) and seven samples (from 15# to 21# sample) have the highest membership degree on the 2nd and 3rd cluster, and the last nine samples (from 22# to 30# sample) have the highest membership degree on the 4th clusters. The average prediction accuracies of the four submodels on the four kinds of testing samples are listed in Table 2.

Table 2. Average percentage error of the four submodels on testing samples.

Prediction target	Serial number	Average percentage error (%)
Prediction target	Serial number	1–6 # samples	7–13 # samples	14–20 # samples	21–30 # samples	Total
Uniaxial compressive strength (MPa)	1st	6.94	12.22	17.35	15.77	13.54
	2nd	22.10	3.72	11.41	16.95	13.60
	3rd	10.06	15.62	6.31	18.40	13.26
	4th	12.04	11.68	18.67	7.93	12.13
Jf (m−1)	1st	5.10	16.58	15.54	12.88	12.81
	2nd	19.82	6.38	15.06	12.28	13.06
	3rd	19.69	14.13	7.41	14.09	13.66
	4th	10.21	14.70	10.35	6.52	10.06

Note: The most suitable results, also the lowest percentage error are in bold.

As shown in Figures 11 and 12, there is relatively good consistency between the actual value and the value predicted by the four submodels. In addition, the UCS average percentage errors predicted by the four BP neural network-based submodels are 13.54%, 13.60%, 13.26%, and 12.13%, while the Jf average percentage errors predicted by the four submodels are 12.81%, 13.06%, 13.66%, and 10.06%, respectively. Compared with the model built without fuzzy C-means, there are no significant improvement in the prediction accuracy of the four submodels. Due to the significant imbalance in sample error, the prediction results vary considerably, although the average percentage errors of the four submodels are close to each other. In detail, each submodel is relatively good at predicting samples with high membership degree to the corresponding clusters. The 2nd submodel can be taken as an example. Samples from the 7th to 13rd clusters have higher membership degree to the 2nd cluster, while the error predicted by the 2nd submodel of their UCS and Jf is not more than 6.35% (13rd sample) and 10.20% (9th sample), respectively. The prediction results of the other three submodels also verify this rule. In other words, it is found that each submodel is good at predicting samples with high membership degree to the corresponding cluster, but the total accuracy still needs to be improved. However, the imbalance in sample error provides the possibility of improving accuracy through the weighted submodels.

After the four submodels have been weighted, a grouped BP neural network-based model is established. The ith sample can be taken as an example. Its membership degrees to the four clusters are recorded as , , , and , while the output of the ith sample predicted by the grouped model is calculated using Equation (6), and the prediction results of the 30 testing samples by the grouped BP neural networks-based model are shown in Figure 13.

As shown in Figure 13, the average percentage errors of the 30 testing samples predicted by the hybrid model are 7.66% (UCS) and 6.40% (Jf). Compared with the model without fuzzy C-means clustering, in which the error reaches 13.62% (UCS) and 12.38% (Jf), the accuracy of the model built by the proposed method shows a significant improvement. As mentioned above, there are considerable differences between the errors of different samples predicted by the four submodels. On the contrary, the results predicted by the hybrid model are relatively stable; the largest UCS and Jf error of the 30 samples are 12.29% (24th sample) and 13.18% (17th sample), respectively. The results confirm that the hybrid model can make full use of the advantages for the four submodels predicting different samples, and the prediction accuracy and stability have been improved, compared with the model without fuzzy C-means clustering.

5 DISCUSSION

5.1 Adaptability of fuzzy C-means to multiple machine learning algorithms

In Section 4.1, the 100 training samples are grouped into four clusters. Theoretically, the clustering results can be used in combination with multiple machine learning algorithms to improve the accuracy. In the 4th section of this paper, BP neural networks are used to verify the effect of the fuzzy C-means clustering in improving the prediction accuracy. In this section, the applicability of the fuzzy C-means to other machine learning algorithms is further discussed using two machine learning algorithms: SVR and RF. The verification process of the two algorithms is similar to that of the BP neural network. First, 100 training samples are directly solved by SVR and RF with equal weights in Weka 3.8.6, which is regarded as the original model and recorded as models “SVR” and “RF.” Then, according to the clustering results obtained by fuzzy C-means, four SVR- and RF-based submodels are established by samples with the weights of their membership degree. Finally, the four submodels are combined with the hybrid models, which are recorded as models “FCM-SVR” and “FCM-RF.” The SVR- and RF-based original models, submodels, and hybrid models are tested by the 30 testing samples, and their prediction accuracies are calculated (Figures 14 and 15).

In terms of SVR, there are three key hyperparameters, namely, the kernel function, the penalty parameter (C), and the insensitive parameter ( ). In this research, the most widely used kernel function, poly kernel, is adapted to build the models. In addition, C is used to control the weight between the accuracy and complexity of the model. Higher C always leads to a model with higher accuracy and complexity, and for this model, overfitting is easier. determines the smooth degree of the model, and only samples with error higher than this are used to update the model. By exhaustion, C and are set as 50 and 0.05, respectively.

The pure SVR-based model and the SVR-based model improved by the fuzzy C-means results show acceptable predictability. However, the UCS and Jf percentage errors predicted by the pure SVR-based model are 12.73% and 10.23%, while those predicted by the SVR-based model improved by fuzzy C-means results are 6.61% and 6.17%, respectively. In particular, there are 18 UCS samples and 16 Jf samples whose errors predicted by the pure SVR-based model are more than 10%. On the contrary, the errors of most samples predicted by the SVR-based model, which is improved by fuzzy C-means results, remain below 10%. Specifically, only five UCS samples and eight Jf samples have errors more than 10%. In addition, the R2 between the actual and predicted value of the pure SVR-based model is 0.915 (UCS) and 0.760 (Jf), while the R2 of the improved SVR-based model is higher, which is 0.970 (UCS) and 0.901 (Jf), respectively. These results show that the fuzzy C-means results do improve the prediction effect of the SVR model (Figure 14).

The main hyperparameters of RF include the number of decision tree, minimum samples to split, and maximum of features. Among them, the maximum of features consists of the features number, and it is set as 3 after pretreatment. The number of decision tree and minimum samples to split are set according to the sample size. Due to the fact that only 100 samples are used, the number of decision tree is determined to be 3, and the minimum samples to split is determined to be 10. The application of the fuzzy C-means clustering to the RF also shows similar characteristics as in the case of SVR. Combined with the clustering results of fuzzy C-means, the R2 of the UCS predicted result is improved from 0.868 to 0.966, while the R2 of Jf is improved from 0.740 to 0.911. In addition, the average percentage errors of UCS and Jf are reduced from 13.73% to 5.72% and from 15.47% to 7.09%, respectively (Figure 15). The results proved the effect of the fuzzy C-means clustering in improving the prediction accuracy of multiple machine learning models, including but not limited to BP neural network, SVR, and RF.

5.2 Differences between harden clustering and fuzzy clustering methods

The above results have proved that the fuzzy C-means clustering results have a positive influence on the prediction accuracy by machine learning models. It needs to be discussed whether harden clustering methods can replace the fuzzy C-means clustering method. Different from the fuzzy clustering method, in which the membership degree matrix is used to express the clustering results, in the harden clustering method, each sample clearly belongs to a certain cluster. Combined with the harden clustering method, the training and testing processes of the machine learning are partly changed. According to the results of harden clustering, the training and testing samples are divided into C clusters. In this section, the number of C is taken as 4 in order to maintain consistency with that of the fuzzy C-means clustering method used in this paper. In total, C clusters of training samples are trained to form C submodels by machine learning methods. Correspondingly, the testing samples are also grouped into C clusters, and each sample is tested by the corresponding submodel, while the average predicted errors of the 30 testing samples are used to measure the accuracy of models with harden clustering methods and compare with the one obtained by the fuzzy C-means clustering method.

In this section, K-means clustering is used, and its clustering results are combined with the above-mentioned machine learning methods. K-means clustering is one of the typical harden clustering methods, and its principle is not described in this paper. To verify the K-means clustering results, the clustering results obtained by fuzzy C-means are hardened, and the 100 training samples and 30 testing samples are listed according to their clusters. In detail, for each sample, the cluster with the largest membership degree is regarded as the cluster that the sample belongs to. As shown in Figure 9, for the 15th sample, which belongs to four clusters as an example, the membership degrees of the sample are 0.62, 0.08, 0.09, and 0.21. Therefore, the 15th sample is divided into the first clusters, and also for other samples. The training set results of harden fuzzy C-means are as follows: samples from the 1st to the 25th belong to cluster 1, samples from the 26th to the 49th belong to cluster 2, samples from the 50th to the 78th belong to cluster 3, and samples from the 79th to the 100th belong to cluster 4. Correspondingly, the 30 testing samples are also divided into the four clusters by harden fuzzy C-means. Specifically, samples from the 1st to the 6th belong to cluster 1, samples from the 7th to the 13th belong to cluster 2, samples from the 14th to the 20th belong to cluster 3, and samples from the 21st to the 30th samples belong to cluster 4.

By K-means clustering, the 100 training samples and 30 testing samples are divided into 4 clusters, and the clustering results are shown in Figure 16. There is considerable similarity between the cluster results of harden fuzzy C-means and those of K-means methods. In the 100 training samples, only the results of six samples (29th, 43rd, 82nd, 83rd, 85th, and 95th sample) are different, while in the 30 testing samples, only the results of one sample (30th) are different.

The above clustering results are used to replace the membership degree matrix in this paper. The results are combined with machine learning methods in order to measure the difference between harden clustering methods and the fuzzy clustering one in improving the prediction accuracy. According to the harden clustering result, the 100 training samples are used to train four submodels, and corresponding testing samples are tested by them. The harden fuzzy C-means can be taken as an example. The training samples from the 1st to the 25th all belong to cluster 1, and are used to train a submodel, called the 1st submodel. Then, the 1st submodel is applied to test the testing samples from the 1st to the 6th, and the other test samples are tested by corresponding submodels. The average percentage error of the 30 testing samples is used to measure the accuracy of the prediction model. The prediction results by the BP neural network combined with the K-means clustering are shown in Figure 17.

As shown in Figure 17, the actual and predicted UCS and Jf show relatively high consistency on the whole. In addition, the prediction abilities of the four submodels are similar. The average percentage error of the four clusters of UCS is 12.54%, 11.27%, 13.64%, and 11.08%, respectively, while that of Jf is 12.38%, 12.78%, 10.75%, and 10.95%. The average percentage error of predicted UCS and Jf is 12.02% and 11.68%, respectively. Compared with the normal BP neural network-based model (the average percentage error of UCS and Jf being 13.62% and 12.38%, respectively), the accuracy has been improved slightly, but not significantly. However, the BP neural network-based model combined with fuzzy C-means clustering yields higher accuracy (the average percentage error of UCS and Jf being 7.66% and 6.40%, respectively). This proves that the fuzzy clustering methods, including the fuzzy C-means clustering, are more suitable to be combined with machine learning methods so as to improve their accuracy than harden clustering methods, such as K-means clustering.

To further verify the above conclusion, it is attempted to combined K-means clustering with SVR and RF, and their prediction results are compared with the one predicted by pure machine learning methods and methods combined with fuzzy C-means clustering. The average percentage errors of the prediction results are listed in Table 3.

Table 3. Average percentage errors of the prediction results.

Method	Prediction target	Pure method (%)	Cluster 1	Cluster 2	Cluster 3	Cluster 4	Total	Combined with fuzzy C-means (%)
			Combined with K-means (%)
BPNN	Uniaxial compressive strength (UCS) (MPa)	13.62	12.54	11.27	13.64	11.08	12.02	7.66
BPNN	Jf (m−1)	12.38	12.38	12.78	10.75	10.95	11.68	6.40
Support vector regression	UCS (MPa)	12.73	10.06	13.20	12.94	12.86	12.41	6.61
Support vector regression	Jf (m−1)	10.23	10.03	10.27	12.09	12.70	11.38	6.17
Random forest	UCS (MPa)	13.73	11.52	10.42	14.01	11.20	11.71	5.72
Random forest	Jf (m−1)	15.74	14.68	14.57	14.68	10.21	13.31	7.09

Note: Bold values are significant for proving the effect of the proposed method.

As can be seen from the results listed in Table 3, the results of SVR and SF are similar to those of BP neural networks. After the pure machine learning models were combined with K-means clustering, the accuracy is mostly improved by 1–3 percentiles; even the prediction accuracy of some models is reduced, such as the SVR-based joint frequency prediction model. When combined with K-means clustering, the error increases from 10.23% to 11.38%. On the contrary, the accuracies of all the machine learning models combined with fuzzy C-means clustering are significantly improved.

Theoretically, after K-means clustering, the 100 training samples are divided into four clusters in this paper, and each submodel is trained by only about 20–30 samples. Although the training samples after clustering have higher pertinence for training submodels, the training sample number is reduced, which may have a negative influence on prediction accuracy. The field rock mass samples, which are hard to collect, are always only tens and hundreds of series. Therefore, harden clustering methods may result in too few samples for training submodels. Different from harden clustering methods, fuzzy C-means only changes the training weight of each sample, but the training sample number is not reduced. Therefore, in theory, fuzzy clustering methods are more suitable for improving the prediction accuracy of machine learning methods than harden clustering methods, especially when the number of training samples is limited, which has also been proved by the results presented in this section.

6 CONCLUSION

A clustered method learning method to predict rock mass parameters based on fuzzy C-means clustering is proposed in this paper. Using this method, target stratum samples are divided into several clusters by fuzzy C-means clustering, and multiple submodels are trained by samples in different clusters with the input of pretreated TBM tunneling data and the output of rock mass parameters. Each testing sample or newly encountered tunneling condition can be predicted by multiple submodels with the weight of the membership degree of the sample to each cluster. The proposed method is realized by 100 training samples and verified by 30 testing samples collected from the C1 part of the Pearl Delta water resources allocation project. The main conclusions are summarized as follows:

1.
After multiple attempts, the samples used in this paper are divided into four clusters, and the membership degree of each sample to each cluster is obtained. Training samples are weighted according to their membership degree to each cluster, and four corresponding BP neural network-based submodels are built and used separately on the 30 testing samples. According to the results, it is found that each submodel is good at predicting samples with high membership degree to the corresponding cluster, but the total accuracies are close to the accuracy of the model built without fuzzy C-means clusters, and still need to be improved.
2.
With the four submodels weighted, a hybrid BP neural network-based model is established. The average percentage errors of the 30 testing samples predicted by the hybrid model are 7.66% (UCS) and 6.40% (Jf), while the error of pure BP neural network models are 13.62% (UCS) and 12.38% (Jf). The results show that the hybrid model can make full use of the advantages of the four submodels predicting different samples, and the prediction accuracy and stability are improved compared with the model without fuzzy C-means clustering.
3.
The applicability of the fuzzy C-means to other machine learning algorithms, with SVR and RF as examples, is studied in this paper. Similar results are obtained on SVR and RF as the BP neural network. After being combined with fuzzy C-means clustering, the UCS and Jf percentage error predicted by SVR is reduced from 12.73% to 6.61% and from 10.23% to 6.17%, while those of RF are also reduced from 13.73% to 5.72% and from 15.47% to 7.09%, respectively. The results prove the effect of the fuzzy C-means clustering results in improving the prediction accuracy of multiple machine learning models, including but not limited to BP neural network, SVR, and RF.
4.
To compare the effect of harden and fuzzy clustering in improving the prediction accuracy of machine learning algorithms, a typical harden clustering method, K-means clustering, is used and combined with the mentioned three machine learning methods. According to the tested results, compared with pure machine learning models, the prediction accuracy of models combined with K-means clusters is improved slightly, but not significantly; the accuracy of some models (Jf prediction model based on SVR) is even slightly reduced. This proves that the fuzzy clustering methods are more suitable for combining with machine learning methods to improve their accuracy than harden clustering methods.

AUTHOR CONTRIBUTIONS

Ruirui Wang: Field data collection and main manuscript organization. Yaodong Ni: Proposed algorithm programming. Lingli Zhang: Result processing and image rendering. Boyang Gao: Inspection and correction of the proposed program.

ACKNOWLEDGMENTS

This research was supported by the Natural Science Foundation of Shandong Province (No. ZR202103010903) and the Doctoral Fund of Shandong Jianzhu University (No. X21101Z).

CONFLICT OF INTEREST STATEMENT

All author of this manuscript disclosed no relevant relationships.

Biography

Ruirui Wang was graduated from Shandong University in December 2020, with a doctorate in civil engineering. Currently, he is a lecturer in the Department of Intelligent Construction at the School of Civil Engineering, Shandong Jianzhu University, mainly engaged in research of tunnel construction and intelligent tunneling of TBM, covering multiple disciplines such as geotechnical engineering, artificial intelligence, mechanical manufacturing, and numerical simulation. So far, he has published 10 SCI papers, including six papers as the first author, and eight national invention patents have been approved in China.