CHOSUN

머신러닝을 활용한 비탈면 안정성 평가 모델에 관한 연구

Metadata Downloads
Author(s)
노정두
Issued Date
2022
Abstract
In Korea, where more than 70% of the land is mountainous, the slopes are vulnerable due to topographical and meteorological influences. Traditional methods for determining slope stability include the theoretical method, statistical method, and numerical method. Recently, technologies such as artificial intelligence and big data have been introduced, and many studies using them have been conducted. However, it is considered that most studies are difficult to use generally because the number of data and variables is small. Therefore, this study suggests a prediction model for slope stability using more than 30,000 slope data and XGBoost, LightGBM, and CatBoost, which have been awarded and certified for excellence in various artificial intelligence competitions. For the purpose, first, input variables and output variable were selected through data pre-processing and statistical verification. Second, to understand the characteristics of slope investigation and to improve the performance of the prediction model for slope stability, analysis cases were set according to the data types of the input variables. Third, prediction models for slope stability using XGBoost, LightGBM, and CatBoost were compared. Finally, by summarizing the results, the prediction model for slope stability was cross-checked and proposed.
Slope data that were composed of 84 variables can be divided into basic survey, which are objective, and detailed surveys, which are subjective, Among these data, 31 input variables and 1 output variable were selected by performing outlier removal, binning, correlation analysis, statistical verification, and logistic regression. For carrying out analysis and comparison of feature importance and prediction model for slope stability, analysis steps were divided into Case-1, Case-2, and Case-3 using input data. Case-1 was composed of 9 numerical data, and Case-2 consisted of 22 categorical data. Case-3 was formed of 31 numerical and categorical data. For each analysis step and machine learning model, training performance, AUC, prediction performance, and feature importance were estimated. The ratio of training data and test data was set to 7:3, and Bayesian optimization was used for hyper-parameter tuning. To prevent overfitting, k-fold cross-validation was used.
As a result of analyzing the prediction model for slope stability using XGBoost, AUC was found to be 0.668 for Case-1, 0.771 for Case-2, and 0.770 for Case-3. An analysis of the training and prediction performance for Case-1, Case-2, and Case-3 obtained good results. However, prediction imbalance occurred in precision and recall. Feature importance in Case-1 was similar to most of variables except for berm and valley. Feature importance in Case-2 was relatively higher in topography, bedrock shape, stability method 2, and weathering than other variables. Feature importance in Case-3 was higher in slope length, angle of upper slope, and soil depth than other variables.
As a result of analyzing the prediction model for slope stability using LightGBM, AUC was found to be 0.681 for Case-1, 0.766 for Case-2, and 0.783 for Case-3. An Analysis of the training and prediction performance for Case-1, Case-2, and Case-3 obtained the favorable results. Additionally, prediction imbalance did not occur. Feature importance in Case-1 was similar most of variables except for berm and valley. Feature importance in Case-2 was relatively higher bedrock shape and topography than other variables. Feature importance in Case-3 was higher in soil depth, angle of upper slope, slope angle, slope length, slope height, and distance from the road than other variables.
As a result of analyzing the prediction model for slope stability using CatBoost, AUC was found to be 0.672 for Case-1, 0.777 for Case-2, and 0.794 for Case-3. An analysis of the training and prediction performance for Case-1, Case-2, and Case-3 obtained the highest results. Feature importance in Case-1 was higher in soil depth, angle of upper slope, and slope height than other variables. Feature importance in Case-2 was higher in soil depth, the direction of discontinuities, the shape of slope side, and type of discontinuities than other variables.
Although XGBoost showed good performance in the training and prediction, there was an imbalance in precision and recall. Feature importance was found to be relatively higher in the numerical data than in the categorical data using XGBoost. LightGBM showed good performance in the training and prediction. However, prediction imbalanced in precision and recall did not occur. Feature importance was found to be relatively higher in the numerical data than in categorical data using LightGBM. CatBoost showed the best performance in the training and prediction, but overfitting with train data occurred. Feature importance was found to be relatively higher in the categorical data than in the numerical data.
Putting together, based on the prediction performance of the model, it was found that categorical data played an important role in evaluating slope stability based on prediction performance than numerical data. Still, it was also likely to cause overfitting in training performance. Therefore, it is considered that XGBoost, LightGBM, and CatBoost can be used all as prediction models for slope stability. but among them, LightGBM is deemed to be able to predict slope stability most stably, and CatBoost is considered to predict slope stability most accurately.
Alternative Title
Model for stability evaluation on a slope using machine learning
Alternative Author(s)
Jeongdu Noh
Affiliation
조선대학교 일반대학원
Department
일반대학원 첨단에너지자원공학과
Advisor
강성승
Awarded Date
2022-08
Table Of Contents
차례 ⅰ
List of tables ⅳ
List of figures ⅵ
Abstract ⅹ

1. 서론 1
1.1 연구배경 및 목적 1
1.2 비탈면 안정성에 관한 연구 동향 5

2. 분석 모형 10
2.1 머신러닝 10
2.1.1 머신러닝 알고리즘 10
2.1.2 분류 모델의 평가 지표 16
2.1.3 초매개변수 최적화 20
2.2 XGBoost 23
2.2.1 XGBoost 이론적 배경 23
2.2.2 XGBoost 학습 파라미터 33
2.3 LightGBM 34
2.3.1 LightGBM 이론적 배경 34
2.3.2 LightGBM 학습 파라미터 41
2.4 CatBoost 42
2.4.1 CatBoost 이론적 배경 42
2.4.2 CatBoost 학습 파라미터 49

3. 데이터 분석 50
3.1 비탈면 조사자료 50
3.2 범주형 데이터의 변수 구간화 및 인코딩 57
3.2.1 변수 구간화 57
3.2.2 변수 레이블링 67
3.3 수치형 데이터의 이상치 제거 및 스케일링 68
3.3.1 이상치 제거 68
3.3.2 스케일링 70
3.4 교차 분석 72
3.5 상관관계 분석 75
3.6 해석 단계 설정 77

4. 연구 결과 84
4.1 XGBoost 예측 모델 84
4.1.1 XGBoost 비탈면 안정성 예측 모델 초매개변수 84
4.1.2 XGBoost 비탈면 안정성 예측 모델 : Case-1 85
4.1.3 XGBoost 비탈면 안정성 예측 모델 : Case-2 89
4.1.4 XGBoost 비탈면 안정성 예측 모델 : Case-3 93
4.1.5 XGBoost 비탈면 안정성 예측 모델 결과 97
4.2 LightGBM 예측 모델 100
4.2.1 LightGBM 비탈면 안정성 예측 모델 초매개변수 100
4.2.2 LightGBM 비탈면 안정성 예측 모델 : Case-1 101
4.2.3 LightGBM 비탈면 안정성 예측 모델 : Case-2 105
4.2.4 LightGBM 비탈면 안정성 에측 모델 : Case-3 109
4.2.5 LightGBM 비탈면 안정성 예측 모델 결과 113
4.3 CatBoost 예측 모델 116
4.3.1 CatBoost 비탈면 안정성 예측 모델 초매개변수 116
4.3.2 CatBoost 비탈면 안정성 예측 모델 : Case-1 117
4.3.3 CatBoost 비탈면 안정성 예측 모델 : Case-2 121
4.3.4 CatBoost 비탈면 안정성 예측 모델 : Case-3 125
4.3.5 CatBoost 비탈면 안정성 예측 모델 결과 129

5. 토의 132
5.1 머신러닝 방법에 따른 비탈면 안정성 예측 모델 결과 132
5.2 비탈면 조사자료의 시각화를 통한 비탈면 관리 방안 148

6. 결론 151

참고문헌 154
Degree
Doctor
Publisher
조선대학교 대학원
Citation
노정두. (2022). 머신러닝을 활용한 비탈면 안정성 평가 모델에 관한 연구.
Type
Dissertation
URI
https://oak.chosun.ac.kr/handle/2020.oak/18517
http://chosun.dcollection.net/common/orgView/200000632366
Appears in Collections:
General Graduate School > 4. Theses(Ph.D)
Authorize & License
  • AuthorizeOpen
  • Embargo2022-08-26
Files in This Item:

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.