Abstract:
Structural equation modeling (SEM) is a method of establishing, estimating and testing causality. It can replace multiple regression, path analysis, factor analysis, covariance analysis and other methods to clearly analyze the effect of individual indicators on the overall and the relationship between individual indicators. SEM is a multivariate statistical modeling technology mainly applied to confirmatory factor analysis model. Due to the advantages of measuring latent variable scores through observable variables and analyzing the synergistic effects between latent variables using different sub-models, SEM is widely used in data modeling and analysis in the fields of psychology, behavior, and marketing. It provides a mature application path of proposing the concept-designing the model-obtaining data-verifying the model. Geoscience data modeling technology has always been one of the hotspots in geoscience research, the purpose of which is to extract valuable model structures and latent variables from massive, multi-dimensional, high-dimensional, and multi-temporal geo-data, and to study different geo-variables and interactive relationship between latent variables so as to support related applications and research such as environmental governance, disaster prevention, resource prospecting, and ecological evaluation. With the changes in the scale of geoscience data and the continuous development of modeling tools, the geoscience data modeling have gradually changed from sampling to full-sample, the method from under the guidance of geological models to unconstrained/weak-constrained modeling, the basis from variable causality to variable correlation, and the complexity from single model/single process to comprehensive multi-model/multi-process. SEM is a comprehensive modeling method, which can include multiple analysis techniques such as factor analysis, latent variable estimation, path analysis, etc. This multi-level, multi-branch modeling method combines the characteristics of knowledge-driven modeling and data-driven modeling. SEM generally faces the following three challenges, also three changes, in the modeling of geoscience data: from a method mainly oriented to confirmatory modeling and analysis to an exploratory modeling and analysis method; from a construction with complete geological model constraints to a weak model/unconstrained geological data modeling method; from a modeling of statistical variables without spatial attributes to a modeling of spatial statistical variables. This puts forward new requirements on the model itself and the method of data modeling. In response to the above three issues, this article reviews the concept and development of SEM, and introduces three application cases of SEM in geological data modeling.One is using lake sediment geochemical data to extract mineralization endogenous factors in gold mines which is modeled under weak constraints.The second is using the comprehensive parameter optimization method of SEM to weaken and correct CI problem of weight of evidence in the calculation of the posterior probability of gold prospecting by matching the posterior probability and the observation posterior probability. The third is using SEM to study the forest protection strategy of the Magdalena watershed in Mexico.By numbering the forest blocks in different regions, the spatial distribution of the data is transformed into traditional statistical variables without spatial attributes, and the impact of different environmental strategies on forest protection is analyzed.