留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于BERT的金矿地质实体关系抽取模型研究

黄徐胜 朱月琴 付立军 刘雨江 唐珂珂 李金

黄徐胜, 朱月琴, 付立军, 等, 2021. 基于BERT的金矿地质实体关系抽取模型研究. 地质力学学报, 27 (3): 391-399. DOI: 10.12090/j.issn.1006-6616.2021.27.03.035
引用本文: 黄徐胜, 朱月琴, 付立军, 等, 2021. 基于BERT的金矿地质实体关系抽取模型研究. 地质力学学报, 27 (3): 391-399. DOI: 10.12090/j.issn.1006-6616.2021.27.03.035
HUANG Xusheng, ZHU Yueqin, FU Lijun, et al., 2021. Research on a geological entity relation extraction model for gold mine based on BERT. Journal of Geomechanics, 27 (3): 391-399. DOI: 10.12090/j.issn.1006-6616.2021.27.03.035
Citation: HUANG Xusheng, ZHU Yueqin, FU Lijun, et al., 2021. Research on a geological entity relation extraction model for gold mine based on BERT. Journal of Geomechanics, 27 (3): 391-399. DOI: 10.12090/j.issn.1006-6616.2021.27.03.035

基于BERT的金矿地质实体关系抽取模型研究

doi: 10.12090/j.issn.1006-6616.2021.27.03.035
基金项目: 

国家自然科学基金项目 41872253

国家重点研发计划项目 2018YFC1505501

中国地质调查局地质调查项目 DD20190318

详细信息
    作者简介:

    黄徐胜(1994-), 男, 在读硕士, 从事自然语言处理研究、人工智能在地质领域的应用。E-mail: huangxusheng18@mails.ucas.ac.cn

    通讯作者:

    朱月琴(1987-), 女, 正高级工程师, 从事地质大数据、人工智能在地学方面的研究及应用等工作。E-mail: yueqinzhu@163.com

  • 中图分类号: P628.4

Research on a geological entity relation extraction model for gold mine based on BERT

Funds: 

the National Natural Science Foundation of China 41872253

National Key Research and Development Program 2018YFC1505501

Geological Survey Project of China Geological Survey DD20190318

  • 摘要: 金矿实体关系的智能识别是提高金矿文献分析挖掘和知识提取的重要方法和途径。此次研究针对目前金矿实体关系抽取涉及到的核心问题,如金矿实体关系复杂、人工标注信息少等特点,提出了基于BERT(Bidirectional Encoder Representations from Transformer)的远程监督关系抽取模型。并通过金矿地质数据编码、金矿分类和金矿地质实体过滤等模块的优化改进,提高了金矿地质实体关系抽取的准确率。最后通过对金矿文献数据的实体关系抽取实验,验证了该方法的有效性。

     

  • 图  1  远程监督框架结构

    Figure  1.  Framework of the remote supervision

    图  2  远程监督关系抽取模型

    Figure  2.  Remotely supervised relation extraction model

    图  3  本体关系图

    Figure  3.  Ontology diagram

    图  4  实体关系类别

    Figure  4.  Categories of entity relation

    图  5  模型在NYT数据集上的PR图

    Figure  5.  PR graph of each model in NYT dataset

    图  6  模型在地质数据集上的PR图

    Figure  6.  PR graph of each model in geological dataset

    图  7  文章模型的抽取效果

    Figure  7.  Extraction effect of BERT model

    表  1  实验参数

    Table  1.   Experiment parameters

    参数名称 参数名称(英文) 符号 参数值
    批大小 Batch_size B 8
    学习率 Adam Learning_rate λ 2e-5
    批次 Number of epoch E 6
    随机丢弃率 Droupout rate P 0.1
    最大句子长度 Max sentence length ML 384
    下载: 导出CSV

    表  2  各种模型在NYT数据集上的抽取效果

    Table  2.   Extraction effect of the models in NYT dataset

    模型 接受者操作特征曲线下方面积大小(AUC) Top N项准确率(P@N/%) 平均准确率(Avg Prec/%)
    100 200 300 500 1000 2000 5000 Top 300 Top 1000 Top 5000
    DenseNet 0.34 81.0 69.5 68.7 61.4 51.6 39.5 22.4 73.1 66.4 56.3
    ResNet 0.10 54.0 50.0 48.0 43.0 31.0 19.0 9.9 50.7 45.2 36.4
    PCNN+ATT 0.32 74.0 67.5 64.3 59.8 48.7 37.2 22.3 68.6 62.9 53.4
    文章模型 0.65 98.0 96.0 94.3 92.6 91.1 80.9 67.0 96.1 94.4 88.6
    下载: 导出CSV

    表  3  各个方法在地质领域数据集上的抽取效果

    Table  3.   Extraction effect of the methods in geological dataset

    模型 接受者操作特征曲线下方面积大小(AUC) Top N项准确率(P@N/%) 平均准确率(Avg Prec/%)
    100 200 100 500 100 2000 100 Top300 Top 1000 Top5000
    DenseNet 0.40 88.0 54.5 39.7 33.8 23.5 15.8 8.0 60.7 47.9 37.6
    ResNet 0.34 70.0 52.5 40.3 30.6 23.6 15.3 7.8 54.3 43.4 34.3
    PCNN+ATT 0.60 99.0 81.0 63.3 50.0 30.5 18.3 8.1 81.1 64.8 50.0
    文章模型 0.75 100.0 100.0 99.3 98.4 98.6 96.1 93.1 99.8 99.3 97.9
    下载: 导出CSV
  • ALT C, HVBNER M, HENNIG L, 2019. Fine-tuning pre-trained transformer language models to distantly supervised relation extraction[C]//Proceedings of the 57th annual meeting of the association for computational linguistics. Florence, Italy: Association for Computational Linguistics: 1388-1398.
    BING X Y, SHEN L D, ZHENG L Y, 2019. A moderately deep convolutional neural network for relation extraction[C]//Proceedings of the 2019 11th international conference on machine learning and computing. New York, NY, USA: Association for Computing Machinery: 173-177.
    CAI Q, HAO J Y, CAO J, et al., 2018. Multi-level attention mechanism based distant supervision for relation extraction[J]. Journal of Chinese Information Processing, 32(1): 96-101. (in Chinese with English abstract)
    CAI Q, LI J, HAO J Y, 2019. Distant supervision relation extraction based on focal loss and residual network[J]. Computer Engineering, 45(12): 166-170. (in Chinese with English abstract)
    CHEN J P, LI J, XIE S, et al., 2017. China geological big data research status[J]. Journal of Geology, 41(3): 353-366. (in Chinese with English abstract) http://gateway.proquest.com/openurl?res_dat=xri:pqm&ctx_ver=Z39.88-2004&rfr_id=info:xri/sid:baidu&rft_val_fmt=info:ofi/fmt:kev:mtx:article&genre=article&jtitle=Journal%20of%20Geology&atitle=China%20geological%20big%20data%20research%20status
    DEVLIN J, CHANG M W, LEE K, et al., 2019. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies. Minneapolis, Minnesota: Association for Computational Linguistics: 4171-4186.
    FENG J, HUANG M L, ZHAO L, et al., 2018. Reinforcement learning for relation classification from noisy data[C]//Proceedings of the 32nd AAAI conference on artificial intelligence. Menlo Park, CA: AAAI: 5779-5786.
    GAO H, LIU Z, VAN DER MAATEN L, et al., 2017. Densely connected convolutional networks[C]//Proceedings of the 2017 IEEE conference on computer vision and pattern recognition. Honolulu, HI, USA: IEEE: 4700-4708.
    HOFFMANN R, ZHANG C L, LING X, et al., 2011. Knowledge-based weak supervision for information extraction of overlapping relations[C]//Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies. Portland, Oregon, USA: Association for Computational Linguistics: 541-550.
    HUANG Y Y, WANG W Y, 2017. Deep residual learning for weakly-supervised relation extraction[C]//proceedings of the 2017 conference on empirical methods in natural language processing. Copenhagen, Denmark: Association for Computational Linguistics: 1803-1807.
    LIN T Y, GOYAL P, GIRSHICK R, et al., 2017. Focal loss for dense object detection[C]//2017 IEEE international conference on computer vision (ICCV). Venice, Italy: IEEE: 2999-3007.
    LIN Y K, SHEN S Q, LIU Z Y, et al., 2016. Neural relation extraction with selective attention over instances[C]//Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers). Berlin, Germany: Association for Computational Linguistics: 2124-2133.
    LYU P F, WANG C N, ZHU Y Q, 2017. Study on geologic entity relation extraction method based on literature[J]. China Mining Magazine, 26(10): 167-172. (in Chinese with English abstract) http://en.cnki.com.cn/Article_en/CJFDTotal-ZGKA201710034.htm
    MINTZ M, BILLS S, SNOW R, et al., 2009. Distant supervision for relation extraction without labeled data[C]//Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 2-Volume 2. Stroudsburg, PA: Association for Computational Linguistics: 1003-1011.
    QIAN X M, LIU J Y, CHENG P S, 2020. Distant supervised relation extraction based on densely connected convolutional networks[J]. Computer Science, 47(2): 157-162. (in Chinese with English abstract)
    RIEDEL S, YAO L M, MCCALLUM A, 2010. Modeling relations and their mentions without labeled text[C]//Proceedings of the 2010 European conference on machine learning and knowledge discovery in databases. Berlin: Springer-Verlag: 148-163.
    SOARES L B, FITZGERALD N, LING J, et al., 2019. Matching the blanks: distributional similarity for relation learning[C]//Proceedings of the 57th annual meeting of the association for computational linguistics. Florence, Italy: Association for Computational Linguistics: 2895-2905.
    SONG M C, LI S Z, YI P H, et al., 2014. Classification and metallogenic theory of the Jiaojia-Style gold deposit in Jiaodong Peninsula, China[J]. Journal of Jilin University (Earth Science Edition), 44(1): 87-104. (in Chinese with English abstract) http://www.researchgate.net/publication/286230657_Classification_and_metallogenic_theory_of_the_Jiaojia-style_gold_deposit_in_Jiaodong_Peninsula_China
    TAN Y J, WEN M, ZHU Y Q, et al., 2017. Research on the big data characteristics of geological data[J]. China Mining Magazine, 26(9): 67-71, 84. (in Chinese with English abstract) http://en.cnki.com.cn/Article_en/CJFDTotal-ZGKA201709015.htm
    TANG C, NUO M H, HU Y, 2020. A hybrid model for relation extraction via ResNet & BiGRU[J]. Journal of Chinese Information Processing, 34(2): 38-45. (in Chinese with English abstract) http://www.sciencedirect.com/science/article/pii/S0165168420301262
    VASWANI A, SHAZEER N, PARMAR N, et al., 2017. Attention is all you need[C]//Proceedings of the 31st international conference on neural information processing systems. Red Hook, NY, USA: Curran Associates Inc. : 6000-6010.
    WANG Q S, ZHANG J H, YOU T, et al., 2021. Study on the multiple-element exploration method of ore beds in wells and gold exploration experiment in the area with thick cover: Taken Wuhe area in Northeast Anhui as anexample[J]. Geology and Exploration, 57(1): 136-145. (in Chinese with English abstract) http://www.researchgate.net/publication/352496172_Study_on_Optimal_Volumetric_Fracturing_Design_of_Horizontal_Tight_Oil_Wells_in_mathbfE_32_Pre-salt_Reservoir_in_Yingxi_Area_Qaidam_Basin
    XUE Y S, WANG R T, WANG C, et al., 2020. Ore-controlling rules of fault structures in the Wangjiaping gold deposit in Shanyang County, Shaanxi Province[J]. Journal of Geomechanics, 26(3): 391-404. (in Chinese with English abstract)
    YIH W T, CHANG M W, HE X D, et al., 2015. Semantic parsing via staged query graph generation: question answering with knowledge base[C]//Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers). Beijing, China: Association for Computational Linguistics: 1321-1331.
    ZENG D J, LIU K, CHEN Y B, et al., 2015. Distant supervision for relation extraction via piecewise convolutional neural networks[C]//Proceedings of the 2015 conference on empirical methods in natural language processing. Lisbon, Portugal: Association for Computational Linguistics: 1753-1762.
    ZHANG B Q, YANG Q H, ZHAO F Y, et al., 2020. The ore-bearing horizon and ore characteristics of gold deposits in the Emesishan basalt area of western Guizhou: A case study of the Jiadi gold deposite in Panxian County[J]. Geology and Exploration, 56(6): 1145-1157. (in Chinese with English abstract)
    ZHANG K, YANG X K, YU H B, et al., 2020. Analysis of ore-controlling structure in the Changgou gold deposit of the northern Hanyin gold orefield, southern Qinling Mountains[J]. Journal of Geomechanics, 26(3): 363-375. (in Chinese with English abstract)
    ZHANG X Y, YE P, WANG S, et al., 2018. Geological entity recognition method based on Deep Belief Networks[J]. Acta Petrologica Sinica, 34(2): 343-351. (in Chinese with English abstract) http://www.zhangqiaokeyan.com/academic-journal-cn_acta-petrologica-sinica_thesis/0201252011589.html
    ZHU Y Q, TAN Y J, WU Y L, et al., 2017. Research on semantic retrieval model towards geological big data[J]. China Mining Magazine, 26(12): 143-149. (in Chinese with English abstract)
    ZHU Y Q, ZHOU W W, XU Y, et al., 2017b. Intelligent learning for knowledge graph towards geological data[J]. Scientific Programming, 2017: 5072427.
    蔡强, 郝佳云, 曹健, 等, 2018. 采用多尺度注意力机制的远程监督关系抽取[J]. 中文信息学报, 32(1): 96-101. doi: 10.3969/j.issn.1003-0077.2018.01.013
    蔡强, 李晶, 郝佳云, 2019. 基于聚焦损失与残差网络的远程监督关系抽取[J]. 计算机工程, 45(12): 166-170. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJC201912025.htm
    陈建平, 李靖, 谢帅, 等, 2017. 中国地质大数据研究现状[J]. 地质学刊, 2017, 41(3): 353-366. https://www.cnki.com.cn/Article/CJFDTOTAL-JSDZ201703001.htm
    吕鹏飞, 王春宁, 朱月琴, 2017. 基于文献的地质实体关系抽取方法研究[J]. 中国矿业, 26(10): 167-172. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGKA201710034.htm
    钱小梅, 刘嘉勇, 程芃森, 2020. 基于密集连接卷积神经网络的远程监督关系抽取[J]. 计算机科学, 47(2): 157-162. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJA202002023.htm
    宋明春, 李三忠, 伊丕厚, 等, 2014. 中国胶东焦家式金矿类型及其成矿理论[J]. 吉林大学学报(地球科学版), 44(1): 87-104. https://www.cnki.com.cn/Article/CJFDTOTAL-CCDZ201401008.htm
    谭永杰, 文敏, 朱月琴, 等, 2017. 地质数据的大数据特性研究[J]. 中国矿业, 26(9): 67-71, 84. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGKA201709015.htm
    唐朝, 诺明花, 胡岩, 2020. ResNet结合BiGRU的关系抽取混合模型[J]. 中文信息学报, 34(2): 38-45. doi: 10.3969/j.issn.1003-0077.2020.02.005
    汪青松, 张金会, 尤淼, 等, 2021. 井中矿层多要素探测方法研究与厚覆盖区金矿勘查试验: 以皖东北五河地区为例[J]. 地质与勘探, 57(1): 136-145.
    薛玉山, 王瑞廷, 汪超, 等, 2020. 陕西省山阳县王家坪金矿断裂构造控矿规律[J]. 地质力学学报, 26(3): 391-404. https://www.cnki.com.cn/Article/CJFDTOTAL-DZLX202003010.htm
    张兵强, 杨清毫, 赵富远, 等, 2020. 贵州西部峨眉山玄武岩区金矿赋矿层位及矿石特征: 以盘县架底金矿为例[J]. 地质与勘探, 56(6): 1145-1157. https://www.cnki.com.cn/Article/CJFDTOTAL-DZKT202006004.htm
    张康, 杨兴科, 于恒彬, 等, 2020. 南秦岭汉阴北部金矿田长沟金矿区控矿构造解析[J]. 地质力学学报, 26(3): 363-375. https://www.cnki.com.cn/Article/CJFDTOTAL-DZLX202003008.htm
    张雪英, 叶鹏, 王曙, 等, 2018. 基于深度信念网络的地质实体识别方法[J]. 岩石学报, 34(2): 343-351. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXB201802011.htm
    朱月琴, 谭永杰, 吴永亮, 等, 2017. 面向地质大数据的语义检索模型研究[J]. 中国矿业, 26(12): 143-149. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGKA201712027.htm
  • 加载中
图(7) / 表(3)
计量
  • 文章访问数:  298
  • HTML全文浏览量:  67
  • PDF下载量:  19
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-11-20
  • 修回日期:  2021-01-10
  • 刊出日期:  2021-06-28

目录

    /

    返回文章
    返回