矿山行业大模型建设路径探索与应用展望

TP18; 煤炭是保障能源安全的压舱石.在当前加快发展数字经济、积极稳妥推进"双碳"目标的背景下,煤炭行业亟需深化数字化转型与智能化建设.在此背景下,探索引入大模型技术赋能煤炭行业应用,充分利用行业海量知识数据,加快推动煤炭行业的数字化发展,已成为行业关注的焦点.基于此,梳理了通用大模型技术的发展现状,阐述了大模型技术在多领域的应用现状与成效,介绍了数据处理(清洗、平衡、增强等)、文本分词、预训练与微调、提示词优化、向量嵌入、对齐、检索增强生成等行业大模型关键技术,表明了行业大模型在继承通用大模型"通"的优势的同时又兼具"专"的特点,...

Full description

Saved in:
Bibliographic Details
Published in煤炭科学技术 Vol. 52; no. 11; pp. 45 - 59
Main Author 王海军
Format Journal Article
LanguageChinese
Published 煤炭智能开采与岩层控制全国重点实验室,北京 100013 01.11.2024
煤炭科学研究总院有限公司矿山人工智能研究院,北京 100013
天地科技股份有限公司,北京 100013
Subjects
Online AccessGet full text
ISSN0253-2336
DOI10.12438/cst.2024-1382

Cover

Abstract TP18; 煤炭是保障能源安全的压舱石.在当前加快发展数字经济、积极稳妥推进"双碳"目标的背景下,煤炭行业亟需深化数字化转型与智能化建设.在此背景下,探索引入大模型技术赋能煤炭行业应用,充分利用行业海量知识数据,加快推动煤炭行业的数字化发展,已成为行业关注的焦点.基于此,梳理了通用大模型技术的发展现状,阐述了大模型技术在多领域的应用现状与成效,介绍了数据处理(清洗、平衡、增强等)、文本分词、预训练与微调、提示词优化、向量嵌入、对齐、检索增强生成等行业大模型关键技术,表明了行业大模型在继承通用大模型"通"的优势的同时又兼具"专"的特点,在推动行业生产力革新和产业升级方面发挥着重要作用.深度剖析了大模型技术在煤炭行业应用面临研发投入成本高、高质量数据搜集难度大、多模态数据融合技术难度高等挑战,从基础设施层、数据资源层、算法模型层、应用服务层、安全可信与测试层、行业生态层六方面详细总结了太阳石矿山大模型为应对上述挑战采取的建设路径以及取得的阶段性成效,最后对大模型技术的发展给煤炭行业带来的生产与技术变革进行了展望,指出矿山行业大模型建设应遵循开源模型与行业数据相结合的路径,发挥大模型的工具属性以赋能业务场景、构建"产-学-研-用"相结合的应用生态,助力矿山行业新质生产力的发展.
AbstractList TP18; 煤炭是保障能源安全的压舱石.在当前加快发展数字经济、积极稳妥推进"双碳"目标的背景下,煤炭行业亟需深化数字化转型与智能化建设.在此背景下,探索引入大模型技术赋能煤炭行业应用,充分利用行业海量知识数据,加快推动煤炭行业的数字化发展,已成为行业关注的焦点.基于此,梳理了通用大模型技术的发展现状,阐述了大模型技术在多领域的应用现状与成效,介绍了数据处理(清洗、平衡、增强等)、文本分词、预训练与微调、提示词优化、向量嵌入、对齐、检索增强生成等行业大模型关键技术,表明了行业大模型在继承通用大模型"通"的优势的同时又兼具"专"的特点,在推动行业生产力革新和产业升级方面发挥着重要作用.深度剖析了大模型技术在煤炭行业应用面临研发投入成本高、高质量数据搜集难度大、多模态数据融合技术难度高等挑战,从基础设施层、数据资源层、算法模型层、应用服务层、安全可信与测试层、行业生态层六方面详细总结了太阳石矿山大模型为应对上述挑战采取的建设路径以及取得的阶段性成效,最后对大模型技术的发展给煤炭行业带来的生产与技术变革进行了展望,指出矿山行业大模型建设应遵循开源模型与行业数据相结合的路径,发挥大模型的工具属性以赋能业务场景、构建"产-学-研-用"相结合的应用生态,助力矿山行业新质生产力的发展.
Abstract_FL Coal is the cornerstone for energy security.In the current background of accelerating the development of the digital economy and actively and steadily promoting the"dual carbon"goal,the coal industry urgently needs to deepen digital transformation and intelli-gent construction.In this background,exploring the introduction of large model technology to empower coal industry applications,mak-ing full use of the industry's massive knowledge data,and accelerating the digital development of the coal industry has become the focus of industry attention.Based on this,this paper sorts out the development status of generative large model technology,expounds the applic-ation status and effectiveness of large model technology in multiple fields,introduces the key technologies of industry large model such as data processing(cleaning,balancing,enhancement,etc.),text tokenization,pre-training and fine-tuning,prompt word optimization,vector embedding,alignment,retrieval enhancement generation and other large model technologies,and demonstrates that the industry large model inherits the advantages of the general large model of"general"and at the same time has the characteristics of"specialization".This paper deeply analyzes the challenges of high R&D investment cost,difficulty in collecting high-quality data,and high difficulty in mul-timodal data fusion technology in the application of large model technology in the coal industry,and summarizes in detail the construction path and phased results achieved by SolStone Mine Large Model to cope with the above challenges from six aspects:infrastructure layer,data resource layer,algorithm model layer,application service layer,security and trustworthiness and testing layer,and industry ecological layer,and finally looks forward to the production and technological changes brought by the development of large model technology to the coal industry.It is pointed out that the construction of large models in the mining industry should follow the path of combining open ac-cess models and industry data,give full play to the tool attributes of large models to the application in scenarios,and build an application ecology combining"production-learning-research-application",so as to help the development of new quality productivity in the mining in-dustry.
Author 王海军
AuthorAffiliation 煤炭科学研究总院有限公司矿山人工智能研究院,北京 100013;煤炭智能开采与岩层控制全国重点实验室,北京 100013;天地科技股份有限公司,北京 100013
AuthorAffiliation_xml – name: 煤炭科学研究总院有限公司矿山人工智能研究院,北京 100013;煤炭智能开采与岩层控制全国重点实验室,北京 100013;天地科技股份有限公司,北京 100013
Author_FL WANG Haijun
Author_FL_xml – sequence: 1
  fullname: WANG Haijun
Author_xml – sequence: 1
  fullname: 王海军
BookMark eNrjYmDJy89LZWAQMzTQMzQyMbbQTy4u0TMyMDLRNTS2MGJh4DQwMjXWNTI2NuNg4C0uzkwyMDU0NjcxNDDjZDB7Pn__040bXyzsebJj1tMly5-tWPh0XvfT3bterNv3Yvv6p_tanvUter5l0ZMdfU93TXk-ZcXTjVOfzZnNw8CalphTnMoLpbkZQt1cQ5w9dH383T2dHX10iw1B9psD7TAyTTM1SEmxTDE0tTA3M0u2NLQwT0tLTDE1ME1NSTZNNTFJS7NMSk2xtEwytDQ2BApaGFokGSelJKWmGXMzqEPMLU_MS0vMS4_Pyi8tygPaGJ9bkl2RVQzypaGhgYGJMQBeLV2D
ClassificationCodes TP18
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2B.
4A8
92I
93N
PSX
TCJ
DOI 10.12438/cst.2024-1382
DatabaseName Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
DocumentTitle_FL Construction exploration and application prospect of the large model in mining industry
EndPage 59
ExternalDocumentID mtkxjs202411004
GrantInformation_xml – fundername: (国家重点研发计划); (天地科技股份有限公司科技创新创业资金专项重点资助项目); (天地科技股份有限公司科技创新创业资金专项重点资助项目)
  funderid: (国家重点研发计划); (天地科技股份有限公司科技创新创业资金专项重点资助项目); (天地科技股份有限公司科技创新创业资金专项重点资助项目)
GroupedDBID -02
2B.
4A8
5XA
5XC
92H
92I
93N
ABJNI
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CDRFL
CW9
GROUPED_DOAJ
PSX
TCJ
TGT
U1G
U5L
ID FETCH-LOGICAL-s1024-710625f50dd9d158766c9187ffad505edc5e44ff9bed99b19315ed818b3bdbef3
ISSN 0253-2336
IngestDate Thu May 29 04:07:34 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 11
Keywords 太阳石矿山大模型
大规模预训练模型
矿山行业大模型
知识标签体系
SolStone Mine Large Model
the large scale pre-trained model
检索增强生成
retrieval enhancement generation
the large model in mining industry
knowledge labeling system
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s1024-710625f50dd9d158766c9187ffad505edc5e44ff9bed99b19315ed818b3bdbef3
PageCount 15
ParticipantIDs wanfang_journals_mtkxjs202411004
PublicationCentury 2000
PublicationDate 2024-11-01
PublicationDateYYYYMMDD 2024-11-01
PublicationDate_xml – month: 11
  year: 2024
  text: 2024-11-01
  day: 01
PublicationDecade 2020
PublicationTitle 煤炭科学技术
PublicationTitle_FL Coal Science and Technology
PublicationYear 2024
Publisher 煤炭智能开采与岩层控制全国重点实验室,北京 100013
煤炭科学研究总院有限公司矿山人工智能研究院,北京 100013
天地科技股份有限公司,北京 100013
Publisher_xml – name: 煤炭科学研究总院有限公司矿山人工智能研究院,北京 100013
– name: 煤炭智能开采与岩层控制全国重点实验室,北京 100013
– name: 天地科技股份有限公司,北京 100013
SSID ssib051374106
ssj0037581
ssib001105251
ssib012291398
ssib036204842
Score 2.4013667
Snippet TP18; 煤炭是保障能源安全的压舱石.在当前加快发展数字经济、积极稳妥推进"双碳"目标的背景下,煤炭行业亟需深化数字化转型与智能化建设.在此背景下,探索引入大模型技术赋能...
SourceID wanfang
SourceType Aggregation Database
StartPage 45
Title 矿山行业大模型建设路径探索与应用展望
URI https://d.wanfangdata.com.cn/periodical/mtkxjs202411004
Volume 52
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: Directory of Open Access Journals (DOAJ)
  issn: 0253-2336
  databaseCode: DOA
  dateStart: 20210101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.doaj.org/
  omitProxy: true
  ssIdentifier: ssj0037581
  providerName: Directory of Open Access Journals
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NaxQxFA9tvehB_MRv9mBOMrrJZGaSY7I7pQh6aqG3MrMzoyiu4G5Bevam9FQvKkqLYE_KgmCpUP-ZTts_w5eX7DqtBT9gGd6-yftI3k7yy2zyQsjNrMhZrwzzAGbLPBAVjwMZJTwoWZblVY_lbczEdO9-PLcg7i5Gi1PT3caqpeVhfru3cuy-kv-JKvAgrnaX7D9EdqIUGEBDfOEKEYbrX8WYpglVs9TM0jSihtlPKqlmVHZoKqiRVGl7SwuqE5rGVONd4KiUSoNShhqNUik1qSVMQrVTCGWElZIp1dzaMgIJ1AxMW0ZTJdANYZU7N1RkpVSHKtPEvraYRGcswanuWgIcU-gSfNUxmtN2-YXToCcLj1HIeR1TE1k3QUjGYyP-zQUXfgvf-Ld2nFUQUr7aMqSmi3530KqiMrGfQ5UEKeXrBhp8izj70F4xEpGvvzKo0OnpeqMGxaGJod3hltYYIeSAYxxp4ChnV1PducUQNzf6aR6FAQ9dHpfxoBLx5sPDGkOEy57pwYbLhv7bMMYFZq3vDexqX9twoTuh6Uhq8CfDx88fDWwJTPw3TU5wGNvajRcLCIqZPatw0oszzm0q2EmvG9oTCaSYgNyIhYAy7X_vDs-EMKPEcyfH1fSpT62Pdw55iPve-lXWf9CAaPNnyGk_t2pp96CcJVMrD8-RU42Mm-dJvP_hRz0aHay_2t16U3_8tLe5Xr9_WX_fPvi8c_DtS73zYm91Y__rxu7War29tr-2WY9e7717e4EszKbznbnAHx0SDJh1B3AzTOyrqF0UqmARDPlxTzGZVFVWAOYvi15UClFVKi8LpXKYxTBgAnjNw7zIyyq8SGb6T_vlJdLiSVKxMrO9FqCKNlCJgqvISinCOCsuk5av9ZLvGgZLRwJz5c9FrpKTvx6Ra2Rm-Gy5vA5wd5jfwGj-BI5ofZE
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E7%9F%BF%E5%B1%B1%E8%A1%8C%E4%B8%9A%E5%A4%A7%E6%A8%A1%E5%9E%8B%E5%BB%BA%E8%AE%BE%E8%B7%AF%E5%BE%84%E6%8E%A2%E7%B4%A2%E4%B8%8E%E5%BA%94%E7%94%A8%E5%B1%95%E6%9C%9B&rft.jtitle=%E7%85%A4%E7%82%AD%E7%A7%91%E5%AD%A6%E6%8A%80%E6%9C%AF&rft.au=%E7%8E%8B%E6%B5%B7%E5%86%9B&rft.date=2024-11-01&rft.pub=%E7%85%A4%E7%82%AD%E6%99%BA%E8%83%BD%E5%BC%80%E9%87%87%E4%B8%8E%E5%B2%A9%E5%B1%82%E6%8E%A7%E5%88%B6%E5%85%A8%E5%9B%BD%E9%87%8D%E7%82%B9%E5%AE%9E%E9%AA%8C%E5%AE%A4%2C%E5%8C%97%E4%BA%AC+100013&rft.issn=0253-2336&rft.volume=52&rft.issue=11&rft.spage=45&rft.epage=59&rft_id=info:doi/10.12438%2Fcst.2024-1382&rft.externalDocID=mtkxjs202411004
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fmtkxjs%2Fmtkxjs.jpg