An LCN-Based Medical Knowledge Base Question Answering Model

Man-fu MA; Yuan-zhe LIU; Yong LI; Xia WANG; Hai JIA; Yan-bin SHI; Xiao-kang ZHANG

doi:10.13718/j.cnki.xdzk.2020.10.004

2020 Volume 42 Issue 10

Article Contents

Previous Article Next Article

Man-fu MA, Yuan-zhe LIU, Yong LI, et al. An LCN-Based Medical Knowledge Base Question Answering Model[J]. Journal of Southwest University Natural Science Edition, 2020, 42(10): 25-36. doi: 10.13718/j.cnki.xdzk.2020.10.004

Citation:

Man-fu MA, Yuan-zhe LIU, Yong LI, et al. An LCN-Based Medical Knowledge Base Question Answering Model[J]. Journal of Southwest University Natural Science Edition, 2020, 42(10): 25-36. doi: 10.13718/j.cnki.xdzk.2020.10.004

An LCN-Based Medical Knowledge Base Question Answering Model

1.
College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
2.
The People's Hospital of Gansu Province, Lanzhou 730000, China
3.
College of Pharmacy, Lanzhou University, Lanzhou 730000, China
4.
Lanzhou Qidu Data Technology CO., Ltd., Lanzhou 730070, China

More Information

Received Date: 29/09/2020
Available Online: 20/10/2020
MSC: TP391

Abstract

Word segmentation in the Chinese medical field is difficult, resulting in inadequate extraction of medical problem features by the existing algorithms. This paper proposes a medical knowledge base question answering model based on LCN (Lattice Convolutional Neural Network) for the characteristics of Chinese word segmentation. First, 15 000 electronic medical records provided by a first-class hospital at Grade 3 are used to train medical word vectors with the Glove model. Then, a large number of medical nouns and their intra-relations are obtained through major medical websites to construct a medical knowledge map. The relationship words in the knowledge graph are extracted and, combined with the trained word vectors, the relationship vectors are obtained. Finally, the medical word vector is used as the model input and the LCN neural network is used to extract the medical problem features. The model is trained by calculating the similarity of the question features and relationship vectors. Experiments show that the accuracy rate of the LCN model is as high as 89.0%, which is an improvement of 2% compared with similar question answering models.
- medical knowledge base question answering,
- Glove,
- LCN (Lattice Convolutional Neural Network),
- medical knowledge graph,
- electronic medical record

References

[1]	BEN ABACHA A, DEMNER-FUSHMAN D. A Question-entailment Approach to Question Answering [J]. BMC Bioinformatics, 2019, 20(1): 511-520. doi: 10.1186/s12859-019-3119-4 CrossRef Google Scholar
[2]	ZHANG S, ZHANG X, WANG H, et al. Multi-Scale Attentive Interaction Networks for Chinese Medical Question Answer Selection [J]. IEEE Access, 2018(6): 74061-74071. Google Scholar
[3]	DENG W, GUO P P, YANG J D. Medical Entity Extraction and Knowledge Graph Construction [C]//2019 16^th International Computer Conference on Wavelet Active Media Technology and Information Processing, IEEE, 2019: 41-44. Google Scholar
[4]	LI X, LIU H Y, ZHAO X, et al. Automatic Approach for Constructing a Knowledge Graph of Knee Osteoarthritis in Chinese [J]. Health Information Science and Systems, 2020, 8(1): 1-8. Google Scholar
[5]	CHAI X Q. Diagnosis Method of Thyroid Disease Combining Knowledge Graph and Deep Learning [J]. IEEE Access, 2020(8): 149787-149795. Google Scholar
[6]	YUAN J B, JIN Z W, GUO H, et al. Constructing Biomedical Domain-specific Knowledge Graph with Minimum Supervision [J]. Knowledge and Information Systems, 2020, 62(1): 317-336. doi: 10.1007/s10115-019-01351-4 CrossRef Google Scholar
[7]	LIU H I, NI C C, HSU C H, et al. Attention Based R&CNN Medical Question Answering System in Chinese [C]//2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, 2020: 341-345. Google Scholar
[8]	NGUYEN V, KARIMI S, XING Z C. ANU-CSIRO at MEDIQA 2019: Question Answering Using Deep Contextual Knowledge [C]//Proceedings of the 18^th BioNLP Workshop and Shared Task, Florence, Italy. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019: 478-487. Google Scholar
[9]	ZHANG S, ZHANG X, WANG H, et al. Chinese Medical Question Answer Matching Using End-to-End Character-Level Multi-Scale CNNs [J]. Applied Sciences, 2017, 7(8): 767-775. doi: 10.3390/app7080767 CrossRef Google Scholar
[10]	ZOU Y, HE Y, LIU Y. Research and Implementation of Intelligent Question Answering System Based on Knowledge Graph of Traditional Chinese Medicine [C]//2020 39^th Chinese Control Conference (CCC), IEEE, 2020: 4266-4272. Google Scholar
[11]	ZHU W, NI Y, XIE G T, et al. The Dr-KGQA System for Automatically Answering Medication Related Questions in Chinese [C]//2019 IEEE International Conference on Healthcare Informatics (ICHI), IEEE, 2019: 1-6. Google Scholar
[12]	SADID A H, ZHAO S Y, VIVEK D, et al. Clinical Question Answering using Key-Value Memory Networks and Knowledge Graph [C]//TREC, 2016. Google Scholar
[13]	ZHANG Y Y, QIAN S S, FANG Q, et al. Multi-modal Knowledge-aware Hierarchical Attention Network for Explainable Medical Question Answering [C]//Proceedings of the 27^th ACM International Conference on Multimedia, Nice France. New York, NY, USA: ACM, 2019: 1089-1097. Google Scholar
[14]	LAI Y X, FENG Y S, YU X H, et al. Lattice CNNs for Matching Based Chinese Question Answering [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33: 6634-6641. doi: 10.1609/aaai.v33i01.33016634 CrossRef Google Scholar
[15]	YUSUF N, YUNUS M A M, WAHID N, et al. Enhancing Query Expansion Method Using Word Embedding [C]//2019 IEEE 9^th International Conference on System Engineering and Technology (ICSET), IEEE, 2019: 232-235. Google Scholar
[16]	CRAFT R C, LEAKE C. The Pareto Principle in Organizational Decision Making [J]. Management Decision, 2002, 40(8): 729-733. doi: 10.1108/00251740210437699 CrossRef Google Scholar
[17]	EL-GANAINY N O, BALASINGHAM I, HALVORSEN P S, et al. On the Performance of Hierarchical Temporal Memory Predictions of Medical Streams in Real Time [C]//2019 13^th International Symposium on Medical Information and Communication Technology (ISMICT), IEEE, 2019: 1-6. Google Scholar
[18]	WANG D Y, SU J L, YU H B. Feature Extraction and Analysis of Natural Language Processing for Deep Learning English Language [J]. IEEE Access, 2020(8): 46335-46345. Google Scholar
[19]	YIN W P, YU M, XIANG B, et al. Simple Question Answering by Attentive Convolutional Neural Network [EB/OL]. [2020-08-26]. https: //arxiv. org/abs/1606. 03391. Google Scholar
[20]	WANG X, DU Y T, LI X L, et al. Embedded Representation of Relation Words with Visual Supervision [C]//2019 Third IEEE International Conference on Robotic Computing (IRC), IEEE, 2019: 409-412. Google Scholar
[21]	IHM S Y, LEE J H, PARK Y H. Skip-Gram-KR: Korean Word Embedding for Semantic Clustering [J]. IEEE Access, 2019(7): 39948-39961. Google Scholar
[22]	WANG Z, HUANG Z, GAO J. Chinese Text Classification Method Based on BERT Word Embedding [C]//Proceedings of the 2020 5^th International Conference on Mathematics and Artificial Intelligence, 2020: 66-71. Google Scholar

Access History

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(9) / Tables(5)

Export Citation

PDF

XML

Article Metrics

Article views(1344) PDF downloads(167) Cited by(0)

Access History

Other Articles By Authors

on this site
on Google Scholar

HTML

随着医学领域信息资源的日渐丰富，消费者需要专门解决方案来适应健康相关信息的异质性和特点^[1].在线医疗保健社区可以为用户提供远程医疗支持，既给用户带来了便利，又有助于积累大量的数据.同时，与数据量的爆炸性增长相比，医生的数量相当有限^[2].医疗问答可将患者提出的问题进行整合、分析，利用机器学习算法训练智能问答模型，再利用其自动解答患者的疑问，从而减少医生的工作量.

本文采用知识问答模型构建医疗问答系统解决上述问题，知识问答模型不同于传统的基于文档的问答模型. DBQA(Document-based question answering，文档问答系统)采用自然语言表达方式进行提问，返回包含着答案的文档，用户需要阅读已存在的文档发现相关答案.而KBQA(Knowledge Base Question Answering，知识问答系统)通过理解问句的意图，利用人工定制的句法解析树，将自然语言处理为Select语句，查询数据库可直接返回答案.所以，医学知识问答模型需要人工对句法解析树不断添加新词汇和映射机制，成本过大.近些年基于短文本匹配的知识问答模型逐渐发展起来，但我国的医疗知识问答模型仍存在着一些挑战和限制.首先，我国还没有现有的中文医疗知识问答库.其次，针对中文领域的医学词库的构建尚不成熟，没有较好的中文分词工具处理专业领域文本.由于医学文本的分词效果不佳，导致现有研究诸如疾病预测或医疗问答问题中利用深度学习模型很难提取到医学文本的特征.

针对以上问题，本文通过各大医疗网站爬取大量医疗问题与疾病常识，将疾病常识存储为“实体—关系—实体”形式的医疗知识图谱^[3].根据医疗问题和其对应的医疗关系，构建“问题—关系”一对一的医疗知识问答库.本文还利用LCN模型提取问题特征，LCN中的格子可以提取问句中的所有分词情况并把它们转为特征向量，充分概括问句的特征信息，解决因为分词工具不成熟所导致的特征提取模糊.基于LCN的问答模型将医疗问答转换为一个选择最佳关系的文本匹配问题，并根据匹配提高问题与标签答案的相似度从而构建问答模型.当新问题输入时，根据新问题的特征，选取相似度最高的对应答案.无需人工定制句法分析树，节省人工成本.最终通过实验，LCN模型准确率可达89.0%，比同类知识问答模型准确率高出2%.

医疗知识问答模型需要先构建医疗知识图谱，近年来多位学者利用不同的方法构建了医疗知识图谱，并结合知识图谱解决医疗问题. Li X等利用膝骨关节炎患者的电子病历文本构建医学知识图谱^[4]，以支持诸如知识检索和决策之类的智能医学应用，并促进医学资源的共享. Chai X Q提取生物医学实体之间的关系以构建生物医学知识图谱^[5]，并利用知识图谱嵌入方法将知识图谱中的实体和关系转换为低维连续向量.最后，将已知的病理疾病关系数据用于训练双向长短期记忆网络(Bi-STLM)的疾病诊断模型. Yuan J B等利用弱监督的方法提取医疗文本中的实体与关系词构建医疗知识图谱^[6].现有研究主要利用自然语言处理的方法抽取电子病历信息构建知识图谱，抽取到的实体或关系存在部分错误.本文利用爬虫方法对于医疗网站中的实体和关系进行抽取，并存储于Neo4j数据库中，有着较高的准确性.

构建知识图谱成功后，需要将医疗问题和其对应的知识进行匹配，并计算两者的相似度构建医疗知识问答模型.现有的医学问答系统主要分为文本问答系统和知识问答系统.近些年，文本问答系统已有多位学者进行了研究，Liu H I等^[7]提出了一种基于CNN的自我注意嵌入式模型的中医问答系统，利用LSTM分析问题特征，并通过CNN的卷积核获取特征图，最终通过CNN的池化层提高模型的准确性. Nguyen V等^[8]利用文本推断的方法识别具有相似语义的问题，改进了自然语言推理和问题蕴含的方法，进一步完善了医学问答，提出了结合开放领域和生物医学领域以改善语义理解和语义消歧的问答系统MEDIQA. Zhang S等^[9]利用答案匹配的方法，提出了一种端到端的字符级多尺度卷积神经框架cMedQA，使用CNN从不同比例的问题或者答案中提取上下文信息，进而通过相似度的计算完成医疗问题与答案的匹配.

文本问答系统返回非结构化文本答案，是概念性片面化的文本，更适用于回答“为什么会患病”之类的问题，需要人们从散乱的答案之中进一步分析获取自己想要知道的信息，不能直接满足人们的需求.然而现有医疗问答系统则会直接通过分析医疗问题，从知识库中获取和问题相关的实体词，并通过自然语言处理方法构成包含答案的简短语句，人们可从系统返回的答案中直接获得所需信息，更适用于“所患什么疾病” “吃什么药”等问题的答案，显然知识问答系统更适用于医疗领域.

现有的医疗知识问答系统对医疗知识普及以及医生的临床用药决策有着重要的意义和参考价值，近年来人们提出了利用神经网络方法计算问题与答案的相似度. Zou Y等^[10]以中医药领域为基础，以中医药网站《本草纲目》的开源数据为数据源，建立了中药知识图谱，根据知识图谱实现自动答疑和辅助处方的功能. Zhu W等^[11]提出问答系统Dr-KGQA，利用bilstm-crf提取医学文本中的实体和关系，构建医疗知识图谱，并使用text-CNN将医疗问题和图谱中的关系词进行匹配，构建问答模型. Sadid A H等^[12]基于T-Know中医药知识图谱服务系统，抽取电子病历中的三元组，并在知识图谱的基础上，开发了用于单个问题理解和多轮对话的深度学习算法. Zhang Y Y等^[13]构建了一种多模态知识感知层次注意网络MKHAN，通过利用多模态知识图谱解决医学问题，通过组合实体结构、语言学和视觉信息来生成路径，并通过利用MKG路径中的顺序依存关系来推断问答互动的基本原理.

基于相似度方法采用了相对统一的RDF表示知识图谱，并且把语义理解的结果映射到知识图谱的本体后生成SPARQL查询解答问题系统，通过本体可将用户问题映射到基于概念拓扑图标识的查询表达式中，相当于知识图谱中的子图，基于相似度算法不断完善对用户问题特征的提取，以便于找出问题到知识图谱子图的最合理映射.上述方法虽然很全面地抓取了问题中的信息，但并未解决针对专业领域的分词困难问题，Lai Y X等^[14]提出了LCN模型，通过单词格抓取语言问题中的多粒度信息提高匹配的准确性，本文利用LCN的多粒度抓取特征方法解决医疗专业分词困难的问题，进而训练医疗知识问答模型.

3. 总结

本文通过爬取医疗网站信息构建医学知识图谱，并进一步构建了医学知识问答库，利用Glove模型将电子病历训练为医学词向量，将医学词向量作为输入端，利用LCN模型提取问题的特征，并且计算问题特征和答案之间的相似度，进而训练模型完成医疗问答，不仅省去了传统问答模型人工定义规则这一过程，而且在实验中通过与其它问答模型对比，效果优于其它深度学习模型.但是LCN还存在以下不足：

1) LCN的核心是利用CNN中的卷积核提取问题特征，但是忽略了问题的时序特征.

2) 模型对于“问题—关系”的1对1模式准确率较高，但是对于一个问题多个答案的1对n模式，其问答表现不佳.

后续将RNN入到模型中，提升对问题时序性特征的关注，并且针对1对n的问答情况更新模型.

Figure (9) Table (5) Reference (22)

Name
	Name cannot be empty!
E-mail
	Mailbox cannot be empty! Mailbox cannot be empty!
Telephone
	Mobile number cannot be empty! Please enter a valid mobile number!
Title

Content
Verification Code

Message Board

An LCN-Based Medical Knowledge Base Question Answering Model

Abstract

References

Access History

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Access History

Other Articles By Authors