职业与健康 ›› 2023, Vol. 39 ›› Issue (20): 2819-2825.

• 论著 • 上一篇    下一篇

基于图数据库的流调数据轨迹追踪技术的建立

何海艳1, 王钰铭2, 王永馨1, 张国平1, 赵莹1, 吴伟慎1   

  1. 1.天津市疾病预防控制中心传染病预防控制所,天津 300011;
    2.天津医科大学公共卫生学院,天津 300007
  • 收稿日期:2023-02-19 修回日期:2023-03-14 发布日期:2026-03-26
  • 通信作者: 吴伟慎,主任医师,E-mail:wuweishen@live.cn
  • 作者简介:何海艳,女,副主任医师,主要从事传染病预防与控制工作。
  • 基金资助:
    天津市卫生健康科技项目(ZC20019); 天津市医学重点学科(专科)建设项目(TJYXZDXK-050A); 天津市公共卫生科技重大专项(21ZXGWSY00010)

Establish of trace tracking technology of streaming data based on graph database

HE Haiyan1, WANG Yuming2, WANG Yongxin1, ZHANG Guoping1, ZHAO Ying1, WU Weishen1   

  1. 1. Infectious Disease Prevention Control Room,Tianjin Centers for Disease Control and Prevention,Tianjin 300011,China;
    2. School of Public Health,Tianjin Medical University,Tianjin 300007,China
  • Received:2023-02-19 Revised:2023-03-14 Published:2026-03-26
  • Contact: WU Weishen,Chief physician,E-mail:wuweishen@live.cn

摘要: 目的 利用重大突发公共卫生事件流调数据中的感染者活动轨迹信息,实现感染者活动轨迹的可视化。方法 基于Neo4j图数据库的深度挖掘流调数据的方法,利用2020年初天津市某远郊区重大突发公共卫生事件中感染者流调数据,使用NLP领域的文本分类和命名实体识别等相关深度学习技术,将流调数据拆解为感染者、感染类型、地点、地点类型、活动事件及活动事件类型6类实体及其实体关系,实现对感染者活动轨迹的追踪,辅助划定风险区域。结果 本研究选择任意时间段内60例感染者的聚集度,最终形成节点和关系数据7 390条数据,其中节点1 643条,关系5 747条,得出的结果与实际聚集区域基本一致。经过2022年天津市某近郊区疫情数据进一步校验,根据流调数据中感染者活动轨迹和活动范围,可生成感染者活动热力图与官方封控的区域基本吻合。结论 感染者的热力图分布可以评估疫情可能波及的范围,辅助划定风险区域,有利于进一步精准的流行病学调查和政府的精准管控。

关键词: 重大突发公共卫生事件, Neo4j图数据库, 轨迹追踪, 可视化分析

Abstract: Objective To realize the visualization of the activity trajectory of infected persons by the activity trajectory information of infected persons in the major public health emergencies surveillance data. Methods Based on the method of deep mining of Neo4j graph database,the traffic survey data of major public health emergencies infected persons in outer suburb of Tianjin in early 2020 were used to implement data mining. Using text categorization and named entity recognition in the NLP deep learning technology,the traffic survey data were decomposed into six types of entities and their entity relationships,including infection,infection type,location,location type,activities and activities type,in order to track the movement track of the infected person and assist in delineating the risk area. Results In this study,the aggregation degree of 60 infected persons in any time period was selected,and finally 7 390 pieces of node and relationship data were formed(including 1 643 nodes and 5 747 relationships),and the results were basically consistent with the actual clustering area. Furthermore,the data of epidemic situation in near suburb of Tianjin in 2022 were further verified. According to the data of epidemic survey,the activity locus and activity range of infected persons could be generated,and the activity heatmap of infected persons was basically consistent with the area sealed off by the government. Conclusion The distribution of the heat map of infected persons can be used to assess the possible spread of the epidemic situation and help to delineate the risk areas,which is beneficial to the further accurate epidemiological investigation and the precise control of the government.

Key words: Major public health emergencies, Neo4j diagram database, Trajectory tracking, Visual analysis

中图分类号: