职业与健康 ›› 2021, Vol. 37 ›› Issue (1): 92-96.DOI: 10.13329/j.cnki.zyyjk.20201028.003

• • 上一篇    下一篇

基于多维度数据融合的大学生体质预警模型构建

  

  1. 南京航空航天大学体育部,江苏 南京 210016

  • 收稿日期:2020-07-13 接受日期:2020-09-21 出版日期:2021-01-01 发布日期:2021-03-03
  • 作者简介:寇磊,男,讲师,在读博士研究生,主要从事体育教育与健康促进工作。
  • 基金资助:
    中央高校基本科研业务费专项资金资助项目(NR2018033);江苏省社会科学基金后期资助项目(19HQ021)

Construction of college students' physique early warning model based on multi-dimensional data fusion

  1. Sports Department, Nanjing University of Aeronautics and Astronautics, Nanjing Jiangsu, 210016, China

  • Received:2020-07-13 Accepted:2020-09-21 Online:2021-01-01 Published:2021-03-03

摘要: 目的 提出构建大学生体质预警模型的思路,通过对比不同模型在预测方面的性能效果,帮助大学生对各体测项目等级进行合理的预测,从而达到预警目的。 方法 在收集南京航空航天大学 2015—2018 年体测数据的基础上,运用数据挖掘技术,经过数据理解和数据准备,采用随机森林算法、梯度提升树算法以及神经网络算法三种机器学习算法来进行模型搭建及评估。 结果 从对男生最终体质等级的预测效果看,准确率在 90%~97%之间,梯度提升树算法准确率始终优于其他两种,当训练集比例是 80%时,达到最高预测率为 96.19%;从对女生最终体质等级的预测效果看,随机森林算法和梯度提升树算法方法准确率在 90%~96%之间,而神经网络算法准确率在 80%~87%之间。 当训练集比例为 80%时,梯度提升树算法取得最高的准确率 95.06%。 结论 构建大学生体质预警模型是可行的,对最终体质等级的预测准确率可以达到 93%以上,其中梯度提升树算法的性能最佳。

关键词: 体质健康, 数据挖掘, 特征工程, 机器学习

Abstract: Objective To put forward the idea of constructing an early warning model for college students' physique, help college students to make reasonable predictions on each physical test item level by comparing the performance effects of different models in forecasting, so as to achieve the purpose of early warning. Methods On the basis of collecting the 2015-2018 physical measurement data of Nanjing University of Aeronautics and Astronautics, using data mining technology, after data understanding and data preparation, three machine learning algorithms such as Random Forest Algorithm, Gradient Boosting Decision Tree Algorithm and Neural Network Algorithm were used to build the model and evaluation. Results From the perspective of the prediction effect on the final physical grade of boys, the accuracy rate was between 90% and 97%,and the accuracy rate of the Gradient Boosting Decision Tree method was always better than the other two. When the training set ratio was 80%,the highest prediction rate reached 96.19%. From the perspective of predicting the final fitness level of girls,the accuracy rates of Random Forest and Gradient Boosting Decision Tree methods were between 90% and 96%,while the accuracy rates of Neural Networks methods were between 80% and 87% . When the training set ratio was 80%, the Gradient Boosting Decision Tree method achieved the highest accuracy rate of 95.06%. Conclusion It is feasible to construct an early warning model for college students' physique,and the prediction accuracy of the final physique grade can reach more than 93%, of which the Gradient Boosting Decision Tree method has the best performance.

Key words: Physique health, Data mining, Feature engineering, Machine learning