基于机器学习的衰老基因特征选择与分类

Machine Learning-Based Aging Gene Feature Selection and Classification

  • 摘要: 设计了基于机器学习的衰老基因特征选择与分类实验,作为面向智能医学工程等专业的“机器学习基础”课程的实验内容。该实验通过将衰老基因映射至基因本体来获得实验数据集,利用特征选择方法处理基因本体内的冗余性,并使用朴素贝叶斯、支持向量机等分类模型来实现衰老基因分类。实验采用Python语言、Scikit-learn框架实现。除了框架内置的特征选择方法外,基于该数据集的统计特性和测试样本的独特性,定制设计了一个层次特征选择方法,用以更好地消除特征间的层次冗余。实验结果表明,有效的特征选择方法可以显著改善衰老基因分类的结果。

     

    Abstract: A machine learning-based aging gene feature selection and classification experiment is designed as the experimental content of the “Machine Learning Basics” course for intelligent medical engineering and other majors. In this experiment, the data set is obtained by mapping aging genes to gene ontology, feature selection methods are used to deal with feature redundancy in gene ontology, and classification models such as naive Bayesian and support vector machines are used to classify aging genes. The experiment is implemented with Python language and Scikit-learn framework. In addition to the built-in methods of the framework, a hierarchical feature selection method based on the statistical properties of the data and the uniqueness of the test sample is designed to eliminate the hierarchical redundancy among features. Experimental results show that effective feature selection methods can significantly improve the results of aging gene classification.

     

/

返回文章
返回