具有缺失值及异常值的时间序列处理与再筛选机制

A Processing and Re-Screening Method for Time Series with Missing Values and Outliers

  • 摘要: 多维时间序列数据应用广泛,但会因缺失或异常值的出现,导致数据不可靠。该文提出了多维时间序列数据处理与再筛选机制(MTSM)方法。该方法基于缺失值的Transformer填补,结合3 \mathrm\sigma 法与箱型图检测、分层修正异常值,并依据数据类型应用多尺度模糊熵、边界混合重采样及高斯混合聚类采样,对填补和修正后的数据进行再筛选。基于世界卫生组织的COVID-19数据进行了对比分析,结果表明MTSM方法在不同缺失率及异常率下均优于GRU、RNN和LATC,在精度与鲁棒性方面也表现突出。

     

    Abstract: Multidimensional time series data is widely used, but it can be unreliable due to missing or outlier values. A multidimensional time series data processing and re-screening mechanism (MTSM) method is proposed in this paper. This method is based on Transformer-based imputation of missing values, combined with 3σ method and box plot detection, hierarchical correction of outliers, and applies multi-scale fuzzy entropy, boundary mixture resampling and Gaussian mixture clustering sampling according to data types to re-screen the filled and corrected data. A comparative analysis was conducted based on COVID-19 data from the World Health Organization, and the results showed that the MTSM method outperformed GRU, RNN, and LATC at different missing and abnormal rates, and also demonstrated outstanding accuracy and robustness.

     

/

返回文章
返回