Data mining chap 1

Knowledge discovery (KDD) Process

A view from database system

  1. Data cleaning 数据清洗:去除噪声和不一致数据
  2. Data integration 数据集成:合并多数据源
  3. Data selection 数据选择:选择与任务相关的数据。
  4. Data transformation 数据变换:将数据转换或统一成适合挖掘的形式
  5. Data mining 数据挖掘
  6. Pattern evaluation 模式评估:根据某种兴趣度度量,识别代表知识的真正有趣的模式
  7. Knowledge presentation 知识表示

A view from ML

Input data -> Data Pre-Processing -> Data Mining -> Post-Processing -> Knowledge

Data Pre-Processing: Data integration、Normalization、Feature selection、Dimension reduction
Post-Processing:Pattern evaluation、Pattern selection、Pattern interpretation、Pattern visualization

Data mining 分类

Data mining function

  1. Generalization
  2. Association and Correlation Analysis 关联分析
  3. Classification 分类
  4. Cluster Analysis 聚类分析
  5. Outlier Analysis 离群点分析

Confluence of Multiple Discipline(学科)

Machine Learning、Pattern Recognition、 Statistics、Visualization、High-Performance Computing、Database Technology、Algorithm、Applications、Information retrieval

Major Issues

  • Mining Methodology
    • Mining various and new kinds of knowledge
    • Mining knowledge in multi-dimensional space
    • Data mining: An interdisciplinary effort 跨学科的努力
    • Boosting the power of discovery in a networked environment
    • Handling noise, uncertainty, and incompleteness of data
    • Pattern evaluation and pattern- or constraint-guided mining
  • User Interaction
    • Interactive mining
    • Incorporation(合并) of background knowledge
    • Presentation and visualization of data mining results
  • Efficiency and Scalability
    • Efficiency and scalability of data mining algorithms
    • Parallel, distributed, stream, and incremental mining methods
  • Diversity(多种) of data types
    • Handling complex types of data
    • Mining dynamic, networked, and global data repositories
  • Data mining and society
    • Social impacts of data mining
    • Privacy-preserving data mining
    • Invisible(看不见的) data mining