Knowledge discovery (KDD) Process
A view from database system
- Data cleaning 数据清洗:去除噪声和不一致数据
- Data integration 数据集成:合并多数据源
- Data selection 数据选择:选择与任务相关的数据。
- Data transformation 数据变换:将数据转换或统一成适合挖掘的形式
- Data mining 数据挖掘
- Pattern evaluation 模式评估:根据某种兴趣度度量,识别代表知识的真正有趣的模式
- Knowledge presentation 知识表示
A view from ML
Input data -> Data Pre-Processing -> Data Mining -> Post-Processing -> Knowledge
Data Pre-Processing: Data integration、Normalization、Feature selection、Dimension reduction
Post-Processing:Pattern evaluation、Pattern selection、Pattern interpretation、Pattern visualization
Data mining 分类
Data mining function
- Generalization
- Association and Correlation Analysis 关联分析
- Classification 分类
- Cluster Analysis 聚类分析
- Outlier Analysis 离群点分析
Confluence of Multiple Discipline(学科)
Machine Learning、Pattern Recognition、 Statistics、Visualization、High-Performance Computing、Database Technology、Algorithm、Applications、Information retrieval
Major Issues
- Mining Methodology
- Mining various and new kinds of knowledge
- Mining knowledge in multi-dimensional space
- Data mining: An interdisciplinary effort 跨学科的努力
- Boosting the power of discovery in a networked environment
- Handling noise, uncertainty, and incompleteness of data
- Pattern evaluation and pattern- or constraint-guided mining
- User Interaction
- Interactive mining
- Incorporation(合并) of background knowledge
- Presentation and visualization of data mining results
- Efficiency and Scalability
- Efficiency and scalability of data mining algorithms
- Parallel, distributed, stream, and incremental mining methods
- Diversity(多种) of data types
- Handling complex types of data
- Mining dynamic, networked, and global data repositories
- Data mining and society
- Social impacts of data mining
- Privacy-preserving data mining
- Invisible(看不见的) data mining