Data Mining
Graduate course, Quaid-i-Azam University, 2013
Offered: Fall 2013
Aims and Objectives
The purpose of this course is to give students a hands-on experience of data mining. The course covers basic topics in data mining: data processing, pattern mining, classification, and cluster analysis. To get maximum benefit out of this course, students will be provided with real life datasets to perform the data mining tasks learned during the course. After completing the course, students should be able to perform general data mining tasks on real life datasets.
Weekly Contents
- Introduction to course and data mining (data mining process, data mining frameworks, types of data, data mining functions, applications of data mining, issues in data mining)
- Introduction to Data mining (data mining functions, applications of data mining, issues in data mining), Data discovery (data objects, attributes, types of attributes, nominal, binary, ordinal, numeric, discrete, continuous, statistical descriptions of data, central tendency in data, dispersion in data, box-plots, histograms, quantile plots, q-q plots, scatter plots)
- Data visualization (visualization techniques, pixel oriented, geometric projections, icon-based, hierarchical, visualizing complex data)
- Data Similarity (similarity, dissimilarity, proximity, proximity measures for nominal, binary, ordinal, numeric data types, Jaccard co-efficient, coherence, z-score, mean absolute deviation, Minkowski distance, L1, L2, L-max distances, cosine similarity, proximity of mixed data types) Data preprocessing (data cleaning, data integration, correlation analysis)
- Data preprocessing (data reduction, wavelet transforms, principal component analysis, attribute selection, linear/multiple regression, histogram analysis, clustering, sampling, data transformation, discretization, binning, correlation analysis, concept hierarchies)
- Classification (introduction, decision trees, decision trees induction, information gain, Gini index, id3, c4.5), First Midterm Exam
- Classification (decision trees, AVC groups, bootstrapping, Bayes theorem, naïve Bayesian classifier, Laplacian correction)
- Classification (if-then rules, rules from Bayesian trees, sequential coverage), Model Evaluation (confusion matrix, accuracy, error rate, precision, recall, f-measure, hold-out method, cross validation, bootstrapping, statistical significance, t-test, ROC curves), classification ensembles (bagging, boosting, adaboost, random forest)
- Clustering (cluster analysis, applications, clustering quality, clustering challenges, clustering approaches, partitioning methods, k-means, k-medoids, PAM)
- Second Midterm Exam, Clustering (hierarchical methods, agglomerative clustering, divisive clustering, single linkage, complete linkage, average linkage, mean linkage, clustering features, BIRCH algorithm, Chameleon algorithm, probabilistic hierarchical clustering)
- Clustering (Density based clustering, DBSCAN, OPTICS, DENCLUE)
- Clustering (Grid based methods, STING, CLIQUE) clustering evaluation (measuring clustering tendency using Hopkins statistic, determining number of clusters, measuring clustering quality, intrinsic and extrinsic methods)
- Mining Frequent Patterns (market basket analysis, frequent item sets, closed item sets, association rules, mining frequent item sets, Apriori algorithm, FPGrowth, mining item sets in vertical data format)
- Mining Frequent Patterns (evaluating quality of frequent item sets, correlation analysis, null-invariant correlation methods)
- Applications of data mining in social media
Textbook(s)
- Data Mining Concepts and Techniques (3rd edition) by Jiawei Han, Micheline Kamber, and Jian Pei (2011)