Data Mining: Techniques and Applications
Course Description
This course will introduce data mining techniques, including frequent pattern and association rule mining, some basic background on classification and clustering, and applications of data mining techniques in specific domains. The emphasis will be on applications in specific domains rather than fundamental methodologies. IST 557 Data Mining: Techniques and Applications (3) The course will begin with an introduction of data mining field, including why data mining, what is data mining, what kinds of data can be mined, what kinds of patterns can be mined, an overview of technologies, the major issues in data mining, and a brief history of data mining community. The three key lecture topics are: (1) mining frequent patterns and association rules; (2) classification: basic concepts and techniques, and (3) cluster analysis: basic concepts. For topic (1), we will introduce frequent item set mining methods including Apriori and FPGrowth. We will also teach advanced frequent pattern mining methods such as pattern mining in multi-dimensional space, constraint-based frequent pattern mining, mining high-dimensional data, sequential pattern mining, and graph pattern mining. For topic (2), we will teach how to formulate a real-world problem into a classification problem, how to apply classification models on real data and how to analyze the results. The classification models covered in our class include decision tree, random forest, boosting, support vector machine and kernels, naïve bayes classifier, and KNN. Students will learn how to evaluate classification methods using different measures. We will be brief on the fundamental classification methods and will focus more on the applications of such methods on various kinds of data. ;For topic (3), we will cover the clustering topics including partitioning methods, hierarchical methods, density-based methods, grid-based methods and evaluation of clustering results. We will be brief on the fundamental clustering methods and will focus more on the applications of such methods on various kinds of data. Four weeks will be used for lectures on special topics such as text mining, time series mining, spatial data mining, graph mining, image mining, and emerging subjects in data mining. The purpose of the special topics is to help students learn about real-world data mining problems and applying state-of-the-art solutions to them. Instructor will select a few topics based on students’ project proposals. Instructor and students will work together on the literature survey and prepare for the presentation. Potential key special topics include: Mining text data. We will introduce basic preprocessing methods such as tokenization, stemming, and stopwords filtering and basic textual features such as tf-idf. We will teach text mining topics including sentiment analysis, topic modeling, and entity extraction. Mining temporal data. We will introduce basic techniques in mining temporal data, such as measuring time series similarity, periodicity analysis, and trend prediction. Mining spatial data. We will introduce basic spatial models, clustering of spatial locations, spatial outliers, co-location patterns, and location prediction. There will be five discussion classes. Instructor will use these classes to talk with individual students and teams, help them with the problems they encounter in assignments and projects, and better personalize the learning experience.