Indiana University Bloomington

Computer Science B565
Data Mining

Contact: Mehmet Dalkilic
Offered: Spring, 2016
Class Time: 01:00pm-02:15pm
Class Days: M, W
Capacity: 50
Algebra Required: Basic.
Calculus Required: Basic.
Instructor: Predrag Radivojac
Days Per Week Offered: Two.
Recommended follow-up classes: CSCI-B555
Syllabus: No Syllabus Avaliable
Keywords: Prediction, clustering, association rule mining, data exploration, data visualization, anomaly detection.
Description: The course objective is to study algorithmic and practical aspects of discovering patterns and relationships in large databases. This course is designed to introduce basic concepts of data mining and also provide hands-on experience in data analysis, clustering and prediction. Data mining is a dynamic field that has wide applications to a number of scientific areas such as finance, life sciences, social sciences, or medicine. This is a core Computer Science course.

The course covers about 75% of the following topics, depending on the year:
  • basic concepts (introduction to data mining, origins of data mining, data mining tasks, relational databases, transactional databases, data warehouses)
  • data (types of data, data quality, similarity metrics, summary statistics, data preprocessing: cleaning, normalization, reduction, transformation, integration)
  • data warehouse and OLAP technology for data mining (multidimensional data model and OLAP operations, warehouse architecture, implementations and relationship with data mining)
  • association rule mining (basic concepts: frequent itemset generation, rule generation, apriori and FP-growth algorithms, advanced concepts: graph data, sequential patterns, infrequent patterns, concept hierarchies)
  • classification and regression algorithms (Bayesian classification, k-nearest neighbor, neural networks, classification and regression trees, support vector machines, ensemble methods, handling biased data, and class-imbalanced data)
  • clustering (partitioning methods: k-means and k-medoids, and hierarchical methods: agglomerative/divisive clustering; density-based, graph-based, prototype-based, and model-based clustering, clustering with constraints)
  • anomaly detection (statistical approaches to outlier detection, density-based, proximity-based, clustering-based techniques)
  • mining complex types of data (mining spatial, text, time-series and multimedia data, mining web data, mining graphs, mining streaming data)
  • human factors and social issues (ethics of data mining and social impacts, privacy-preserving data mining, user interfaces, data and result visualization)
  • Books: Textbook:
    Data Mining: Concepts and Techniques - by J. Han et al., Morgan Kaufmann 2006.

    Introduction to Data Mining - by P.-N. Tan et al., Pearson 2006.
    Applied/Theoretical: Applied.
    Formal Computing Lab: No
    Comments: The course has several programming tasks.