Knowledge Discovery from Data Bases

Code: CC5015     Acronym: KDD     Level: 500

Keywords
Classification Keyword
OFICIAL Informatics

Instance: 2022/2023 - 1S

Active? Yes
Responsible unit: Department of Computer Science
Course/CS Responsible: Doctoral Program in Computer Science - MAP joint programme

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
PDMAPI 8 Official Study Plan since 2020/2021 1 - 6 30 162

Teaching language

English

Objectives

At the end of the semester the students should be able to:

  1. Formulate a decision problem as a data mining problem;
  2. Identify the basic tasks in knowledge discovery from data bases;
  3. Identify and use the main methods in solving data mining problems;
  4. Apply the main methods and algorithms for each mining task;
  5. Apply the  main  methods  and  algorithms  in  real-world  problems  and adapt to new contexts

Learning outcomes and competences

Knowledge of how to formulate a problem as a problem of knowledge extraction. Ability to apply methods / algorithms to a new problem of data analysis, and evaluate the results and understand the functioning of the methods studied.

Working method

Presencial

Program


  • Introductory Concepts


– Introduction to Knowledge Discovery in Data Bases


∗ From OLAP to On-Line Analytical Mining;


∗ Data Mining tasks;


– Cluster Analysis


∗ Cluster Analysis:  concepts and methods;


∗ Partitioning and Hierarchical Methods;


– Association Analysis


∗ Frequent pattern mining;


∗ Frequent Sequence mining;


– Predictive Data Mining:  Classification and Regression.


∗ Optimization Methods:  Artificial Neural Networks;  Support Vector Machines.


∗ Probabilistic Methods:  Bayesian Classifiers;


∗ Search based Methods:  Decision Trees and Rules.


– Evaluation in Predictive Data Mining.


∗ Evaluation:  goals and perspectives;


∗ Loss Functions and Cost-benefit analysis;


∗ Bias-Variance analysis;


– Ensembles and Multiple Models


∗ Concepts and methods;


∗ Combining Homogeneous Models;


∗ Combining Heterogeneous models;



  • Advanced Topics


– Social Network Analysis


∗ Concepts and methods;


∗ Evolution of Networks;


– Text Mining


∗ Concepts and methods;


∗ Information retrieval;


∗ Document classification;


– Web Mining and Link Analysis


∗ Concepts and methods;


∗ Web and Structure mining;


∗ Link analysis;


– Big Data and Data stream Mining


∗ Big Data:  Applications and tools


∗ Concepts and methods;


∗ Summarizing data streams;


∗ Knowledge discovery from data streams;


– Data Mining Standards and Processes

Mandatory literature

J. Gama, A. Carvalho, K. Faceli, A. Lorena, M. Oliveira; Extração de Conhecimento de Dados - Data Mining, Sílabo, 2012
Jiawei Han e Micheline Kamber; Data Mining, Concepts and Techniques, Morgan Kaufmann, 2006
J. Gama; Knowledge Discovery from Data Streams, CRC Press, 2010

Teaching methods and learning activities

The teaching method consists of theoretical-practical classes. 

Evaluation Type

Distributed evaluation without final exam

Assessment Components

designation Weight (%)
Participação presencial 10,00
Trabalho prático ou de projeto 90,00
Total: 100,00

Amount of time allocated to each course unit

designation Time (hours)
Elaboração de projeto 28,00
Estudo autónomo 28,00
Frequência das aulas 28,00
Total: 84,00

Eligibility for exams

Submit assignment

Calculation formula of final grade

The evaluation consists of home-works.