IDA - Intelligent Data Analysis Research Group

Projects

Running

SUPREME: SUstainable PREdictive Maintenance for manufacturing Equipment

2012 - 2015

The objective of SUPREME is to provide new tools for predictive maintenance, which will also reduce energy consumption. Our mission in the project is to apply machine learning algorithms to develop predictive models.

Principal investigator: 
Filip Železný
Participants: 
Radomír Černoch, Matěj Holec, Jiří Kléma
External number: 
FP7-314311
Internal number: 
602/122401C000/13133

Predicting protein properties with spatial statistical machine learning

2012 - 2014

(Czech Science Foundation) The project aims to create algorithms able to learn predictive models for protein functions from protein spatial structures complemented with sequential, phylogeny, interaction and expression data as well as evolutionarily conserved motif information. To this end we will develop novel statistical-relational learning algorithms leveraging the spatial nature of 3D protein data by spatially-inspired search heuristics (such as dynamic resolution switching), spatial reasoning, location-dependent evaluation of logical formulas (patterns), and sampling-based approximative integration of the latter over subspaces in the R^3 protein conformation space enabling active-site detection.

Principal investigator: 
Filip Železný
External number: 
P202/12/2032

Creating and using domain knowledge for classification in bioinformatics applications

2011 - 2013

(Czech Technical University internal grant) The project aims at developing methods for boosting predictive classification performance in gene expression data analysis by exploiting relational domain knowledge concerning properties of and relations among genes and their products. Such background knowledge is sourced from specialized databases or learned through machine learning algorithms.

Principal investigator: 
Filip Železný
External number: 
SGS11/155/OHK3/3T/13
Internal number: 
10-811550

Finished

SEVENPRO: Semantic Virtual Engineering Environment for Product Design

2006 - 2008

(European Commission IST STREP FP6-027473) SEVENPRO will develop technologies and tools supporting deep mining of product engineering knowledge from multimedia repositories and enabling semantically enhanced 3D interaction with product knowledge in integrated engineering environments. Our group's mission in SEVENPRO is to apply relational data mining algorithms to engineering repositories, mainly CAD data. Our project partners are Semantic systems (Spain), INRIA (France), IFF Fraunhofer (Germany), Living Solids (Germany), IDG (Italy), Estanda (Spain).

Principal investigator: 
Filip Železný
Participants: 
Petr Křemen
External number: 
027473-SEVENPRO
Internal number: 
80-06002/13133

Learning from Theories

2010 - 2012

(Czech Science Foundation) The goal of the project is to develop algorithms that accept multiple theories (possibly incomplete and/or mutually contradictory) along with observational data, and from these inputs produce a single theory that most likely captures the truth. The algorithms will combine first-order logic reasoning with statistical inference. Applications are envisioned in 1) data mining where scattered pieces of formal knowledge (such as rules in ontologies found on the web) are being put on equal footing with data samples, and in 2) learning in multi-agent systems.

Principal investigator: 
Filip Železný
Participants: 
Radomír Černoch
External number: 
103/10/1875
Internal number: 
13/100004/13133

METOD: MetaTool for Educational platform Design

2005 - 2006

(EU Leonardo da Vinci project) The rapid advancement in ICT including multimedia, virtual reality, intelligent systems and telecommunications (i.e. Internet) enabled the development of computer supported educational platforms (CSET) which can improve the quality, user friendliness and accessibility of training, education and lifelong learning. The specific aims of the projects are (a) to design a general computer supported educational platform development paradigm, (b) to develop a framework for integrating machine learning and machine intelligence in the educational platforms, enabling self adaptation according to special educators and trainees needs and characteristics, (c) to design a metatool enabling all kind of educators (including parents or senior family members) to develop, adapt and maintain their own educational platforms without great effort, (d) to provide courses, conferences and workshops for target groups on above three topics and (e) to disseminate the project achievements to wide audience using both ICT and conventional approaches.

Principal investigator: 
Jiří Kléma
Participants: 
Filip Karel, Jiří Kubalík, Jiří Kléma
External number: 
LDVX SI/04/B/F/PP/176004
Internal number: 
84-05001/13133

Predictive Data Modeling for Effective Gene Therapy and Bone Marrow Transplantation

2010 - 2012

(Czech Ministry of Education, joint project with the Univ. of Minnesota) The project aims at using machine learning algorithms to help improve the outcomes of novel gene therapy methods and bone marrow transplantation. A specific task of the project is to learn models for the prediction of interaction of zinc finger proteins with DNA using scalable relational machine learning algorithms.

Principal investigator: 
Filip Železný
Participants: 
Szabóová, Andrea
External number: 
ME10047
Internal number: 
18/100003/13133

Data Mining with Distributed Computing

2010 - 2011

(Czech Ministry of Education) An exchange project with the University of Mendoza aiming at the GRIDification of some of IDA's data mining algorithms, mainly in the tool XGENE.ORG.

Principal investigator: 
External number: 
MEB111005
Internal number: 
181-100003

ProLearn: Bridging the Gap between Systems Biology and Machine Learning

2009 - 2011

(Czech Science Foundation) This project aims at bridging the gap between systems biology and machine learning by devising novel algorithms able to propose biological theories by integrating and learning from multi-platform high-throughput data and background knowledge on the structure and dynamics of cellular processes such as signalling, metabolic and transcription pathways. The University of Minnesota Blood and Marrow transplant division is our primary consultant for this project. The project partly funds the ongoing development of the XGENE.ORG tool.

Principal investigator: 
Filip Železný
Participants: 
Matěj Holec, Jiří Bělohradský
External number: 
201/09/1665
Internal number: 
13/090080/13133

Transferring ILP techniques to SRL

2011 - 2012

(Czech Science Foundation) The projects aims to translate and exploit techniques and concepts established in inductive logic programming, such as redundancy or reducibility, to statistical relational learning.

Principal investigator: 
Filip Železný
Participants: 
Ondřej Kuželka
External number: 
103/11/2170
Internal number: 
13-110007

ML/BIO: Relational machine learning for analysis of biomedical data

2005 - 2009

(Czech Academy of Sciences) This project seeks to apply relational machine learning methods to biomedical databases. It covers an extensive set of both fundamental research goals (efficient inductive logic programming, propositionalization, ontology expoitation in machine learning) and applications (knowledge discovery from gene expression and gene ontology data as well as mining from clinical genetic data). This is a common project with the Nature Inspired Technologies group at the Gerstner lab. The 2nd medical faculty of the Charles University in Prague is a project partner.

Principal investigator: 
Olga Stepankova (NIT Group)
Participants: 
Monika Žáková, Filip Karel, Matěj Holec, Jiří Kubalík, Jiří Kléma, Filip Železný
External number: 
1ET101210513
Internal number: 
12-05001/13133

MLSC: Machine learning methods for solution construction in evolutionary algorithms

2008 - 2010

(Grant agency of the Czech Republic, Grant No. 102/08/P094) Evolutionary algorithms (EAs) are very popular optimization techniques since they are conceptually easy and since they do not require any prior knowledge of the problem. They usually construct new candidate solutions by random recombination of promissing solutions at hand. This method of solution construction is static and is given implicitly by the crossover and mutation operators used in the algorithm. For large problem instances, this approach leads to very slow progress. This project aims at the research of more sophisticated and intelligent ways of creating new candidate solutions (using machine learning methods) which would allow the algorithm to increase the probability of generating good individuals. The population diversity preservation shall also be taken into account since it is necessary for successful operation of the algorithms.

Principal investigator: 
Petr Pošík
Participants: 
Petr Pošík
External number: 
102/08/P094
Internal number: 
13/08008/13133

LeCoS: merging machine LEarning and COnstraint Satisfaction

2008 - 2010

(Czech Science Foundation, joint project with the Faculty of Mathematics and Physics, Charles University) This project aims to merge advanced techniques of two mature research fields: relational machine learning on one hand, and constraint satisfaction on the other hand. Firstly, advanced constraint satisfaction algorithms including randomized techniques will be employed at heart of inductive logic programming algorithms, mainly for subsumption testing. Secondly ILP algorithms will be used to learn new heuristics from solved CS problems such that these heuristics can be automatically deployed in larger, unsolved CS problem instances.

Principal investigator: 
Filip Železný
Participants: 
Ondřej Kuželka
External number: 
201/08/0509
Internal number: 
13/08012/13133

LOGenom: Logic-Based Machine Learning for Genomic Data Analysis

2005 - 2006

(Grant Agency of the Czech Academy of Sciences) In the current raise of interest in the research on mining from gene expression data by means of machine learning and data mining, logic-based relational machine learning (LBRML) algorithms receive little or no attention, which contrasts with their successes in related biological applications, their strong theoretical foundations, the availability of a plethora of implementations, and mainly the understandability and direct biological interpretability of their outputs. Their little penetration is due to the fact that in comparison to statistical approaches currently favored in this application field, LBRML exhibits insufficient robustness agains data imperfection, inefficiency in the attribute-rich genetic domains and insufficient uncertainty modeling features. This project aims to eliminate these deficiencies by incorporating stochastic inductive techniques into LBRML and demonstrate experimentally its power in genomic data mining.

Principal investigator: 
Filip Železný
Participants: 
Filip Železný
External number: 
KJB201210501
Internal number: 
12-05006/13133

RML/StatSearch: Methods of Statistical Search for Improving the Efficiency of Relational Machine Learning Algorithms

2005 - 2006

(Czech Ministry of Education [CZ costs]) Bilateral project with University of Wisconsin in Madison, dept. of Biostatistics. This project supports through travel funds the p/i in his collaboration with the US partner on Methods of Statistical Search for Improving the Efficiency of Relational Machine Learning Algorithms.

Principal investigator: 
Filip Železný
Participants: 
Filip Železný
External number: 
1P05ME755
Internal number: 
14-05003/13133

OntoExpres: Using Gene Ontologies and Annotations for the Interpretation of Gene Expression Data through Relational Machine Learning Algorithms

2007 - 2010

(Czech Ministry of Education - CZ expenses) Bilateral project with University of Minnesota (Dpt. of Pediatrics / Blood and Marrow Transplantation, P/I Jakub Tolar). The goal of the project is to establish a principled framework for the relational integration of gene expression, -ontology and -annotation data supporting all basic cases of relational inductive knowledge discovery tasks (with special emphasis on predictive classification, association discovery and outlier detection).

Principal investigator: 
Filip Železný
Participants: 
Filip Železný
External number: 
ME910
Internal number: 
18/07001/13133

Adding aggregation operators to the subgroup discovery systems

2010 - 2011

(SGS ČVUT) Most current methods for subgroup discovery use first-order features, which does not seem appropriate for certain type of applications. In order to overcome these shortcomings, this project focuses on integrating statistical methods into systems for subgroup discovery. The main objective is to adopt aggregation operators and to design algorithms, which could exploit them efficiently.

Principal investigator: 
Radomír Černoch
Participants: 
Radomír Černoch, Filip Železný

SA: State analysis of annotated gene sets in order to improve predictive classification of machine-learning methods on gene expression data

2010 - 2011

(SGS ČVUT) The most common approach for high-throughput gene-expression data analysis with predefined gene sets is based on functional enrichment methods like GSEA. This project aims at improving predictive accuracy of machine-learning algorithms on the data by exploiting natural modes of the predefined annotated gene sets. This is a key step necessary for wider using of machine-learning techniques which enable proposing biological theories based on data and background knowledge.

Principal investigator: 
holec
External number: 
SGS10/071/OHK4/1T/13
Internal number: 
10/800710/13133

SRLG: Statistical Relational Learning for Modelling Gene Expression Data

2010

(SGS ČVUT) The aim of this project is to develop a statistical relational learning (SRL) framework suitable for modelling gene expression data. Existing SRL frameworks are not appropriate for this task. Our envisioned SRL framework should combine ideas from Bayesian Logic Programming and Markov Logic Networks but unlike these frameworks it will aim primarily at modelling numerical data described using a subset of first order logic.

Principal investigator: 
Ondřej Kuželka
Internal number: 
10/800730/13133
Creative Commons License  Content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Czech Republic License.