Events

  • Conferences :
    • WSBioComp, Workshop on Bioinformatics and computational biology, Tunis, Tunisia
    • PAKDD 2013, The 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Gold Coast, Australia
    • HSB 2012, First International Workshop on Hybrid Systems and Biology Colocated with CONCUR 2012 Newcasle upon Tyne, UK, September 3, 2012
    • MCEB 2012, Mathematical and Computational Evolutionary Biology June 18-22, 2012, Hameau de l'Etoile (France)
    • DILS 2012, Eighth International Conference on Data Integration in the Life Sciences June 28-29, 2012 University of Maryland, College Park, MD, USA
  • PhD Thesis Defense of Sabeur Aridhi:
    • Title: Distributed Subgraph Mining in the Cloud
    • Place: ISIMA, Campus des Cézeaux (Clermont-Ferrand), Room E05
    • Date: Friday November 29th at 9am.
    • Jury members:
      Reviewers:
      • Pr. Anne LAURENT, LIRMM, University of Montpellier 2, France
      • Pr. Takeaki UNO, National Institute of Informatics, Japan
      • Examiners:
      • Pr. Jérome DARMONT, ERIC, University of Lyon 2, France
      • Pr. Mohamed Mohsen GAMMOUDI, University of Manouba, Tunisia
      • Co-Supervisors:
      • Dr. Laurent D'ORAZIO, LIMOS, University of Clermont Ferrand II, France.
      • Pr. Mondher MADDOURI, LIPAH, Université of Manouba, Tunisia
      • Supervisor:
      • Pr. Engelbert MEPHU NGUIFO, LIMOS, University of Clermont Ferrand II, France.
    • Abstract:

      Recently, graph mining approaches have become very popular, especially in certain domains such as bioinformatics, chemoinformatics and social networks. One of the most challenging tasks in this setting is frequent subgraph discovery. This task has been highly motivated by the tremendously increasing size of existing graph databases. Due to this fact, there is urgent need of efficient and scaling approaches for frequent subgraph discovery especially with the high availability of cloud computing environments. This thesis deals with distributed frequent subgraph mining in the cloud. First, we provide the required material to understand the basic notions of our two research fields, namely graph mining and cloud computing. Then, we present the contributions of this thesis. In the first axis, we propose a novel approach for large-scale subgraph mining, using the MapReduce framework. The proposed approach provides a data partitioning technique that consider data characteristics. It uses the densities of graphs in order to partition the input data. Such a partitioning technique allows a balanced computational loads over the distributed collection of machines and replace the default arbitrary partitioning technique of MapReduce. We experimentally show that our approach decreases significantly the execution time and scales the subgraph discovery process to large graph databases. In the second axis, we address the multi-criteria optimization problem of tuning thresholds related to distributed frequent subgraph mining in cloud computing environments while optimizing the global monetary cost of storing and querying data in the cloud. We define cost models for managing and mining data with a large scale subgraph mining framework over a cloud architecture. We present an experimental validation of the proposed cost models in the case of distributed subgraph mining in the cloud.