presentation - Optimisation Dynamique de Requêtes Réparties à grande échelle

Dynamic Query Optimization in Large-Scale Distributed Environments

In parallel and distributed large-scale environments (Cluster, Grid, Cloud Computing), we address the main topic of query processing and optimization, targeting huge volumes of distributed data in large-scale (“Big Data”).

Currently, our research activities focus on the design and development of new elastic resource allocation models for query optimization, while maximizing the exploitation of fundamental results obtained in parallel and distributed systems, particularly the aspects relative to parallelism types (i.e., partitioned, independent and pipeline parallelisms) and the minimization of inter-operation communication costs.

Our approach is based on the best trade-off between: (i) efficiency (multi-tenant satisfaction in terms of Quality of Service QoS) and (ii) profitability (PaaS service providers). The originality of these new elastic resource allocation models lies in: (i) the introduction of an economic model integrating the profitability dimension (taking into account the providers’ pricing) in the objective function, (ii) the decentralization of control to ensure the scalability by the integration of pro-active migration policy based on mobile agents, and (iii) the revisit of cost estimation methods and search strategies for finding an optimal or near-optimal execution plan.

The two main research issues addressed by the Pyramid team are described below.

I1: Elastic Resource Allocation for Query Optimization

The objective is to design and develop elastic resource allocation models for query optimization. In Cloud Computing environments, the allocated resources (on the provider side) should increase or decrease in accordance with the demand of services (on the tenants side), in order to maintain a QoS and to meet the SLAs (Service Level Agreement). The main QoS criteria taken into account are the performance (e.g., query response time) and the availability of services. As for the SLA, a kind of provider-consumer contract, it specifies a set of constraints to meet and objectives to reach, in terms of these main QoS criteria. The main challenge is to find the best trade-off between the tenants’ satisfaction in terms of QoS and the providers’ gain in terms of profitable resource management.

I2: Data Replication in Cloud Systems

The data replication strategies proposed for parallel, distributed and grid systems are difficult to adapt to Cloud systems. The objective is to propose data replication strategies which should integrate an economic model in terms of profitability of the provider which takes into account possible penalties. The main challenge is to define a dynamic mechanism to adjust the optimal number of replicas in order to allow an elastic resource management.