Resource allocation with observable and unobservable environments

Santiago Duran

Résumé

This thesis studies resource allocation problems in large-scale stochastic networks. We work on problems where the availability of resources is subject to time fluctuations, a situation that one may encounter, for example, in load balancing systems or in wireless downlink scheduling systems. The time fluctuations are modelled considering two types of processes, controllable processes, whose evolution depends on the action of the decision maker, and environment processes, whose evolution is exogenous. The stochastic evolution of the controllable process depends on the the current state of the environment. Depending on whether the decision maker observes the state of the environment, we say that the environment is observable or unobservable. The mathematical formulation used is the Markov Decision Processes (MDPs). The thesis follows three main research axes. In the first problem we study the optimal control of a Multi-armed restless bandit problem (MARBP) with an unobservable environment. The objective is to characterise the optimal policy for the controllable process in spite of the fact that the environment cannot be observed. We consider the large-scale asymptotic regime in which the number of bandits and the speed of the environment both tend to infinity. In our main result we establish that a set of priority policies is asymptotically optimal. We show that, in particular, this set includes Whittle index policy of a system whose parameters are averaged over the stationary behaviour of the environment. In the second problem, we consider an MARBP with an observable environment. The objective is to leverage information on the environment to derive an optimal policy for the controllable process. Assuming that the technical condition of indexability holds, we develop an algorithm to compute Whittle's index. We then apply this result to the particular case of a queue with abandonments. We prove indexability, and we provide closed-form expressions of Whittle's index. In the third problem we consider a model of a large-scale storage system, where there are files distributed across a set of nodes. Each node breaks down following a law that depends on the load it handles. Whenever a node breaks down, all the files it had are reallocated to other nodes. We study the evolution of the load of a single node in the mean-field regime, when the number of nodes and files grow large. We prove the existence of the process in the mean-field regime. We further show the convergence in distribution of the load in steady state as the average number of files per node tends to infinity.

Cette thèse étudie les problèmes d'allocation des ressources dans les réseaux stochastiques à grande échelle dans lesquels les paramètres fluctuent dans le temps. Nous supposons que l'état du système est formé de deux processus, une partie contrôlable dont l'évolution dépend de l'action du décideur et la partie environnement dont l'évolution est exogène. L'évolution stochastique du processus contrôlable dépend de l'état actuel de l'environnement. Selon que le décideur observe l'état de l'environnement, nous disons que l'environnement est observable ou non observable. La thèse suit trois axes de recherche principaux. Dans le premier problème, nous étudions le contrôle optimal d'un problème de bandit agité multi-bras MARBP avec un environnement inobservable. L'objectif est de caractériser la politique optimale de maîtrise du processus contrôlable malgré le fait que l'environnement ne peut pas être observé. Nous considérons le régime asymptotique à grande échelle dans lequel le nombre de bandits et la vitesse de l'environnement tendent tous deux à l'infini. Dans notre résultat principal, nous établissons qu'un ensemble de politiques prioritaires est asymptotiquement optimal. Nous montrons que cet ensemble comprend notamment l'indice de Whittle d'un système dont les paramètres sont moyennés sur le comportement stationnaire de l'environnement. Dans le second problème, nous considérons un MARBP avec un environnement observable. L'objectif est de tirer parti des informations sur l'environnement pour dériver une politique optimale pour le processus contrôlable. En supposant que la condition technique d'indexabilité soit vérifiée, nous développons un algorithme pour calculer numériquement l'indice de Whittle. Nous appliquons ensuite ce résultat au cas particulier d'une file d'attente avec abandon. Nous établissons une indexabilité, et nous obtenons des caractérisations de l'indice de Whittle sous forme fermée. Dans le troisième problème, nous considérons un modèle d'allocation de fichiers dans un grand système de stockage, où il y a des fichiers répartis sur un ensemble de nœuds. Chaque nœud tombe en panne selon une loi qui dépend de la charge qu'il gère. Chaque fois qu'un nœud tombe en panne, tous les fichiers qu'il possédait sont réalloués selon une stratégie d'allocation fixe, et le nœud redémarre son travail en étant vide. Nous étudions l'évolution de la charge d'un nœud dans le régime de champ moyen, lorsque le nombre de fichiers et le nombre de nœuds deviennent importants. Nous prouvons l'existence et l'unicité de la mesure de probabilité stationnaire du processus, et la convergence dans la distribution de cette mesure.

Resource allocation with observable and unobservable environments

Allocation de ressources avec environnements observables et non-observables

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager