Fast and Accurate Repeated Decision Making - ANITI - Artificial and Natural Intelligence Toulouse Institute Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

Fast and Accurate Repeated Decision Making

Résumé

We study a setting in which a learner faces a sequence of decision tasks and is required to make good decisions as quickly as possible. Each task n is associated with a pair (Xn, µn), where Xn is a random variable and µn is its (unknown and potentially negative) expectation. The learner can draw arbitrarily many i.i.d. samples of Xn but its expectation µn is never revealed. After some sampling is done, the learner can decide to stop and either accept the task, gaining µn as a reward, or reject it, getting zero reward instead. A distinguishing feature of our model is that the learner's performance is measured as the expected cumulative reward divided by the expected cumulative number of drawn samples. The learner's goal is to converge to the per-sample reward of the optimal policy within a fixed class. We design an online algorithm with data-dependent theoretical guarantees for finite sets of policies, and analyze its extension to infinite classes of policies. A key technical aspect of this setting, which sets it aside from stochastic bandits, is the impossibility of obtaining unbiased estimates of the policy's performance objective.
Fichier principal
Vignette du fichier
main.pdf (356.32 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02976864 , version 1 (23-10-2020)
hal-02976864 , version 2 (15-02-2021)
hal-02976864 , version 3 (30-06-2021)
hal-02976864 , version 4 (22-12-2021)

Identifiants

Citer

Nicolò Cesa-Bianchi, Tommaso R Cesari, Yishay Mansour, Vanney Perchet. Fast and Accurate Repeated Decision Making. 2021. ⟨hal-02976864v2⟩
329 Consultations
229 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More