Safe online bid optimization with return-on-investment and budget constraints subject to uncertainty.


In online marketing, the advertisers’ goal is usually a tradeoff between achieving high volumes and high profitability. The companies’ business units customarily address this tradeoff by maximizing the volumes while guaranteeing a lower bound to the Return On Investment (ROI). Technically speaking, such a task can be naturally modeled as a combinatorial optimization problem subject to ROI and budget constraints to be solved online since the parameter values are uncertain and need to be estimated during the sequential arrival of data. In this picture, the uncertainty over the constraints’ parameters plays a crucial role. Indeed, these constraints can be arbitrarily violated during the learning process due to an uncontrolled algorithms’ exploration, and such violations represent one of the major obstacles to the adoption of automatic techniques in real-world applications as often considered unacceptable risks by the advertisers. Thus, to make humans trust online learning tools, controlling the algorithms’ exploration so as to mitigate the risk and provide safety guarantees during the entire learning process is of paramount importance. In this paper, we study the nature of both the optimization and learning problems. In particular, when focusing on the optimization problem without uncertainty, we show that it is inapproximable within any factor unless P = NP, and we provide a pseudo-polynomial-time algorithm that achieves an optimal solution. When considering uncertainty, we prove that no online learning algorithm can violate the (ROI or budget) constraints during the learning process a sublinear number of times while guaranteeing a sublinear pseudo-regret. Thus, we provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. We also design its safe version, namely GCBsafe, guaranteeing w.h.p. a constant upper bound on the number of constraints violations at the cost of a linear pseudo-regret. More interestingly, inspired by the previous two algorithms, we provide an algorithm, namely GCBsafe(ψ, φ), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances ψ and φ in the satisfaction of the ROI and budget constraints, respectively. This algorithm actually mitigates the risks due to the constraints violations without precluding the convergence to the optimal solution. Finally, we experimentally compare our algorithms in terms of pseudo-regret/constraint-violation tradeoff in settings generated from real-world data, showing the importance of adopting safety constraints in practice and the effectiveness of our algorithms.


Matteo Castiglioni, Alessandro Nuara, Giulia Romano, Giorgio Spadaro, Francesco Trovò, Nicola Gatti


arXiv preprint arXiv:2201.07139


January 18, 2022