diff --git a/_posts/2024-06-11-bai24b.md b/_posts/2024-06-11-bai24b.md index 6a9e25a..f7c34af 100644 --- a/_posts/2024-06-11-bai24b.md +++ b/_posts/2024-06-11-bai24b.md @@ -12,7 +12,7 @@ abstract: We consider a multi-task learning problem, where an agent is presented contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives - at a sublinear rate $\mathcal{O}(1/\sqrt{k})$, where $k$ is the number of iterations. + at a sublinear rate $O(1/\sqrt{k})$, where $k$ is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments. diff --git a/_posts/2024-06-11-jongeneel24a.md b/_posts/2024-06-11-jongeneel24a.md index 763a87e..6c960d2 100644 --- a/_posts/2024-06-11-jongeneel24a.md +++ b/_posts/2024-06-11-jongeneel24a.md @@ -2,7 +2,7 @@ title: A large deviations perspective on policy gradient algorithms abstract: Motivated by policy gradient methods in the context of reinforcement learning, we derive the first large deviation rate function for the iterates generated by - stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-{Ł}ojasiewicz + stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Łojasiewicz condition. Leveraging the contraction principle from large deviations theory, we illustrate the potential of this result by showing how convergence properties of policy gradient with a softmax parametrization and an entropy regularized objective diff --git a/_posts/2024-06-11-liao24a.md b/_posts/2024-06-11-liao24a.md index aace8aa..9821927 100644 --- a/_posts/2024-06-11-liao24a.md +++ b/_posts/2024-06-11-liao24a.md @@ -7,7 +7,7 @@ abstract: Many machine learning problems lack strong convexity properties. Fortu conditions for convex and smooth functions is well understood, it is not the case for the nonsmooth setting. In this paper, we go beyond convexity and smoothness, and clarify the connections among common regularity conditions (including strong - convexity, restricted secant inequality, subdifferential error bound, Polyak-{Ł}ojasiewic + convexity, restricted secant inequality, subdifferential error bound, Polyak-Łojasiewic inequality, and quadratic growth) in the class of weakly convex functions. In addition, we present a simple and modular proof for the linear convergence of the proximal point method (PPM) for convex (possibly nonsmooth) optimization using these regularity diff --git a/_posts/2024-06-11-mitra24a.md b/_posts/2024-06-11-mitra24a.md index a68976e..72d90b1 100644 --- a/_posts/2024-06-11-mitra24a.md +++ b/_posts/2024-06-11-mitra24a.md @@ -6,15 +6,15 @@ abstract: 'Given the success of model-free methods for control design in many pr problem has analogies with the formulations studied under the rubric of networked control systems, the rich literature in that area has typically assumed that the model of the system is known. As a step towards bridging the fields of model-free - control design and networked control systems, we ask: \textit{Is it possible to + control design and networked control systems, we ask: Is it possible to solve basic control problems - such as the linear quadratic regulator (LQR) problem - - in a model-free manner over a rate-limited channel?} Toward answering this question, + - in a model-free manner over a rate-limited channel? Toward answering this question, we study a setting where a worker agent transmits quantized policy gradients (of the LQR cost) to a server over a noiseless channel with a finite bit-rate. We propose - a new algorithm titled Adaptively Quantized Gradient Descent (\texttt{AQGD}), and - prove that above a certain finite threshold bit-rate, \texttt{AQGD} guarantees exponentially - fast convergence to the globally optimal policy, with \textit{no deterioration of - the exponent relative to the unquantized setting}. More generally, our approach + a new algorithm titled Adaptively Quantized Gradient Descent (AQGD), and + prove that above a certain finite threshold bit-rate, AQGD guarantees exponentially + fast convergence to the globally optimal policy, with no deterioration of + the exponent relative to the unquantized setting. More generally, our approach reveals the benefits of adaptive quantization in preserving fast linear convergence rates, and, as such, may be of independent interest to the literature on compressed optimization.' diff --git a/_posts/2024-06-11-turan24a.md b/_posts/2024-06-11-turan24a.md index 974a446..6073f34 100644 --- a/_posts/2024-06-11-turan24a.md +++ b/_posts/2024-06-11-turan24a.md @@ -14,7 +14,7 @@ abstract: This paper introduces the Safe Pricing for NUM with Gradual Variations leveraging an estimate of the users’ price response function. By tuning the amount of shrinkage to account for the error between the desired and the induced demand, we prove that the induced demand always belongs to the feasible set. In addition, - we prove that the regret incurred by the induced demand is ${\cal O}(\sqrt{T(1+V_T)})$ + we prove that the regret incurred by the induced demand is $O(\sqrt{T(1+V_T)})$ after $T$ iterations, where $V_T$ is an upper bound on the total gradual variations of the users’ utility functions. Numerical simulations demonstrate the efficacy of SPNUM-GV and support our theoretical findings. diff --git a/_posts/2024-06-11-zhang24a.md b/_posts/2024-06-11-zhang24a.md index 19d36b2..e963865 100644 --- a/_posts/2024-06-11-zhang24a.md +++ b/_posts/2024-06-11-zhang24a.md @@ -15,7 +15,7 @@ abstract: Designing stabilizing controllers is a fundamental challenge in autono including a high-fidelity F-16 jet model featuring a 16D state space and a 4D input space. Experiments indicate that, compared to prior works in reinforcement learning, imitation learning, and neural certificates, LYGE reduces the distance to the goal - by 50% whil