Merge pull request #12 from markcannon/gh-pages

Fixed abstracts
mlresearch · Jun 25, 2024 · bcda3f5 · bcda3f5
2 parents fb72177 + c3a151c
commit bcda3f5
Show file tree

Hide file tree

Showing 6 changed files with 11 additions and 11 deletions.
diff --git a/_posts/2024-06-11-bai24b.md b/_posts/2024-06-11-bai24b.md
@@ -12,7 +12,7 @@ abstract: We consider a multi-task learning problem, where an agent is presented
   contribution is to provide theoretical results to characterize the performance of
   the proposed method. In particular, we show that incremental policy gradient methods
   converge to the optimal value of the multi-task reinforcement learning objectives
-  at a sublinear rate $\mathcal{O}(1/\sqrt{k})$, where $k$ is the number of iterations.
+  at a sublinear rate $O(1/\sqrt{k})$, where $k$ is the number of iterations.
   To illustrate its performance, we apply the proposed method to solve a simple multi-task
   variant of GridWorld problems, where an agent seeks to find an policy to navigate
   effectively in different environments.

diff --git a/_posts/2024-06-11-jongeneel24a.md b/_posts/2024-06-11-jongeneel24a.md
@@ -2,7 +2,7 @@
 title: A large deviations perspective on policy gradient algorithms
 abstract: Motivated by policy gradient methods in the context of reinforcement learning,
   we derive the first large deviation rate function for the iterates generated by
-  stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-{Ł}ojasiewicz
+  stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Łojasiewicz
   condition. Leveraging the contraction principle from large deviations theory, we
   illustrate the potential of this result by showing how convergence properties of
   policy gradient with a softmax parametrization and an entropy regularized objective

diff --git a/_posts/2024-06-11-liao24a.md b/_posts/2024-06-11-liao24a.md
@@ -7,7 +7,7 @@ abstract: Many machine learning problems lack strong convexity properties. Fortu
   conditions for convex and smooth functions is well understood, it is not the case
   for the nonsmooth setting. In this paper, we go beyond convexity and smoothness,
   and clarify the connections among common regularity conditions (including strong
-  convexity, restricted secant inequality, subdifferential error bound, Polyak-{Ł}ojasiewic
+  convexity, restricted secant inequality, subdifferential error bound, Polyak-Łojasiewic
   inequality, and quadratic growth) in the class of weakly convex functions. In addition,
   we present a simple and modular proof for the linear convergence of the proximal
   point method (PPM) for convex (possibly nonsmooth) optimization using these regularity

diff --git a/_posts/2024-06-11-mitra24a.md b/_posts/2024-06-11-mitra24a.md
@@ -6,15 +6,15 @@ abstract: 'Given the success of model-free methods for control design in many pr
   problem has analogies with the formulations studied under the rubric of networked
   control systems, the rich literature in that area has typically assumed that the
   model of the system is known. As a step towards bridging the fields of model-free
-  control design and networked control systems, we ask: \textit{Is it possible to
+  control design and networked control systems, we ask: Is it possible to
   solve basic control problems - such as the linear quadratic regulator (LQR) problem
-  - in a model-free manner over a rate-limited channel?} Toward answering this question,
+  - in a model-free manner over a rate-limited channel? Toward answering this question,
   we study a setting where a worker agent transmits quantized policy gradients (of
   the LQR cost) to a server over a noiseless channel with a finite bit-rate. We propose
-  a new algorithm titled Adaptively Quantized Gradient Descent (\texttt{AQGD}), and
-  prove that above a certain finite threshold bit-rate, \texttt{AQGD} guarantees exponentially
-  fast convergence to the globally optimal policy, with \textit{no deterioration of
-  the exponent relative to the unquantized setting}. More generally, our approach
+  a new algorithm titled Adaptively Quantized Gradient Descent (AQGD), and
+  prove that above a certain finite threshold bit-rate, AQGD guarantees exponentially
+  fast convergence to the globally optimal policy, with no deterioration of
+  the exponent relative to the unquantized setting. More generally, our approach
   reveals the benefits of adaptive quantization in preserving fast linear convergence
   rates, and, as such, may be of independent interest to the literature on compressed
   optimization.'

diff --git a/_posts/2024-06-11-turan24a.md b/_posts/2024-06-11-turan24a.md
@@ -14,7 +14,7 @@ abstract: This paper introduces the Safe Pricing for NUM with Gradual Variations
   leveraging an estimate of the users’ price response function. By tuning the amount
   of shrinkage to account for the error between the desired and the induced demand,
   we prove that the induced demand always belongs to the feasible set. In addition,
-  we prove that the regret incurred by the induced demand is ${\cal O}(\sqrt{T(1+V_T)})$
+  we prove that the regret incurred by the induced demand is $O(\sqrt{T(1+V_T)})$
   after $T$ iterations, where $V_T$ is an upper bound on the total gradual variations
   of the users’ utility functions. Numerical simulations demonstrate the efficacy
   of SPNUM-GV and support our theoretical findings.

diff --git a/_posts/2024-06-11-zhang24a.md b/_posts/2024-06-11-zhang24a.md
@@ -15,7 +15,7 @@ abstract: Designing stabilizing controllers is a fundamental challenge in autono
   including a high-fidelity F-16 jet model featuring a 16D state space and a 4D input
   space. Experiments indicate that, compared to prior works in reinforcement learning,
   imitation learning, and neural certificates, LYGE reduces the distance to the goal
-  by 50% whil<e requiring only 5% to 32% of the samples. Furthermore, we demonstrate
+  by 50% while requiring only 5% to 32% of the samples. Furthermore, we demonstrate
   that our algorithm can be extended to learn controllers guided by other certificate
   functions for unknown systems.
 layout: inproceedings