Skip to content

Commit

Permalink
Merge pull request #12 from markcannon/gh-pages
Browse files Browse the repository at this point in the history
Fixed abstracts
  • Loading branch information
markcannon authored Jun 25, 2024
2 parents fb72177 + c3a151c commit bcda3f5
Show file tree
Hide file tree
Showing 6 changed files with 11 additions and 11 deletions.
2 changes: 1 addition & 1 deletion _posts/2024-06-11-bai24b.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ abstract: We consider a multi-task learning problem, where an agent is presented
contribution is to provide theoretical results to characterize the performance of
the proposed method. In particular, we show that incremental policy gradient methods
converge to the optimal value of the multi-task reinforcement learning objectives
at a sublinear rate $\mathcal{O}(1/\sqrt{k})$, where $k$ is the number of iterations.
at a sublinear rate $O(1/\sqrt{k})$, where $k$ is the number of iterations.
To illustrate its performance, we apply the proposed method to solve a simple multi-task
variant of GridWorld problems, where an agent seeks to find an policy to navigate
effectively in different environments.
Expand Down
2 changes: 1 addition & 1 deletion _posts/2024-06-11-jongeneel24a.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: A large deviations perspective on policy gradient algorithms
abstract: Motivated by policy gradient methods in the context of reinforcement learning,
we derive the first large deviation rate function for the iterates generated by
stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-{Ł}ojasiewicz
stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Łojasiewicz
condition. Leveraging the contraction principle from large deviations theory, we
illustrate the potential of this result by showing how convergence properties of
policy gradient with a softmax parametrization and an entropy regularized objective
Expand Down
2 changes: 1 addition & 1 deletion _posts/2024-06-11-liao24a.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ abstract: Many machine learning problems lack strong convexity properties. Fortu
conditions for convex and smooth functions is well understood, it is not the case
for the nonsmooth setting. In this paper, we go beyond convexity and smoothness,
and clarify the connections among common regularity conditions (including strong
convexity, restricted secant inequality, subdifferential error bound, Polyak-{Ł}ojasiewic
convexity, restricted secant inequality, subdifferential error bound, Polyak-Łojasiewic
inequality, and quadratic growth) in the class of weakly convex functions. In addition,
we present a simple and modular proof for the linear convergence of the proximal
point method (PPM) for convex (possibly nonsmooth) optimization using these regularity
Expand Down
12 changes: 6 additions & 6 deletions _posts/2024-06-11-mitra24a.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ abstract: 'Given the success of model-free methods for control design in many pr
problem has analogies with the formulations studied under the rubric of networked
control systems, the rich literature in that area has typically assumed that the
model of the system is known. As a step towards bridging the fields of model-free
control design and networked control systems, we ask: \textit{Is it possible to
control design and networked control systems, we ask: Is it possible to
solve basic control problems - such as the linear quadratic regulator (LQR) problem
- in a model-free manner over a rate-limited channel?} Toward answering this question,
- in a model-free manner over a rate-limited channel? Toward answering this question,
we study a setting where a worker agent transmits quantized policy gradients (of
the LQR cost) to a server over a noiseless channel with a finite bit-rate. We propose
a new algorithm titled Adaptively Quantized Gradient Descent (\texttt{AQGD}), and
prove that above a certain finite threshold bit-rate, \texttt{AQGD} guarantees exponentially
fast convergence to the globally optimal policy, with \textit{no deterioration of
the exponent relative to the unquantized setting}. More generally, our approach
a new algorithm titled Adaptively Quantized Gradient Descent (AQGD), and
prove that above a certain finite threshold bit-rate, AQGD guarantees exponentially
fast convergence to the globally optimal policy, with no deterioration of
the exponent relative to the unquantized setting. More generally, our approach
reveals the benefits of adaptive quantization in preserving fast linear convergence
rates, and, as such, may be of independent interest to the literature on compressed
optimization.'
Expand Down
2 changes: 1 addition & 1 deletion _posts/2024-06-11-turan24a.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ abstract: This paper introduces the Safe Pricing for NUM with Gradual Variations
leveraging an estimate of the users’ price response function. By tuning the amount
of shrinkage to account for the error between the desired and the induced demand,
we prove that the induced demand always belongs to the feasible set. In addition,
we prove that the regret incurred by the induced demand is ${\cal O}(\sqrt{T(1+V_T)})$
we prove that the regret incurred by the induced demand is $O(\sqrt{T(1+V_T)})$
after $T$ iterations, where $V_T$ is an upper bound on the total gradual variations
of the users’ utility functions. Numerical simulations demonstrate the efficacy
of SPNUM-GV and support our theoretical findings.
Expand Down
2 changes: 1 addition & 1 deletion _posts/2024-06-11-zhang24a.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ abstract: Designing stabilizing controllers is a fundamental challenge in autono
including a high-fidelity F-16 jet model featuring a 16D state space and a 4D input
space. Experiments indicate that, compared to prior works in reinforcement learning,
imitation learning, and neural certificates, LYGE reduces the distance to the goal
by 50% whil<e requiring only 5% to 32% of the samples. Furthermore, we demonstrate
by 50% while requiring only 5% to 32% of the samples. Furthermore, we demonstrate
that our algorithm can be extended to learn controllers guided by other certificate
functions for unknown systems.
layout: inproceedings
Expand Down

0 comments on commit bcda3f5

Please sign in to comment.