How Anc-VI Helps AI Learn Faster with Optimality Operators

14 Jan 2025

Authors:

(1) Jongmin Lee, Department of Mathematical Science, Seoul National University;

(2) Ernest K. Ryu, Department of Mathematical Science, Seoul National University and Interdisciplinary Program in Artificial Intelligence, Seoul National University.

Abstract and 1 Introduction

1.1 Notations and preliminaries

1.2 Prior works

2 Anchored Value Iteration

2.1 Accelerated rate for Bellman consistency operator

2.2 Accelerated rate for Bellman optimality opera

3 Convergence when y=1

4 Complexity lower bound

5 Approximate Anchored Value Iteration

6 Gauss–Seidel Anchored Value Iteration

7 Conclusion, Acknowledgments and Disclosure of Funding and References

A Preliminaries

B Omitted proofs in Section 2

C Omitted proofs in Section 3

D Omitted proofs in Section 4

E Omitted proofs in Section 5

F Omitted proofs in Section 6

G Broader Impacts

H Limitations

2.2 Accelerated rate for Bellman optimality opera

We now present the accelerated convergence rate of Anc-VI for the Bellman optimality operator. Our analysis uses what we call the Bellman anti-optimality operator, define

Anc-VI with the Bellman optimality operator exhibits the same accelerated convergence rate as Anc-VI with the Bellman consistency operator. As in Theorem 1, the rate of Theorem 2 also becomes O(1/k) when γ ≈ 1, while VI has a O(1)-rate.

Proof outline of Theorem 2. The key technical challenge of the proof comes from the fact that the Bellman optimality operator is non-linear. Similar to the Bellman consistency operator case, we have

This paper is available on arxiv under CC BY 4.0 DEED license.

← Previous

Anc-VI Sets New Standards in Speed for Bellman Consistency in Reinforcement Learning

Up Next →

Why Anc-VI is Crucial for Undiscounted Reinforcement Learning