Lean

Lean project: Formalizing my own research paper

A running log of a Lean formalization of (a part of) one of my research paper.

What is Lean?

Lean is a programming language designed for formalizing and proving mathematical logic and statements, initiated by Leonardo De Moura in 2013. Expand for more descriptions from the official Lean Programming Language homepage and Lean Wikipedia page. [expand]

Lean is an open-source programming language and proof assistant that enables correct, maintainable, and formally verified code.
Lean is a proof assistant and a functional programming language. It is based on the calculus of constructions with inductive types. It is an open-source project hosted on GitHub. Development is currently supported by the non-profit Lean Focused Research Organization (FRO).

To share some of my experiences, the Lean code is somewhat human-readable, but writing and reading formal proofs in Lean feel much slower than handwritten proof. This is due to more explicit breakdown in every step in the proofs. For example, in Lean, we need rfl tactic to say $x = x$ is true, and need Nat.add_assoc theorem to show that $x + y = y + x$ is true.

The official Lean homepage provides some learning resources including interactive games and tutorials. Natural Number Game is an excellent starting point to learn basic tactics and syntax. This game reminded me of the undergraduate level set theory and mathematical logic course.

Community is a very important component of Lean. On Zulip, there is an official public chat room dedicated to Lean. The Reservoir is a registry for many Lean packages. Notably, Mathlib package is one of the most popular packages. A (non-exhaustive) list of topics in Mathlib is listed in this document, while there are still many missing topics from undergraduate mathematics.

How I got interested in

The ability of Lean to check and formalize a full chain of mathematical reasoning was fascinating.

Terence Tao has been actively engaged with Lean, including a collaborative formalization project for one of his co-authored papers. More recently, he created a Lean companion to Analysis I, a real analysis textbook he wrote in 2006. You can find several Lean-related blog posts in his websites: here1; here2; here3.

In recent years, there has been huge interest in the AI community to achieve benchmark results in mathematical problem solving. They have made significant development, which is personally very interesting but bittersweet. I explored some of this in the (forthcoming) note about The ‘gold rush’ in Erdős problems: My early participation and failure.

About this Lean project

Inspired by these works, I started a small project to formalize some core results from my work, Basic Inequalities for Optimization Algorithms with Applications to Statistical Risk Analysis, whose manuscript is on the Research tab. Theorem 1 and 2, called basic inequalities, are foundational for other optimization and statistical analyses in the paper. Thus I begin by formalizing these two theorems.

Four sections are followed: 1. Lemmas for gradient descent (GD); 2. Main proof for GD basic inequality; 3. Lemmas for mirror descent (MD); 4. Main proof for MD basic inequality. The first two sections are for Theorem 1, and the other two are for Theorem 2.

I’d like to emphasize that this writing is mainly focused on the Lean language itself. Therefore I purposely omit LaTeX math notations, unless it is necessary. For complete details about the theorems and their proof, please refer to the manuscript.

On the other hand, the full Lean code is presented for each section with extensive comments. I hope these comments help readers (including me) who do not have experiences in Lean to trace back and learn Lean effectively. Yet there are some notes regarding LLMs usage in the code.

Notes on LLMs in this project

Note 1. I actively interacted with LLMs and coding assistants throughout the workflow: Lean project initialization; environment setup; Lean code generation; code explanation and comments; and final proof verification via Lean. I initially used ChatGPT-5 (both Thinking and Pro), but after many iterations, I transitioned to Claude Sonnet 4.5 via Claude Code in my local VS Code. Claude Code was more effective for iterative development. Due to its terminal integration, it can execute Lean proof checks, read and debug the output message, and so on.

Note 2. While LLMs handled the most of code generation accurately (saying 98% correct), to understand the Lean and Mathlib’s tactics and theorems through the official documentations and community resources was the most challenging part for me. This may raise an interesting question for some people: if automated proof checker, such as Lean, can verify the correctness and rigor of the proof, then do we need to, or how much do we need to understand the proof? I explore this a little more in the (forthcoming) note mentioned above.

1. Lemmas for GD

The following Lean code formalizes and proves two key lemma for Theorem 1, the basic inequality for gradient descent (GD). Each lemma’s statement is presented in both LaTeX inline math notation and Lean syntax. Meanwhile, I preferred to use calc tactic which can display the intermediate steps of the logic in Lean code, similar to \begin{align} in LaTeX. Also, it is quite possible that the proof can be simplified via other packages or Mathlib modules.

Statement 1. Squared norm difference for the gradient decent iterates.

$x, z, g\in \mathbb R^d$; $\eta\in\mathbb R$; $x':=x-\eta g$. Then $\|x-z\|_2^2 - \|x' -z\|_2^2 = 2\eta\langle g, x-z \rangle -\eta^2\|g\|_2^2.$
‖x - z‖^2 - ‖(x - η • g) - z‖^2 = 2 * η * @inner ℝ E _ g (x - z) - (η^2) * ‖g‖^2

Statement 2. Telescoping sum.

$\sum_{t=0}^{T-1} (a_t - a_{t+1}) = a_0 - a_T$.
(∑ t ∈ Finset.range T, (a t - a (t + 1))) = a 0 - a T

The following Lean code passes the proof check by Lean via lake build in the terminal.

-- Lemmas.lean
-- ---------- Setup and imports ----------
import Mathlib.Tactic
-- Provides: tactics (simp, rw, calc, exact, set, have, ring, ring_nf, congr, etc;
--                    ac_rfl (associativity-commutativity))
-- Transitively imports: Mathlib.Algebra.BigOperators (for ∑ notation)

import Mathlib.Analysis.InnerProductSpace.Basic
-- Provides: InnerProductSpace, inner, norm_add_sq_real, inner_smul_right, real_inner_comm

import Mathlib.Data.Real.Basic
-- Provides: ℝ (real numbers), Real.norm_eq_abs, sq_abs

set_option warningAsError true
-- Make any warning fail the build

namespace LeanBasicineq
namespace GD
-- Wrap everything in project-specific namespaces:
--    Create a namespace so every declaration inside is qualified as LeanBasicineq.GD.<name>.
--    This prevents name clashes and groups related code.
--    Later, we close these with end GD, end LeanBasicineq.

open BigOperators
-- Enable ∑ notation (from Mathlib.Algebra.BigOperators, imported by Mathlib.Tactic)
-- Provides: ∑ x ∈ s, f x notation for Finset.sum, Finset.sum_range_succ lemma

variable {E : Type*} [NormedAddCommGroup E] [InnerProductSpace ℝ E]
-- Work over a real inner-product space E, saying,
-- there will be an implicit type parameter E from here on.
-- [InnerProductSpace ℝ E] says "assume there is a real inner-product on E".

-- ---------- Lemma 1 ----------
-- Squared-norm difference under a GD update. Statement:
-- For any x, z, g ∈ E and η ∈ ℝ,
-- ‖x - z‖^2 - ‖(x - η·g) - z‖^2 = 2η ⟨g, x - z⟩ - η^2 ‖g‖^2.

lemma normSq_update (x z g : E) (η : ℝ) :
  ‖x - z‖^2 - ‖(x - η • g) - z‖^2 = 2 * η * @inner ℝ E _ g (x - z) - (η^2) * ‖g‖^2 := by
  -- 0) Abbreviations
  set a := x - z with ha -- set tactic from Mathlib.Tactic
  set b := (-η) • g with hb

  -- 1) (x - η g) - z  =  (x - z) + (-η) g
  have h_ab : (x - η • g) - z = a + b := by
    calc (x - η • g) - z
        = (x + (-(η • g))) + (-z) := by
          simp [sub_eq_add_neg, add_assoc] -- simp: simplification tactic (Mathlib.Tactic)
      _ = (x + (-z)) + (-(η • g)) := by
          ac_rfl -- ac_rfl: associativity-commutativity reflexivity (Mathlib.Tactic)
      _ = (x - z) + (-η) • g := by
          simp [sub_eq_add_neg, neg_smul]
          -- neg_smul: scalar multiplication distributes over negation
      _ = a + b := by
          simp [ha, hb]

  -- 2) ‖a + b‖^2 expansion
  -- norm_add_sq_real: ‖a + b‖^2 = ‖a‖^2 + 2*inner a b + ‖b‖^2 (InnerProductSpace.Basic)
  -- cf. No 'by' because the lemma directly produces the proof we need.
  --     If we used 'by' instead, we could use 'exact norm_add_sq_real a b'.
  have hExp : ‖a + b‖^2 = ‖a‖^2 + 2 * @inner ℝ E _ a b + ‖b‖^2 :=
    norm_add_sq_real a b

  -- 3) ‖a‖^2 - ‖a+b‖^2  =  - (2 * @inner ℝ E _ a b + ‖b‖^2)
  have hSqDiff : ‖a‖^2 - ‖a + b‖^2 = - (2 * @inner ℝ E _ a b + ‖b‖^2) := by
    rw [hExp] -- rw: rewrite tactic (Mathlib.Tactic)
    ring -- ring: solve polynomial ring equalities (Mathlib.Tactic)

  -- 4) @inner ℝ E _ a b = (-η) * @inner ℝ E _ (x - z) g
  have hInner : @inner ℝ E _ a b = -η * @inner ℝ E _ (x - z) g := by
    calc @inner ℝ E _ a b
        = @inner ℝ E _ (x - z) ((-η) • g) := by
          rw [ha, hb]
      _ = (-η) * @inner ℝ E _ (x - z) g := by
          rw [inner_smul_right]
          -- inner_smul_right: ⟨x, c•y⟩ = c*⟨x,y⟩ (InnerProductSpace.Basic)

  -- 5) ‖b‖^2 = η^2 ‖g‖^2
  --    Since b = (-η)•g, we have ‖b‖^2 = |-η|^2 ‖g‖^2 = η^2 ‖g‖^2
  have hB : ‖b‖^2 = (η^2) * ‖g‖^2 := by
    rw [hb, norm_smul, norm_neg, Real.norm_eq_abs] -- norm properties (Data.Real.Basic)
    rw [mul_pow] -- (a*b)^n = a^n * b^n
    congr 1
    -- congr: congruence tactic (Mathlib.Tactic)
    --        congr ("congruence") is used when you have a goal of the form f a = f b,
    --        and you want to reduce it to proving a = b.
    --        i.e., "If two function applications are equal, their arguments must be equal."
    --        Here, '1' in congr 1 means "apply congruence at depth 1", thus,
    --        it looks at the top-level operation (multiplication) and reduces the problem.
    exact sq_abs η -- sq_abs: |x|^2 = x^2 (Data.Real.Basic)

  -- 6) Swap inner-product arguments (inner product is symmetric on ℝ)
  have hComm : @inner ℝ E _ (x - z) g = @inner ℝ E _ g (x - z) := by
    rw [real_inner_comm]
    -- real_inner_comm: ⟨x,y⟩ = ⟨y,x⟩ for real spaces (InnerProductSpace.Basic)

  -- 7) Assemble: combine all intermediate results to reach the goal
  -- calc: calculation mode for chained equalities (Mathlib.Tactic)
  calc ‖x - z‖^2 - ‖(x - η • g) - z‖^2
        = ‖a‖^2 - ‖a + b‖^2 := by
          rw [← ha, ← h_ab]
    _   = - (2 * @inner ℝ E _ a b + ‖b‖^2) := by
          rw [hSqDiff]
    _   = - (2 * (-η) * @inner ℝ E _ (x - z) g + ‖b‖^2) := by
          rw [hInner]; ring_nf -- ring_nf: ring normalization (Mathlib.Tactic)
    _   = 2 * η * @inner ℝ E _ (x - z) g - ‖b‖^2 := by
          ring
    _   = 2 * η * @inner ℝ E _ g (x - z) - ‖b‖^2 := by
          rw [hComm]
    _   = 2 * η * @inner ℝ E _ g (x - z) - (η^2) * ‖g‖^2 := by
          rw [hB]

-- ---------- Lemma 2 ----------
-- Telescoping sum identity. Statement:
-- For any sequence a : ℕ → ℝ and any T : ℕ, ∑_{t=0}^{T-1} (a(t) - a(t+1)) = a(0) - a(T).

lemma telescoping_sum {a : ℕ → ℝ} (T : ℕ) :
    (∑ t ∈ Finset.range T, (a t - a (t + 1))) = a 0 - a T := by
  -- Proof by induction on T (induction tactic from Mathlib.Tactic)
  induction T with
  | zero =>
      -- Base case: T = 0, sum over empty range equals 0
      simp -- simp proves: empty sum = 0 and a 0 - a 0 = 0
  | succ T ih =>
      -- Inductive step: assume true for T, prove for T + 1
      -- sum_{t=0}^{T} (a t - a(t+1)) = sum_{t=0}^{T-1} (...) + (a T - a(T+1))
      -- Finset.sum_range_succ: expands sum to include the T-th term (Mathlib.Tactic)
      rw [Finset.sum_range_succ]
      rw [ih] -- Apply induction hypothesis: sum_{t=0}^{T-1} = a 0 - a T
      -- Goal: (a 0 - a T) + (a T - a (T+1)) = a 0 - a (T+1)
      ring -- Algebraic simplification: (a-b)+(b-c) = a-c

end GD
end LeanBasicineq

2. Main proof for GD basic inequality

Coming soon.

3. Lemmas for MD

Coming soon.

4. Main proof for MD basic inequality

Coming soon.

Seunghoon Paik

Lean project: Formalizing my own research paper

What is Lean?

How I got interested in

About this Lean project

Notes on LLMs in this project

1. Lemmas for GD

2. Main proof for GD basic inequality

3. Lemmas for MD

4. Main proof for MD basic inequality