1 option

A Dynamical Systems Perspective on Optimization Algorithms Orlando Romero

Dissertations & Theses @ University of Pennsylvania Available online

Format:: Book; Thesis/Dissertation
Author/Creator:: Romero, Orlando, author.
Contributor:: University of Pennsylvania. Electrical and Systems Engineering., degree granting institution.
Language:: English
Subjects (All):: Applied mathematics.; Computer science.; Engineering.; Mathematics.; Information science.; 0364.; 0984.; 0537.; 0723.; 0405.
Local Subjects:: Applied mathematics.; Computer science.; Engineering.; Mathematics.; Information science.; 0364.; 0984.; 0537.; 0723.; 0405.
Physical Description:: 1 electronic resource (171 pages)
Contained In:: Dissertations Abstracts International 86-07A
Place of Publication:: Ann Arbor : ProQuest Dissertations and Theses, 2024
Language Note:: English
Summary:: The intersection of machine learning (ML) - including deep learning and reinforcement learning - and dynamical systems and control (S&C) has become a prominent area of research in recent years, as more problems naturally blur the lines between these fields. However, the majority of research at this intersection has focused on applying ML techniques to S&C problems, whereas this dissertation explores the reverse direction: utilizing tools from S&C in ML and optimization algorithms.Part I (Ch 2 and 3): Continuous-Time and Discrete-Time OptimizationWe investigate the relationship between continuous-time dynamical systems and traditional iterative optimization algorithms. By reinterpreting iterative methods like gradient descent as continuous-time systems described by differential equations or inclusions, we gain insights into their qualitative and quantitative behaviors, with convergence rates often matching in both time domains. However, the fundamental limits of this interplay are not well understood. To address this, we show that a simple modification (rescaling) of the gradient flow can provably achieve finite-time convergence - a property fundamentally incompatible with iterative algorithms. Notably, this is possibly in continuous time only as this domain allows instantaneous corrections in velocity without risk of overshooting. When discretizing, we instead risk overshooting if step sizes are too large, as is directly dictated by the geometric landscape of the function. Furthermore, we demonstrate that under certain regularity conditions, any high-order off-the-shelf ordinary differential equation (ODE) solver can approximate the convergence rate of any optimization flow. Notably, certain ODE solvers - most prominently the Forward Euler -preserve exponential stability and thus linear convergence.Part II (Ch 4): Generalizing Stability and SmoothnessWe re-examine the connections between convexity and smoothness in convergence rates, motivated by the abrupt transition between the sublinear rate \uD835\uDCAA(L/k) and the linear rate \uD835\uDCAA(e − µ/L k) in gradient descent for the class FL of L-smooth convex functions and the subclass Fµ,L of µ-strongly convex functions. Using and expanding tools from Part I, we show that this abrupt transition is actually smooth if we quantify convexity in terms of the Bregman divergence, namely Df (x, y) ≥ µ/p ∥x−y∥p , and consider the classes FL and Fµ,L as the edge cases p → ∞ and p = 2. More generally, introducing the modulus of convexity (ϕ) and the modulus of smoothness (σ), characterized by ϕ(∥x − y∥) ≤ Df (x, y) ≤ σ(∥x − y∥), we derive convergence rates in terms of these functions for a proposed algorithm called σ-rescaled gradient descent (σ-RGD), motivated by the earlier finite-time convergence findings. In particular, we show that linear convergence with a rate \uD835\uDCAA((1 − 1/κ)k) is achieved, provided that the generalized condition number κ = sups>0 ϕ∗ (s)/σ∗(s) is finite. Part III (Ch 5 and 6): Applications to Machine LearningWe use tools from S&C for two ML problems. The first one is to re-analyze the convergence of the Expectation-Maximization (EM) algorithm, now using Lyapunov stability theory. The second is to use standard tools from linear systems theory to a provide preliminary bias-variance tradeoff bound for a novel method we propose to reduce the variance of the ConfTr (conformal training) method proposed by Stutz and others (2022). This method consists of simulating a quantile-threshold calibration procedure during training in order to learn conformal predictors that are more length efficient. We identify a source of sample inefficiency, propose a fix for it, and use in key places tools from S&C to aid in the theoretical analysis. Further, the insights gained from Part II are used to guide the training regime, particularly as some of the hyperparameters directly influence (in a predictable way) the smoothness of the cost function in the problem studied
Notes:: Source: Dissertations Abstracts International, Volume: 86-07, Section: A.; Advisors: Pappas, George J. Committee members: Vidal, Rene; Altschuler, Jason; Benosman, Mouhacine; Ph.D. University of Pennsylvania 2024
Local Notes:: School code: 0175
ISBN:: 9798302182777
Access Restriction:: Restricted for use by site license

The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.

1 option

A Dynamical Systems Perspective on Optimization Algorithms Orlando Romero

Find

My Account

Guides