[2603.00888] Probabilistic Learning and Generation in Deep Sequence Models
About this article
Abstract page for arXiv paper 2603.00888: Probabilistic Learning and Generation in Deep Sequence Models
Computer Science > Machine Learning arXiv:2603.00888 (cs) [Submitted on 1 Mar 2026] Title:Probabilistic Learning and Generation in Deep Sequence Models Authors:Wenlong Chen View a PDF of the paper titled Probabilistic Learning and Generation in Deep Sequence Models, by Wenlong Chen View PDF Abstract:Despite exceptional predictive performance of Deep sequence models (DSMs), the main concern of their deployment centers around the lack of uncertainty awareness. In contrast, probabilistic models quantify the uncertainty associated with unobserved variables with rules of probability. Notably, Bayesian methods leverage Bayes' rule to express our belief of unobserved variables in a principled way. Since exact Bayesian inference is computationally infeasible at scale, approximate inference is required in practice. Two major bottlenecks of Bayesian methods, especially when applied in deep neural networks, are prior specification and approximation quality. In Chapter 3 & 4, we investigate how the architectures of DSMs themselves can be informative for the design of priors or approximations in probabilistic models. We first develop an approximate Bayesian inference method tailored to the Transformer based on the similarity between attention and sparse Gaussian process. Next, we exploit the long-range memory preservation capability of HiPPOs (High-order Polynomial Projection Operators) to construct an interdomain inducing point for Gaussian process, which successfully memorizes the hi...