Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

• From Trajectories to Operators: A Unified Flow Map Perspective on Generative Modeling

25 minute read

Published:

In this post, we reframe continuous-time generative modeling from integrating trajectories to learning two-time operators (flow maps). This operator view unifies diffusion, flow matching, and consistency models, and suggests a practical diagnostic — semigroup-consistent jumps yield both step-robust generation and low compositional drift. We derive Eulerian/Lagrangian distillation objectives and use inpainting experiments to show why semigroup-consistent jumps can be both step-robust and composition-stable.

• Discrete Visual Tokenizers for Multimodal LLMs: Bridging Vision and Language

61 minute read

Published:

How do we convert continuous images into discrete tokens that large language models can understand? This survey traces the complete evolution of discrete image tokenization—from the groundbreaking VQ-VAE in 2017 to today’s FSQ, LFQ, and TiTok designed for multimodal LLMs. We reveal a compelling narrative of three paradigm shifts: the initial quest for better generative models, the existential challenge from diffusion models, and the remarkable renaissance driven by vision-language unification. Whether you’re building multimodal AI systems or seeking to understand the technical foundations behind models like Chameleon and Emu3, this comprehensive survey provides the complete picture.

• Visual Representation Learning

93 minute read

Published:

This post provides a systematic, mechanics-first tour of visual representation learning in the foundation-model era—framing representation as the internal currency that enables native multimodality and the unification of understanding and generation. It connects the major paradigms—contrastive learning, negative-free self-distillation, masked modeling (pixel/token/feature targets), generative representations (VAE/AR/Diffusion/GAN), multimodal learning (alignment vs fusion), and JEPA-style joint-embedding prediction—into one coherent map, emphasizing objectives, anti-collapse mechanisms, and what each family’s representations are actually good for.

• Inverse Problems with Generative Priors

57 minute read

Published:

Inverse problems, which aim to recover a signal of interest from indirect and often corrupted measurements, are a cornerstone of computational science and engineering. These problems are typically ill-posed, necessitating the use of prior knowledge to regularize the solution space and ensure a unique and stable reconstruction. This post provides a structured exposition of the evolution of priors in solving inverse problems, from classical formulations to the modern paradigm of deep generative models. This work aims to bridge the conceptual gap between classical and modern techniques, offering a unified view of the role of priors in solving generative inverse problems.

• Accelerating Diffusion Sampling: From Multi-Step to Single-step Generation

124 minute read

Published:

This article takes a deep dive into the evolution of diffusion model sampling techniques, tracing the progression from early score-based models with Langevin Dynamics, through discrete and non-Markov diffusion processes, to continuous-time SDE/ODE formulations, specialized numerical solvers, and cutting-edge methods such as consistency models, distillation, and flow matching. Our goal is to provide both a historical perspective and a unified theoretical framework to help readers understand not only how these methods work but why they were developed.

• Fast Generation with Flow Matching

31 minute read

Published:

Fast sampling has become a central goal in generative modeling, enabling the transition from high-fidelity but computationally intensive diffusion models to real-time generation systems. While diffusion models rely on tailored numerical solvers to mitigate the stiffness of their probability flow ODEs, flow matching defines dynamics through smooth interpolation paths, fundamentally altering the challenges of acceleration. This article provides a comprehensive overview of fast sampling in flow matching, with emphasis on path linearization strategies (e.g., Rectified Flow, ReFlow, SlimFlow, InstaFlow), the integration of consistency models, and emerging approaches such as flow generators.

• From Diffusion to Flow: A New Genrative Paradigm

33 minute read

Published:

In this post, we uncovered the foundations of Flow Matching: the limitations of diffusion models, the constraints of continuous flows, and the transformative idea of directly learning the path between distributions. From the intuition of Rectified Flow to the unifying lens of Stochastic Interpolants, Flow Matching emerged as more than a method — it is a paradigm that reframes generation as learning currents of transformation. With this conceptual map in hand, we are now ready to move from theory to practice.

• High-Order PF-ODE Solver in Diffusion Models

40 minute read

Published:

Diffusion sampling can be cast as integrating the probability flow ODE (PF-ODE), but dropping it into a generic ODE toolbox rarely delivers the best speed–quality trade-off. This post first revisits core numerical-analysis ideas. It then explains why vanilla integrators underperform on the semi-linear, sometimes stiff PF-ODE in low-NFE regimes, and surveys families that exploit diffusion-specific structure: pseudo-numerical samplers (PLMS/PNDM) and semi-analytic/high-order solvers (DEIS, DPM-Solver/++/UniPC). The goal is a practical, unified view of when and why these PF-ODE samplers work beyond “just use RK4.”

• Consistency Theory: From Discrete Constraints to Continuous Flows

25 minute read

Published:

Consistency models (CMs) have recently emerged as a powerful paradigm for accelerating diffusion sampling by directly learning mappings that preserve consistency across noisy representations of the same data. This paper provides a comprehensive study of the Consistency Family, tracing its evolution from discrete consistency models to continuous-time formulations and trajectory-based extensions. We begin by revisiting the foundational motivation behind CMs and systematically derive their discrete and continuous objectives under both consistency distillation and consistency training paradigms.

• Diffusion Architectures Part I: Stability-Oriented Designs

86 minute read

Published:

This article explores how network architectures shape the stability of diffusion model training. We contrast U-Net and Transformer-based (DiT) backbones, analyzing how skip connections, residual scaling, and normalization influence gradient propagation across noise levels. By surveying stability-oriented innovations such as AdaGN, AdaLN-Zero, and skip pathway regulation, we reveal why architectural choices can determine whether training converges smoothly or collapses. The discussion provides both theoretical insights and practical design rules for building robust diffusion models.

• Analysis of the Stability and Efficiency of Diffusion Model Training

83 minute read

Published:

while diffusion models have revolutionized generative AI, their training challenges stem from a combination of resource intensity, optimization intricacies, and deployment hurdles. A stable training process ensures that the model produces good quality samples and converges efficiently over time without suffering from numerical instabilities.

• Unifying Discrete and Continuous Perspectives in Diffusion Models

20 minute read

Published:

Diffusion models have been shown to be a highly promising approach in the field of image generation. They treat image generation as two independent processes: the forward process, which transforms a complex data distribution into a known prior distribution (typically a standard normal distribution) by gradually injecting noise; and the reverse process, which transforms the prior distribution back into the complex data distribution by gradually removing the noise.

awards

books

patents

projects

publications

services

talks