Jen-Yuan (Joy) Huang

Hello! I am a Master's student at the School of Intelligence Science and Technology of Peking University, where I am supervised by Professor Tong Lin. My research focuses on Generative Modeling, particularly in computer vision, with the long-term goal of building models that can effectively represent and understand our physical world.

Currently, I am a remote research intern at Harvard University, working on compositional generalization under the supervision of Professor Yilun Du. Previously, I collaborated with with Professor Huan Wang at Westlake University on text-to-image diffusion model compression.

To bridge the gap between research and real-world applications, I have gained industry experience as a student researcher at Tencent Hunyuan, Xiaohongshu, and SenseTime. I am also an active contributor to the open-source community as a member of the InstantX Team, where I develop AI-generated content (AIGC) models.

I received my Bachelor's degree in Industrial Engineering from Nanjing University, with a specialization in optimization and operation research. During my undergraduate studies, I had the opportunity to work closely with Professor Xiaopeng Luo.

profile photo

Publications

project image
project image

InstantIR: Blind Image Restoration with Instant Generative Reference


Jen-Yuan Huang, Haofan Wang, Qixun Wang, Xu Bai, Hao Ai, Peng Xing, Jen-Tse Huang
arXiv, 2024
arXiv / demo / website / code

In this work, we implement an auto-regressive image restoration process utilizing pre-trained text-to-image diffusion model. At each denoising step, we first generate a restoration reference from current diffusion latent, which is then used to condition the sub-sequent diffusion step.

project image

Long-Text-to-Image Generation via Compositional Prompt Decomposition


Jen-Yuan Huang, Tong Lin, Yilun Du
under review, 2025

We develop an unsupervised prompt decomposition module, which decompose paragraph-long text prompt into manageable sub-prompts. A pre-trained text-to-image diffusion model process these sub-prompts in parallel with the outputs merged into a composed prediction through product-of-expert sampling.

project image

CSGO: Content-Style Composition in Text-to-Image Generation


Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Jen-Yuan Huang, Zechao Li
NeurIPS, 2025
arxiv / demo / website / code

In this work, we devlop an image stylization model named CSGO, which transfers the style presented by an input reference image to a source image. To enable end-to-end training, we introduce an automatic construction pipeline and IMAGStyle, first large-scale style transfer dataset with 210K {content;style;target}-triplet.

project image

Remedy Text-to-image Diffusion Model Compression via Concept Knowledge Distillation


Jen-Yuan Huang, Jen-Tse Huang, Huan Wang, Tong Lin
work in process, 2025

We identify the performance degradation in textual-inversion of T2I diffusion models after compression. By reformuolating the concept discovery process like textual-inversion as an implicit text generation model’s inference, we propose to a knowledge-distillation based retraining method via aligning gradients with respect to text condition.

project image

InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation


Haofan Wang, Peng Xing, Jen-Yuan Huang, Hao Ai, Qixun Wang, Xu Bai
arXiv, 2024
arxiv / demo / website / code

In this paper, we explore natural style transfer while maintaining content integrity. Through analyzing different components of the Stable Diffusion UNet, we identify layers that specialize in processing style and content. Furthermore, we introduce a style discriminator to enhance the stylization of the output.

Contact

E-mail: jenyuan@stu.pku.edu.cn
Address: Peking University, Beijing, China


Design and source code from: Jon Barron's website