📑 About Me
🧠 Interests
Artificial General Intelligence, Multimodal Large Language Models, Vision-Language-Speech Generation, Object-Centric Representation Learning, Text-to-Image Generation, Multimodal Alignment
🎓 Education
MS in Artificial Intelligence, Korea University
2025.03 – Present
GPA: 4.44 / 4.5
BS in Computer Science and Engineering, Korea University
2019.03 – 2025.02
GPA: 4.11 / 4.5
Leave of absence for mandatory military service, Republic of Korea Air Force (2021.01 – 2023.01)
📄 Publications
-
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Donghwan Chi*, Hyomin Kim*, Yoonjin Oh, Yongjin Kim, Donghoon Lee, Daejin Jo, Jongmin Kim, Junyeob Baek, Sungjin Ahn, Sungwoong Kim
Submitted to IEEE Transactions on Multimedia
Introduced Slot Q-Former to enhance multimodal LLMs in object-centric visual understanding, generation, and editing tasks.
-
Object-centric Self-improving Preference Optimization for Text-to-Image Generation
Yoonjin Oh, Yongjin Kim, Hyomin Kim, Donghwan Chi, Sungwoong Kim
Accepted to CVPR 2026
Proposed the OSPO framework to mitigate object-level hallucinations in text-to-image generation models.
🔬 Research Experience
WAVLab, Carnegie Mellon University
Visiting Collaborator (Advisor: Prof. Shinji Watanabe)
2025.10 – Present
- Conducting research on vision-speech-language omni-modal models for unified multimodal understanding and generation
- Developing alignment-preserving training methods to extend large vision-language models with discrete speech tokenizers for ITS2ITS omni-modal understanding and generation across image, text, and speech
AGI Lab, Korea University
Graduate Researcher / Undergraduate Intern (Advisor: Prof. Sungwoong Kim and Prof. Sungbin Lim)
2023.09 – Present
- Led research on object-centric visual tokenization for multimodal large language models, resulting in Slot-MLLM
- Designed a Slot Q-Former-based visual tokenizer to improve object-level grounding, generation, and editing
- Conducted large-scale multimodal training on image-text and interleaved datasets using multi-GPU distributed training
- Evaluated object-centric reasoning and generation on benchmarks including spatial relation understanding, text-to-image generation, and image editing
- Managed lab GPU servers and supported large-scale training infrastructure
👨🏫 Activities
Artificial Intelligence Korea University (AIKU)
First Cohort Senior Member, Technical & Academic Management Team (2023.01 – 2024.01)
- Managed the club’s servers, ensuring reliable and efficient operation
- Served as a committee member for multiple hackathon question designs
- Conducted and participated in a project related to diffusion models
- Organized and taught introductory courses for new members
🏅 Honors & Awards
- Chief of Naval Operations Award, Republic of Korea Navy, Open Source Academy Military (OSAM) Hackathon, 2021
- Excellence Award, Capstone Design Competition, Korea University, Fall 2024
- Academic Excellence Scholarship, Korea University, Spring 2024
- Special Scholarship by Student Affairs Office, Korea University, Fall 2023 & Spring 2024
- Semester High Honors, Korea University, 2019, 2020, 2023, 2024
💻 Skills
- Languages: Python, Bash, LaTeX
- ML Frameworks: PyTorch, PyTorch Lightning, Hugging Face Transformers
- ML Systems: Distributed Training, Multi-GPU Training, Mixed Precision Training, DeepSpeed, Weights & Biases
- Research Areas: Multimodal LLMs, Vision-Language Models, Diffusion Models, Speech Tokenization
- Language Proficiency: TOEIC 905