Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
Published in Empirical Methods for Natural Language Processing (EMNLP), 2024
This paper addresses the high inference cost of deploying LLMs in diverse language environments. It proposes a speculative decoding method using small, language-specialized drafter models. By employing language-specific drafters optimized through pre-training and fine-tuning, this approach demonstrates a significant improvement in LLM inference speed in multilingual contexts compared to existing methods.
Recommended citation: Euiin Yi*, T. Kim*, H. Jeung, DS Chang, and S-Y. Yun. (2024). "Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters." Empirical Methods for Natural Language Processing (EMNLP).
Download Paper

This research focuses on detecting road floods in nighttime driving videos. The study proposes a deep learning model that learns spatiotemporal representations from vehicle black-box footage to effectively detect flooded road conditions, even in low-light and poor visibility environments. The work contributes to enhancing the safety of intelligent transportation systems.