Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

Published in Empirical Methods for Natural Language Processing (EMNLP), 2024

Recommended citation: Euiin Yi*, T. Kim*, H. Jeung, DS Chang, and S-Y. Yun. (2024). "Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters." Empirical Methods for Natural Language Processing (EMNLP).
Download Paper