BayLing - a large-scale language model that follows instructions

Tag： Multilingual multi round interaction

Chinese Websites： http://mlops.ccloud.conestore.cn:30010/bayling/#/ Enter The Website

BayLing is a large language model that follows instructions and has multilingual and multi round interaction capabilities consistent with humans.

Large scale language models (LLMs) have demonstrated extraordinary abilities in language comprehension and generation. From base LLM to instruction following LLM, instruction fine-tuning plays a crucial role in aligning LLM with human preferences. However, existing LLMs typically focus on English, resulting in poorer performance in non English languages. In order to improve the performance of non English languages, it is necessary to collect language specific training data for the base LLM and fine tune instructions to build language specific instructions, both of which come at a significant cost. In order to minimize manual workload, we propose to transfer the ability of language generation and instruction following from English to other languages through interactive translation tasks.

We have released Bailing, an LLM based on LLaMA that follows instructions. We automatically built interactive translation instructions to fine tune them. Adequate evaluation experiments have shown that Bailing achieves performance comparable to GPT-3.5-turbo, and uses much smaller parameter sizes (only 13 billion). The experimental results of the translation task showed that Bailing achieved 95% single round translation ability compared to GPT-4 under automatic evaluation, and 96% interactive translation ability compared to GPT-3.5-turbo under manual evaluation. To evaluate the performance of the model on general tasks, we created a multi round instruction test set called Bailing-80. The experimental results on Bailing-80 showed that Bailing achieved 89% performance compared to GPT-3.5-turbo. Bailing also performed well in the knowledge assessment test sets of Chinese college entrance examination and English SAT, second only to GPT-3.5-turbo among many LLMs that follow instructions.

We have publicly released our training, inference, and evaluation code on GitHub. In addition, we also publicly disclosed the model weights of Bailing-7B and Bailing-13B on HuggingFace. We have built an online demonstration system to make it easier for the research community to use our model. In addition, we have also released the Bailing-80 test set we have developed, which includes 80 two round instructions in both Chinese and English, and can be used to comprehensively evaluate the multilingual and multi round interaction capabilities of LLMs. For more experiments and more detailed findings, please refer to our blog and papers.