AUTOMATIC TRANSLATION OF SQUAD AND RACE QUESTION ANSWERING DATASETS IN BULGARIAN LANGUAGE
Abstract
There are many question-answering (QA) datasets, used in different natural lan-guage processing (NLP) tasks with SQuAD one of the most popular QA dataset around.RACE dataset is popular dataset for Multi Choice Question Answering (MCQA) taskand used to evaluate and train MCQA models. These datasets are available in Englishlanguage only.We took these two datasets and translated them in Bulgarian language using au-tomated translation techniques. After that we evaluated the new translated datasetson Extractive QA and MCQA tasks. Experimental results show, that our datasets canbe effectively used to improve the performance of transformer models on QA tasks inBulgarian language.
Refbacks
- There are currently no refbacks.