Home About Lwazi Areas of research Get involved! Demos Downloads Publications Contact us
Lwazi service login
 
 
Home Areas of research Electronic linguistic resource collection

Electronic linguistic resource collection

The Lwazi project, and future speech-based applications in South Africa, depends on the creation of extensive electronic linguistic resources both to generate and recognise speech. For each South African language, a pronunciation dictionary, an ASR corpus, and a TTS corpus is generated. An electronic repository enables the sharing of these valuable resources with the larger HLT research and development community.

Pronunciation dictionary (DictionaryMaker)

Speech and language technology requires the development of phone sets, word lists, and recorded phonemes for each language. The HLT team developed DictionaryMaker, a system that uses bootstrapping to quickly create pronunciation dictionaries in new languages.

Download Download DictionaryMaker


Speech recognition corpus

Automatic Speech Recognition (ASR) systems depend on a corpus of transcribed speech for each new language. Electronic linguistic resource collection also requires the development of phonemic and orthographic transcription conventions, prompt sheets, and phonetically balanced sentences for each language.

The HLT team developed an ASR data collection protocol in order to generate the extensive linguistic data needed to power ASR in each official South African language.


Text-to-speech corpus

Our concatenating TTS systems dynamically generate speech that is built from the diphones of speech recorded by a voice artist. Creating the TTS corpus for each language requires the development of a text-to-speech protocol and phonetically-balanced sentences. A voice artist is recruited and speech is recorded to generate the TTS corpus for each language.


Electronic repository

The linguistic resources developed for the Lwazi project (e.g. multi-speaker speech corpora, orthographic conventions, pronunciation dictionaries) will be of significant value to South Africa's HLT research and development community. At present, the HLT team is developing a searchable electronic resource repository to consolidate and distribute tools and resources, as these become available.

Back to top