Top of page
go to main navigation
go to sub navigation
go to main content
Meraka Institute

   
start of main navigation
end of main navigation
start of sub navigation
HLT Home | People | Research | Collaborators | Projects | Publications
end of sub navigation
start of content

Human Language Technologies (HLT) – Projects

Telephony-based service delivery

Project Lwazi

Project Lwazi is a 3-year project (completing in September 2009) that will enable efficient and affordable information service delivery over the telephone. As part of this project a comprehensive suite of tools are being developed, including speech recognition and text-to-speech systems in all South Africa's official languages, integrated into a multilingual telephony platform. Technologies are being evaluated in a number of pilots across the country.
Read more about Project Lwazi

Telephony platform development

Many information services can be provided efficiently over the telephone, even in remote areas where access to alternative information infrastructure is limited. The current telephony platform development effort builds on Asterisk, but adds an application development layer that makes it easier to develop telephony applications, additional monitoring and logging capability and integrates additional multilingual ASR and TTS services in the standard platform.
Read more about telephony platform development

Human factors in speech-based user interface design

Our Human Factors research aims to understand how people from diverse cultural and linguistic backgrounds, or varying levels of literacy and familiarity with technology, interact with telephone-based systems and other speech technologies.
Read more about Human Factors research

OpenPhone

The OpenPhone project completed in 2008, and demonstrated the use of telephony-based information services in providing health information in a rural setting. In collaboration with the Botswana Baylor Children's Clinical Centre of Excellence and the University of Botswana, a health information system was developed that provides information to caregivers looking after HIV-positive children living in the vicinity of Gabarone in Botswana.
Read more about the OpenPhone project

Speech and language technologies

Automatic speech recognition

Automatic speech recognition (ASR) systems provide computers with the capability to process and “understand” human speech. In two specific areas the technology has been very successful: to understand speech spoken by any speaker, but on a limited set of topics (speaker-independent speech recognition), or to understand speech from a much wider range of topics but for one speaker only (speaker-dependent speech recognition).
Read more about ASR

Text-to-speech

In order to provide timely information in an affordable way, text-to-speech (TTS) systems are required that automatically transform digital text into speech. Such systems have only been developed for a limited number of languages internationally, and prior to 2004 no freely available text-to-speech system existed for any indigenous South African language.
Read more about Text-to-speech technologies

Speaker and language identification

In addition to the actual content, much can be said about even a small segment of speech: What language was spoken? What do we know about the identity of the speaker? Was the speaker angry or sad? A number of very accurate speaker identification, gender identification and language identification systems have been built in the group, with the utilisation of novel speech features (such as speech timing) and development of systems for resource-scarce languages the focus of our work.

Intonation modelling

A detailed understanding of the intonation (or “prosody”) of a language is important for theoretical as well as practical reasons. Theoretical, since intonation is a deep regularity of languages (and their dialects), and practical, since high-quality speech synthesis (and, possibly, recognition) is impossible without such an understanding.
Read more about intonation modelling

Pronunciation modelling

Given the spelling of a word in a specific language, what would that word sound like? A pronunciation model describes this process of letter-to-sound conversion. A pronunciation modelling component is required by many speech processing tasks – including general domain speech synthesis and large vocabulary speech recognition – and is often one of the first resources required when developing speech technology in a new language.
Read more about pronunciation modelling

Topic modelling

Today, large collections of digital data are widely available and continue to grow in size at an increasing pace. Trying to understand the meaning of such data is a difficult task and in general the first option is to perform keyword searches. The results of keyword searches do not always describe the meaning of the data collection in a satisfactory way, especially if the user has limited insight into the collection. A summary of the data would be very useful and would ideally encapsulate the main topics within the data.
Read more about topic modelling

Statistical pattern recognition

Classification of patterns in data is employed in numerous applications, including speech recognition, text analysis, machine vision, astronomy, medical research and many more. In relation to Human Language Technologies, classification is used in many applications of speech technology as well as text classification and text-based language identification.
Read more about pattern recognition

Tools and products:

DictionaryMaker

DictionaryMaker allows users to create electronic pronunciation dictionaries quickly and effectively. After an initial set-up phase, a mother tongue speaker of a target language can create a pronunciation dictionary quickly and easily, without requiring specialist skills. This is an especially useful tool for speech technology developers working with resource scarce languages.
Read more about DictionaryMaker

OpenSpell

OpenSpell is a wacky, educational game that targets spelling skills, especially designed for kids in developing regions. Version 1.0 was localized to support the 11 official languages of South Africa. Open Spell is easy to edit, so teachers can modify this game to suit their classroom, curriculum, and dialect.
Read more about OpenSpell

ASR-builder

ASR-builder supports the rapid development of HTK-based speech recognition systems in new languages. A series of customisable “recipes” encode the typical steps when developing such systems. Given a set of audio files, transcriptions and a pronunciation dictionary, it is easy to create your own ASR system using this tool.
Read more about ASR-builder

   
  Contact: Marelie Davel +27 12 841 2466 mdavel@csir.co.za
   
Copyright © Meraka Institute 2007
Bottom of page