Version 1. New: Support for exceptions. Note: Corpora are regulary improved and transcribers retrained.
This free version only allows 40 word transcriptions per submission. View performance statistics for Danish , German , English. The transcription tool is based on a Decision Tree derived from a training lexicon a list of orthographic forms and their phonemic counterparts.
It doesn't look up words in a lexicon, but transcribes in accordance with the general rules it "learned" from the training lexicon. More specifically, the Decision Tree is some machine generated code that decides how a grapheme should be transcribed phonemically given its left and right context. It has been generated by a program that based on an Expectation—Maximization algorithm aligns graphemes and phonemes of the training lexicon and subsequently based on the alignments builds the tree structure .
The transcription tool is not error free.
Table of Contents
For "normal" native words it mostly produces correct results, however for words of foreign origin, some proper names, abbreviations etc. Other systems may be mainly lexicon-based and only resort to machine-generated transcriptions when words are not found in the lexicon.
Since version 1. This excludes languages like Chinese with a syllable based orthography and Hebrew consonantal orthography.
Moreover, for languages with alphabetic orthographies the problem of mapping graphemic symbols to phonemic ones does not have equal complexity. There are extremely "easy" languages like Turkish where the problem largely can be solved simply by substituting orthographic symbols with phonemic ones without considering the context.
And there are "difficult" languages like Danish where certain historical sound changes weakening of plosives and lowering of vowels in certain contexts etc. View performance statistics for Danish , German , English The transcription tool is based on a Decision Tree derived from a training lexicon a list of orthographic forms and their phonemic counterparts.