Teaching Children with Down Syndrome ...

Viewer
Transcript

TEACHING CHILDREN WITH DOWN SYNDROME PRONUNCIATIONS USING SPEECH RECOGNITION Suad Al Shamsi Author Informatics Institute, The British University in Dubai, Knowledge Village, Block 17, Dubai, UAE [email protected]

Habib Talhami Co-Author Informatics Institute, The British University in Dubai, Knowledge Village, Block 17, Dubai, UAE [email protected]

communications difficulties such as those children with DS.

ABSTRACT Several applications that are based on speech recognition have been developed to assist people with special needs to perform their daily tasks. For example, people who are physically challenged can enter data and issue commands by dictating to a computer. Visually impaired people are able to listen to what is written on the screen by using text-to-speech. However, applications that have been developed for children with special needs (like, for example, Speaking for Myself [16]) do not provide any feedback to the children. This paper proposes and compares two new approaches for teaching children with Down Syndrome (DS) pronunciations using speech recognition. These approaches make use of the major speech characteristic of children with DS to develop an educational tool that assists them in overcoming their speech communication difficulties. The tool recognizes the spoken words and provides feedback. The two approaches that are proposed are: a word-based approach that handles any phonological process, and the phone1based approach that handles one phonological process at a time. The phone-based approach is more accurate than the word-based approach. However, both approaches can be improved through tuning the speech recognition parameters and using single utterance recognition confidence scores.

In recent years researchers have become interested in identifying and improving the speech characteristics of children with DS. The majority of the studies have focused on the human role (i.e. parents and speech therapists) in improving the speech and language skills for these children. Few studies have examined the advantages of using computer applications to support the speech and language development. Most applications do not provide any feedback that can be used to evaluate and improve children’s speech pronunciations. One study has examined the use of speech recognition in order to provide feedback for the vowel’s production only [12]. The purpose of this project is to study the speech characteristics of children with DS, and in particular phonological processes, and how these can be used to develop a teaching aid or tool. The tool visualises different words for the child, waits for his or her input, analyzes the input, and provides feedback based on the recognition result. This will assist in improving the pronunciations of children with DS. The project focuses initially on the American English pronunciations of single words. It uses 52 words and their variations (i.e. different pronunciations of a single word produced by children with DS. For example “frog” can be pronounced as “fog”, “fof”, “frod”, “rog”, etc.). The paper is divided as follows: first there is a short description of the Down Syndrome; this is followed by a categorization of the main phonological processes related to DS; next it describes the specification, implementation, and evaluation of the application being developed to improve speech pronunciations.

KEY WORDS Down Syndrome, speech recognition, and educational software. 1. INTRODUCTION Children with Down Syndrome suffer from many difficulties such as brain damage, hearing loss, health problems, and learning disabilities. New technologies in the medical field are constantly contributing to improving the children’s health. However, what is needed is a more concerted effort to produce teaching aids that would improve the communication skills of children with

1

2. BACKGROUND 2.1 Down Syndrome Down Syndrome is a developmental disability caused by the existence of an extra chromosome, chromosome 21, in each cell. Children with DS usually vary in progress and

Phone: Single basic speech sound.

528-075

Khaled Shaalan Co-Author Informatics Institute, The British University in Dubai Knowledge Village, Block 17, Dubai, UAE [email protected]

146

consonant with the voiced consonant that is produced with similar movements. The voiceless consonants are (k, p, d, s, f, sh, and, ch) and their voice consonants replacements are (g, b, t, z, v, zu, and, j) respectively. Examples of the replacement are (cup to gup) and (two to do). Velar Fronting (VF). This is the process of producing the sounds that should be produced at the back of the mouth, in front of the mouth. The sounds that are produced at the back of the mouth are (k and g), and their replacement are (t and d). Examples of applying VF are (car to tar and dog to dod). Stopping of fricatives and affricatives (ST). This is the process of producing longer fricative and affricative (s, z, f, v, sh, ch, and, j) sounds as short plosive sounds (t, d, p, and, b). For instance, applying ST on “sun” and “jump”, they become “tun” and “dump” respectively. Cluster Reduction (CR). This is the process by which children omit one of the two or three consonants which occur together (e.g. drum to dum). The most common consonants that the children omit are (r, s, and, l). Consonant Harmony (CH). This is the process by which children attempt to replace consonant in a word by another which is similar to one of the consonants in that word. For instance, “dog” could be produced as “gog”. Postvocalic Devoicing (PVD). This is the process by which children replace the final voiced consonant by another which is devoiced. The voiced consonants are (b, d, g, v, z, and, j) and their devoiced format are (p, t, k, f, s, and, ch) respectively. An example of applying PVD on “dog” is “dok”. Vocalization (VOC). This is the process by which children produce the final syllabic “er” and “al” as a full vowel (e.g. table to tabo). Gliding of liquids (GL). This is the process of replacing a liquid “r” or “l” with a glide, “w” or “j”. Examples of that are “lamp” becoming “wamp” and “right” becoming “jight”.

difficulties. They fall behind their typical peer group in speech and language skills. The main factors that hinder the speech development of children with DS can be categorized into mental, physical and social factors. The mental factors are auditory discrimination difficulty and auditory short term memory. Children with DS have difficulty discriminating between similar sound words such as red and bread. They also have problems with the short term memory, which is responsible for holding the incoming sound patterns of a word, relating it to its meaning, and producing it again. In addition, children with DS have difficulties planning and organizing the sequence of movements needed to produce speech, which affect the accuracy and consistency of speech movement [4, 5]. The physical factors that influence the speech development of children with DS are (i) mild-to-moderate hearing loss, (ii) a small oral cavity, (iii) a large tongue or a forward tongue position, and (iv) weak facial muscles which limit lip and tongue movement affecting sound production as well as clarity and speed of utterances [8, 15]. The social factor is mainly the amount of verbal input that children with DS receive in their daily life. As children with DS have difficulty producing words and they are usually misunderstood by the people around them, they are excluded from any conversation. As a result, the amount of input that they receive is reduced. Moreover, people are usually not aware of the children’s verbal ability thus their input is not appropriate for the children in term of speaking clarity and vocabulary level [4, 5, 15]. Due to these factors, children with DS might suffer from the following speech difficulties dysathria, dysphasia, and phonological processing problems. Dysathria and dysphasia are beyond the scope of this project. This project focuses on the phonological processing problems. Phonological processing problems result in ambiguous production of phonemes, which is the unit sound in a language. Children mostly tend to simplify words through the elimination of sounds [15].

Children with DS might not necessarily apply a single phonological process while producing a word. They might apply a combination of phonological processes. For example applying PVV in “cup”, it becomes “gup”. Applying DFC in “gup”, it becomes “gu”. As a result, speech production of children with DS is ambiguous and hard to understand by other people.

DS children suffer from the same phonological processing problems as their typical peer group. However, these problems appear in later development stages and take longer time to be fixed. The main phonological processing problems as defined by [3, 5, 8] are: Deletion of Final Consonants (DFC). This is the process by which children omit the final consonants in a word so it ends with vowel. Applying DFC can result in single or cluster consonants deletion (e.g. dog to do and drink to dri). Prevocalic Voicing (PVV). This is the process by which children tend to produce the initial consonant in a word as a voiced consonant instead of the voiceless consonant. Children replace the voiceless

2.2 Literature Review Few studies have identified and explored ways to improve the speech characteristic problems for children with DS. The majority of these studies focused on the human role in overcoming the speech problems [4, 5]. Some developed computer applications to assist in teaching children with DS but these applications did not provide any feedback.

147

The Centre for Spoken Language Understanding (CSLU) toolkit is a powerful tool that adopts various technologies such as speech recognition, speech generation, audio tools, and animated faces. It was developed in 1992 and since then it has been used in a number of research projects [9]. It has been used in various fields including education, language learning, and other research and development projects. However, it does not provide flexibility for the developers to customize the toolkit through deploying their own code; neither does it provide support for large vocabulary.

Researchers stated that highlighting the speech difficulties of children with DS should start from an early age. Parents should be aware of the best practice to improve their children’s speech skills. Speech therapists can guide parents and provide sessions to enhance the children’s learning process. Children with DS vary in the level of attainment so they need to be taught individually according to their level [4]. They need to be able to hear clearly and to distinguish different sound patterns thus helping them to produce these patterns. The first step to improve speech production for children with DS is to ensure that children are able to hear and produce speech sounds as single sounds and in single words [9]. They can start with initial consonant sounds that they can produce based on their age and knowledge [5]. They should then move to words with different initial sounds, sound patterns, and number of syllables [9]. This method improves the children’s articulation. Researchers found that visualizing the printed word with an image and its associated pronunciation prints the word image in the children’s minds and enables them to produce it later[9,5]. Moreover, involving children in real conversations gives them the opportunity to practice their speech and improve their articulation and clarity. Children with hearing loss learn better in a quiet environment, where the speaker speaks in suitable volume and clear voice [9].

The goal of this project is to develop an educational application that includes speech recognition to recognize children’s pronunciation and provide feedback that encourages them to produce the correct pronunciation. The application uses the Nuance large vocabulary recognizer [17] that can be used for both English and Arabic. 2.3. Rashed Paediatric Therapy Centre The Rashed Paediatric Therapy Centre (RPTC) in Dubai is a centre for children with special needs. It has two different groups of students who are native and non-native Arabic speakers. The study focuses on identifying and improving the speech characteristics of the non-native Arabic speakers with DS as a first step. English is also the language of study since English is spoken by most of the children in the Centre.

Besides parents and speech therapists, several computer applications have been developed to assist children learning. These applications have been developed in a free error environment. They are usually highly organized and demonstrate a single task. “Speaking for myself [16]” is one of the best applications that has been used as an educational tool especially for children with DS. It is based on the idea of teaching the spoken word through reading. The application presents flash cards containing modern graphics, printed words, and clearly pronounced words. In addition, it includes story book that combines two or three words [16]. This computer application was found to be highly motivating for children with DS, maintaining their attention, providing structured environment, and improving accuracy thus improving self-esteem [1, 16].

According to the speech therapist [18] at the RPTC, there are common phonological problems that the children with DS in the centre suffer from. These phonological problems are velar fronting, stopping of fricatives and affricatives, and cluster reduction. She stated that the main reason for these difficulties is the middle ear problem. She also stated that these problems do not only occur in children with DS, but also occur in their typical peer group. However, it takes children with DS much longer to learn and overcome these phonological process problems. The speech therapist has also participated in validating the content of the software developed in this project, evaluating its design specifications, and doing the testing of the software during the implementation.

However, these applications lack an important aspect of teaching, which is providing feedback to children that can evaluate and improve pronunciations. “CSLU Toolkitbased Vocabulary Tutors” is an application that has been developed for students with Down Syndrome in the Jean Piaget Special Education School in México [12]. The application uses animation and speech recognition in order to mainly support pronunciation of vowels. It produces the letter, waits for sound input, analyzes it, and gives feedback. It uses an animated character to present the facial movements required to produce sounds correctly [12].

2.4. Speech Recognition The application developed in this project is based on the speech recognition. Speech recognition is the process of converting a spoken word or sentence into text. This is done by modeling the word as a sequence of phonemes. The phoneme sequence is defined in a dictionary. In order to match the phonemes to the speech waveform, statistical models are used. These phoneme models are also known as Hidden Markov Models (HMMs). Speech recognition is used in a number of applications and in particular telephony applications. Speech 148

The application considers the spoken word as correct as long as the child pronounces the target sound correctly. The target sound is specified based on the category (i.e. phonological problem). For example if the selected category was cluster reduction (s-blend), the target sound is the “s” sound. The pronunciation is correct as long as the child pronounces the “s” sound regardless of the rest of the sounds in the word’s pronunciation. If the sound was pronounced correctly, the application congratulates the child and moves to the next word. Otherwise, the application repeats the correct pronunciation of the target sound and the word as a whole. The application keeps looping until the correct pronunciation is produced. In this algorithm, there is a need to define only one rule for every word to provide the appropriate feedback. There is no need to include all the word’s variations. This approach was suggested by the speech therapist. It is more focused, easier to code, and is expected to help overcome some speech difficulties as stated by the speech therapist. This approach can be used to improve speech characteristics for children with DS in early stages when the focus is producing individual sounds.

recognition is also integrated in some cars and mobile phones. Moreover, it is used in the education sector especially for students who are physically challenged. Students, who cannot use the keyboard for writing, can dictate to the computer using speech recognition. They can also perform a set of commands using speech recognition. The performance of speech recognition systems can be affected by technical and human factors. The technical factors, which include microphone, sound card, operating system, and hardware, are easy to identify and fix. On the other hand, the human factors which can strongly affect the accuracy of the recognition and challenge the recognition performance are hard to overcome. Different people produce the same word differently and even the same person produces the same word differently each time it is spoken. 3. APPLICATION SPECIFICATION In this project the speech recognition has not been used for dictation or issuing commands. However, it has been used to interpret the spoken words and provide feedback. The application corrects the pronunciation of the word in the event of the child uttering the wrong pronunciation. This could help overcome some the communications difficulties that are experienced by children with DS.

4. APPLICATION DEVELOPMENT The word-based and the phone-based algorithms were developed using the Nuance Speech Recognition System [17] and the Java programming language. Developing the application using Nuance requires collecting the data (i.e. expected input), building the dictionary that defines word’s pronunciations, developing the grammars that include all the expected inputs, and creating a Java class that deploys the speech recognition with NuanceSpeechChannel object and defines the set of rules to provide appropriate feedback. The following is the step-by-step procedure used to implement both algorithms.

Two different algorithms were developed to provide feedback to the children. The first algorithm is a wordbased algorithm which ensures that the whole word is pronounced correctly. It provides feedback for every single variation caused by applying single or a mixture of phonological process problems. It takes an input (spoken word via microphone) and looks for the best match in its grammar. If the word was pronounced correctly, the application congratulates the child and moves to the next word. Otherwise, the application repeats what the child has said, corrects it by emphasizing the omitted or replaced sound, and provides the right pronunciation. The application keeps looping until the correct pronunciation is produced. In this algorithm, there is a need to define a rule for every variation of the word in order to provide the appropriate feedback, which requires long coding for small number of words. However, the advantage of this approach is that it deals with all the phonological problems. Thus it could be used for any child regardless of the phonological problems that he or she suffers from. This approach can be used to improve the speech production for children in later stages when the focus is not producing individual sounds but rather combination of sounds.

4.1. Data Gathering (Content) The software was developed based on isolated English2 words. These words were selected carefully to cover all the phonological problems. They include various phoneme types such as fricatives, affricatives, and velars which occur in different position (i.e. beginning, middle, or end of the word). These words were gathered from different sources which are: the speech therapist, educational websites, and a workbook on phonological processes [3]. They are related to the children’s daily life activities. For the word-based, only five words were gathered, while for the phone-based approach, forty eight words were gathered. The reason for this huge difference in the number of words is that the first approach is not categorized so applying all the possible phonological problems as single problem or combination of more than

The second algorithm is phone-based. The application is categorized according to the phonological problems. It handles one problem at a time and ensures that a single sound in a word is produced correctly and not the whole word. This algorithm takes an input (spoken word via microphone) and looks for the best match in its grammar.

2

The system can been developed in any language the is defined in Nuance.

149

original word was reduced to smaller similar sounds words.

one result in 103 variations for the five words and as mentioned earlier this affects the performance of the recognizer. In the second approach the original words were categorized into three main phonological problems as suggested by the speech therapist. These main categories are cluster reduction, velar fronting, and stopping of fricatives and affricatives. Under each of the main categories there are subcategories. In the cluster reduction the subcategories are S-Blends, R-Blends, and L-Blends. In the velar fronting the subcategories are KVelar and G-Velar and in the stopping of fricatives and affricatives the subcategories are S-Fricative, Z-Fricative, Sh-Fricative, F-Fricative, V-Fricative, Ch-Affricative, and J-Affricative. For the second approach there was a need to find out the list of variation as well to know the combination of the correct results. The phonological process workbook [3] was used as the fundamental source to build the variation list in both approaches. The author went through all the 22 phonological process problems in order to apply these problems in each word as a single problem and a combination of problems. Part of the list was verified by the speech therapist to ensure coverage and validity.

In the second approach the grammar file was divided into forty eight top-level grammars for the same reasons. Each top-level grammar represents the correct pronunciation of the word as a mixture of phonemes combined through logical operators (i.e. “or” and “and”). As mentioned earlier, the word is considered to be pronounced correctly as long as the child produces the target sound. Because the grammar file defines the word as a mixture of phonemes, the dictionary file defines the customized English letter’s mapping. The grammar file is smaller and easier to define, in spit the fact that it contains bigger number of top-level grammar. 4.4. Speech Recognition The NuanceSpeechChannel object was created in a Java class. This Java class is responsible for starting a speech channel session, playing prompts to the children, waiting for input, recognizing the input, applying the defined rules, and closing the session. For the word-based approach around 108 rules were defined for each variation the child might input. In the phone-based approach around 48 rules were defined for the correct pronunciations of the target letter in each word. In both classes the system keeps prompting the target word, if there was no input after a period of time.

4.2. Dictionary File After preparing the word lists that include the original words as well as their variations, the customized dictionary file was created. The dictionary file defines the pronunciation of the words. In this project the English American pronunciation was used. In the first approach, the dictionary file includes the list of the words (original and their variations) and against each word, its American English pronunciation in Nuance format. The reason for defining the customized dictionary file is that the word’s variations are not defined in the default dictionary. The second reason is for consideration of the correct pronunciation. For example in the word “blue” both “blue” and “balue” are considered correct as stated by the speech therapist.

4.5. Graphical User Interface The interface was developed using the Java programming language because Nuance provides an API that allows seamless integration with the application. The Graphical User Interface was designed to be similar to “Speaking for Myself”, one of the best educational products for children with DS. The interface presents the printed word, a modern picture visualizing the word’s meaning, and the word’s pronunciation. Both approaches have the same graphical user interface. The only difference is that in the second approach the interface is divided into categories and subcategories.

For the second approach, the dictionary file includes customization for the English language letter mapping. This is because the grammar file as will be explained below is a combination of phonemes for each word.

The application was tested in different levels to ensure the validity of the dictionary file, grammar file, and Java classes. The main challenge was to identify the source of the errors whether it was the defined pronunciation, or the recoded voice used for the testing.

4.3. Grammar File The grammar file defines all the possible input or “spoken words” from the children. It uses both the default and the customized dictionary to determine the best match for the input based on its pronunciation.

The application should be used in quiet environment. The input voice should be clear and should have the right level of volume. Moreover, the application does not currently deal with long duration phonemes; neither does it deal with pauses between these phonemes. These have to be incorporated in the training of the Hidden Markov Models used for the recognition which Nuance has to develop and we have no access to.

In the first approach, the grammar file defines five toplevel grammars that can be referenced by the application during run time. Each of the top-level grammar represents one of the original words and its variations. Therefore the total number of words listed in the five top-level grammars is 108. The grammar file was divided in order to simplify the grammar and improve the accuracy as each

150

initial and final sounds of the words like “f”, “v”, and “g”. These factors can reflect the replacement, elimination, or addition of initial and final sounds. Moreover, some of the variations are difficult to pronounce such as “bwu” and “bwod”. In addition, if the speaker is not a native American speaker, the accuracy might be even lower.

5. EVALUATION The application was tested using around 800 audio files. These audio files provide a complete coverage for all the possible input. They were not used during the development phase. They were recoded using only one person's voice, the author voice. The audio files simulate how children with DS pronounce words.

The overall recognition result can be improved through proper setting of the recognition parameters which are “ConfidenceRejectionThreshold”, “PPR3”, and “Pruning”. The best configuration of these parameters can be found through running several tests on the same testing data. Few experiments were demonstrated to find the best parameters setting. Changing the “Pruning” values gradually from 500 to 5000 did not affect the recognition result at all. In contrast setting the “PPR” value to True decreased the confidence score of the mismatch result. Setting the “ConfidenceRejectionThreshold” to 60 helped in finding the right match for some of the mismatched with low confidence as well as rejecting some of the mismatched.

The word-based approach and the phone-based approach were tested using different data sets. For the word-based 450 audio files were used such that there were five audio files for each input in the grammar file. The audio files were categorized based on the original words. Table 1 summarizes the recognition result. Category Star

Total No. of Audio Files 75

No. of Correct Recognition 34

Success Ratio 45.3

Jump

45

29

64.4

Blue

30

17

56.7

Cup

65

29

44.6

Frog

235

76

32.3

Total

450

185

41.1

However, the performance can be strongly improved if the acceptance of the input data is based on the confidence scores of the individual sounds and not the average of these sounds. Thus, if any sound is below the acceptance confidence score the input is rejected. This can be accomplished by separating the sounds of the words in the grammar file to be considered as a combination of words. The dictionary file needs to be modified to represent the individual sounds. This is similar to the representation of the phone-based approach. In order to reduce the human factors effect more audio files need to be recorded from various people.

Table 1: Word-based Recognition Results

The overall success ratio of the word-based approach is 41.1%. The major factor for the low success ratio is the mismatch between the input audio files and the recognition result in all of the categories. The first type of the mismatch is replacement of some sounds such as “d” instead of “t” and “d” or “k” instead of “g”. For example, the input “taw” is mostly recognized as “daw”. The second type is eliminating some sounds from the original input such as “p”, “f”, and “v”. For example in a word like “pwok”, it is recognized as “wok”. The third type of the mismatch is addition of some sounds such as “r”, and “w”. For example in a word like “fod”, it is recognized as “fwod”.

The phone-based approach was tested using 348 audio files. These audio files were categorized into five different categories. The first category is “Right Word”. In this group the audio files contain the correct pronunciation of all the original words in the grammar file (e.g. star and doll). The second category is “Right Letter”. In this category only the target letter in each word is pronounced correctly. For instance the target letter in the word “doll” is “L”, therefore words belonging to this categories are “toll” and “koll”. The third category is “Similar Letter”. The target letters in this category were replaced by similar sound letter. For example, in the word “green” the target letter “G” is replaced with “K”, so the word becomes “kreen”. The fourth category is “Different Letter”. In this category the target letter was replaced by totally different letter. For example, in a word like “cherry” the target letter, “ch” was replaced by “r”, so it becomes “rerry”. The last category is “Eliminate Letter”. In this category the target letter was deleted from the input. The overall performance of the phone-based approach is 73.8% which is better than the word-based approach. Table 2 and Table

The mismatch recognition can be attributed to both technical and human factors. The major technical factor is that the recognition uses the average confidence score of the word’s sounds and not the individual utterance confidence score. For instance the recognition of the word “frog” is based on the average confidence score of the individual sounds (i.e. f, r, o, and g). Therefore, if the confidence scores of these sounds are as follows: (F = 23, R = 60, O = 73, and G = 56), the confidence of the word will be 53 and the word will be accepted since it is above the default rejection threshold, which is 45. This is despite the fact that the recognition score of “f” is very low. In addition, the grammar file is complicated as all of the input words are short and similar. The recognition result can also be strongly affected by the human factors. These factors are the variations in clarity and constancy of volume of all the sounds, especially the

3 The PPR controls whether the recognizer performs phonetic pruning.

151

3 below summarize the success and error performance of the phone-based approach respectively. No. of Total Audio Files

Right Word

47

No. of Correct Recognition (High Confidence) 32

Success Ratio

Success Ratio

Category

Comparison of Specific-Purpose Approach Results

68.6

120 100 80 60 40 20 0

word confidence letter confidence

Right Letter

78

49

62.8

Similar Letter Different Letter Eliminate Letter Total

46

22

48

88

71

80.7

89

83

93.3

Figure 1: Comparison of the phone-based approach result

348

257

73.8

6. CONCLUSION AND FUTURE WORK Speech recognition can be used in order to provide feedback for children with Down Syndrome. However, the accuracy of the recognition is hard to measure as it can be affected by human and technical factors.

1

Right Word Right Letter Similar Letter Different Letter Eliminate Letter Total

Low Confidence 12

High Confidence -

Rejected 2

Error Ratio 31.4

27

-

2

37.2

15

9

-

52

17

-

-

19.3

4

-

2

6.7

75

9

6

25.9

3

4

5

Categories

Table 2: The success ratio of the phone-based approach Category

2

The performance of the phone-based (sounds rather than words) approach suggested in this paper provides a big improvement on the word-based approach of word variation recognition. This is because it only focuses on a single sound and does not suffer from the ambiguity problems resulting from recognizing multiple variations of a single word. Further enhancements and research need to be done in order to improve the application performance and provide a wider range of phonological processes. As a first step the acceptance decision of the speech recognition should be based on the individual letter confidence score. The second step would be getting additional audio data from children with DS at Rashed Paediatric Therapy Centre. The data will be used for evaluating the application, finding the appropriate recognition parameters settings, and identifying more speech production problems.

Table 3: The error ratio of the phone-based approach

The major reason for the error is the acceptance of faulty target letters as correct. This results from the high overall confidence score of the word, although the target letter has low confidence. Another reason for the error is the human factors discussed before. The solution is similar to the word-based approach. The decision of accepting or rejecting the input should be based on the target letter confidence score instead of the overall confidence of the word. This will not only reject the mismatch but will also reject the input that does not pronounce the target letter clearly.

In addition to the above, a methodology needs to be developed in order to deal with prolonged word and phone durations, pauses between sounds in a single word, and interruptions of the prompts.

After implementing the above mentioned solution, the overall success ratio increased to 84.2%. Figure 1 represents a comparison between the original approach and the modified approach for the phone-based implementation. The confidence score should be set into a proper value that would not reject a huge number of the correct input. It should be set in a way that enables the child to pronounce the letter clearly and in an acceptable manner.

ACKNOWLEDGEMENTS We would like to thank Dr. Eman Gaad for sharing her ideas and for her encouragement. We would also like to thank her for introducing us to the Rashed Paediatric Therapy Centre. REFERENCES [1] Black, B. (2005). Down‘s Syndrome, Computers and the Curriculum,. Retrieved November 28, 2005, from http://www.inclusive.co.uk/severe_complex_special_need s/infosite/bblack.shtml [2] Black, B. and Gourley, J. (2005). Speaking for Myself [CD Rom], US: Topologika. [3] Workbook of Phonological Processes, Rashed.Paediatric Therapy Centre, 2005. 152

[4] Buckley, S., Improving the Speech and Language Skills of Children and Teenagers with Down Syndrome, Down Syndrome Information Network, 1(3), 1999. [5] Buckley, S., & Le Prevost. P., Speech and language therapy for children with Down syndrome, Down Syndrome Information Network, 2(2), 2002. [6] Carlson, E. (2002). Study Sheds Light on Down Syndrome and Language. University of Wisconsin. Retrieved November 29, 2005, from http://www.news.wisc.edu/7937.html [7] Deroo, O. (2005). A Short Introduction to Speech Recognition. Retrieved November 28, 2005, from http://www.babeltech.com/download/SpeechRecoIntro.pd f [8] Hamilton, C., Investigation of the articulatory patterns of young adults with Down syndrome using electropalatography, Down Syndrome Information Network, 1(1), 2001. [9] Hosom, J. (2005). The CSLU Toolkit: A Platform for Research and Development of Spoken-Language Systems. Retrieved December 3, 2005, from Oregon Health & Science University, Center for Spoken Language Understanding Web site: http://cslu.cse.ogi.edu/toolkit/Toolkit_slideshow.htm [10] Downs Syndrome (2000). Retrieved November 11, 2005, from http://www.inclusive.co.uk/support/downs.shtml [11] Kirriemuir, J. (2003). Speech Recognition Technologies. Retrieved December 5, 2005, from http://www.jisc.ac.uk/uploaded_documents/tsw_0303.pdf [12] Kirschning, I., CSLU Toolkit-based Vocabulary Tutors for the Jean Piaget Special Education School. Proceedings of InSTIL/ICALL2004, Venice, 2004, 17-19. [13] James, F. A. (2005). Lecture 12: An overview of Speech Recognition”. Retrieved December 3, 2005, from University of Rochester, Computer Science Department Web site: http://www.cs.rochester.edu/u/james/CSC248/Lec12.pdf [14] Speech Recognition Systems. (2005). Retrieved December 5, 2005, from University of Edinburgh, Communication Aids for Language and Learning Web site: http://callcentre.education.ed.ac.uk/downloads/speech_rec ognition/introduction.pdf [15] Stoel-Gammon. C., Down syndrome phonology: Developmental patterns and intervention strategies, Down Syndrome Research and Practice 7(3), 2001, 93-100. [16] Speaking for Myself. (2005). Retrieved December 1, 2005, from http://www.r-e-m.co.uk/cgibin/xrem/S_/U_/M_1/T_26314/G_/SN_ [17] Nuance Speech Recognition System. (Version 8.0). (2001). US: Nuance Communications. [18] Booth, S., personal communication, December 12, 2005.

153

$PDF Teaching Math to People with Down Syndrome ...$

PDF Teaching Math to People with Down Syndrome ...

$pdf-1411\babies-with-down-syndrome-a-new ...$

pdf-1411\babies-with-down-syndrome-a-new ...

MCC Down syndrome awareness.pdf

Down Syndrome Fact Sheet.pdf

Automated Down Syndrome Detection Using ... - Semantic Scholar

Down Syndrome Day Flyer.pdf

Automated Down Syndrome Detection Using ... - Semantic Scholar

Autism and Down syndrome