IUI '08: Multimodal Chinese text entry with speech and keypad on mobile devices

Yingying Jiang Chinese Academy of Sciences, Beijing, China
Xugang Wang Chinese Academy of Sciences, Beijing, China and Ministry of information Industry Software and Integrated Circuit Promotion Center
Feng Tian Chinese Academy of Sciences, Beijing, China
Xiang Ao Ministry of information Industry Software and Integrated Circuit Promotion Center
Guozhong Dai Chinese Academy of Sciences, Beijing, China
Hongan Wang Chinese Academy of Sciences, Beijing, China

In this paper Jiang et. al. created a multimodal text entry system that uses both keypad and speech entry to reduce the amount of key-presses, time to enter the characters, and number of resulting possible characters to choose from when using a mobile device.

Jiang et. al. identify the problem of chinese text entry on mobile keypads as slow and arduous and set out to improve the input method for these characters. The current method is called T9 in which roman phonetic characters (pinyin) corresponding to the sound of the chinese characters are input and then the desired characters are selected from a list of homophones. However this is slow and arduous so the Jiang et. al. proposed a method called "Jianpin" where the initial sound of the each chinese character the user wants is input via keyboard while the user simultaneously says the word they wish to enter.

For example, if the user wants to enter "wang luo" 网络 (network) into a mobile phone using Jianpin, the user presses "95" which corresponds to "w.l" while saying "wang luo" then the user selects 网络 from several other homophones.

Here is an overview of the input method:

A user study was run with 4 college students where 50 words were inputted in both the T9 method and the "Jianpin" method. They measured the number of key presses it took to complete the 50 words with each method. The results are as follows:

My spill:
The Jianpin input system sounds like a great way to reduce ambiguity in the selection set as well as speed up input.
My only bone to pick is that the input scheme requires voice input. I can imagine being on a crowded street in china with hundreds of Chinese entering voice input into their cell phones just so they can text.
It's just more noise pollution that way.
If they can make a faster system without voice input, I'll be impressed.

Knowing Japanese, I was really interested in how the Chinese entered text since they don't have a phonetic system like the Japanese. In the end, it really isn't all that different.

