Designer Software home

Nothing New Under the Thumb

Mobile text input has provided a fertile field for tinkerers over the past four decades. Flurries of invention and research have followed the introduction of the Touch-Tone phone, the PDA market expanding (and contracting), and the ubiquity of mobile phones with SMS. A new flurry should be expected with the advent of wireless Internet devices.

At least since Graffiti® and T9® made their commercial debuts, a continuous parade of "new and improved" input systems for pocket-sized devices have appeared. MacKenzie and others developed metrics to better compare all of those keypad input methods. [1] Numerous key layouts and input devices have been proposed and measured, with champagne prizes showing that even human performance testing may be open to innovation.

But it seems I still see, every year, at least one graduate student project determining the "optimal arrangement of letters" on a touchscreen or 12-key phone keypad! And last year I received a request for concept validation services from someone developing yet another Mobile Keyboard That Will Change The World.

Is there anything new? I don't think so. Just old ideas in new clothes, to borrow a phrase. [2]


Means to an End

The ultimate goal of mobile text input is to type, as fast as you can think, on a mobile device with limited real estate. Nearly all existing solutions include attributes that hinder or detract from that goal, and/or feature a user interface that draws some of the user's attention away from more pressing needs – such as watching where he or she is walking. [3]

(By the way, there are lots of summaries of mobile input methods available. A write-up by I. Scott MacKenzie [4] or Poika Isokoski [5] is a good place to start.)

The problem is, there's only so many ways to solve the problem. There's only so many core technologies in your toolbox to work with, and each must trade off maximum potential human performance against the inertia of existing standards and "good enough" solutions. [6]

Referring to the following table, one way to map the mobile text input invention space to date is in two dimensions: a type of input hardware and a disambiguation/correction strategy. (A sample of existing solutions is listed for each combination.)

 

 

Per Letter

Per Word

Keypad (a small number of keys)

 

(Including two-key, multi-tap, chording, and N-way key solutions)

Touch-Tone variants,"SMART" mode, Data Bank watch, multi-function calculator, Microwriter/Twiddler/ChordTap, data gloves, hat switch

Bell Labs, T9 et al, SHK

 

Keyboard (a number of small keys)

BlackBerry (QWERTY), Fastap

XT9 ("SloppyType")

Touchscreen, trackpad, digitizing tablet

Tap

Soft keyboard (QWERTY, OPTI, Metropolis), Dasher

XT9, iPhone, Android

Gesture

Unistrokes/Graffiti, EdgeWrite, Cirrin, Quikwriting

Transcriber, SHARK, Swype, T9 Trace

Joystick, wheel

TwoStick , Nokia 7280 spinner, iPod Click Wheel, KeyStick, Dasher, nScribe, EdgeWrite

SloppyType for joysticks

Tilt accelerometer

TiltType, TiltText

 

Choose one readily available (hand-operated) input device with mobile proportions:

•    keypad (a small number of keys) or keyboard (a number of small keys)

Data gloves

•    touchscreen or trackpad

•    joystick

•    tilt accelerometer.

Or, if you're daring [7], an up-and-coming technology like:

•    eye tracking

•    virtual reality (data gloves, etc.)

•    brainwave detection.

Then choose one disambiguation/correction strategy:

•    per-letter (explicit input)

•    per-word (ambiguous input).

I've grouped the input hardware, and alternatives like speech recognition, for further consideration below. Each disambiguation/correction strategy has its pros and cons.

Per-letter explicit input methods resemble our desktop typing experience: lock-down or correct each letter before moving on. They are mostly deterministic and many are "eyes free" (like the Twiddler [8]). But due to the reduced space these methods trade off speed versus accuracy, or they impose a learning curve on impatient users. And international character sets reveal the naïve simplicity of solutions optimized for only 26 letters.

Per-word ambiguous input, typically using the entire input sequence (so far) to disambiguate and offer the most likely word, offers benefits like automatic letter accenting. But it is challenged by limited space for dictionaries, higher visual attention requirements, sensitivity to spelling errors and typos, and is overall less "intuitive" than per-letter input.

Somewhere in-between, perhaps, is syllable-based input. [9] Mobile text input for Chinese, Japanese, and Korean gravitates towards this middle ground because of the morpho-syllabic [10] nature of Chinese characters, but the stenotype machine is syllable-based as well.

Let's ignore word/phrase completion [11] and abbreviation expansion. [12] Not only is word completion a well-known technology, it can be applied to almost any input method to reduce the total number of inputs. [13]


Mechanical Keys

Ironically, even after more than a decade of text messaging, keys on mobile devices are still not optimized for text input. [14] But because people still use a mobile phone mostly for voice, manufacturers are obligated to ensure that the dialing digits 1-9 and 0 are simple keypresses. That imposes another constraint on "innovative" mechanical-key-based solutions.

Here are the typical key-based approaches and a few alternatives.

5- to 20-key

Micro-writer device Keypads with fewer keys than letters need disambiguation – whether the keypad uses the original Touch-Tone layout, the semi-QWERTY of RIM's SureType®, optimized letter arrangements like JustType®, the display-mapped keys of TNT, [15] or a one-handed keyboard like the original Microwriter/CyKey and others. [16]

For per-letter explicit input, the methods are:

•    Two-key, i.e., press the ABC key and then the 3 key for "c" [17]

•    Multi-tap / multi-press / triple-tap, i.e., press the ABC key three times quickly for "c" (typically including timeout, a challenge particularly for older users) [18]

•    Chording, i.e., press the ABC key and a (third) auxiliary key simultaneously for "c".

A further alternative is to allow each key to generate multiple keycodes, e.g., when pressed in a direction other than straight down. But few manufacturers have shipped such a keypad, [19] likely due to its increased cost and lower reliability.

Word-based disambiguation, often called predictive text [20] and appearing on most mobile phones, [21] offers nearly one keypress per letter:

•    Press the ABC key followed by the ABC and TUV keys for the word "cat".

QWERTY [22]

Black-Berry keyboard The original BlackBerry keyboard's design made it possible for at least some large-thumbed executives to type on the tiny keys of a complete QWERTY keyboard. The visual familiarity of its layout gives QWERTY a distinct advantage. [23]

But there are regionality problems: QWERTY is a not-quite-standard standard, including dozens of international variants. [24] Wouldn't it be a further act of North American hubris to impose the QWERTY layout on the rest of the world?


Stylus or Finger

Okay, let us say that, as far as novelty goes, mechanical keys are tapped out. [25] What about input methods for touch-sensitive surfaces? The iPhone has certainly renewed manufacturer interest in touchscreens for high-end mobile phones, though touch-sensitive input devices still face a few practical challenges. [26]

Text input methods for such devices may have started back in the era of the light pen; here are the most common approaches for smartphones and PDAs.

Touch, Tap, 2D array

XT9 / Sloppy-Type input system The most ubiquitous input method is the soft keyboard version of QWERTY, though various algorithms have produced optimal layouts [27] for a particular language. It is simple hunt-and-peck typing on labeled keys. The very small, discrete targets, subject to Fitts' Law, result in either slow-and-careful entry or increased error rates, though near-miss letter correction is possible using, e.g., letter tri-grams. [28]

You can also put a phone keypad on a touchscreen, as we did with the TI Avigo [29] and the Philips Nino. Though usable, and consistent across devices, the large keys didn't offer as much benefit on a PDA that required the use of a stylus. [30]

Applying what we had learned about word-based disambiguation while developing and refining T9 Text Input, we prototyped and developed a word-based auto-correcting system (informally called "SloppyType" but now the basis for XT9® Smart Input) for touchscreen and thumb keyboards and even virtual keyboards. [31]

Gesture, Handwriting

Gestures have long been part of keyboards married to a touch-sensitive device. [32] Synaptics even developed a capacitance trackpad layer allowing block letter writing across the top of a modern no-profile keypad design like the RAZR's.

Writing is very natural – once you've made it through grade school – but relatively slow no matter what technology is used or how accurate the recognition engine is. [33] Digital pens remove the need for a touch-sensitive surface or a digitizing tablet, but it's still handwriting.

Graffiti, EdgeWrite, and other simplified stroke alphabets [34] allow rates up to twice that of natural handwriting, once the new shapes are committed to memory. Simplifying gesture input even further: an on-screen diagram containing letters, visual cues, and boundaries that lets the novice user employ the method immediately without memorizing a new stroke alphabet. Cirrin and Quikwriting are two examples of this. Both also allow complete words to be entered by stringing the gestures together.

SHARK, or ShapeWriter, input system SHARK, now known commercially as ShapeWriter™, does for per-word entry what Cirrin et al did for per-letter entry: it employs a simple framework of rules and visual cues (based on a QWERTY soft keyboard in this case) to get the novice user started and then allows performance to increase with skill and memorization. Swype™ offers similar benefits, though its approach is slightly different. Swype's disambiguation benefits from more accuracy at each vertex (the location of each letter) whereas SHARK benefits from more accuracy along the path (the shape of the complete gesture).

Though "natural" handwriting recognition on mobile devices was tarnished by the first Newton releases, the full-word write-anywhere Transcriber for Windows CE showed that progress has been made over the years. Early computer-based (offline) handwriting recognition efforts also attempted cursive and shorthand, such as the Pitman or Gregg shorthand systems used for dictation. Like the stenotype, these shorthand systems are phonetic rather than alphabetic and enthusiasm for their performance potential is tempered by a very long learning curve. [35]


Joystick, Tilt, Wheel

Game consoles have gone online and massively multi-player, making text entry more important. If only the game controller is used, the joystick can be employed to select a letter from an-onscreen array; or, in combination with a secondary joystick or other controller keys, a two-step approach, typically choosing one letter set from a marking menu (pie menu) and then one of 4-8 letters in the set. [36]

Dasher input system The advent of tiny, affordable accelerometers produced a litter of "Gee, what can we do with these?" research studies, including selection of specific letters from ambiguous keys in a two-step approach. [37] Or, if you want to drive yourself crazy, dynamic solutions like KeyStick keep you guessing on every tilt and keypress. At least Dasher maintains the same relative position of each letter (while making you feel like you're playing a videogame).

Nokia's 7280 model "lipstick" phone features a spinner wheel, a mechanical version of the touch-sensitive Click Wheel on the iPod which provides a similar letter ribbon (known as the date stamp method) for searching music titles. [38]

Can handwriting be emulated with a joystick? A number of people have tried that too. [39] Other gesture-based approaches for input devices like joysticks and trackballs include Isokoski's device-independent method and further applications of EdgeWrite. [40]

Similarly, we applied the SloppyType word-based auto-correction technology to each of these non-keyboard input devices, using the tilt of the joystick, or the device itself, or the change in direction of a wheel, as an approximate selection of the area surrounding each letter.


Alternatives

Fastap keyboard What about other approaches? Well, there's Fastap® – David Levy's truly novel invention from the early '90s. It takes advantage of jammed-together keys on small keyboards to naturally encourage chording. [41]

One degree removed from reality, perhaps, but showing some originality: virtual keyboards, such as laser projection keyboards; detection and interpretation of muscular tension, galvanic skin response, or sign language; and others. [42] For example, Senseboard® tries to interpret hand muscle movements representing the finger positions used when typing on a PC keyboard. [43]

Moving away from the constraint of mobile device dimensions, but still enlightening:

•    Stenotype machines for court reporting and live captioning. The syllable-based chording keyboard is designed for raw speed. The system becomes personalized as each operator develops shorthand appropriate to the transcription context.

•    Assistive technologies, [44] such as orbiTouch™ two-hand chorded input.

And what about extra-thumb solutions? [45]

Speech Recognition

Speech is very natural and very fast. [46] Speech recognition is getting better as technology improves, on desktop systems at least. The low-power processors and limited memory of mobile devices, however, constrain speech recognition accuracy; ambient noise (when away from a closed-door office) is of no help either. To compensate for these limitations, the staff at Tegic (and other recent Nuance acquisitions) explored multi-modal remedies such as combining the results of speech recognition and 12-key input to resolve ambiguities. [47]

The other practical issue for speech recognition is that of privacy. Haven't you had enough of listening to one side of other peoples' conversations, on the bus or in the line at Starbucks, as it is? European countries like Finland have shown, though, that social etiquette does adapt to new technology once it becomes ubiquitous, so the issue of privacy may work itself out. [48]

Eye Tracking, etc.

Eye tracking systems are improving, at least for informing usability studies if not also text input. [49] Software is getting better at compensating for normal eye jitter, [50] but desktop systems use multiple infrared emitters and a high-resolution camera and they are always pointed toward the user's face; a mobile phone is not so fortunate. An easier solution for mobile devices is to position the detector close to the eye. With miniaturization, they could become integral with eyeglasses [51] – simple, if it weren't for the fact that most people don't wear glasses unless they have to. [52]

No longer a novelty, a direct interface like brainwave detection would be ideal – truly typing at the speed of thought! The initial research has been promising, even inspiring, for those dealing with severe impairments. [53] It is reasonable to assume that, through biofeedback training, the brain could become even better at generating the signals that can be detected even while the input systems get better at decoding them. [54]

Adapt to Me

Perhaps there's a different solution, a long-tail [55] solution. One size doesn't fit all – but one API could. Imagine a Personal Input Device tailored to the abilities and preferences of the user, abetted by Bluetooth and employing simple and secure sync (thus avoiding the PCjr infrared keyboard problem [56] and keylogging). Imagine starting kids early with a good alternative to QWERTY [57] that is guaranteed to be compatible with every data system or kiosk [58] they come in contact with over the course of a day. [59]

Okay, a reality check: anything carried by anyone under the age of 20 – or over the age of 50 for that matter – is going to be lost or misplaced, repeatedly. Therefore, the Personal Input Device has to be inexpensive, [60] and any accumulated data needs to be backed up onto a server and/or PIN-protected on the device.


New is Old, Again

I expected that the next text input breakthrough would be so efficient (and, ideally, one-handed) that people would be willing to learn an unfamiliar layout or technique, and finally(!) give up the QWERTY keyboard on the desktop as well. But, in spite of some progress in speech recognition and the arrival of the "two-thumbed generation", a high-performance solution for mobile text input has not been developed – or perhaps we have yet to recognize and appreciate it.

Commercial input systems vendors incrementally improve their existing technologies for products that are already on the market and successful. That makes sense; pretty-good solutions often need refinement to make them even better for even more people. Scott Berkun notes that inventions are built upon the work of others, while Peter Denning considers innovation (the adoption of a new practice in a community) more significant than mere technical invention. [61] Concurrently, academic researchers are directing the user studies that objectively measure and compare the various approaches, and refine the models that help establish best practices in this field.

But a lot of time and money is wasted reinventing the wheel. [62] Perhaps it's just that first-year engineering students need to be assigned simple programming exercises, like "How could you arrange the letters on a touchscreen keyboard to reduce stylus travel?", or an advisor is ensuring that a new graduate student knows how to execute and write up a small research study. Perhaps young entrepreneurs across the world truly think that they are the first to realize that the Touch-Tone keypad is not optimal for text entry.

There is little excuse, however, for not spending a few hours with Google to discover what is out there already before embarking on a "glorious quest" with yet another Keyboard That Will Change The World – especially before wasting other people's money, and the patent office's time, on such a futile effort. [63]

My thumb is still waiting...

 


 

ENDNOTES  (Click on number to return to reference paragraph)
 

[1] MacKenzie, "KSPC (Keystrokes per Character) as a Characteristic of Text Entry Techniques", Mobile HCI 2002; followed by Soukoreff/MacKenzie, "Metrics for text entry research: an evaluation of MSD and KSPC, and a new unified error metric", CHI 2003.

[2] But don't misquote me, like the erroneously attributed quote to an official of the U.S. Patent Office, that "everything that can be invented has been invented."

[3] And let me make my position clear: Do not text and DRIVE! Even with "keygloves" (Lee/Hong, "Chording for Text Entry and Control in Mobile Phones", MobileHCI 2004; itself a rehash of Goldstein/Chincolle, "Finger-Joint Gesture Wearable Keypad", Interact'99).

[4] e.g., http://www.yorku.ca/mack/hci3-2002.pdf .

[5] e.g., http://www.cs.uta.fi/reports/pdf/A-1999-14.pdf .

[6] The original near-alphabetic Touch-Tone key layout, insufficient for typing even English, has been extended for 12-key mobile phones by manufacturer convention and ETSI standardization to address the needs of most Latin-based languages. So the inertia of the 12-key ABC standard includes the mass of these further standardization efforts. Satisficing behavior is also a factor, especially when there is no clearly-better alternative on the horizon.

[7] Or a researcher, unconstrained by manufacturers' evident reluctance to adopt any radical approaches.

[8] Popular with the wearable computing crowd, but the Twiddler seems to be nearing its end; see http://www.handykey.com .

[9] Including, perhaps, variable N-gram approaches like the original WordWise and/or LetterWise.

[10] Not ideographic, a minor distinction argued at length with the attorneys drafting our patent applications.

[11] As featured in systems like POBox (Masui, "POBox: An Efficient Text Input Method for Handheld and Ubiquitous Computers", 1999) and WordLogic.

[12] As featured in Instant Text; see http://www.fitaly.com .

[13] Tellingly, the more onerous the input method is, the more beneficial it is to include word completion! Word completion carries some added cognitive overhead when the user attends to the display of proposed words, in spite of the user's perception of faster performance. Recent research mentions include Kamvar/Baluja "Query suggestions for mobile search: understanding usage patterns", CHI 2008. Word completion appears to offer a net benefit in, e.g., Wobbrock et al, "In-stroke Word Completion", UIST 2006.

[14] Early mobile phones used rubbery keys; jeans-pocket-sized handsets offered no hope for the thick-fingered nor an aging population; and the latest wave of ultra-thin phones remove the benefits of decades of refinements on the desktop keyboard, including key spacing, tactile feedback, and full travel.

[15] Ingmarsson et al, "TNT: a numeric keypad based text input method", CHI 2004.

[16] The BAT keyboard, et al; see http://www.infogrip.com .

[17] The Touch-Tone keypad engendered many text entry solutions immediately after its introduction in the early '60s. Other two-key variations include pressing a discriminating key BEFORE the letter key.

[18] Countless people have thought of reducing the average number of keystrokes by optimizing the letter sequence on each key, like Less-Tap. Optimal letters-to-keys distributions for the telephone keypad goes back at least to 1986: Levine et al, "Computer Disambiguation of Multi-Character Key Text Entry". Openwave (aka Phone.com / Unwired Planet / Libris) offered an unpopular "SMART" mode which used letter tri-grams to dynamically order the multi-tap sequence; LetterWise is similar, without the multi-tap timing issue.

[19] Sony Ericsson has tried it on a couple of models, including the P1i.

[20] Nokia gets the credit / blame for introducing that confusing term.

[21] Led by T9 Text Input; plus Motorola's iTap, Zi's eZiText, and others.

[22] Why is QWERTY still relevant in the 21st century?!? Note Everett Rogers' discussion of technology adoption, and specifically the QWERTY typewriter keyboard, in "Diffusion of Innovations" [4th Ed.]. But see also "The Fable of the Keys" at http://www.utdallas.edu/~liebowit/keys1.html .

[23] Summarized in Zhai/Kristensson, "Interlaced QWERTY: accommodating ease of visual search and input flexibility in shape writing", CHI 2008. But at least one other study showed little transference between touch-typing on the desktop QWERTY keyboard and thumbing on a mobile equivalent.

[24] e.g., http://www-01.ibm.com/software/globalization/topics/keyboards/registry_index.html .

[25] Pun not intended, but welcome nonetheless.

[26] Such as the inconvenience of pulling out the stylus, gloves and weather conditions affecting the accuracy of finger use, or that neither of the two most prevalent touchscreen technologies, i.e., resistive and capacitive, works well for both stylus input and finger input; this dichotomy is magnified if gestures are used.

[27] e.g., OPTI (MacKenzie/Zhang, "The Design and Evaluation of a High-Performance Soft Keyboard", CHI 1999) and Metropolis (Hunter/Zhai/Smith, "Physics-based Graphical Keyboard Design", CHI 2000).

[28] As described in Vargas, US Patent 5,748,512.

[29] See http://www.ti.com/organizers/avigo/docs/avother.html .

[30] The iPhone has since shown that the finger can be used effectively on a mobile device touchscreen if the rest of the user interface is designed for it.

[31] See US Patents 7,088,345, 7,030,863, and related patent filings. In practice, the system can be tuned two different ways: careful typing allows any sequence to be typed and offers good word completion candidates; hasty typing is corrected at the end of each word when the most likely interpretation is offered.

[32] On a touchscreen keyboard or keypad, for instance, a short gesture instead of a tap can mean "shift this letter" or "choose the leftmost letter on this key".

[33] For the general population, block handwriting rates are estimated around 15 wpm, with cursive ranging up to 35 wpm.

[34] Jot tried to strike a middle ground, accepting both block letters and simplified strokes.

[35] Shorthand has been taught to secretarial staff as a whole-year course; full proficiency is needed since 100+ wpm is expected for some jobs.

[36] e.g., Weegie and XNav, both based on Quikwriting; also Wilson/Agrawala, "Text Entry Using a Dual Joystick Game Controller", CHI 2006.

[37] Partridge at al, "TiltType: accelerometer-supported text entry for very small devices", UIST 2002, and Wigdor, "Chording and Tilting for Rapid, Unambiguous Text Entry to Mobile Phones", 2004.

[38] But, even with rudimentary next-letter prediction, the 7280 was panned by all for serious text entry.

[39] I haven't reviewed the studies, but given the inverted pivot point it would seem to be less natural than handwriting with a pen, and at least as slow.

[40] Wobbrock et al, "Joystick Text Entry with Date Stamp, Selection Keyboard, and EdgeWrite", CHI 2004.

[41] But it certainly looks "different", and Digit Wireless has not had an easy time getting mobile phone manufacturers to adopt it.

[42] e.g., Saponas et al, "Demonstrating the Feasibility of Using Forearm. Electromyography for Muscle-Computer Interfaces", CHI 2008; also Haj-Rashid/Mohamed, "Hand Gestures Used In Manipulation System", 2001.

[43] Two-handed touch-typing in a full-size keyboard region on any flat surface seems like a great idea, but is sabotaged by the lack of tactile feedback.

[44] Mobile device input researchers have frequently rubbed elbows with assistive technology and augmentative communications (AAC) researchers. People with severe physical impairments gain great benefit from minimizing the total number of inputs; but in a sense the constraints imposed by pocket-sized devices limit the able-bodied too.

[45] By extra-thumb I mean "beyond manual entry", not growing more of them!

[46] The normal speaking rate is in the range of 120-180 wpm; slower if you have to think about what you're saying.

[47] See US patents 7,720,682, 7,881,936, and related filings.

[48] Perhaps personal "phone booths" (a la http://en.wikipedia.org/wiki/Cone_of_silence for those of an earlier generation) will start appearing everywhere.

[49] Wobbrock et al, "Longitudinal evaluation of discrete consecutive gaze gestures for text entry", ETRA 2008.

[50] Zhang et al, "Improving eye cursor's stability for eye pointing tasks", CHI 2008.

[51] See the OWL, the impetus that led to T9 Text Input, at http://www.inference.phy.cam.ac.uk/dasher/development/ .

[52] The recent fashion trend of supermodels wearing non-prescription frames contradict this statement; the "Borg" look around MIT still hasn't caught on elsewhere, though.

[53] e.g., Doherty et al, "Improving the performance of the cyberlink mental interface with 'yes / no program'", CHI 2001.

[54] But the adaptability of the brain comes at a cost: preliminary studies indicate that use of brainwave detection systems changes, at least temporarily, how the user's brain responds afterwards.

[55] Anderson, "The Long Tail: Why the Future of Business is Selling Less of More", 2006.

[56] A classroom full of kids who need merely to point their wireless keyboard at another's PC to be able to hijack it!

[57] "Baby Stenotype"?

[58] e.g., Magerkurth/Stenzel, "A Pervasive Keyboard – Separating Input from Display", PerCom 2003.

[59] Granted, this carries the risk of further complicating the lives of teachers, when two dozen students ask, for instance, how to generate an accented letter on each of their personalized input devices...

[60] i.e., less than any $100 PC it's synched to, or about the same cost as the sneakers left behind at school (or maybe at Joey's house?).

[61] Berkun, "The Myths of Innovation", 2007; Denning, "The Social Life of Innovation", CACM 2004 (summarizing and extending Drucker's "Innovation and Entrepreneurship", 1985).

[62] Given the barriers to entry in the mobile industry, a mere clone or minor variation of an earlier product will not result in commercial success – or early retirement. See also, on Barbara Ballard's blog, "Free Advice to Keyboard Inventors".

[63] Too bad there isn't a clearinghouse, as seen in the Linux community, to direct new efforts into more promising areas of research and refinement.