Innovation seems to be the hot topic of the moment. Countries have to innovate, or their populations are doomed to a lower economic status. Companies have to innovate or die. Education has to innovate to propagate educated workers and citizens that can compete in a flat world.
SRI has been innovating for 60 years. Curt Carlson, President and CEO, has written a book (well worth reading) on innovation (Innovation, The Five Disciplines for Creating What Customers Want) and he will be the keynote speaker at the SIIA’s Ed Tech Industry Summit on May 18 in San Francisco.
Recently, I had the opportunity to talk with Valerie Crawford and John Brecht of SRI’s Center for Technology in Learning. They provided me with a preview of a really cool new technology, and also with great background information on what SRI does and how it works.
The technology was built around voice recognition software developed at SRI and is designed to support building fluency in English, through either formal or informal learning approaches. What was most cool was that the software worked with Second Life (SL). Second Life is a three dimensional world where visitors can travel, interact with each other, learn, build stuff, take classes, buy things, conduct meetings, etc. Once logged into SL, using SRI’s technology, students interact in social situations through speaking. The SRI software not only conveys the meaning into the Second Life world, it responds and also provides feedback about how well their speech compares with native speakers’. In SRI’s SL-based environment, users can engage in game-like activities, talking through a microphone and listening to responses. This allows them to practice and assess their English-language fluency, comprehension, and pronunciation, all with speech recognition technology.
The core software for the speech recognition technology was originally designed for the exact opposite function. The software was originally written so that the computer would allow for errors in speaking, but would still recognize what the person was saying, a forgiving speech recognition system. In order for this to work, the developers had to understand the differences between how a native speaker would speak and how a non-native speaker would speak. The software has to select meaning from different possible interpretations.
The education application uses those same capabilities, but with the opposite purpose. It compares the user’s speech to an optimal voice pattern, scores the results, and provides feedback and possibly remediation to the speaker. For example, take the pronunciation of the letter “t,” which has different sounds depending on the context (wrestle, put, better, vacation, these, thing). One type of error would be to use a wrong but valid “t” sound in a word. A different error is to pronounce a “t” like a “d” or “z”. (My mother-in-law is Hungarian, so I always know when “ze dinner is on ze table.”) For each context, there must be trapping sequences for common speech errors.
A second hurdle involves interfacing the technology with SL.
Second Life has an API for developers wanting to create new applications. You would think it would be a quick leap to interface the speech engine with Second Life. But, the licensing agreement calls for Linden Labs to have unrestricted use of all intellectual property in SL. In order not to lose the IP, SRI had to put the speech recognition and all software architecture for the system outside of Second Life, without diminishing the user’s SL experience.
They were able to accomplish this through a combination of client software (on the user’s machine) and server software at SRI. Oversimplifying a little, the client software captures what the user is saying and communicates data over the Internet with the SRI server. SRI has also integrated into the SL environment to capture information about avatars’ actions in SL, and then communicate actions back to SL.
One vision is to develop this into a SL resort island. Visitors would have the ability to practice and self assess their language skills in a risk-free setting, and learning experts acknowledge that language use in context is the best way to build language skills, and a lot more fun than memorizing conjugations and verb tenses.
For example, visitors might go to a bar and ask for a drink. The bartender avatar might respond, “I think you asked for a drink, but your pronunciation needs a little practice.” They may be given a language improvement mission to complete before they get the drink. When they come back to ask again, if they still need practice, the bartender might respond, “Yes, I understood you, you want a drink. You still need some practice on pronunciation. If you want a glass of water I can give you some now, but if you want something more upscale, first go to the language gym to work on your letter ‘R.’”
As Valerie made clear, the speech recognition technology that interfaces with SL is not a stand-alone product. The CTL group of SRI does not create products but rather focuses on innovating new learning technologies, designs, and resources. They perform five different types of services, generally in the disciplines of math, sciences, and language.
1. They develop assessments of student learning (devising and deploying instruments to assess different cognitive skills)
2. They evaluate methods or products to determine their effectiveness in cognitive, information, or skill transfer
3. They research and study different methods that try to improve teacher effectiveness, specifically looking at how teachers learn complex forms of pedagogy. In fact, they host one of the longest running online social networks: Tapped-In (tappedin.org)
4. They develop technologies that can become the basis of products (creating feasibility demonstrations, and then partnering with others who will produce and sell the product). Once a technology is developed, they may also help specify an infrastructure that will allow the technology or products to scale.
- They provide consulting to commercial firms to leverage existing and new research to inform product design and product strategy, to lower innovation risk and enhance product effectiveness and traction.
The SL/Speech Recognition technology falls into the first part of the fourth service, technology development. To actually take this technology and turn in into a product that people can buy may still take $1–2 million and 9–18 months. That is where SRI looks for partners.
A partner can be a joint development effort, technology can be purchased, it can be licensed, or there can be royalty arrangement.
So what can we learn from SRI?
First, SRI may have a technology that you can use to create a business. SRI has a lot of different IP that can be applied to education, and they are looking for partners to commercialize the technology into products.
Second, successful innovation requires overcoming a number of different hurdles. CTL started with an existing voice recognition software, developed years ago at SRI’s Speech Technology Lab. There are a slew of complexities in figuring out how to best leverage this technology for learning in the current and near-future learning technology market, through integrating it with learning environments, such as Second Life and mobile devices. Creating successful products almost always takes substantially more resources than expected; plan to have the people and resources to continue moving forward even when obstacles slow you down.
Third, interesting ideas often come from where you least expect them. When developing a forgiving voice recognition application, who could have predicted that the product might also be used to grade the quality of speech of English language learners?