Always-On Voice Trigger

Always-On Voice Trigger Technology

“Magic Word” Provides Hands Free Triggering of Consumer Devices

The Voice Trigger idea is quite simple really: The integrated product wakes up when you say the proper “magic word“. Well, it doesn’t really “wake up”, since voice trigger identification requires a sophisticated automatic speech recognition (ASR) engine to run continuously in the background. It can run for hours and days, ‘hunting’ for the voice password. Not only does this poses a power consumption challenge (especially for battery-powered devices), it also has to be accurate enough to eliminate false-alarms by extraneous speech or sounds, yet avoid mis-alarms occur when the voice trigger is uttered. Recently, Voice Trigger is being praised as one of the latest advances in embedded voice recognition technology. While the value of voice trigger to hands free operation is undoubted, facts are voice trigger was already introduced to the embedded market before the year 2000. Rubidium began shipping its voice trigger technology in 2001. It was integrated into a funny little creature called “Nobby”. Nobby was a battery-powered handsfree alarm clock which, after woken up by calling its name, responded to a certain set of voice commands with a variety of cheeky answers. Since then, voice trigger has become part of Rubidium’s basic Voice User Interface technology suite. It is offered for integration into any applications and/or product, regardless of its size or computational resources. It can be used by OEMs for everything from car kits, to home appliances, toys and smartphone applications. Voice trigger technology is nothing new, but recently it has gained more attention with the increase interest in voice technology as a mode of operating electronic devices. It  has been used for years in applications such as speech analytics and surveillance but what differentiates the quality of voice trigger offerings on the market is the amount of resources they consume, power drain and computational resources – while “waiting” to hear the voice trigger. This is where Rubidium’s voice trigger excels in the market place. We offer a low resource, small footprint solution with voice trigger capabilities built in. Rubidium’s Voice Trigger technology is available as either a software solution or a standalone chip, as part of the RDE IC series. Rubidium has accumulated vast experience with different cores and platforms, including Ceva’s TeakLite, ARM7 and ARM9, and CSR BC5MM and CSR8670, to mention a few. Implementation of Rubidium’s Voice Trigger has been performed on chips with as little as a few MIPS and only several Kbytes of RAM! This allows the Rubidium Voice Trigger to be integrated into virtually any type or scale of consumer product, regardless of its resources, complexity or price point. Rubidium’s voice trigger can be used as a self-contained application for voice wakeup, or in conjunction with other Rubidium voice interface products like ASR, TTS, voice storage/playback and Biometric Speaker Verification. In terms of language support, Rubidium’s ASR is language independent, since it does not rely on any language model (unlike technologies that are used over the cloud). Our ASR is trained from scratch for each application, so we do not rely on any language-specific tools or building-blocks (like acoustic models). Rubidium has developed ASR applications in more than 12 languages. However, using our proprietary development method, we can develop any application in any language in a short period of time.  Previous experience with the language is not required.

Voice Trigger Usage Scenario

At Rubidium, we have found that the latest speech processing solutions encompass utilizing both embedded speech processing and speech processing over the cloud. The client uses voice activation and speaker id to initiate the speech processing activity and then connects to the cloud based ASR. The cloud based ASR utilizes the strength and power of the cloud for interpreting text phrases and their meaning. In this scenario the device is always listening, utilizing very low resources and can be awoken at any time. The diagram below illustrates the flow of the solution described above. Mode 1 – VOX – is the “standby mode” where the SRS is waiting for an audio cue before it moves into Mode 2 or 3. When the VOX “wakes up” there are two options.

  • Option one is when the VOX kicks off Mode 2 which is the voice trigger mode, targeted to continuously search for a specific activation command. Once the command is recognized, Mode 3 – the full recognition mode – is kicked off.
  • Option two would be the VOX immediately kicks off Mode 3 – the full ASR mode – represented by the red box and arrow. This transition is suitable when there is no voice trigger, and all the voice recognition commands are made active immediately upon identifying an acoustic cue. This option is less recommended since it is more susceptible to recognition errors due to Mode 1 not being very selective and being subject to false alarms.

All of the above transpires locally on Rubidium’s embedded ASR solution. This solution is used either when the cloud is not available, or as a competing functionality to the full-blown ASR performed over-the-cloud. This alternative to over-the-cloud ASR is utilized for one of the following purposes:

  1. Supporting continuous listening (both VOX and Voice Trigger modes) which is impractical to implement over- the-cloud due to handset power consumption, network traffic load, privacy issues and more
  2. In cases where there is no coverage (cellular or Wi-Fi) that is essential to support over-the-cloud services
  3. For elementary commands like weather, time and date, invoking applications, voice dialing (name and/or number) and other similar functions which can be accessed directly on the handset, therefore reducing power consumption and internet traffic and lowering the response time of the ASR. Furthermore, it allows local adaptation to the user’s speech and interface habits, and hence provides for increased user satisfaction.

However you decide to integrate Rubidium’s voice trigger technology, one thing is certain – voice trigger is rapidly becoming a must have feature in today’s VUI design environment. Hands Free has become a catch phrase for safer and more secure operation.