*This article is a Japanese translation by Macnica of a blog written by an engineer at DSP Concepts.
Edge computing, where service delivery and data processing occurs at the "edge" of a local network rather than through a cloud provider, has driven the shift away from cloud computing in recent years. Information security and privacy, reduced latency, and increased intelligent applications are the key reasons for the adoption of edge devices across infrastructure, enterprise and consumer market verticals. According to a study by Allied Market Research, the global edge computing market is expected to grow at approximately 33% annually.
Consumers are aware of the benefits of voice-enabled products. Voice interfaces are appealing to those who value convenience, as well as those who are reluctant to continue using touch-based controls due to hygiene concerns. However, concerns about privacy, accuracy and the device's ability to recognize different accents and dialects are hindering widespread adoption of these products.
75% of consumers surveyed in 2020 reported some level of concern, with 30% saying they were "very concerned." Concerns over user privacy are one of the major barriers to overcome. With traditional voice recognition systems, language coverage is also a major concern for users, with frustration resulting from inaccurate recognition in different accents.
Designed with energy efficiency and low cost in mind, Arm Cortex-M series processors provide an ideal target platform for building voice control capabilities into a wide range of embedded applications, from fitness trackers to battery-powered remote controls to home appliances. These processors give product manufacturers the opportunity to deliver cost-effective and efficient voice-enabled products, but it is increasingly important to provide solutions that address the lack of reliability in the above areas.
Fluent.ai and DSP Concepts develop accurate, noise-resilient speech recognition
Bring it to the edge of Arm-based platforms!
In noisy environments, the solution overcomes noise from sources such as washing machines, HVAC, and fan hoods, and achieves extremely high accuracy in recognition of wake words and commands at close range (within 1m), with similar performance at longer ranges (>3m) thanks to a multi-microphone design using technology from Fluent.ai.
TalkTo combines advanced signal processing techniques to deliver a clean audio signal to voice assistants and speech recognition engines. TalkTo's capabilities can be tailored to scale to a wide range of design requirements, platform limitations, acoustic environments and use cases. Used in conjunction with Fluent.ai Wakeword and Fluent.ai "Air" speech intent recognition models, this audio front-end (AFE) delivers an accurate and reliable voice interface with robust noise rejection capabilities.
Audio Weaver is a low-code/no-code audio development platform that accelerates and simplifies the process of building and developing voice-enabled products. Audio Weaver allows developers to develop platform-independent designs and deploy them to a variety of target products without redesigning. It enables rapid testing and iteration of audio designs without hardware, allowing developers to later work in parallel on individual features that seamlessly integrate into the final product design.
Fluent.ai Air delivers accurate, noise-robust speech recognition supporting multiple different accents and languages in one system, addressing consumer concerns about accent-sensitive systems without sacrificing memory size or CPU clock speed. The system's linguistic flexibility enables manufacturers to utilize cost-effective processors to deploy a single SKU across a wide range of geographic markets, saving both cost and development/localization time.
Privacy is another user concern that this solution addresses. Edge-based voice-enabled devices operate offline, or at the fringe of the network, so they operate as sealed systems that do not require communication with the cloud. This method of local operation significantly reduces the latency of command recognition and processing, enabling a seamless user experience that is private by design.
The combined software solution from Fluent.ai and DSP Concepts can be deployed on any Arm Cortex series processor and is suitable for the space-saving Cortex-M series.
How does it work?
Traditional automatic speech recognition (ASR) models convert speech to text and then use cloud-based natural language processing in the target language to determine user intent (Figure 1). This approach requires transmitting large amounts of data and using significant amounts of computing power. Moreover, this approach to ASR introduces latency that prevents natural dialogue.
Fluent.ai has developed a model that seeks to solve the compute footprint, reliability, and latency issues present in traditional models. Employing a proprietary neural network algorithm, Fluent.ai's speech-to-intent approach is language and accent agnostic, and determines intended actions based on received audio without converting speech to text or relying on cloud-based natural language processing (Figure 2). The model operates at the edge without relying on an internet connection or third-party language processing.
The combination of Fluent.ai Wakeword, Fluent.ai Air, DSP Concepts Audio Weaver and DSP Concepts TalkTo constitutes a noise-robust, multilingual voice user interface (VUI) that is an ideal solution for manufacturers of white goods and other products.
Where do we go from here?
The next stage of VUI has the potential to understand commands at a “better than human” level, cut through environmental noise and determine user intent regardless of language or accent, delivering a customized, frustration-free user experience.
The flexibility of the Audio Weaver platform allows DSP Concepts to provide developers with an end-to-end solution for programming, real-time debugging and tuning, reducing the development overhead required to create edge-based VUIs for a variety of use cases.
Fluent.ai has partnered with DSP Concepts to bring this flexibility and ease of use to developers of voice-enabled solutions on Arm Cortex-M4 and -M7 based devices.
Recommended related articles
Inquiry
If you have any questions regarding this article, please contact us below.
DSP Concepts Manufacturer Information Top
DSP Concepts Manufacturer Information If you would like to return to the top page, please click below.