Site Search

The evolution of AI voice technology

*This article is a Japanese translation by Macnica of a blog written by an engineer at DSP Concepts.

Introduction

In this article, we take a closer look at how the technologies from DSP Concepts and Fluent.ai can work together to develop better voice-enabled products, what it means for current and future products, how advances in artificial intelligence (AI) will impact voice product development in the near future, and other advancements in machine learning and the future opportunities for AI in voice-enabled products.

The solution you need for today's voice-enabled products

According to a report by Market Research Future, the global voice assistant market is expected to reach a market value of USD 7.3 billion by 2025, representing a CAGR of over 24%. Based on region and device type, voice-enabled products need to be competitive in performance and adaptable to different languages and regions to achieve maximum success in their respective market segments.


Further market constraints, such as high development and manufacturing costs and the sometimes difficult integration of voice technology, can be problematic for product manufacturers, who must spend significant development time tailoring their designs to different languages and regional accents to develop and deploy competitive voice recognition capabilities.

From a consumer perspective, the main obstacles expected for voice-enabled products are challenges with the acoustic environment, delays associated with interpreting and processing commands, and inaccurate voice recognition. These issues lead to the need for voice products that are robust to noise, offer a flexible command set that can be deployed across multiple geographies, and combine input signal processing capabilities (known as an audio front-end) that can scale to fit physical or cost constraints.

DSP Concepts and Fluent.ai Solutions

To meet the demands of voice product development, DSP Concepts offers flexible solutions, including the Audio Weaver platform and TalkTo audio front-end, which, combined with the edge-based Fluent.ai Air automatic intent recognition engine, comprise a toolset that enables product manufacturers to deliver noise-resistant systems with multi-language support, low latency, and flexible command sets, reducing development costs and accelerating time to market.

Audio Weaver

Audio Weaver is a low-code, hardware-agnostic audio platform that provides tools to streamline the development workflow from prototyping to manufacturing. Audio Weaver includes two parts: AWE Designer and AWE Core. The AWE Designer application allows you to rapidly design utilizing a drag-and-drop interface. For final testing, tuning, and manufacturing, designs in AWE Designer are deployed to the target product (MCU, dedicated DSP, or SoC) with embedded AWE Core runtime libraries. This dynamic instantiation of audio processing functions allows for rapid iteration, allowing each function to be developed in parallel before deploying the completed design. Integration with the end product is simplified because each aspect of the design targets a specific library already present on the device.

TalkTo

TalkTo is a customizable Audio Front-End (AFE) that combines advanced signal processing techniques to provide a clean audio signal to voice assistants and speech recognition engines. The extensive signal processing that TalkTo offers can be tuned to meet different use cases and required performance footprints, and multiple microphone array topologies are available to meet the demands and constraints of numerous device form factors. TalkTo's capabilities can be chosen to match the processing power of different systems; it can be scaled up to meet the demands of feature-rich, multi-mic designs, or scaled down to meet the demands of low-power, processor-efficient designs.

Fluent.ai Air

Fluent.ai Air is a speech language understanding system that goes directly from speech to intent. Featuring edge-based universal language support, Fluent.ai Air delivers an on-device command set that understands multiple languages and accents simultaneously and with great accuracy, and detects intent without connecting to the cloud or converting speech to text. Like TalkTo, Fluent.ai Air is a scalable technology. To achieve the most natural voice user experience, predefined commands can be triggered with variable or synonymous phrasing using a "slot" model where syntax is broken down, filtered into action/object/location slots, and mapped against a command set. The "direct intent" model can be adopted for low-power devices with smaller vocabulary and less variable command phrases.

Approaches to AI today and tomorrow

Fluent.ai's approach to embedded AI differs from traditional cloud-based systems. While connectivity to the cloud is useful for certain use cases like performing web searches by voice, edge-based AI offers much lower latency because it is offline and private by nature. While the cloud offers access to more information and potential processing power, Fluent.ai Air is a system that leverages the benefits of privacy and low latency while occupying a smaller processing footprint and providing an intuitive voice UI for users.

Technologies like Fluent.ai Air point to a future where AI devices themselves are intelligent and no longer need to rely on an internet connection. Besides the appeal of more natural interactions with smart machine listeners, embedded AI shows real-world benefits that could drive wider adoption. Because processing is performed locally, devices utilizing this kind of technology can be deployed with fewer geographic barriers, as they don't require network infrastructure or an ISP. With no dependency on third-party services like Google or Alexa Voice, device integration is simplified, device usage includes more responsive feedback, and voice device owners also keep their own data.

Advances in machine learning are also paving the way for future voice AI developments. Machine learning is key to the future of voice-enabled, assistive products. It improves data collection from multiple sources and sensors, enabling more useful actions and requiring less conscious intervention from the user. This means products can become more powerful and more user-friendly.

Tiny Machine Learning (TinyML) is an emerging field of deep learning that combines software, embedded machine learning, and on-device data analysis. Advances in this field will in the future shrink AI models to take up as little space as possible, enabling smaller devices to become smarter than ever before.

Automated Machine Learning (AutoML) is a field that seeks to automate the process of machine learning, such as data preprocessing and feature selection. AutoML can be thought of as AI that builds AI. It can automate the machine learning process, and does not require advanced expertise to utilize machine learning models. This will make it possible in the future to quickly train AI systems and adapt them to different technologies and use cases.

Visions of smarter, futuristic voice assistants seem to share some commonalities. The machine listeners of the future are mostly imagined as more conversational and intelligent, with queries and responses that are closer to the pace and tone of human-to-human conversations. The future we envision is one in which devices use responsive AI to interpret and act on biometric data, detecting voice inflection and stress, and making recommendations and queries based on the user’s tendencies. Such devices use machine queries to provide conversational responses. This is backed up by their ability to remember previous interactions and adapt accordingly, appearing to build trust with the user. The ability to respond to cues and carry out real dialogue is believed to be commonplace in the virtual assistants of the future.

Future voice assistants are expected to be more proactive, taking information from a variety of sources and tailoring their behavior and recommendations accordingly, such as smart appliances that learn your schedule and combine that data with time-of-use energy tariffs to perform functions within the scope of your daily activities at the lowest possible cost. This is the essence of machine learning: processing information from many sources and using that data to improve task execution.

Meeting the future

When we think about advancements in these fields and imagine what's to come, how can technologies like Audio Weaver, TalkTo, and Fluent.ai Air help product manufacturers move forward?

Audio Weaver's capabilities help product manufacturers get closer to the future by helping them innovate faster and reduce risk. You can design by placing signal processing building blocks called modules on a virtual canvas, connecting them with virtual wires, and adjusting module properties to fine-tune your design. You can also audition from within AWE Designer using your PC's sound card. Multiple team members can approach the design and tune different parts at the same time, developing features in parallel and later combining them into the final design. This collaboration and the ability to test iterations and new designs quickly and seamlessly streamlines the entire process.

The ability to include IP developed by third parties also allows Audio Weaver to integrate new technologies in the form of additional customized modules. Dozens of third-party algorithms are included, such as immersive 3D audio rendering and active noise cancellation solutions, providing developers with an advanced and specialized system.

Similarly, the customizability of TalkTo's audio front-end provides the performance and flexibility needed to address future voice UI use cases. TalkTo can scale with different product demands, from single-mic designs with noise reduction to designs using 8-mic arrays with acoustic echo cancellation, beamforming, adaptive interference cancellation and more.

Finally, Fluent.ai Air's linguistic flexibility allows one product version to be deployed across a wide geographic market, reducing development overhead. The solution also has a small operating footprint, allowing it to be embedded into small, low-power devices. This efficient resource usage also enables Air to coexist with other more resource-intensive technologies, such as on-device machine learning models. Additionally, Fluent.ai's unique acoustics-only approach and slot model architecture allows Fluent.ai Air to accurately understand voice commands in a variety of phrasings, providing end users with flexibility and ease of use of voice-enabled devices for the most natural user experience.

Conclusion

The flexibility and power of the technology offered by DSP Concepts and Fluent.ai is aligned with the future trajectory of AI advancements and can be adopted by developers who want to capture most of the growing voice market.

Recommended related articles

Inquiry

If you have any questions regarding this article, please contact us below.

DSP Concepts Manufacturer Information Top

DSP Concepts Manufacturer Information If you would like to return to the top page, please click below.