Digital assistants: a new way to test AI from RootMetrics
When Amazon introduced Alexa as the digital assistant in the Amazon Echo and Echo Dot smart speakers in 2014, few people could have foreseen the extensive reach and impact that voice-activated virtual assistants would have on varied aspects of daily life and human activity.
Digital assistants make life more convenient today in many ways. They act as the key mechanism in controlling smart devices in home automation. They can play music, manage alarms, and provide news and other real-time information, like the weather. And they can be "trained" by users to learn new skills to boost their functionality.
Moreover, the use of digital assistants isn't confined to consumer applications alone. These days, digital assistants are finding a place in business—in both the conference room and on the factory floor.
With digital assistants becoming an integral part of daily connected life, the smart speaker market is exploding. Global revenue for smart speakers is forecast to reach $20.9 billion by 2021, the latest IHS Markit forecasts show, up from $7.9 billion in 2018. And by 2021, households throughout the world will have installed some 482 million digitally assisted smart speaker solutions.
Benchmarking performance testing of digital
Many agree that voice control is what makes digital assistants appealing and easy to use. But while all digital assistants understand natural-language voice commands and can complete tasks for users, not all have been created equal.
To see which smart speaker and its accompanying digital assistant delivered the best experience under real-world conditions, RootMetrics tested performance of the four major smart speaker platforms and their digital assistants: Amazon Echo (2nd generation) and Alexa; Apple HomePod and Siri; Google Home and Google Assistant; and Harman Kardon Invoke and Cortana.
Each digital assistant and smart speaker was tested using a broad range of everyday voice commands separated into four categories. Within each category, five voice commands were issued, and the responses were then recorded and scored.
In the category of Everyday Questions, RootMetrics asked the digital assistants to tell a joke, show the latest weather forecast, and display movie showtimes, among other tasks. In the Media category, the assistants were told to play a song or podcast and tell the news. In the category of Productivity, they were requested to call someone and to set a reminder, timer, or alarm. Finally, in the last category of Web Queries, the assistants were pitched questions like "Why is the sky blue?" and "Where do babies come from?"
Overall, the testing methodology was designed to accurately characterize digital assistant performance on smart speakers from the consumer's point of view, with performance tested in a controlled lab setting. And to ensure consistency of voice, inflection, and cadence during testing, the Google Cloud Text-to-Speech machine learning API was used to give commands.
Each platform was evaluated on three variables: reliability, accuracy, and speed.
Digital assistants received 100% for reliability when an action was performed, or when a response was given for a specific voice command or request. What matters here is that the digital assistant was reliable and could be counted on to answer a question or carry out a task.
For accuracy, the assistants received 100% when they correctly performed an action or provided a correct response to a voice command or request. And because accuracy can only be measured for completed requests, the metric did not apply to tests that received a score of 0.0 for reliability.
For speed, the median response time was used.
And the winner is…?
The race was tight among the four platforms, and Productivity requests were often handled better than other tasks. Still, the testing by RootMetrics revealed surprising results.
Amazon's Echo and Alexa, for instance, garnered a perfect score in Productivity, but stumbled in Media despite the breadth and expanse of the Amazon ecosystem. Apple's Siri, running on HomePod, had its worst score in Everyday Questions.
More details on the performance of each platform can be found in the recent IHS Markit webcast Hey Alexa, Siri, Google and Cortana: Who's the best AI assistant? For additional context, the webcast provides individual results from the teardown of each smart speaker device. A key finding from the IHS Markit teardown revealed that while Amazon Echo offered the best performance among all platforms, its component prices were the lowest of any of the devices tested. Apple HomePod, meanwhile, earned the lowest score during testing, but its component costs were the most expensive among all four smart speakers.
Also available along with the webcast is the infographic Smart Speaker Benchmarking, which describes the metrics involved in testing and summarizes the testing results.
Digital assistant benchmarking and performance analysis is the latest example of testing conducted by RootMetrics, an IHS Markit company. RootMetrics is recognized across the industry for what is considered the most comprehensive measurement methodology of the consumer's technology experience. Its scientific benchmarking provides an objective view of, for instance, how consumers experience mobile networks and other technologies. More information on RootMetrics can be found on the RootMetrics website.
IHS Markit Technology Expert
Posted 13 March 2019
- Know the limitations of your machine
- IoT devices evolve as transformative technologies like AI and 5G converge
- Where are the robots? The wait for mission-critical IoT and massive IoT
- With its latest flagship smartphones, Huawei affirms continued support of AMOLED
- Bright prospects for both photovoltaics and battery storage
- Transformative technologies will cause considerable disruption in business and industry
- Breaking down IoT silos: the move toward data exchange platforms (DEPs)
- Healthcare systems face ongoing challenges with aging populations and chronic diseases on the rise
IHS Markit lowers 2019 flat panel demand growth forecast by 2.1 percentage points. Our explanation for the revised… https://t.co/XLYZke9C1N