VUI

Challenge

Company

Amazon for DePaul Capstone Project

Team DePaul HCI

Sonali Bhurke

Catherine Layer

Yvonne Nillissen - Lead

My Roles

Manage team meetings and timeline. Research, static and interactive prototypes using Amazon Developer resources, Invocable Voice software, user testing

Basic skills enabled in virtual assistants such as Amazon Echo 'Alexa' and Google Home currently don't provide efficient multiple medication reminders for seniors. The current 'reminder' skill for the Alexa asks and records one reminder at a time. If the user tries to add more than one reminder, the skill does not work.

The team believed that this experience could be significantly improved and decided to do research to come to design recommendations for a voice user interface (VUI) skill serving a multiple medication reminder. The prototype for this new skill provides a rich and engaging conversation using the natural language processing (NLP) of the Amazon Alexa.

Strategy

Amazon is currently pursuing a strategy of wide adoption for their Alexa VUI (with or without screen). As consumers become more comfortable and the machine intelligence more intuitive, Amazon continues to gather more AI data. The sale of many Amazon AI related products is a win for the company as it assures leadership in the future. To find out more about the VUI strategies of Amazon and Google, see my presentation on the topic.

Solution / Alexa Reminder Design Recommendations

Seniors want to be independent. They are open to finding assistance in daily chores. They are keenly aware of the importance of taking medications and the risk associated with missing doses. While they like the idea of VUI assistance and even companionship, they have concerns about technology intrusiveness.
A successful medication reminder script allows for multiple requests in one session with added confirmations assuring the user that they will be alerted on the correct day and at the correct time.
Users prefer clarity and detailed direction over a faster interaction with the device. That said, once the user shows mastery of the exchange, they expect the conversation to evolve intuitively.
Usabiltiy and satisfaction is highest when the user can accomplish setting multiple medication reminders, get feedback on set times and dates, and engage in conversations that are polite and responsive.

Research Goals for alexa reminder

The objective of our research was to develop design recommendations for an Alexa Custom Skill serving a multiple medication reminder. The prototype for this new skill was based on providing a rich and engaging conversation using the natural language processing (NLP) of the Amazon Alexa. The team outlined 4 goals to achieve our objective and used different research methods to complete them.

Research Goals and Methods

Goal 1: Explore language and behavior used by our population when requesting multiple medication reminders.
RESEARCH METHOD: Conduct interviews with seniors to find common narratives when requesting a multiple medication reminder. Through interviews, subjects would reveal natural expectations, behavior and language for a question and answer interchange.
Goal 2. Create best Multiple Medication Reminder VUI script.
RESEARCH METHOD: Using low-fidelity prototype 'Wizard of Oz' method, compare the time required for a participant to successfully complete a VUI request for multiple medication reminders between two scripts. A lower time required, and/or better satisfaction will indicate that one script direction is better to further explore.
Goal 3. Create an effective new multiple medication reminder Amazon Alexa skill
RESEARCH METHOD: Compare the time for successful completion of a multiple medication reminder request between the current Amazon Alexa script and a new script developed from speech patterns natural for seniors.
Goal 4. Create better usability in a multiple medication reminder skill.
RESEARCH METHOD: Compare satisfaction levels of a multiple medication reminder request to the current Amazon Alexa skill script and a new skill script developed using the user-centered design process.

Goal 1: Explore language and behavior

RESEARCH METHOD: Interviews

Interviews were conducted to discover natural language, behaviors and environments used by seniors when they take their medications, remind themselves to take medications, or ask others to help them.

Participants

We recruited four participants through our friends, family and colleagues. Requirements were:
1. They had to be over 70 years of age.
2. They currently take medicine, vitamins and/or supplements daily.
The interviews were conducted in the participants home or in a quiet, public place.

Data collection

Participants were asked to read and sign the consent form. All interviews were done in person following a common script. Interviews were recorded by moderators using their mobile phone or Zoom video conferencing. Interview questions explored:
1. Their use of medications
2. How they remembered to take their medications
3. Did they have someone to ask for help remember?
4. How would they ask for help to remember to take a medication?

Data analysis

Interviews were transcribed from recordings and open inductive coding was used to find common themes and verbal patterns. An Affinity Diagram was created using online whiteboard software. We organized the codes into common themes about motivation, insights, behavior and language.

Affinity board showing themes found from interviews with seniors 70+. An additional board focused on themes centered around language.

Interview findings for Alexa Reminder

Independence
This was a strong theme for almost all participants with one saying, "It’s me. It's all on my shoulders," and another saying that "I couldn't rely on anyone. And I couldn't put pressure on a friend."
Object association reminders
Seniors often put their medications in one place tin the kitchen by the coffee pot or in the bedroom by the alarm clock but said that this doesn't always work. "When I was in a hurry and ran out the door without going to the kitchen to have my coffee, I could have easily forget to take medications."
The importance of taking medications/consequence of forgetting
Seniors understood the consequences of forgetting to take medications. One said, "My friend's husband didn't take his medications for 4 days, and he had a stroke!"
Fear and frustration about Alexa technology
Seniors worry that the technology is intrusive and can be accessed by others. One said, "They say now that they can come in and find out if you sleep until 8 in the morning.
Politeness
Seniors usually conducted interactions and requests using phrases that consistently included "Please" and "Thank you" and expected an acknowledgement of their actions in return.

Design scripts

Goal 2: Create engaging Alexa Reminder scripts.

With interview insights in mind, the team moved on to its second goal and developed two scripts for an Alexa skill that would remind seniors to take their medications.

Script A voice flow mapping

Script B voice flow mapping

Research method for Alexa Reminder scripts

Our goal was to compare scripts and determine which felt more comfortable to seniors and could be refined during further testing. We compared our scripts for completion time and satisfaction levels.

The scripts were tested using the low-fidelity "Wizard of Oz" method, with team moderator acting as the computer, and participant interacting with them from behind a screen to roughly approximate a human to computer interaction.

Participants

We recruited four participants through our friends, family and colleagues. To qualify for our study, they had to meet the same requirements as our interviewees.

Data collection

The moderator explained the structure of the Wizard of Oz experiment. A picture of an Alexa device was placed before the participant to help them imagine that they were talking to the device (photo below on left). The tester had the voice flow scripts before them to act as the computer (photo below right). The participants were given 3 scenarios and tasks to complete using each script. Script order was reversed between team members to conduct within-subjects test design and avoid bias. After each task and script the participant was asked to complete a Satisfaction Likert scale survey question.

Wizard of Oz method: The user's view

Wizard of Oz method: The 'computer' view

Example of Scenario and Task

Scenario
Imagine your doctor has prescribed you to take a few medicines daily. He has told you to take Niacin in the mornings. And Metformin and Lisinopril at night. He has told you to make sure that you take these daily and must not forget.
Task 1
How would you set up a reminder for taking all these medications?

Data analysis of Alexa Reminder scripts

Each team member entered the qualitative and quantitative data into a shared spreadsheet to consolidate all information. An inferential statistic 2-sample t-test was used to compare efficiency between the scripts. To compare success and failure between the scripts, we used Task 2 and conducted a Fisher's Exact Test to best measure binomial, dichotomous data with small sample sizes. We conducted descriptive statistics to compare mean, standard deviation and mode. The Likert Scale Satisfaction Survey questions were reviewed to find what percentage of participants did or did not rate the script highly and also to find the mode and median responses.

Alexa Reminder script findings

Descriptive and inferential statistical results

The 2-sample t-test

For the efficiency comparison of time on task, task 1 was selected as being the simplest task and best for comparisons of time. We conducted a two-sample independent t-test to compare efficiency (time) between the 2 scripts when completing task 1.

We found a significant difference (t(6) = 2.45, p = 0.02) such that participants who used Script B (M = 82.50, SD 19.57, n = 4) were able to complete the task faster than the participants who used Script A (M = 146.00, SD 34.15, n = 4). Based on this analysis, there was reason to reject the Null hypothesis that there was no difference between times for Script A or Script B on task 1. A simple descriptive comparison of mean task times also showed a difference (see graph below).

time-on-task mean comparison of scripts A and B

Qualitative results of Alexa Reminder scripts.

Satisfaction Survey

We observed a strong social desirability response bias (or acquiescence bias) where participants ratings did not align with their success or failure, or time on task.

Script A Satisfaction results

There were 4 respondents to the satisfaction survey question; of these, 3 of 4, or 75% either could Somewhat Agree to Strongly Agree that the Alexa medication reminder was easy to use. There was 1 participant who could Neither Agree or Disagree that it was easy to use. The Mode was Strongly Agree and the Median was Agree.

Script B Satisfaction results

There were 4 respondents to the satisfaction survey question; of these, 2 of 4 participants, or 50% could Somewhat Agree that the Alexa medication reminder was easy to use. Another 2 of 4 could Strongly Agree. Overall, 4 of 4 participants, or 100% could either Somewhat Agree or Strongly Agree that the reminder was easy to use. The Mode was bimodal, evenly divided between Strongly Agree and Somewhat Agree. The Median was Agree. Neither script was wholly successful, but we moved forward to create a script that could guide users while able to take multiple medication reminder requests at once.

Alexa Reminder Usability testing

Goals 3 and 4:
Improve usability of a reminder skill.

Research method: High-fidelity prototype VUI skill comparison to original Alexa skill

Our goal was to create a the new skill using our script which would feel more usable and efficient for seniors to use. To determine if the current Alexa Reminder script could be improved for seniors especially when taking multiple medications, a final script was created using insights gained from interviews and low-fidelity testing. This script was tested against the current Amazon Alexa Basic Reminder script.

Script A: Original Amazon Alexa reminders skill

Script B: The new multiple medication Alexa reminders skill

Participants

We recruited five participants through our friends, family and colleagues. To qualify for our study, they had to meet the same requirements as our participants for the first two tests. The team also tested students in the DePaul HCI program.

Data collection

After a software review, the script was created in Invocable (formerly Storyline). The platform featured the ability to share a VUI script via URL link for remote user testing. This was later a problem which is discussed in our results. An Amazon Alexa Developer account was used to voice test the script as this was a feature still under development in Invocable (see image below). The Alexa Mobile App was downloaded to team phones. This allowed testing to be conducted using the Amazon Alexa developer script on a VUI environment that would be common for all tests and for students participants who were remote. Senior and Student data was compiled separately.

Early iteration of Medication Reminder script created in Invocable online software (now VoiceFlow). Note: This was first use of variables for drug names. Video demonstration

Amazon Developer portal where script was imported from Invocable and first tested by team. Video demonstration

Participants were asked to read and sign the consent form. The moderator explained that the Alexa reminder requests would be conducted through the phone. While not ideal, this method was used so that tests of the script could be conducted on any mobile phone (iOS or Android) via the Amazon Alexa app, and many participants could be accessed who might not have an Alexa device. The moderator held the phone while the participant called up the skill and was asked to complete 2 tasks for each script. Script order was reversed between team members to conduct within-subjects test design and avoid bias. The participants were given task "cheat sheets" to remember what medications and times they would need to request for the reminder. This technique had proven helpful in the low-fidelity testing.

Example of Scenario and Task for Alexa Reminder

Scenario
Imagine your doctor has prescribed you to take some medications. He has told you to take Aleve and Aspirin in the morning on Monday and Wednesday, and Tylenol before you go to bed on Saturday.
Task 2
How would you set up a reminder for taking all these medications? Remember, start Alexa by saying "Alexa, open Basic Option Reminder."

Data Analysis of the high-fidelity testing/Alexa reminder

Once the high-fidelity testing was completed with a participant, each team member entered the Qualitative and Quantitative data into a shared spreadsheet. An inferential statistic 2-sample t-test was used to compare efficiency between the Seniors times and Student times on both tasks 1 and 2.

We also conducted descriptive statistics to compare mean, standard deviation and mode. The Likert Scale Satisfaction Survey questions were reviewed to find what percentage of participants did or did not rate the script highly and also to find the mode and median responses. The general impressions qualitative data was gathered into a collaborative online whiteboard. Entries were grouped into shared affinities and themes.

High-fidelity Alexa Reminder Skill findings

A Basic Reminder Script (A) replicated to our best ability the original Amazon reminder skill. The Medication Reminder Script (B) was written based on learning from interviews and low-fidelity Wizard of Oz prototype and the features learned in the Invocable software.

To gain power for our statistical reporting, we conducted t-tests to compare efficiency (time) between 2 groups, Seniors and Students, when completing the Medication Reminders Script (B) on task 1 and task 2. When the efficiency of both groups are statistically similar, the larger data set is available to strengthen observations about difference between the 2 high-fidelity prototype Alexa scripts.

Compare Seniors and Students Script B Tasks 1 & 2

The 2-sample t-test Script B Task 1

We tested to compare efficiency (time) between Seniors and Students on the Medication Reminder Script (B) task 1 and found no significant difference in time on task (t(8) = 2.31, p = 0.30) such that Group-Seniors (M = 99.20, SD 25.97, n = 5) were able to complete the task in a time not statistically different than Group-Students (M = 124.60, SD 44.42, n = 5).

The 2-sample t-test Script B Task 2

We then tested to compare efficiency (time) between Seniors and Students on the Medication Reminder Script (B) task 2 and found no significant difference (t(8) = 2.31, p = 0.50) such that Group-Seniors (M = 97.70, SD 21.70, n = 5) were able to complete the task in a time not statistically different than from Group-Students (M = 112.60, SD 42.23, n = 5).

With no significant difference between the results, we could use the Student data to enlarge our data set and strengthen our results when comparing the 2 scripts.

The Fisher's Exact Test of Alexa Reminder results

Fisher's Exact Test comparing Success and Failure between scripts "Basic Reminder A" and "Medication Reminder B" (tasks 1) using both Seniors and Student data.

The Seniors and Student participants were found to have the same statistical differences on Script B tasks 1 and 2, so their Success and Failure data was combined and measured using the Fisher's Exact Test. This test was used as it is a best measure for binomial dichotomous data with small sample sizes.

Null Hypothesis: No difference in success between "Basic Reminder Script A" and "Medication Reminder Script B" (tasks 1).

Alternate Hypothesis: A difference in success between "Basic Reminder Script A" and "Medication Reminder Script B" (tasks 1).

Conclusion: The probability is 0.01. Since the probability of 0.01 is smaller than the critical value of p = 0.05, we have reason to reject the null hypothesis. There is a statistically significant difference in the success of scripts A and B, where Medication Reminder Script B shows a higher rate of success.

The Fisher Exact Test statistic value is 0.0134. The result is significant at p < .05

Qualitative results of Alexa Reminder

Satisfaction Survey

Participants were given a Likert Scale Satisfaction Survey question after each script and asked "Overall, did you find the Alexa reminder easy to use?" Participants replied using a 7 point scale, with 1 being Strongly Disagree and 7 being Strongly Agree. We observed a strong social desirability response bias where participants ratings did not align with their success or failure, or time on task.

Basic Reminder Script A (task 1) Satisfaction results

There were 5 respondents to the Basic Reminder Script A (task 1) satisfaction survey question. Of these, 4 of 5, or 80% either could Somewhat Agree to Strongly Agree that the Basic reminder was easy to use. There was 1 of 5 participants who Disagreed that it was easy to use. The Mode was Somewhat Agree and the Median was Somewhat Agree. One participant, said of the task, "This needs improvement, but the initiative to help seniors is really great," Strongly Agreeing that the reminder was easy to use, while actually failing to complete the task.

Medication Reminder Script B (task 1) Satisfaction results

There were 5 respondents to the Medication Reminder (task 1) satisfaction survey question. Of these, 5 of 5 participants, or 100% could Strongly Agree that the Medication Reminder Script was easy to use. The Mode was Strongly Agree and the Median was Strongly Agree.

Here the findings show a preference for script B, even accounting again for strong social desirability response bias towards the moderators.

Qualitative general impression results

After completing each script participants were asked to give a general impression of each script. The team assembled response entries on to online whiteboard tool 'Stormboard' where we formed clusters of similar responses (Figure 7). The following themes emerged for both scripts:

The Medications Reminder script B was much better received, though not without some problems. Student FB said that the script "works far better," and student NQ noted that "I knew exactly what Alexa was asking me and I enjoyed the feedback response after." Problems were found in the area of script length, where student NF said that the script was smoother but, "verbiage could be changed when asking for medication" and NQ wished that "Alexa would not talk so long because I knew what the format was going to be like after the first reminder." Two students hoped that in the end, all medications that had been set could be confirmed.

Alexa Reminder Retrospective

This project stayed focused on the initial goals of the research. Please review the following presentation for a brief overview.

Room for growth

Looking back at the process and results of our study, there are a few things the team might have done differently. As mentioned in our goals and discussion section, the team evaluated different metrics than originally thought and firm up what success and failure was with each iteration. In future is critical to set clear parameters.

Also, in this study we were limited by the high fidelity prototyping tool. The software we were planning to use was acquired by another company and functionality was changed. The sharing/testing method was eliminated so if we were to do this study again, we would use a tool that allowed for easier sharing and the ability to tailor it to our needs. The script was somewhat limited by the available Amazon libraries.

Lastly, if we were to repeat the study, we would like to work with more users and have a larger data set. This would confirm the script functionality with the population we are trying to design for. When participants gave us qualitative responses that conflicted with their quantitative results, we would have a second layer of questions to evaluate and uncover their true feelings.

Seniors use Amazon Alexa to get a better multiple medication reminder

Design Process

Challenge

Company

Team DePaul HCI

My Roles

Strategy

Solution / Alexa Reminder Design Recommendations

Research Goals for alexa reminder

Research Goals and Methods

Goal 1: Explore language and behavior

RESEARCH METHOD: Interviews

Participants

Data collection

Data analysis

Interview findings for Alexa Reminder

Design scripts

Goal 2: Create engaging Alexa Reminder scripts.

Research method for Alexa Reminder scripts

Participants

Data collection

Example of Scenario and Task

Data analysis of Alexa Reminder scripts

Alexa Reminder script findings

Descriptive and inferential statistical results

The 2-sample t-test

Qualitative results of Alexa Reminder scripts.

Satisfaction Survey

Script A Satisfaction results

Script B Satisfaction results

Alexa Reminder Usability testing

Goals 3 and 4: Improve usability of a reminder skill.

Research method: High-fidelity prototype VUI skill comparison to original Alexa skill

Script A: Original Amazon Alexa reminders skill

Script B: The new multiple medication Alexa reminders skill

Participants

Data collection

Example of Scenario and Task for Alexa Reminder

Data Analysis of the high-fidelity testing/Alexa reminder

High-fidelity Alexa Reminder Skill findings

Compare Seniors and Students Script B Tasks 1 & 2

The 2-sample t-test Script B Task 1

The 2-sample t-test Script B Task 2

The Fisher's Exact Test of Alexa Reminder results

Fisher's Exact Test comparing Success and Failure between scripts "Basic Reminder A" and "Medication Reminder B" (tasks 1) using both Seniors and Student data.

Null Hypothesis: No difference in success between "Basic Reminder Script A" and "Medication Reminder Script B" (tasks 1).

Alternate Hypothesis: A difference in success between "Basic Reminder Script A" and "Medication Reminder Script B" (tasks 1).

Qualitative results of Alexa Reminder

Satisfaction Survey

Basic Reminder Script A (task 1) Satisfaction results

Medication Reminder Script B (task 1) Satisfaction results

Qualitative general impression results

Alexa Reminder Retrospective

Room for growth

end of alexa reminder case study

Find more UX stories.

Goals 3 and 4:
Improve usability of a reminder skill.