TECHTRICKS365

WordFinder app: Harnessing generative AI on AWS for aphasia communication | Amazon Web Services TechTricks365

WordFinder app: Harnessing generative AI on AWS for aphasia communication | Amazon Web Services TechTricks365


In this post, we showcase how Dr. Kori Ramajoo, Dr. Sonia Brownsett, Prof. David Copland, from QARC, and Scott Harding, a person living with aphasia, used AWS services to develop WordFinder, a mobile, cloud-based solution that helps individuals with aphasia increase their independence through the use of AWS generative AI technology.

In the spirit of giving back to the community and harnessing the art of the possible for positive change, AWS hosted the Hack For Purpose event in 2023. This hackathon brought together teams from AWS customers across Queensland, Australia, to tackle pressing challenges faced by social good organizations.

The University of Queensland’s Queensland Aphasia Research Centre (QARC)’s mission is to improve access to technology for people living with aphasia, a communication disability that can impact an individual’s ability to express and understand spoken and written language.

The challenge: Overcoming communication barriers

In 2023, it was estimated that more than 140,000 people in Australia were living with aphasia. This number is expected to grow to over 300,000 by 2050. Aphasia can make everyday tasks like online banking, using social media, and trying new devices challenging. The goal was to create a mobile app that could assist people with aphasia by generating a word list of the objects that are in a user-selected image and extend the list with related words, enabling them to explore alternative communication methods.

Overview of the solution

The following screenshot shows an example of navigating the WordFinder app, including sign in, image selection, object definition, and related words.

In the preceding diagram, the following scenario unfolds: 

  1. Sign in: The first screen shows a simple sign-in page where users enter their email and password. It includes options to create an account or recover a forgotten password.
  2. Image selection: After signing in, users are prompted to Pick an image to search. This screen is initially blank.
  3. Photo access: The next screen shows a popup requesting private access to the user’s photos, with a grid of sample images visible in the background.
  4. Image chosen: After an image is selected (in this case, a picture of a koala), the app displays the image along with some initial tags or classifications such as Animal, Bear, Mammal, Wildlife, and Koala.
  5. Related words: The final screen shows a list of related words based on the selection of Related Words next to Koala from the previous screen. This step is crucial for people with aphasia who often have difficulties with word-finding and verbal expression. By exploring related words (such as habitat terms like tree and eucalyptus, or descriptive words like fur and marsupial), users can bridge communication gaps when the exact word they want isn’t immediately accessible. This semantic network approach aligns with common aphasia therapy techniques, helping users find alternative ways to express their thoughts when specific words are difficult to recall.

This flow demonstrates how users can use the app to search for words and concepts by starting with an image, then drilling down into related terminology—a visual approach to expanding vocabulary or finding associated words.

The following diagram illustrates the solution architecture on AWS.

In the following sections, we discuss the flow and key components of the solution in more detail.

  1. Secure access using Route 53 and Amplify 
    1. The journey begins with the user accessing the WordFinder app through a domain managed by Amazon Route 53, a highly available and scalable cloud DNS web service. AWS Amplify hosts the React Native frontend, providing a seamless cross-environment experience. 
  2. Secure authentication with Amazon Cognito 
    1. Before accessing the core features, the user must securely authenticate through Amazon Cognito. Cognito provides robust user identity management and access control, making sure that only authenticated users can interact with the app’s services and resources. 
  3. Image capture and storage with Amplify and Amazon S3 
    1. After being authenticated, the user can capture an image of a scene, item, or scenario they wish to recall words from. AWS Amplify streamlines the process by automatically storing the captured image in an Amazon Simple Storage Service (Amazon S3) bucket, a highly available, cost-effective, and scalable object storage service. 
  4. Object recognition with Amazon Rekognition 
    1. As soon as the image is stored in the S3 bucket, Amazon Rekognition, a powerful computer vision and machine learning service, is triggered. Amazon Rekognition analyzes the image, identifying objects present and returning labels with confidence scores. These labels form the initial word prompt list within the WordFinder app, kickstarting the word-finding journey. 
  5. Semantic word associations with API Gateway and Lambda 
    1. While the initial word list generated by Amazon Rekognition provides a solid starting point, the user might be seeking a more specific or related word. To address this challenge, the WordFinder app sends the initial word list to an AWS Lambda function through Amazon API Gateway, a fully managed service that securely handles API requests. 
  6. Lambda with Amazon Bedrock, and generative AI and prompt engineering using Amazon Bedrock
    1. The Lambda function, acting as an intermediary, crafts a carefully designed prompt and submits it to Amazon Bedrock, a fully managed service that offers access to high-performing foundation models (FMs) from leading AI companies, including Anthropic’s Claude model.
    2. Amazon Bedrock generative AI capabilities, powered by Anthropic’s Claude model, use advanced language understanding and generation to produce semantically related words and concepts based on the initial word list. This process is driven by prompt engineering, where carefully crafted prompts guide the generative AI model to provide relevant and contextually appropriate word associations.

WordFinder app component details

In this section, we take a closer look at the components of the WordFinder app.

React Native and Expo

WordFinder was built using React Native, a popular framework for building cross-environment mobile apps. To streamline the development process, Expo was used, which allows for write-once, run-anywhere capabilities across Android and iOS operating systems.

Amplify

Amplify played a crucial role in accelerating the app’s development and provisioning the necessary backend infrastructure. Amplify is a set of tools and services that enable developers to build and deploy secure, scalable, and full stack apps. In this architecture, the frontend of the word finding app is hosted on Amplify. The solution uses several Amplify components:

  • Authentication and access control: Amazon Cognito is used for user authentication, enabling users to sign up and sign in to the app. Amazon Cognito provides user identity management and access control with access to an Amazon S3 bucket and an API gateway requiring authenticated user sessions.
  • Storage: Amplify was used to create and deploy an S3 bucket for storage. A key component of this app is the ability for a user to take a picture of a scene, item, or scenario that they’re seeking to recall words from. The solution needs to temporarily store this image for processing and analysis. When a user uploads an image, it’s stored in an S3 bucket for processing with Amazon Rekognition. Amazon S3 provides highly available, cost-effective, and scalable object storage.
  • Image recognition: Amazon Rekognition uses computer vision and machine learning to identify objects present in the image and return labels with confidence scores. These labels are used as the initial word prompt list within the WordFinder app.

Related words

The generated initial word list is the first step toward finding the desired word, but the labels returned by Amazon Rekognition might not be the exact word that someone is looking for. The project team then considered how to implement a thesaurus-style lookup capability. Although the project team initially explored different programming libraries, they found this approach to be somewhat rigid and limited, often returning only synonyms and not entities that are related to the source word. The libraries also added overhead associated with packaging and maintaining the library and dataset moving forward.

To address these challenges and improve responses for related entities, the project team turned to the capabilities of generative AI. By using the generative AI foundation models (FMs), the project team was able to offload the ongoing overhead of managing this solution while increasing the flexibility and curation of related words and entities that are returned to users. The project team integrated this capability using the following services:

  • Amazon Bedrock: Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI apps with security, privacy, and responsible AI. The project team was able to quickly integrate with, test, and evaluate different FMs, finally settling upon Anthropic’s Claude model.
  • API Gateway: The project team extended the Amplify project and deployed API Gateway to accept secure, encrypted, and authenticated requests from the WordFinder mobile app and pass them to a Lambda function handling Amazon Bedrock access. 
  • Lambda: A Lambda function was deployed behind the API gateway to handle incoming web requests from the mobile app. This function was responsible for taking the supplied input, building the prompt, and submitting it to Amazon Bedrock. This meant that integration and prompt logic could be encapsulated in a single Lambda function.

Benefits of API Gateway and Lambda

The project team briefly considered using the AWS SDK for JavaScript v3 and credentials sourced from Amazon Cognito to directly interface with Amazon Bedrock. Although this would work, there were several benefits associated with implementing API Gateway and a Lambda function:

  • Security: To enable the mobile client to integrate directly with Amazon Bedrock, authenticated users and their associated AWS Identity and Access Management (IAM) role would need to be granted permissions to invoke the FMs in Amazon Bedrock. This could be achieved using Amazon Cognito and short-term permissions granted through roles. Consideration was given to the potential of uncontrolled access to these models if the mobile app was compromised. By shifting the IAM permissions and invocation handling to a central function, the team was able to increase visibility and control over how and when the FMs were invoked.
  • Change management: Over time, the underlying FM or prompt might need to change. If either was hard coded into the mobile app, any change would require a new release and every user would have to download the new app version. By locating this within the Lambda function, the specifics around model usage and prompt creation are decoupled and can be adapted without impacting users. 
  • Monitoring: By routing requests through API Gateway and Lambda, the team can log and track metrics associated with usage. This enables better decision-making and reporting on how the app is performing. 
  • Data optimization: By implementing the REST API and encapsulating the prompt and integration logic within the Lambda function, the team to can send the source word from the mobile app to the API. This means less data is sent over the cellular network to the backend services. 
  • Caching layer: Although a caching layer wasn’t implemented within the system during the hackathon, the team considered the ability to implement a caching mechanism for source and related words that over time would reduce requests that need to be routed to Amazon Bedrock. This can be readily queried in the Lambda function as a preliminary step before submitting a prompt to an FM.

Prompt engineering

One of the core features of WordFinder is its ability to generate related words and concepts based on a user-provided source word. This source word (obtained from the mobile app through an API request) is embedded into the following prompt by the Lambda function, replacing {word}:

prompt = "I have Aphasia. Give me the top 10 most common words that are related words to the word supplied in the prompt context. Your response should be a valid JSON array of just the words. No surrounding context. {word}"

The team tested multiple different prompts and approaches during the hackathon, but this basic guiding prompt was found to give reliable, accurate, and repeatable results, regardless of the word supplied by the user.

After the model responds, the Lambda function bundles the related words and returns them to the mobile app. Upon receipt of this data, the WordFinder app updates and displays the new list of words for the user who has aphasia. The user might then find their word, or drill deeper into other related words.

To maintain efficient resource utilization and cost optimization, the architecture incorporates several resource cleanup mechanisms:

  • Lambda automatic scaling: The Lambda function responsible for interacting with Amazon Bedrock is configured to automatically scale down to zero instances when not in use, minimizing idle resource consumption.
  • Amazon S3 lifecycle policies: The S3 bucket storing the user-uploaded images is configured with lifecycle policies to automatically expire and delete objects after a specified retention period, freeing up storage space. 
  • API Gateway throttling and caching: API Gateway is configured with throttling limits to help prevent excessive requests, and caching mechanisms are implemented to reduce the load on downstream services such as Lambda and Amazon Bedrock.

Conclusion

The QARC team and Scott Harding worked closely with AWS to develop WordFinder, a mobile app that addresses communication challenges faced by individuals living with aphasia. Their winning entry at the 2023 AWS Queensland Hackathon showcased the power of involving those with lived experiences in the development process. Harding’s insights helped the tech team understand the nuances and impact of aphasia, leading to a solution that empowers users to find their words and stay connected.

References


About the Authors

Kori Ramijoo is a research speech pathologist at QARC. She has extensive experience in aphasia rehabilitation, technology, and neuroscience. Kori leads the Aphasia Tech Hub at QARC, enabling people with aphasia to access technology. She provides consultations to clinicians and provides advice and support to help people with aphasia gain and maintain independence. Kori is also researching design considerations for technology development and use by people with aphasia.

Scott Harding lives with aphasia after a stroke. He has a background in Engineering and Computer Science. Scott is one of the Directors of the Australian Aphasia Association and is a consumer representative and advisor on various state government health committees and nationally funded research projects. He has interests in the use of AI in developing predictive models of aphasia recovery.

Sonia Brownsett is a speech pathologist with extensive experience in neuroscience and technology. She has been a postdoctoral researcher at QARC and led the aphasia tech hub as well as a research program on the brain mechanisms underpinning aphasia recovery after stroke and in other populations including adults with brain tumours and epilepsy.

David Copland is a speech pathologist and Director of QARC. He has worked for over 20 years in the field of aphasia rehabilitation. His work seeks to develop new ways to understand, assess and treat aphasia including the use of brain imaging and technology. He has led the creation of comprehensive aphasia treatment programs that are being implemented into health services.

Mark Promnitz is a Senior Solutions Architect at Amazon Web Services, based in Australia. In addition to helping his enterprise customers leverage the capabilities of AWS, he can often be found talking about Software as a Service (SaaS), data and cloud-native architectures on AWS.

Kurt Sterzl is a Senior Solutions Architect at Amazon Web Services, based in Australia.  He enjoys working with public sector customers like UQ QARC to support their research breakthroughs.


Exit mobile version