Saturday, June 7, 2025
HomeTechnologyArtificial IntelligenceSupercharge your development with Claude Code and Amazon Bedrock prompt caching |...

Supercharge your development with Claude Code and Amazon Bedrock prompt caching | Amazon Web Services TechTricks365


Prompt caching in Amazon Bedrock is now generally available, delivering performance and cost benefits for agentic AI applications. Coding assistants that process large codebases represent an ideal use case for prompt caching.

In this post, we’ll explore how to combine Amazon Bedrock prompt caching with Claude Code—a coding agent released by Anthropic that is now generally available. This powerful combination transforms your development workflow by delivering lightning-fast responses from reducing inference response latency, as well as lowering input token costs. You’ll discover how this makes AI-assisted coding not just more efficient, but also more economically viable for everyday development tasks.

What is Claude Code?

Claude Code is Anthropic’s AI coding assistant powered by Claude Sonnet 4. It operates directly in your terminal, your favorite IDEs such as VS Code and Jetbrains, and in the background with Claude Code SDK, understanding your project context and taking actions without requiring you to manually manipulate and add generated code to a project. Unlike traditional coding assistants, Claude Code can:

  • Write code and fix bugs spanning multiple files across your codebase
  •  Answer questions about your code’s architecture and logic
  • Execute and fix tests, linting, and other commands
  • Search through git history, resolve merge conflicts, and create commits and PRs
  • Operate all of your other command line tools, like AWS CLI, Terraform, and k8s

The most compelling aspect of Claude Code is how it integrates into your existing workflow. You simply point it to your project directory and interact with it using natural language commands. Claude Code also supports Model Context Protocol (MCP), allowing you to connect external tools and data sources directly to your terminal and customize its AI capabilities with your context.

To learn more, see Claude Code tutorials and Claude Code: Best practices for agentic coding.

Amazon Bedrock prompt caching for AI-assisted development

The prompt caching feature of Amazon Bedrock dramatically reduces both response times and costs when working with large context. Here’s how it works: When prompt caching is enabled, your agentic AI application (such as Claude Code) inserts cache checkpoint markers at specific points in your prompts. Amazon Bedrock then interprets these application-defined markers and creates cache checkpoints that save the entire model state after processing the preceding text. On subsequent requests, if your prompt reuses that same prefix, the model loads the cached state instead of recomputing.

In the context of Claude Code specifically, this means the application intelligently manages these cache points when processing your codebase, allowing Claude to “remember” previously analyzed code without incurring the full computational and financial cost of reprocessing it. When you ask multiple questions about the same code or iteratively refine solutions, Claude Code leverages these cache checkpoints to deliver faster responses while dramatically reducing token consumption and associated costs.

To learn more, see documentation for Amazon Bedrock prompt caching.

Solution overview: Try Claude Code with Amazon Bedrock prompt caching

Prerequisites

Prompt caching is automatically turned on for supported models and AWS Regions.

Setting up Claude Code with Claude Sonnet 4 on Amazon Bedrock

After configuring AWS CLI with your credentials, follow these steps:

  1. In your terminal, execute the following commands:
    # Install Claude Code
    npm install -g @anthropic-ai/claude-code
    
    # Configure for Amazon Bedrock
    export CLAUDE_CODE_USE_BEDROCK=1
    export ANTHROPIC_MODEL='us.anthropic.claude-sonnet-4-20250514-v1:0'
    export ANTHROPIC_SMALL_FAST_MODEL='us.anthropic.claude-3-5-haiku-20241022-v1:0'
    
    # Launch Claude Code
    claude
  2. Verify that Claude Code is running by checking for the Welcome to Claude Code! message in your terminal.
    Terminal - Welcome to Claude Code

To learn more about how to configure Claude Code for Amazon Bedrock, see Connect to Amazon Bedrock.

Getting started with prompt caching

To get started, let’s experiment with a simple prompt.

  1. In Claude Code, execute the prompt:
    build a basic text-based calculator
  2. Review and respond to Claude Code’s requests:
    1. When prompted with questions like Do you want to create calculator.py? select 1. Yes to continue.
      Example question:
      Do you want to create calculator.py?
      
      1. Yes
      2. Yes, and don't ask again for this session (shift+tab)
      3. No, and tell Claude what to do differently (esc)
    2. Carefully review each request before approving to maintain security.
  3. After Claude Code generates the calculator application, it will display execution instructions such as:
    Run the calculator with: python3 calculator.py
  4. Test the application by executing the instructed command above. Then, follow the on-screen prompts to perform calculations.

Claude Code automatically enables prompt caching to optimize performance and costs. To monitor token usage and costs, use the /cost command. You will receive a detailed breakdown similar to this:

/cost 
  ⎿  Total cost:            $0.0827
  ⎿  Total duration (API):  26.3s
  ⎿  Total duration (wall): 42.3s
  ⎿  Total code changes:    62 lines added, 0 lines removed

This output provides valuable insights into your session’s resource consumption, including total cost, API processing time, wall clock time, and code modifications.

Getting started with prompt caching

To understand the benefits of prompt caching, let’s try the same prompt without prompt caching for comparison:

  1. In the terminal, exit Claude Code by pressing Ctrl+C.
  2. To create a new project directory, run the command:
    mkdir test-disable-prompt-caching; cd test-disable-prompt-caching 
  3. Disable prompt caching by setting an environment variable:
    export DISABLE_PROMPT_CACHING=1
  4. Execute claude to run Claude Code.
  5. Verify prompt caching is disabled by checking the terminal output. You should see Prompt caching: off under the Overrides (via env) section.
  6. Execute the prompt:
    build a basic text-based calculator
  7. After completion, execute /cost to view resource usage.

You will see a higher resource consumption compared to when prompt caching is enabled, even with a simple prompt:

/cost 
  ⎿  Total cost:            $0.1029
  ⎿  Total duration (API):  32s
  ⎿  Total duration (wall): 1m 17.5s
  ⎿  Total code changes:    57 lines added, 0 lines removed

Without prompt caching, each interaction incurs the full cost of processing your context.

Cleanup

To re-enable prompt caching, exit Claude Code and run unset DISABLE_PROMPT_CACHING before restarting Claude. Claude Code does not incur cost when you are not using it.

Prompt caching for complex codebases and efficient iteration

When working with complex codebases, prompt caching delivers significantly greater benefits than with simple prompts. For an illustrative example, consider the initial prompt: Develop a game similar to Pac-Man. This initial prompt generates the foundational project structure and files. As you refine the application with prompts such as Implement unique chase patterns for different ghosts, the coding agent must comprehend your entire codebase to be able to make targeted changes.

Without prompt caching, you force the model to reprocess thousands of tokens representing your code structure, class relationships, and existing implementations, with each iteration.

Prompt caching alleviates this redundancy by preserving your complex context, transforming your software development workflow with:

  • Dramatically reduced token costs for repeated interactions with the same files
  • Faster response times as Claude Code doesn’t need to reprocess your entire codebase
  • Efficient development cycles as you iterate without incurring full costs each time

Prompt caching with Model Context Protocol (MCP)

Model Context Protocol (MCP) transforms your coding experience by connecting coding agents to your specific tools and information sources. You can connect Claude Code to MCP servers that integrate to your file systems, databases, development tools and other productivity tools. This transforms a generic coding assistant into a personalized assistant that can interact with your data and tools beyond your codebase, follow your organization’s best practices, accelerating your unique development processes and workflows.

When you build on AWS, you gain additional advantages by leveraging AWS open source MCP servers for code assistants that provide intelligent AWS documentation search, best-practice recommendations, and real-time cost visibility, analysis and insights – without leaving your software development workflow.

Amazon Bedrock prompt caching becomes essential when working with MCP, as it preserves complex context across multiple interactions. With MCP continuously enriching your prompts with external knowledge and tools, prompt caching alleviates the need to repeatedly process this expanded context, slashing costs by up to 90% and reducing latency by up to 85%. This optimization proves particularly valuable as your MCP servers deliver increasingly sophisticated context about your unique development environment, so you can rapidly iterate through complex coding challenges while maintaining relevant context for up to 5 minutes without performance penalties or additional costs.

Considerations when deploying Claude Code to your organization

With Claude Code now generally available, many customers are considering deployment options on AWS to take advantage of its coding capabilities. For deployments, consider your foundational architecture for security and governance:

Consider leveraging AWS IAM Identity Center, formerly AWS Single Sign On (SSO) to centrally govern identity and access to Claude Code. This verifies that only authorized developers have access. Additionally, it allows developers to access resources with temporary, role-based credentials, alleviating the need for static access keys and enhancing security. Prior to opening Claude Code, make sure that you configure AWS CLI to use an IAM Identity Center profile by using aws configure sso --profile . Then, you login using the profile created aws sso login --profile .

Consider implementing a generative AI gateway on AWS to track and attribute costs effectively across different teams or projects using inference profiles. For Claude Code to use a custom endpoint, configure the ANTHROPIC_BEDROCK_BASE_URL environment variable with the gateway endpoint. Note that the gateway should be a pass-through proxy, see example implementation with LiteLLM. To learn more about AI gateway solutions, contact your AWS account team.

Consider automated configuration of default environment variables. This includes the environment variables outlined in this post, such as CLAUDE_CODE_USE_BEDROCK, ANTHROPIC_MODEL, and ANTHROPIC_FAST_MODEL. This will configure Claude Code to automatically connect Bedrock, providing a consistent baseline for development across teams. To begin with, organizations can start by providing developers with self-service instructions.

Consider permissions, memory and MCP servers for your organization. Security teams can configure managed permissions for what Claude Code is and is not allowed to do, which cannot be overwritten by local configuration. In addition, you can configure memory across all projects which allows you to auto-add common bash commands files workflows, and style conventions to align with your organization’s preference. This can be done by deploying your CLAUDE.md file into an enterprise directory //CLAUDE.md or the user’s home directory ~/.claude/CLAUDE.md. Finally, we recommend that one central team configures MCP servers and checks a .mcp.json configuration into the codebase so that all users benefit.

To learn more, see Claude Code team setup documentation or contact your AWS account team.

Conclusion

In this post, you learned how Amazon Bedrock prompt caching can significantly enhance AI applications, with Claude Code’s agentic AI assistant serving as a powerful demonstration. By leveraging prompt caching, you can process large codebases more efficiently, helping to dramatically reduce costs and response times. With this technology you can have faster, more natural interactions with your code, allowing you to iterate rapidly with generative AI. You also learned about Model Context Protocol (MCP), and how the seamless integration of external tools lets you customize your AI assistant with specific context like documentation and web resources. Whether you’re tackling complex debugging, refactoring legacy systems, or developing new features, the combination of Amazon Bedrock’s prompt caching and AI coding agents like Claude Code offers a more responsive, cost-effective, and intelligent approach to software development.

Amazon Bedrock prompt caching is generally available with Claude 4 Sonnet and Claude 3.5 Haiku. To learn more, see prompt caching and Amazon Bedrock.

Anthropic Claude Code is now generally available. To learn more, see Claude Code overview and contact your AWS account team for guidance on deployment.


About the Authors

Jonathan Evans is a Worldwide Solutions Architect for Generative AI at AWS, where he helps customers leverage cutting-edge AI technologies with Anthropic’s Claude models on Amazon Bedrock, to solve complex business challenges. With a background in AI/ML engineering and hands-on experience supporting machine learning workflows in the cloud, Jonathan is passionate about making advanced AI accessible and impactful for organizations of all sizes.

Daniel Wirjo is a Solutions Architect at AWS, focused on SaaS and AI startups. As a former startup CTO, he enjoys collaborating with founders and engineering leaders to drive growth and innovation on AWS. Outside of work, Daniel enjoys taking walks with a coffee in hand, appreciating nature, and learning new ideas.

Omar Elkharbotly is a Senior Cloud Support Engineer at AWS, specializing in Data, Machine Learning, and Generative AI solutions. With extensive experience in helping customers architect and optimize their cloud-based AI/ML/GenAI workloads, Omar works closely with AWS customers to solve complex technical challenges and implement best practices across the AWS AI/ML/GenAI service portfolio. He is passionate about helping organizations leverage the full potential of cloud computing to drive innovation in generative AI and machine learning.

Gideon Teo is a FSI Solution Architect at AWS in Melbourne, where he brings specialised expertise in Amazon SageMaker and Amazon Bedrock. With a deep passion for both traditional AI/ML methodologies and the emerging field of Generative AI, he helps financial institutions leverage cutting-edge technologies to solve complex business challenges. Outside of work, he cherishes quality time with friends and family, and continuously expands his knowledge across diverse technology domains.


RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments