From PoC to Production: Shipping Enterprise AI Agents with Amazon Bedrock AgentCore

Video size:

Abstract

Many AI agent projects stall at proof-of-concept because production demands security, scalability, and integration. Amazon Bedrock AgentCore changes this by providing managed services for identity, gateway-based tool access, runtime isolation, memory, and observability. In this talk, you’ll learn how to operationalize agents beyond chatbots: securely connecting to enterprise systems, handling long-running tasks, and migrating PoCs into production-ready deployments. We’ll explore patterns for authentication, memory, and monitoring—plus a blueprint you can apply to ship your next enterprise AI agent with confidence.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone. Welcome to my session. My name is Samuel bfi and I am a principle Solutions architect with AWS. I'm very excited to talk to you today about Agent Core. Agent Core is a new AWS offering, which is a set of functionality and primitives that allows you to build very quickly and powerful production grade agent. The name of my talk today is from POC to Production Shipping Enterprise AI Agents with Amazon Bedrock Agent Core. Without further ado, let's get started. When we talked about Agent Tech ai, things have moved very quickly in the last couple of ears. But when Che Giti and, a smaller. Previous large language models were released back in 2022. One of the very common things that were being released was generative AI assistance. So chat bots that could maybe respond to you, but do very little. Moving that we have created in the industry, a lot of generative AI agents. So you can achieve a singular goal. Maybe there is one or few tools available for the agent, and you could automate, specific workflows. And what we are talking about now and the future of agent AI is agent AI systems that can operate fully autonomous, it can have conversation across multiple agents. And now rather than just automating a specific workflow, it can mimic the whole human logic and reasoning. Of course, we is still human in the loop for security verification and hallucination and so forth. The whole idea is to increase productivity and business values for our customers. A very interesting stats from 20 24, 30 3% of enterprise software's. Applications will actually include iGen AI by 2028, which are up from less than 1% in 2024. So you can see in four years one third of applications will include the agenda, and we can now already see this trend. That has been very quickly evolving in 2025. These are also both researchers based from and created by Gartner. Another research was that 15% of the day to day work decisions will be made autonomous through Agen AI by 2020. There are pros and cons to that, and we're gonna talk about some of the capabilities that can help you with that. Here are some agent AI use cases across different industries. Each industry will be able to apply agent and agent AI systems. At the different levels of utilization and productivity. But let's say you are part of the healthcare, you can do a lot of the patient and member engagement automation that before always require a lot of humans in the end, in the backend. Now you can automate and make those humans way more productive with automation of agents. What is the AWS vision for agents? The vision is to provide the best place for you to build the most useful agents for your specific vertical to solve the best and most important use cases that will help your customer improve, customer expectation, potentially bringing down the cost and few other benefits that agent. Agent KI bring to the system and the vision for AWS is for you to be able to deploy those agents while empower empowering different organizations within your company to deploy those reliable securable and scalable agents on AWS. So that's the vision. How do we do that? First we focus on the fundamentals. Four very important pillars of the fundamentals are. We wanna use the state of the art science. So we have, Amazon has a big science team and we put a lot of science work on building Bedrock Agent car. And you're gonna see that a lot of the features built on Agent Car are very common requested features when companies are building agency in production. Secondly, as very typical AWS fashion, we wanna provide best in class infrastructure. For you to build and run those agents, and we wanna manage that for you and make it easy for you to manage and invoke agents. Third is we wanna be the best place for you to deploy a specialized agents. And when I talk about specialized agents and talking about your magic sauce, what makes your business unique? How can you move some of that magic specialized knowledge into a specialized agent? And make your productivity boost. And fourth, provide intuitive experience so anyone can build and use powerful agents. So that is the fundamentals that Agent Core was set to help. Now, before we talked about Agent Core, it's important for us to take a step back and realize the broadest. Set of services that AWS offers for customers to build and deploy agents. We have applications such as Q Dev, KIRO, which are, set of applications that help developers build code and have agent capabilities so we can use. Generative AI agents to help be your co-program and help you actually accelerate the productivity as developers, but you also focus for business. So you have Q Business and multiple features that have generative AI capabilities within AWS services. Then you have some tools for building AI agents such as Bedrock Marketplace, bedrock agents. Is STRs Nova Act, SDK, and many others. We're gonna talk about some of those as we move forward here. The important thing to highlight is Bedrock Agents has existed for almost two years now, but Bedrock Agents is actually different than Bedrock Agent car. Bedrock agent is a more opinionated. Way for you to deploy agents where Agent Core is this new set of primitives that brings a lot of flexibility and new functionality for you to build on top of. That's just one important aspect that you should be aware when you look at AI stack. We are not gonna talk about that in depth, but we have a lot of chip sets, our own chip sets. We have nvi, the chip sets, we have md, we have a service called SageMaker ai. So if you wanna actually go and build your own models, you apologize you should be able to do that. But you also have a lot of the fundamentals of large language models available for you on demand with foundational models on bedrock. So you can use. And Tropic Cloud models. You can use Nova Models, you can use metal models and many other models available in Bedrock. And we are gonna mostly focus on the rest of the presentation on Agent Call, which is a set of primitives that allows you to build production grade agents. So what is Agent Core and how does he actually work? The whole idea is we kept here hearing from customers in the last couple of in the last year, a couple months that they're building rf, agent POCs. But the really challenge was from moving from a POC that was running locally on a development laptop from a specific developer to actually making that widely available within the ecosystem of the company and also production writing. And the challenge to get to production are how do you manage performance? How do you manage scalability? How do you manage security? And how do you have proper governance? With all those four pillars, if you have a way to create a solution that you can deploy on top of, you are gonna get way closer to your AI production agents. So let's now go and talk about some. Of the different primitives that Agent Core have available for you to use. So the first primitive is called agent Core runtime. Agent Core runtime. You can think of as the compute where your agents can run. You can scale from zero agent invocation now to as many as you want. It gets involved with very low latency. When you create a agent on agent carbon time you have a way to invoke those agents through APIs. Those single agent invocations can last for up to eight hours or less, and you pay for as much as you are consuming within those session invocations is support different payloads across different modalities. So if you have an agent that requires different modalities like images, you can also provide. But the idea is this serverless agent compute environment for YouTube run another. One of the things I should have mentioned before, and I can mention later on, is the whole thing with Agent Core is the ability for to pick and choose different primitives. One primitive is agent Core run time and. Only the primitives you want. So you are not buying into the whole ecosystem if you don't want to. You can bring bits and pieces of what you care to actually build your agents. And with agent carbon time, you can actually deploy any agent using pretty much any framework, either some familiar open source frameworks. Or actually build your own framework as well with a very few lines of code. You can get a runtime up and running, and then you can invoke your agent that is running on the cloud. You also have security by default, so we have true session isolation to protect your customer data and integrates very easily with identity provider. We're gonna talk about agent core identity, but by integrating there, you can have a proper authorization and authentication systems. Using o or 2.0 that it can connect to your agent carbon time in order to invoke when you need to. How does it work behind the scenes? You build your agent. Your agent can be using any framework strengths, land graph lang, chain crew, ai your own proprietary framework. You can use any models. It can be bedrock models, it can be Gemini models, it can be open AI models. It doesn't really matter. Asian carbon time allows you to pick and choose whatever model you wanna use, and then you bring in the Asian carbon time decorator. You provide some observability configuration if you wanna use Agent Core observability. And if you wanna do identity, you also bring that behind the scenes. The way it works, you create a Docker file. That Docker file gets a stored into a registry, a ECR repository, and then you create an agent within Agent carbon time. And every time you have a user, the user will invoke that. Endpoint where you have your agent deployed and behind the scenes we will actually create a compute environment for you with true session isolation. Okay. One important aspect to highlight with Agent Core is Agent Carbon time allows you to roast agents like strands, graph agents, we, those are using the HT P protocol, but agent carbon time also have a different capability, which is the ability for you to host a remote MCP server. Automatically for you. So you can create exactly like here, you'd create a Docker file, but rather than using, Lang framework, you'd be using maybe Fast MCP and you deploy that on Agent Core run time. And behind the scenes agent Core run time supports the ability to actually talk MCP protocol. I don't have slides here, but that is one of the capabilities as well. Another important aspect of building agent is the ability for you to create identity. So once I have an agent, there are a couple things I wanna do with the agent. How can I enable agents to be using and be involved properly? Let's say I have an account into an agent, and I wanna make sure that only the specific groups within that system are able to invoke that agent. Let's say I have my application built using third party tools such as Google, Salesforce, or Slack, and I wanna give access to my agent to use that specific systems, right? Agent Core identity does multiple things, and I'll show you in the next slide. But the whole idea of agent core identities to minimize. The amount of work that it takes for you to secure both the inbound authentication. So when you are calling an agent, and also when the agent itself needs to call third party tools like Google in order to grab some data from maybe your Google drive into the agent run specifically, right? So it's trying to streamlining your authentication flows. What it allows you to do is to accelerate AI agent development because there'll be a lot of set of prebuilt primitives that you can use from the agent core identity. Like you can use your existing identity systems such as Okta, Azure, ad, Amazon Kognito, and any other that are supporting OAuth 2.0 flow. And the end goal here is to lower custom development efforts. Without needing to migrate users for this specific credential system or identity system. So how does it actually work? You have a user that user needs to authenticate to a system you can be using, either IM or OAuth. Once you have the permission, you can provide that permission to your agent. The agent will verify that your permission is valid and you have access to actually. Invoke the agent. Remember this agent, because these pieces of Agent Core can be consumed independently. The agent itself can be hosted on Agent Core one time, or it can be hosted elsewhere, and you still can use agent core identity. That's what we call agent inbound authentication. Now, what is an outbound authentication is now that your agent is running, and let's say your agent needs permissions to call an AWS resource. Let's say your agent is calling a database. What Agent Core identity allows you to do is you also have outbound. Authentication. So you can configure that. Your agent should go and talk to it identity to require and retrieve the IAM permission to then call AWS resources. But that can also be you can see three different types of out outbound authentication. Another outbound authentication could be an external resource, either through an API key or potentially through a full flow. Two leg or three leg aut flow. Let's say you wanna go and you're gonna talk to Salesforce. That will be an external resource. And I'm gonna pause here because I'm gonna talk in a moment what Agent Core Gateway is. But another way to do is to integrate a outbound authentication with your agent Core gateway. So Agent Core identity allows you to seamless. Create solutions for your inbound and outbound authentication in production for the agents. Now we hear from a lot of customers that MCP has become so popular and, but managing potentially hundreds of MCP servers can be very. Cumbersome. So what Agent Core Gateway allows you to do is to simplify two development and integration. So Agent Core Gateway way you can think about is it's a MCP server managed for you that you can very easily. Expose different tools that the MCP server will make available. So you can turn existing APIs. So if you have an open API swagger is schema, you can provide that the endpoints through that open API schema to agent, core Gateway as a target and automatically agent Core gateway will lease that as an available tools. You can also. Create Lambda functions as targets for agent core gateway, and those Lambda functions will then behave as MCP tools and you can access thousands of tools through a standardized interface, which is agent core gateway itself. I'll show you how it works in a moment By default, agent Core Gateway operates as a remote MCP server and you are required to provide authentication to that agent core gateway, so only people that have the proper authentication can call the MCP server because they have an endpoint that is available either through the internet or through in the future, through your VPC, and you'll be able to combine that. And also one of the capabilities that is pre cool. It allows you to do intelligent tool discovery, and I'll show you in a moment how that works. So you have an MCP client, which is your agent or your IDE. You going to create a gateway within agent core gateway. That gateway automatically will behave as an MCP server. The beauty is behind the scenes, you can create different targets. So one gateway can have multiple targets. And one target can have multiple tools. So one of the target type that is supported right now is the API endpoint. So you provide your open API schema and whatever are the endpoints that we have available. Agent core will auto automatically convert that into tools. So as an agent or an MCP client, you can very easily now consume your existing APIs as tools into your agent system. The other option is you can create a Lambda target and you can build your own code, which whatever programming language the Lambda supports, and you have a preference to build tools itself. So you can write code and build more complex tools through Lambda functions. So that is how Agent Core Gateway works at a high. Now one of the amazing capabilities of Agent Core Gateway is what we call Symantec Search, which is a very unique and cool feature that Agent Core has by default because this is just using the MCP protocol. If you go and you do the list tools request to your MCP server, which is the agent core gateway, gateway behind the scenes, and you have hundreds in this case 360 tools, when you do list tools. Into the agent core gateway, it is gonna return all the 300 tools. That is really not good practice. And why It's not good practice. I'm gonna talk about that in a moment. One of the capabilities that Agent Core Gateway allows you to do is to do a semantic search on agent Core corrugate. So Agent Corrugate, you created some knowledge base of all the tools that you have, and you can call an API on agent Corrugate to say, Hey, my. Agent now, once you create a social media post, go and find the tools that you have across these 360 tools that are able to do that. In this case, it returns four relevant tools and you provide to the large language model. Now, what are the benefits to do this? First is Agent Corrugate will automatically indexes the tools and gives a serverless semantic search. Think about a small rag system for your tools. But mostly important if you're doing the search, it allows you to reduce the context that you're passing for every single agent's large language model, which is going to improve your accuracy, speed, and cost. Because the more tools you provide, the less accuracy you have. The is lower it is and the more costly it is because you are gonna get charged by how much, how many tokens you are providing as inputs. And what allows you to do allows you to improve. The agent focus because it's only focusing on the tools that are given for that specific task. So moving forward, another primitive on Agent core is agent core memory. So memory is a very important aspect of agents, which allows you to save. Maybe interactions that you have with agent for future use that might be maybe history, like raw history between user and agent, or potentially some preferences. What we call there is this differentiation on agent core memory of short term memory versus long term. And they're gonna talk about that in a moment. But agent core memory is built for enterprise by default. So you have a completely data privacy with dedicated storage for each customer. You can create these specific namespace that only specific customers can have access. You can bring your own encryption to each. So if you have multiple customers and you wanna encrypt the data differently across customers, you can do that. And you have deep customization. If you wanna bring what we call long-term memory, you can define memory patterns based on your use case, and you can extract the, you can extract rules and customize those rules that are being done asynchronous. How does that work? The first concept on engine core memory is what we call short-term memory. So you have your agent and you are having conversation with your agent. So user message and response from the larger language model and the specific state. What you can do is you implement a integration agent core memory, that you're saving those messages into the short term memory space. You can store short term memory for up to 3,365 days and you can customize how long the specific messages are gonna be stored. Once you have short term memory, you can then, of course, once a new invocation of your agent is spun up, you can retrieve the API for agent core memory to retrieve maybe the last day or all the memory that you have on short term memory into your existing conversation. That's one type of memory. The other type of memory, which is very powerful, is what we call the long term memory. So the raw messages are useful at some point but actually what you want to. Do in the long term is extract a specific long-term memories, right? And when you have a short term memory, you can decide if you want to also enable long-term memory. And behind the scenes agent core memory will asynchronous automatically extract pieces of memory should be saved on long term. You can specify, so you can do Symantec where you specify. What type of memory you wanna extract for long term. You can provide user preferences. So let's say I go on my agent and I always say color blue behind the scenes agent core memory will extract that and I will be saved as a preference. Or I can provide summary. So let's say every day after the agent does, its some type of work, it goes and summarize the work has done and saves long-term memory, then the long-term memory can be retrieved. And you can provide those memories, long-term memories as part of new invocations. Because remember, large language models don't remember memory by defaulting from past interactions. You are responsible in providing that. And agent core memory allows you to do that. And you can see, okay, that's what I want. So the next capability is Agent core browser. So is the ability for you to use a automated serverless browser infrastructure if your agent needs to use, so let's say you wanna build a agent and you want the agent to literally use a browser. Let's say you wanna buy a, you wanna build an agent that can buy items from amazon.com. If the agent needs to use a browser, and there are different tools that support this capability of navigating. A browser using a large language model. I'm gonna show you some of the tools, some of the models and open source libraries that allows you to do that. But you always need to have the compute for your specific browser. Agent Core Browser is a tool that is available on Agent Core that allows you to have very low latency browser. It can scale from zero to hundreds of concurrent session, which each session is gonna create its own browser and not share any data across the other sessions. And also you can do proper secure credential handling. You can. I also do live streaming RL for monitoring. So if you wanna show the user what your agent is doing and clicking buttons, you can do the live streaming URL and embedding into our application on how the browser is being used. You can do session replace for debugging and you can do very extensive logging for all the browse commands that have been done through CloudTrail if enabled. Now, how does it work? As a user, if you have an agent and you say buy shoes on Amazon, you need to have a large language model. The large language model can be Nova X, it can be Nova, it can be Cloud, it can be many other things. You create a tool, that tool is you need to build or you can use open source tools such as browser user and few other Cap, a few other open source libraries that are there. But what the tool use will be, you'll be explaining how to use the browser into this specific large language model. So you translate the tool, call into instruction commands using playwright, library or browser using library. Once you translate then that execution environment where you see headless browser or hosted library server is specifically where Agent Core will be running, and you can run multiple invocations for the sessions right there. So very powerful because he allows you to completely remove the necessity of you managing the browser itself and just give that, that, that specific. Power to agent core browser, and you can focus on the end goal of your agent in this case navigating amazon.com. Another tool that is made available as part of Agent Core and you can pick and choose if you want to use, is the ability to red code as part of a code sandbox. So models are really good at trying to solve problems by generating code. But you always need to have some place to run the code. You can execute this code securely using agent card code interpreter because it creates this isolated sandbox compute environment that you can run code. Check for the results. Run code again. Download the outputs. It can process gigabytes of scale data sets efficiently. If you're using S3 integration and without any a p limitation, it's very easy to use because it has prebuilt execution run times for JavaScript types with Python, with Common Library springs out, you can customize that and the way it would work and the way it would look is you have an agent. And that agent has a large language model and you give a tool selection to create a sessional code interpreter. So you create an agent that gives a tool to execute code, and that code would be execute on code interpreter. You write some API integration with age agent, core code interpreter. That code interpreter will have a file system. And Michelle and the large language model now can generate code, run code on code interpreter. The beauty of this is completely isolated and secure by default. So even if you have some hackers trying to do charge, export some vulnerability on your agent. If they go on code interpreter, there is no root access. There is nothing they can do to share any data across any other session that code interpreter is doing. And of course, you can explore the telemetry with observability, which you're gonna talk about in a moment now. The last but not least, is the ability for you to monitor everything that is happening. Agent Core Observability allows you to provide a comprehensive end-to-end visibility into your agent orchestration system. It allows you to accelerate the debugging and audits by using the primitives that you pick and then exposing the metrics and logs into agent core observability. The good thing here is if you are already a CloudWatch user, you can use very quickly prebuilt dashboards, traces that Agent Core Observability brings. But if you don't want to use CloudWatch or you use some other type of monitoring systems, what you can do is you can export the logs as Open Telemetry. Open Telemetry is. A open standard that is commonly used and accepted across multiple observability systems. And that with that open telemetry, you can just expose that to whatever monitoring system. If, let's say you, in case you don't wanna use CloudWatch, but if you only use CloudWatch, you can see the first image agent Car Permissibility dashboards is built on CloudWatch. And you can see the traces, you can see every single call that it was made to your model. Every single tool that it was. Use, and you can see the whole tracing automatically being done. So the way it works, you have agent core observability that integrates with of course, agents running on agent carbon time memory, using agent core memory tools for an agent car gateway, browser for an agent, car browser code interpreter for code interpreter, and whatever you pick and choose, you can just integrate into Agent Core observability and use the benefits either on CloudWatch or if you wanna expose logs into other. Systems. You can do that as well as you can see here. Now, we talked about the features. This is the whole ecosystem of agent core, right? So it allows you to pick and choose either the whole ecosystem of a specific features like Agent core run time or agent car gateway and build your agents with less. Burden on the security, the scalability, the monitoring the reliability, the pay the cost as well. Because everything you've seen here is pay as you go. Any scale from zero to as many as you need it given er demand. So this is the whole ecosystem of Agent Core. As of now, today's September 5th, as I'm recording this session. Agent Core is in preview hopefully soon, BGA and you'll be able to build production grade agents at on AWS using whatever framework you want, using whatever models you want. And then some things that are coming soon is of course HA are coming soon. So agent to agent integration right now is not supported as a protocol will be coming soon for a specific I think runtime. It will be an example that will be supported. HA, the frameworks, screw ai, line graph length change, trend agents, and many. But of course you have the agent core capability. The whole bedrock ecosystem is pretty big. You have the ability to access models on demand. You have the ability to do optimization and caching of those models. You can create and fine tune your models. You can bring guard rails. You can create automatically manage re databases With Bedrock knowledge base, I highly recommend you look into knowledge base if you have a retrieval augmented generation re use case. So Agent Core comes into very good position to contribute to the whole bedrock ecosystem. And what I recommend to you, if you are interested here, are a couple cure codes that you can pause the presentation and scam. I really recommend you look at Agent Core samples, you redirect you to a GitHub repository where you're gonna be able to actually see. Multiple tutorials and use cases using the different capabilities of Agent Core. So if you're interested on Agent Core Gateway, you can navigate to the GitHub repository and you can see very specifically how to implement the code. You can see the code, you can see the different use cases, you can see the different tutorials, and of course, our agent core documentation and the agent core blog posts. Without further ado, thank you so much for watching for my presentation. If I have any questions, again, my name is Samuel bfi. You can find me on LinkedIn at just at Samuel bfi and you'll be able to find me there. You can connect if you have any questions about the agent core and I'll be available to answer any questions. Thank you so much. Enjoy com 42 ml ops and I'll see you around. Bye-bye.

Slides

Download slides (PDF)

See all 37 talks at this event!

Conf42 MLOps 2025 - Online

September 18 2025 - premiere 5PM GMT

From PoC to Production: Shipping Enterprise AI Agents with Amazon Bedrock AgentCore

Video size:

Abstract

Summary

Transcript

Slides

Samuel Baruffi

Principal Global Solutions Architect @ AWS

Join the community!

Featured event

2026

2025

Info

Conf42 MLOps 2025 - Online

September 18 2025 - premiere 5PM GMT

From PoC to Production: Shipping Enterprise AI Agents with Amazon Bedrock AgentCore

Video size:

Abstract

Summary

Transcript

Slides

Samuel Baruffi

Principal Global Solutions Architect @ AWS

Join the community!