Unveiling the Power of Observability using FLUENT BIT in AWS EKS environment

Video size:

Abstract

Unlock powerful observability in AWS EKS using Fluent Bit—a fast, lightweight log processor. Learn how to efficiently collect, enrich, and route logs to services like CloudWatch and OpenSearch, enabling real-time insights and scalable, cost-effective monitoring.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Everyone, thanks for joining. My name is Der Hoja and I'm a DevOps lead in IBM. So today I'm gonna talk about unwilling the power of observability using fluent be in a WS E case environment. So as part of this topic, the we are going to talk about. The power of observability in our daily life. So as the cloud applications scale across the cluster guiding gaining visibility into into our cloud environment becomes extremely difficult as the application loads scale. And when the application complexity increases because of the scalability. It is getting very difficult to get into the of. Of the application. So it is so in the session, like we will unveil how to harness the true power of possibility in our a WC case environment using fuel and weight which is like a, which is like a lightweight processor which is built for modern architecture. And also we will learn how fluent bit can actually efficiently. Efficiently collect power filter and and analyze and route the logs from our containers platform and and like how it routes to like how it routes the Kubernetes metadata also and stream them into into restorations like Amazon CloudWatch S3 or open search or any observability backend of a choice. And so we'll break down also the real world use cases in this session and how we deploy in in the real world scenario. So we'll be covering the demo also as part of this session. So now we'll talk about the observability. What is observability? Observability is like ability to measure the internal state by examining the outputs. So the key pillar of observability like a log. Mattresses and events, right? And the traces basically. And observability is very important in our modern architecture because EKS in, in that today's presentation, we are talking in the context of the EKS cluster and EKS cluster in AWS around the complex workload. So observability will help to track the performance, health, and usage patterns across the services. And also the, also across the notes and the pods. And also it also provides the visibility into the control plane and also the data plane notes data plane events. And also it helps to detect the pro pod crashes and any node pressure. It'll help us detect that. And also it helps to monitor the CPO memory disc related mattresses also. Observability in EKS. We'll talk about it now. EKS is like a fully managed Kubernetes service by AWS and it supports native tools for, monitoring and logging. So a case s logs, everything. Basically it logs application logs and systems and Kubernetes logs. So we have these kind of logs. We have any any case environment, and we have mattresses like CPU Ma and also we can have custom metrics also. Also and everything is like traceable. So we have whatever the requests are coming in. And going out, like request flows and performance monitoring, like performance tracing. We have everything we can trace and it is and basically this fluent bit help us to trace all the logs and and performance mattresses. So that is the advantage of having fluent bit as a log processor in a WC case environment. So now we'll talk about the next fluent bit. So what does fluent bit. Mean, so fluent Bit is like a lightweight log processor, and it's like a forwarder, which allows to collect the data and logs from our different sources and it'll enrich them and filter them and send it to the multiple destinations like CloudWatch and like a Kinesis data fire hose S3 bucket. Or it can also route it to Amazon Open set service as well. And be, and like it can be used to ship logs to various destinations. But in, in this presentation, we'll be focusing on forwarding the logs to to do CloudWatch, with the, in the demo. So now we'll talk about why fluent bit is important. So fluent bit is like a. Critical collection layer, and it performs several essential functions. So what Fluent width does? Fluent width does the collection for us. So when we say collection, so it'll capture the logs from a different containers and and also the endpoints and it'll enrich them. Enrich stem means like we can augment the logs augment the log data with the Kubernetes metadata, we can append we can enrich it and we can append additional information to the logs, and we can add custom tags, we can add contextual information also to, to the logs. And also we can transform the locks, right? So we like, we can do structuring, we can do filtering, and we can also mask sensitive information from the logs as well. That is what it does. And the fourth one is routing. So we can direct the logs to appropriate destination based on the content namespace and any other attribute which we specify. So we can route it based on that, those attributes. So how the fluent weight works, that is the next topic. So Fluent Weight works. So it, it has a log sources locks, basically originate from different sources like containers and log files or system D, right? So these are the log sources from where it will collect the locks. And we have a input plugin. So input plugin, basically these input plugins collect the locks. And and then we have a parsers. So parsers are basically the transformers. So they transform the logs into a structure Metadata, basically we use JSON passing, and then we have filters. So filters will enrich the logs and it will, or it'll remove the specify specific logs from the specific data from the logs. And then we have buffers. So buffers is basically a temporary storage where we are storing the locks in a ma or disk in case of a high throughput. Or network delay. And then we have output plugin. So output plugin is basically to push process logs to external location like Elasticsearch or Amazon S3, or Kafka or Kinesis Fire hose. So now we will talk about the setup to set up the fluent bit we need to collect the logs from containers. So to do that we need an IAM roll. So I role we need which will be attached to our cluster notes. So we will be using a service account for that to achieve rollback access control. And also we will be in the demo. We'll see how we will set up the flow bit as a demo set to send the logs to the CloudWatch to the CloudWatch logs. So we'll be creating the log group. And also we'll be having log stream, so we'll be configuring these things as part of the help chart. So we'll see that as part of the demo that we are, will be using fluent help chart to, to deploy to do install the fluent bit. So we'll see that and, and also we'll be using YDC connector. Alright. OIDC provider to which needs to be created. And also we will need, for I am role, we need to create the IM policy. So it is part of a prerequisite. So we will be creating the IAM policy upfront. And also I'll be using vs code for the demo. So I'm already installing E-K-C-T-L, cube, CTL, and helm and GI B on my machine so that I can make the demo more more, more doable. Just wanted to make sure that we, once we have a prerequisites in place, it'll be easier to follow the instructions, which I mention here. So now we'll come to the demo. So as part of the demo setup we will we will also need the log server. So apart from the pre prerequisites we discussed, so we need to, once the prerequisite setup is completed, we need to have a application which needs, which will, which we'll be using to route the logs. So to do that we will we'll be using the log server, like an Ionix server to generate the logs, and then we will forward the service to our local environment. And we will use call command to do that. So to, to make the request. And also what we'll do is we will be watching the logs locally and also we'll be routing those logs too to the CloudWatch logs. To the cloud log group. So we will do that in a three terminal. So in the terminal one, we'll be using we'll be using, we will be using the terminal to forward the logger server traffic locally. And then and we'll be route, I mean making a port forwarding to that. And then in the second terminal, we'll be watching the logs locally. And the third time we will be making the call request. So we, and then we will be seeing like whatever the logs we are seeing in a local machine, we will be also forwarding it to the CloudWatch locker room. So this is what we are going to see. And then once everything is completed we'll see all the logs going to the CloudWatch log group as mentioned here in the screen. So it is like a sample. But we will see the similar thing in our demo. So now let's get to the demo. We will start the demo now. So as part of the demo, I'm going to further instructions. There are several things we are going to do. So first we are going to create the Kubernetes cluster first. So for creating the Kubernetes cluster, we have a EKS config YAML file, which we can see here. And as part of this yaml it is a cluster configuration. So we are going to create a ees cluster with the name ees cluster demo. And and it is a configuration, so I'm not going to go into the detail of that. So I'm going to create the ees cluster first. And then so I'm going to, run e case CT command to, to create the EGA cluster. So here's the point. So as soon as I execute this command, you can see that it is creating a cluster. So we will look into the cloud formation stack. I'll just go move to that. So it is creating all the dependencies, for example, net gateway subnet routes and route tables, control plane, et cetera. It seems like there are several components being created, so it is almost done. So we'll just wait for a couple of more minutes. We can see that the creation of the cluster is completed and we can see the associated sub components as part of the cluster is also completed. So here all the sub components, you can see like route tables, internet gateway, and and control, plane n Gateway, et cetera. All the dependent components have been created now. So we are good with the cluster creation process, which is a prerequisite for the demo. So I did that as an extra step so that I can demonstrate end-to-end demo here. So now we will go to the next step. The next step is we'll update the q config. So that we can access the cluster. So I will just, just execute the command for that. Okay. I will just go here. So we'll see here, I think a few things happening behind the scene. So I think we need to watch cluster again, maybe three. Let's give a minute and let me look at what's happening at the code. Yeah, so now it is trying to create, it's trying to create the add-ons that is VPCC and I and Q proxy. So it is trying to create that. So we may have one-on-one more stack coming in. So we, before we access the cluster using, cLI, we need to have this process finished. Then we can continue on that. So as we say, as we can see, it is it escalated on the cloud formation stack. So we need to wait for a few more minutes until that is completed. So I'll just show you. We can see that the cluster dependent dependencies is also being created. So we can, we seen the load group is also created now, which we can see on the screen. That stack is also completed, which is a dependency here. So that is good. Now we will, we can see that. So I, hopefully we should be able to access the cluster now. So I'll just switch to VS. Code. Yes. Code so we can Yeah, we can see here, right? That, the. It is showing that cluster dependency is being created. So we we will go to the this point and now we will just execute the command to access the cluster and here go. So let's say. You can see that plus was created and we can, we are able to exit the pos which are already created. So that's a good step. Now, the next step would be we will create the namespace fluent bit as part of this demo, as we are going to deploy all the resources in the fluent namespace. So we will go here and execute. The one cube, it'll create namespace fluid bit. Oh, it seems fluent namespace is created. So that's great. Now we will go ahead and we will create the YDC provider now so that we can associate the service account with the YDC provider later on. So first we'll create our YDC provider as that we can associate the cluster to the YDC provider. So I'm going to execute the EK CT L command. So as part of this process, EK CT is a prerequisite. So we need to have the EK CT installed on machine. I have already taken care of that. Now I'm going to execute EK CT command to to associate. I am ODC provider with the cluster, and this is Dan. So it has created an I am Open 80 connector provider for the cluster. That's great. I will go to the next step. So next step is we'll create a am service account so that we can have a little back access control in place. So I'm going to execute the command for that. So in order to do that, we need to have a, IM policy created in place. Which is a prerequisite. And I have already taken care of that as part of the prerequisite earlier. So I'm going to just execute the command and I will walk through the, policy later on. So here we can see. So here is I'm creating the AM service account in the fluent with the Fluent Withs, a name, and I'm attaching the policy which is EESS Cluster Demo Policy which I'll show it to you here how I created it. So I'm executing this comment now. And it'll create a stack here in cloud formation. And I will show you how it looks like. We can see that there's another stack is created. There is a, in case IT cluster demo IM service account. So that is the. That is a new new cloud commission stack is being created right now, and it is in progress. I will just quickly show you where it where I've created the policy. So I go to IN and go to, I am here. And will go to the policies. So as part of the policy, I'll just, so I have created this EKS cluster demo policy and I'm giving the full access to this CloudWatch logs as part of this demo. So that is what I did and I created this in advance so that we can, attach this policy to the IM role. So I'll just go back to cloud formation stack, as so that I can show the next steps. So we'll go back here and you can see that AM service account is also created. So the cloud formation sticks. It says create complete. So we are good now. So I'll just go back to my V code and to execute further the commands here, go to the V code and I'm excluding next. Next commands now. So now since we have IM service account created, so I'm going to just describe it and see whether AM service's account is created properly or not. And I'm going to access it. So I'm going to, to describe my service account here. So I'm saying yes, it is created so we can see that. So it is, it has created this, it is associated with this role. So the role is also created as part of this process and, the name is fluent with essay, so that's good. Now as part of the next step so I just wanted to highlight that I executed Cube CT Command. So Cube, CTLE xe also needs to be installed similar to E-K-C-T-L as a part of the prerequisites. So we need to take care of that. Now as part of the next step I will need to install hand charts for fluent bit. To the helm report so that we can and so once the helm was really add the helm report for one bit and then we'll update it so that we can access helm charts for that. So I'm going to execute helm commands for that. So as part of the Prerequisites, prerequisites, helm also should be installed on the machine. So I'll go here and, i'll just go here and I'm just adding ham repo for the fluent bit, so my, that is good. Is now I'm going to update the. So we'll be going here and updating. So update is completed. That's great. So we go to the next step. Now since PU is updated now, we'll install the flowing bit using home commands. So we have. I am going to for that. So here is the helm command. So it says Helm upgrade, install one bit. So it is creating installing the fluent bit release and then it is space flow bit. Service account we are not creating because it's all, I already took care of that and we are just giving the service account name, which is already there. And CloudWatch is going to be enabled through, and we are going to create this in US East one region and it is going to check if the log group name is there or not. So it's going to set the log group name if it is not created. And then it is going to, we are setting the stream for from flu and with, from fluent demo. So all that cloud watch logs will be will be with this stream prefix name. And then we are seeing the region should be assist one. So this execute is command. So it says affluent bit does not exist. So installing it now, so we have to wait few minutes right now it is executed, so that's great. Now as part of the next step, for the demo purposes, we need to create a demo namespace. So I'm going to create a new namespace demo where we will be installing our app. So we will just, first, we are going to create the demo namespace space. So demo namespace is created. That's good. Now. What we need to do is so since we executed health commands earlier, so as part of the health command, it has also created the config maps. So we already discussed like what the config map does. So we have fluent bit configuration stored there. So we need to update fluent bid configuration. So that we are just telling fluent bit what to output and where it has to go. So I'll just quickly show you what needs to be done. So here I'll just execute this command. Fluent added cm config map, fluent bit hyphen fluent bit namespace. So now we can see it has created. I just click issue here since it does, I'll put it in not pad. I just see how we can do it here now. In so we can see the the fluent bit lock. This is the. The fluent bit configuration, config map configuration here. So this is what we see. So as part of this process, I need to so this is the configuration looks like so this is a config map configuration and fluent for fluent bit. So I'm going to update the, the outputs because I don't need Elasticsearch here. So I'm going to add outputs for, because we are going to output our logs to car watch. So I'm going to add, add output condition there. So I'm just adding my open condition there is to make sure. So here we go. I am. So we can see that I have updated the configuration as to output the logs to CLO locks. So it is going to. I put it in US S two, and this is going to the log drop name and log stream. Name prefix is going to be from and auto grid to true if it is not created. So that is good. Just check again. How it is looking. So my output is good. That's good. So I'm good. So I just checked it quickly. Now I'm export of the next step. I'm going to I'm going to deploy. My, so as part of the next step we will be restarting the dam set for the fluent bit to check to check on it. First we need to restart. First we have to, we are restarting because we updated the confi map. So it is important to restart it so that it gets the updated configuration. Now, after I restarted the po I'm going to check the pods associated with that and making sure if they're created it, sorry. I am checking. Yes, there are. The new bots are created. Now I'm going to check the logs for that just to make sure. That it is outputting, so making sure there are no errors. So I think it is good. So we can see that log streaming is creating and everything is looking good and no error. So it is outputting it to the appropriate, cloud wash law group. And it is creating the log stream. So that's great. So as, so I'm going to click the screen, and now as part of the next step, I'm going to I'm going to deploy the AIX server as a sample application. So I'm to execute the command for that, and I already have the, an server configuration here. Which I'm going to show here. That is a log server if I show here, so that is a log server and it is ionix type, like match levels app ionix. So I'm going to deploy this so that we can forward the logs and then we can follow the locks. So I'm going to apply this. So that's great. So we, and we have a exposed server service for log server as well. So that is great. Now as part of the next step, we will open another terminal to do port to do port forwarding the locks. So I'll be executing the command for that. I am executing the command here. So here I'm doing a port port port forwarding for this hour to port 84. So this is done. It is forwarding other logs from 80, 80 to 80. So that's good. Now we'll do the next step. Now we'll open one more terminal to check the logs. So I've done, I create another terminal here and I am executing a command to check the logs. So on this terminal, we will be checking the locks whenever we make a request. So that is good. Now we'll go to the next step. And as part of the next step, we are going to make a call request. To code 80 80 and we will see if if it is able to, follow the locks. So I'll be executing the command for that. So now I'm executing the command. So for that, I opened another vice gentleman and I'm making a request. So here we can see that, this is welcome to Ionix. So we are able to hit Endpoint local Host 80 80, and we can see here and it is forwarding the logs. So that is great. Now we will go to CloudWatch and see if it is forwarding the locks there or not to check on that. So I will be showing you the Clovis Locks now. Now we can see that CLA logs here. Here we can see in US West two, there is a output location we mentioned. So it is creating the log group EKS application logs. I think that is the log group name we gave and is created that with that name. And we specified the log stream. So it was from. And we can see that it is outputting the logs here. So today it is 27th in UTC time, so 27th May. So it is outputting the logs here and we can see that we see all the log logs here, coming here, so that it's good. So now, whatever, whenever we are making the call request, it is outputting the log here. So that is a great great success here. As part of the next step what we can do is we can see these CloudWatch logs in a cloud rush dashboard. So I have I'll walk you through the process as well. So here we go to the CloudWatch dashboard, and I have already created CloudWatch Dashboard as part of this demo. So ES Locks is a dashboard name. So if I click on it, it is going to. Show us the logs. We can see so it is, we are streaming the log for this log web name application logs. So this is the log web name we gave and it is showing the logs, other, the logs coming to the CloudWatch. It is just display it here so that we can make the meaningful information out of it. So similarly, like it is just one, but we can have multiple filters and multiple log groups we can create and so that we can output the data accordingly. And we can also customize this logs based on the different metrics type, for example, CPO, memory desk like several parameters are there so we can create different mattresses. And we can output to the different logs and we can display it accordingly so that we can make the meaningful information out of it. So we can see, for example, like we have, we are at the different point of time, we have a different type of logs like different and like number of requests hitting. So that is the information we can make out of it as part of this demo. So now this is the end of the demo. Now we'll talk about the real world examples. So let's look at them one by one. So first, now we'll talk about the real world examples. So let's look at them one by one. So first one we will talk about. Is the FinTech client. So FinTech client needed a centralized logging system to, so that the challenge was to there were microservices running in a e case environment and the bank needed a, like a log aggregation solution. To comply with the financial audits. And also audit trails and also, and so that they can access the log, access these logs easily. So the solution was given in terms of fluent bit, can be deployed as a diamond set. On e case notes and we'll configure the ENT bit to to tail the container logs and them with the Kubernetes metadata. And then forwarding all the logs to to Amazon CloudWatch logs. So this was a solution given and the result was amazing. So the, the bank got the centralized logging system and it got access to, to all the logs, which helped them to improve troubleshooting. And it increased the efficiency for the bank. And they got the operational visibility as well. And they were getting automated alerts for for everything. And it gave them no fast, faster incident response. So that was a success story for the bank. So the second one second one use, second use case is about, steaming logs to Amazon Open Search company. So there was a SaaS monitoring provider the customer. So the company needed like a scalable low latency solution to, to index the logs from the Kubernetes environment and so that they can basically allow, so that they can query the locks. In, in real time. That was the main challenge they had. So the solution was like they use fluent bit as a to collect the logs and then they will increase the, all the logs with the Kubernetes metadata. And then as part of the streaming, like outputting the logs, they output the logs to Amazon open set service and then build the Kibana dashboards on top of it so that they can query these logs in real time. So the result the output was amazing. They were able to get near realtime log availability and which improved the monitoring and troubleshooting for them. And also customer gained like in like very good realtime experience and and the analytics capabilities. And also the operational overhead was reduced with this process. The third use case is about about online, online gaming platform client. So they were having they wanted to forward the logs to a third party service company. So they were using Datadog service initially to route the logs from ES to Datadog. But they wanted to make it more efficient. So the, and they wanted to find a way so that e case container logs can be can be sent to Datadog without. Impacting the cluster performance. That was the main challenge. So the solution given was like they deployed the fluent bit as a with the SGTP plugin which was configured for data for Datadog API, and then filters for applied to reduce ansl log volume. And the production. And also they use Kubernetes metadata to, in to to enhance the locks. So the result was they use the log ingestion cost through the selective filtering, and then also it helped them to improve the debugging experience. So that was a very good benefit they got. And also they got the seamless integration with this, with the Datadog. So that was a very good. Success story for them. Now, the fourth one is about is a success story of a healthcare provider. So this is a real world example where they were having the challenge was like for healthcare client they have to maintain hipaa compliance, which is required for collecting and retention, retaining the logs from the ized workloads. So that was the. A challenge. So the solution was to, they deployed fluent bit as a diamond set. To collect the container logs. And then and also then they needed Linux audit logs as well via system input plugin, and then logs were forwarded to a dedicated S3 bucket for the long-term retention. And also security team also used like Splunk to ingest from S3 for foreign forensic analysis. So this was the solution, given the benefit of this was like they ensure compliance with regulatory requirements, the HPA requirements they had, they were able to comply with them. And also it was a very cost effective and very effective for investing the, investigating the incidents also. So that worked really. For them. So the fifth one is multi-tenant logging in a, in in a company. So they had they had a they had a multi-tenant multi-tenant Kubernetes cluster. So they had multiple name spaces and parts were running in multiple namespace. So they wanted to, they wanted to ship the locks in an isolated way and making sure they are shipped in a secure they shipped in a secure manner, part tenant. So the solution was fluent bit was deployed as a demonstr with the filters to add tenant labels based on namespace and then routing rules for configured. To send the logs to a separate na to separate Grafana low key tenants using the lock output plugin with the tenant specific URLs. That was done. And then the result was amazing. They were able to get the multi-tenant lock isolation without the need for separating the cluster. And then and it simplified the platform operations with centralized logging system. And the centralized logging management for them. So these were the real world examples you wanted to discuss here. And now we'll talk about the references. So have referred to AWS official documentation to prepare this. And and thank you very much for being part of this session. And and please let me know if you need, if you have any questions. Thank you.

Slides

Download slides (PDF)

See all 61 talks at this event!

Conf42 Observability 2025 - Online

June 05 2025 - premiere 5PM GMT

Unveiling the Power of Observability using FLUENT BIT in AWS EKS environment

Video size:

Abstract

Summary

Transcript

Slides

Dharmendra Ahuja

DevOps Architect/ Data Engineer

Join the community!

Featured event

2026

2025

Info

Conf42 Observability 2025 - Online

June 05 2025 - premiere 5PM GMT

Unveiling the Power of Observability using FLUENT BIT in AWS EKS environment

Video size:

Abstract

Summary

Transcript

Slides

Dharmendra Ahuja

DevOps Architect/ Data Engineer

Join the community!