Conf42 Cloud Native 2021 - Online

Making Logs Work for you with Fluentd

Video size:

Abstract

Understanding what is happening with a distributed solution can be challenging. Whilst the solution space for monitoring and application log management is mature, there is a tendency for organizations to end up with multiple tools which overlap in this space to meet different team needs – result multiple agents/probes on every server. Many of these tools work by bulk central analysis rather than enabling events of interest to be spotted as they’re logged.

Fluentd presents us with a means to simplify the monitoring landscape, address challenges of hyper-distribution occurring with microservice solutions, allowing different tools needing log data to help in their different way. Without imposing a specific analysis tool on relevant teams.

In this session, we’ll explore the challenges of modern log management. We’ll look at how Fluentd works and what it can bring to making both development and ops activities easier. To do this we’ll explore and demos some examples of Fluentd and how it makes life easier & more efficient.

Summary

  • Phil Wilkins talks about Fluentd and how we can use it to make our logs so much more productive. He works for Capgemini in the UK and is in the process of finalizing a book called logging in action. If you're thinking about jumping a ship, then we are looking for good people.
  • What we look at logging and think about does actually vary quite a bit. From an infrastructure perspective, you're more likely to be worrying about things like the CPU and your memory resources. As you move into virtualization and containerization you're into a bit more of the logs. Capacity monitoring is still something to be considered.
  • One of our key issues is looking for unexpected errors or warnings of an error scenario occurring. We can use the logs to warn us or give us indications that a problem is going to occur. Whatever we do with logging, we actually are monitoring. It's all got to support the business.
  • These diagram is trying to help us understand or visualize the way monitoring has evolved. Years ago we used to be on single applications, single threads, single machine applications. Now each part of your application is completely transient. A real challenge there to deal with.
  • Fluentd is a highly pluggable framework. Everything is pretty much seen as a plugin within the fluentd ecosystem. It can talk to many, many different sources. It is largely platform agnostic. You can build your own custom elements.
  • An open source utility helps to simulate application logging. It means that we can configure and test our fluent d pipelines without needing to run these real life systems. We could equally also route these events to an analytics engine for looking for anomalous behaviors.
  • Graff: In the real world we've got to deal with things like scaling. We want to bring logs together from logs of different sources to be able to analyze it properly end to end. There is a modern piece of thinking around what's known as open tracing. This could address some of our operational monitoring challenges.
  • Everybody. If you want to know more then please visit my website. There's plenty more information there. The slides will be available and I can be contacted through the site if you're interested in allowing more. Otherwise, thank you very much.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, my name is Phil Wilkins. Thank you for joining me today on this session about Fluentd and how we can use it to make our logs so much more productive. Let me start by giving you a brief introduction to myself. So in five a family man, a blogger, an author of several books, I am both a technology evangelist for our team and a technical architect, which means not only do I have to talk about these technology, I have to actually go and make it work. So a lot of this is hard won, proven stories. So I happen to work for Capgemini in the UK and our team has won quite a few awards over the last 510 years, which we're as a team very proud of and continue to work to innovate and deliver excellence. As a team. We work with Oracle Technologies. So that's Oracle Cloud, that's Oracle middleware, as well as a lot of open source. And a lot of people perhaps don't realize Oracle have really embraced open source in the last few years. Again, there's a lot out there that's based on open source technologies as well as licensed product. And of course if you want to know lots more, that my website's got logs and lots of information, including material around this session. If I'm not writing, I'm also helping host a meetup based in London, which at the moment is being done virtually. But hopefully someday in the not so far away future we will be back to getting together physically and of course talking about fluentd that I have to obviously drop a little plugin. I'm in the process of finalizing these book called logging in action. You can get early access to it now and a lot of examples and content that I'm going to talk about today comes from the book. So hopefully you will find this useful and perhaps go and buy a copy. I mentioned Capgemini, it's a substantial organizations and thriving despite the difficult times that we're in today. But I would encourage you to go have a look and if you're thinking about jumping a ship, then we are looking for good people. So I always try and start my sessions with a little bit of context and logging and monitoring our interest in animals because we all come from it from slightly different perspectives as applications development, as package integration perspective, from an infrastructure perspective. And what we look at logging and think about does actually vary quite a bit. So I think the key messages here is about it's information over time. So time is an important asset. It's about what is changing and what is happening and helping us understand what's evolving now that sounds a little bit obvious, but in the different perspectives, that information and what it is differs quite significantly. So let me illustrate this in a little bit more practical. From an infrastructure perspective, you're more likely to be worrying about things like the CPU and your memory resources, disk storage allocations and details like that. These as an apps person or DevOps person, you're probably going to be looking more at the textual logs which characterize what's going on, rather than numerical statistical figures. You have 80% CPU usage. You're going to be seeing SNMP teams where file write errors occur and things like that. You're going to have your applications logging. If you're a Java person, you've probably been using something like sl for j or log for j or one of that kind of family of logging frameworks. But of course if you're then dealing with end to end stories, you've got to think about the tracing dimension of it, of what happens at each point in the solution, not just in my little bit of code. So you can track what's going on, where is something getting held up or falling over. And we can look at the different aspects and these trace information vis the log information, vis the metrics based information against the different layers. So a hosting layer is heavily influenced by metrics low on the log characteristics, on the textual content. As you move into virtualization and containerization you're into a bit more of the logs because you're going to have your VMware or Kubernetes environments writing a bit about what it's doing and so on. But you've still got a strong percentage statistical analysis going on of resource utilization and things like that. When you get into applications. We are now really starting to look at a lot more of these. What's the application doing, where is it executing some numerical data still? Because we're still collecting metrics, numbers of users signed on, things like that. And then of course if you're a business person, you're going to be more interested in business application, it's still monitoring, you're still interested in what's going on, but it more becomes who's doing what, when have they done it and how many transactions are allowing in the environment at this time and things like that. But theyre all logs, they're all numerical data. Theyre will need to be monitoring and either sucked into a graphical representation of percentages used and things like that, or into textual search engine to start analyzing. And of course then around these side you've got security considerations and your capacity even in today's world of cloud, you need to be looking after your capacity. You got elastic scaling, that's fantastic. But is somebody doing something that means that your scaling is just running away and you're going to be bring on a lot more money than perhaps you expected. So capacity monitoring is still something to be considered. So let's look at these applications of the logs. I've touched upon some of this, but obviously one of our key issues is looking for unexpected errors or warnings of an error scenario occurring. If we're getting our file system getting used up to very high percentages, then we are likely to get faults start occurring because files can't be written. So it's worth thinking about that, because one of the things that we'll see is we tend to look at logs in retrospective, rather than looking at them as opportunities to be precursors and therefore become preemptive. Rather than waiting till a problem occurs, we can use the logs to warn us or give us indications that a problem is going to occur. If we don't intervene, we can look at performance issues. I've talked about cpu and dis storage, but what about slow queries in databases? There are plenty of databases technologies out there, but most of them give you access or insight into what's underperforming or performing badly. Some dedicated tools exist to do that, but perhaps you want to look at that in the context of how your application is performing as well. What is cause and what is effect? Obviously we've got, in this day and age, the world of security to worry about as well through SiEm tooling. And that's quite interesting, because quite often you'll see a completely different set of tools for analyzing that information compared to what DevOps teams or app developers might be using. And of course, as we get into bigger and bigger systems, we need to bring the picture together from lots of different components distributed across our network to be able to understand what's going on end to end. Otherwise it's very easy for one team to blame another and say, well, our system looks like it's performing and reporting okay, problems somewhere else. Not me, Gov. The bottom line is, whatever we do with logging, we actually are monitoring. It's all got to support the business and it's all got to come back to what is going on and how does it impact our business. If you're monitoring something that's not related to the business at all, ultimately what value is it going to bring? Could be interesting. It might be a potential problem, but ultimately in most organizations everything comes back to increasing or improving the business value, and we need to overcome that. There's very few organizations that are going to let you do some blue sky thinking in the monitoring space. These diagram is trying to help us understand or visualize the way monitoring has evolved and just how complicated the problem has become. Years ago we used to be on single applications, single threads, single machine applications. Great, fantastic, easy to monitor because it was just one location that was producing it. And the chances are you would be able to spot the cause of problems quite quickly. As we've evolved over time, we've started to see concurrency and these asynchronous behaviors. So we start parking activities temporarily whilst we wait for something to happen and then pick it up, which is what we see. We node, we see multithreading in a more traditional concurrency and we get thread blocks and things like that. And these of course in the last five years or so we've seen the likes of containerization really take that to a whole new level and that's increased distribution of solutions as well. The latest generations of that sort of thinking in serverless functions means that the whole problem gains another scale of complexity, because now each part of your application is completely transient. It fires up, does its job and then disappears. So how are you going to go and see what was bring on? Because the container that was running or the platform that was running, your solution will have gone by the time you probably get to the logs. A real challenge there to deal with. Okay, so I've set out the landscape somewhat, and I want to talk a little bit about getting to fluentd now and talk about what it is actually capable of and set the scene for that. Most people have heard of fluentd because of its involvement in CNCF. If you're working with containerization, you've probably heard about fluentd as an option for log driving in Docker and its relationship with kubernetes particularly. But not so many people understand quite what it's capable of. So let me drill in and just show you what it is. So it is a highly pluggable framework. In fact, everything is pretty much seen as a plugin, or described as a plugin within the fluentd ecosystem. Now, within fluentd, it starts with a collection of standard plugins being present, but everything operates in the same model, which makes it very extensible. And one of Fluentd's benefits is the amount of work that's been done not just by the core open source team and the committees that work hard on fluentd, but also organizations beyond that. As a result, we can take in log events in many means different ways. It's not just a case of just polling file systems looking for logs or maybe talking to an SMP framework. It can talk to many, many different sources. It's very flexible. It is largely platform agnostic. It will run on Windows equally as comfortable as it will run on a Linux environment or a Mac on outputs. It is equally as powerful, perhaps even more powerful in terms of plugins, because one of the beauties of it is if you can get all these different types of information sourced in, everybody's going to come to you to get that information into their product. You've got custom plugins for Splunk, for Prometheus, for Grafana, kafka, jabber tools, slack emails, and the list goes on and on and on. So it's very, very flexible. And this is part of the core story around fluentd. It talks about described itself as a log unification layer, which means that we can bring all the logs together, unify it and then feed it to where it needs to be. That can be more than one system, as we'll see in a minute. Of course formatters are necessary. Different sources will describe their logs in different ways and structures, whether that's a CSV or JSON or XML or some custom bespoke format. But theyre are many, many formatters that can cope with that and therefore we can translate the log events into a consistent form to be consumed into the different systems. We can filter things, some log events are just not relevant beyond perhaps keeping a local copy. Just show that everything is running smoothly and you might want to just be told about things that are abnormal. So we can filter out the log events and say, well that's interesting that we can just bim because that's just the application confirming that everything is okay, we compose it. This is important not only just to help do major transformation of data structures, but importantly to translate logs as data to logs to something meaningful. We need to take what could be a huge great string of key value pairs and translate it to something meaningful that can be acted upon or queried a lot more easily. And that requires parsing buffers and caching. So in the day of distributed solutions we need to pass these things around. And you do get transient errors in networks from time to time. Therefore we need to buffer and cache things up to stop losing the information. Whilst those connections restore themselves, you can use your buffer and cache bring obviously to optimize processes of distributing the logs as well. So we get efficient use of the events and of course, storage. So if you're not sending it to an active storage solution, we might just want to write it to a file for legal archival issues. If we're creating a record of what's happened in these system, just to show all is well and that there have been no security breaches, and you might need to retain that for a long time. We just want to write that to a storage system that is very efficient and only pull that out of storage and consume it into an analysis tool if or when there is a real problem. And then, as you may have guessed, there is a clear and strong framework for building your own custom elements so you can build your own plugins. It's all based around Ruby, which means it's pretty portable and very flexible, and most people can get their heads around it. It's not like it's a very weird bespoke language. If you're an average developer, you'll soon get to grips with it, because the framework makes it very easy to understand what to look at and what you need to produce to get a plugin up and running and get it deployed. So fluentd looks at logs in the form of three parts, a tag, which is just effectively a label, so we can associate it with something that's just a shortcut, largely timestamp, which is critical, because as we said, all events are in context of time, and then the record, which can be just about anything you like. But the more meaning that's in there, the better it is, because it means that we can process the record in more detail. As I say, to make the most of fluent D, we need, or logs in generally. Actually, we need to go from capturing, just capturing bits of strings or data to actually translating it, to being very meaningful. And you've got a kind of lifecycle of events here where we start at the source, these capture, just get the information in to process. We then need to structure that information to work out how to handle it and process it with a bit more meaning. Then we're into the world of potentially aggregating and analyzing it. So looking for repeating events, things like that, and we're starting to move in to the world where other tools are better suited, and we use fluent D to get that data there so that we can maximize the other tools. So yeah, when you get into these analytical processes, unification tools is not the best thing. You want a data analytics engine that's looking at data that you've made structured and meaningful, and fluentd is passing it all to the right places to then do the analysis of course, once you've done login analysis, we need to do some sort of visualization to make it easier to consume these information, pull out what's important, not present every little detail. And of course, ideally when things are indicated as not healthy, we're generating alerts that could be a Jira ticket to give details of an error that's been thrown in the application. That could be a service desk support thing. That could be a slack message to someone looking after the system at that moment in time to say the file system on server XYZ is about to hit 100% use. Go fix it now mate. If you've heard of fluentd, you've probably encountered the two common stacks which are described as the elk stack or elasticsearch, log stash and kibana, which is this. And as you can see, I've taken that lifecycle and drawn it into these core, these the log aggregation, the unification, the basic transformation to make it meaningful, which is log stash primarily with a baby brother of beats, which is ideal for very, very small footprint deployments. You've got elasticsearch, which is your analytics and search engine, and then Kibana gives you the visualization. But there is an alternate stack that's coming up getting more and more mention and that's the EFK, which is essentially the same thing, except we now replace log stash with fluentd. Now the reason for doing or considering doing that replacement and moving from milk to fc is the fact that logs stash, whilst it's a good tool, has a bigger footprint. It also has the disadvantage, the fact it is a product aligned to a vendor in these form of elastic. Nothing wrong with that in itself, except elastic perhaps don't make it or encourage the development of plugins, which means that the number of systems that logstash can reach is smaller than fluentd by quite a margin. So that's one of the key benefits of considering moving over to the fluentd approach. There is also the fact that if you're working with Docker, there are pre built drivers to take docker logs and just pump them to fluentd. And again, there's a lot of work being made available to package up and make fluentd available to kubernetes based environments out of the box or very close to being out of the box, because you do need to obviously apply a few configuration items like where your servers are. So theyre the differences there. There's nothing wrong with elk, but there are more opportunities, let's put it that way, with the fluentd based stack and they work equally well. They just need slightly different configs at the bottom. So I'm going to do a little demo in a moment. I'm going to run it for you, but let's talk about what the demo does just to give you some real world events of the art of the possible. I'm bring to start with a setup where I've got, let's say a couple of applications running in a Kubernetes environment. They could be running in a vm or running on bare metal. It really doesn't matter particularly fluentd does not care. It works in legacy environments as equally well as the most modern containerization setups. It just happens that those people on the containerisation end of that scale perhaps are a bit more aware of fluentd. And what we got in our demo is we've got what's called a pipeline, which is going to take one log file been generated or simulated. I don't actually have real applications. I'm going to run what I've built as a log simulator, which is a utility that can generate log events or play back old log files and restamp them so it looks like it's can active live log. And we'll see the log files being filled out in real time, rather than it being one great big lump that can be passed instantly. And in that file I'm going to do a little bit manipulation, because in one file, as you will see in a minute, it describes as payload using one attribute, and in the other application file it describes it slightly differently. And what I want to do downstream is to process the log events in the same way. And it also just pumps out the log events to a file. So we got an audit trail of how log events have been tweaked. And that could be one node or it could be many. Okay, so that's the aggregation and we can scale that out and we just get more and more fluentd nodes doing that preprocessing. So I'm going to do with it, I'm going to do the fun, funky thing, because what I often encounter is a lot of people think about logs as things to search after an event, but we should be and can be preemptive. If you've got a log event that indicates something is going to become a problem. And this is a great way of dealing with legacy applications. Over time you'll understand what the log messages mean. And sometimes you get these little log events that look quite innocent but actually indicate a serious issue. And one of the things we can do with fluent D is actually process it and go actually, whilst that logs innocent, it's a precursor to a very very significant issue and therefore you can act upon it before your system collapses because something is broken. In the demo, we're going to have just one node running that and one node running on the source, both on the same machine. One of the things that I mentioned fluent and fluent bit its baby brother, are tiny footprints. So you can run all of this in these most simplistic of machines if you want. And what we're going to be doing on this second node is again filtering it out, going to write all the logs to a standard out, which could be a more practical use case of warboard. And I'm going to send to slack anything that has a particular context in the log event. So you can see it pushing a message to me very, very quickly. And of course therefore we need to run this across a network of some sort. I have talked about and mentioned these existence of elasticsearch and Prometheus and Sim. We could equally also route these events to an analytics engine for looking for anomalous behaviors and for doing other mining of log events as well. And we could separate that out that we can do both. It's not of one or the other. We're not tying our notifications and alerts to the analysis these which if you used splunk or tools like that, they will alert off the analysis these once they've ingested the log event and theyre run their queries. But for today I'm not going to do that. I'm just going to concentrate on this little piece just because it's quite funky and it gives you something fairly visual to see. So let me show you the setup. This is the configuration file for these first server node one which is generating the source. As you can see it's a combination of XML style teams and name value pairs. It's a bit strange to start with, but the moment you get your head round it, it's incredibly easy to work with. Here are the two sources. You can see two different files coming from two different solutions, and I've set it up so that we can track them. So if my fluentd node was to be stopped temporarily, or it gets reconfigured or moved around the network, it will know where to pick up and carry on. It can deal with all sorts of things like log rotation and stuff like that. And it's going to convert the payload. Let's move on. And the next bit is filter, which is going to actually apply a record change. Now all this is doing is because they're coming in with slightly different structures. This is going to adjust the log event so it has the same structure. In one message. I'm bringing the payload, the core payload in with a tag called event and the other one is message and I want to make it consistent and I'm going to actually just flag that it's transformed and then I'm going to do the routing and sending of the events. As you can see, I've got these file store there and I've got a little bit of buffering going on, just so the I O is a bit more efficient. And then on the right hand side you can see I'm sending to another node on my machine every 5 seconds the log events for the second node to process, which gives us this configuration. And you can see now I'm saying, okay, I'm going to accept from anywhere this is a pretty promiscuous ingestion of log events that have been transmitted. I'm then going to filter it out and I'm looking for anything that has got a reference to computer, so it can be a lowercase or an uppercase c for computer or anything like that out of my log events. And then if it's got that value, it will go to my slack channel. And just to show you, here's this open source utility I've built for helping to simulate application logging. It means that we can configure and test our fluent d pipelines without needing to run these real life systems. And if we want to see what happens when an error occurs, we can ply log file with errors in into the environment as if it was live and see or confirm that the right alerts are being generated and whether our analytics engines is going to be able to pick up these details. And as you can see, importantly here is how I am structuring the payload. And you can see I'm saying it's going to be an event. And that's the payload log payload. And over here I've called it message and I've put it in a slightly different sequence in the file, but it will find the message piece and do the analysis to determine whether it talks about computer. And this is what we're going to see in a moment. So let me drop out of the presentation deck and we'll see this happening for real. I am bring to just run a little startup script which is going to fire up a number of shell windows which are going to give me the two log generators which you can see here. One is going to be fairly quiet. One is going to be very chatty because I've got the different settings. Then I've got two fluent D nodes. There's server one, here's server two. And you can see it's just reported the console, the configuration file it's picked up and what it's up to. And you can see over here on fluentd node two it's actually receiving the log events and theyre all coming through with message in the title. All right. And up here you can see that's very quiet. This one is being verbose and it's showing me the log event source file it's processing and how many times it's looping through that data set to create log events. Just to understand, rather than playing real log files or application logs and generating them, I've set my tool up to run from a mock data set which is just basically full of jokes. And what we've got over here, this is my slack desktop client and you can see here that periodically it will scroll as new messages come in. And it's just the messages that have got a reference to computer. So you can see, there you go. Node two says oldest computer can be traced back to Adam and Eve. And you can see every one of these. Whilst these are just junk jokes being pushed through, you could imagine that this is a genuine log event. And whilst I'm just showing what the text is, you could very easily say okay, when I identify this event, I'm going to put a more meaningful message in. You could potentially put hyperlink into your ops processes to how do I resolve this? And things like that. So that's what it's doing. And if I come into here, what we're going to see, if I just stop that, if I go into the demo folder, we can see there is log file being generated and there's a folder for it, for it. So one of the logs we've got is log rotating. So if I go to label, label pipeline and we look in here, I haven't left it running long enough, but you'll see given enough time it will do log rotation. And you can see the two position files that I mentioned. So it's tracking what's going on. So if I restart stopped and restarted, they would carry on consuming from where they left off. Okay, I can see it's just on a log rotate there for label pipeline file output. And there's one and there's two. So there we go. Let me take you back to the deck and we'll wrap up. So one of the things that's, yep, that's fine, that's a little bit Mickey Mouse, but you'll get a sense of the art of the possible there. And certainly in terms of making things proactive, why filtering out log events is certainly a possibility, but in the real world we've got to deal with things like scaling. I mentioned the complexities of highly distributed solutions and the fact that we want to bring logs together from logs of different sources to be able to analyze it properly end to end. Are we seeing messages, know API calls coming in that hit our API gateway, but never seem to move beyond that? Things like that. So as I say, lots of challenges. There is a pretty modern piece of thinking around what's known as open tracing, which is the idea that we can start to create a means to trace events through the different applications. We got to deal with the possibility that in a lot of organizations, different teams with different jobs will want to use different tools. And in that kind of situation, you either have one tool pulling all the logs together and sending them across to splunk or whatever the dbas are using, or the infra guys if they're using nadias, or you end up with your database server having three different agents, which is a bit overkill. And of course the network setup is that much more complicated as well. Whereas if you've got a log unification engine like fluentd or log stash for that matter, picking all this information up, sending it over to these right places, or sending it to a central distribution point, then your network is a lot more controlled and your security guys are going to be a lot happier. I mentioned briefly these idea that doing this sort of thing, we can these address some of our operational monitoring challenges around legacy applications where people don't want you anywhere near it. So you can just start interpreting the logs and saying, well, actually this innocent logs log message is significant, or when it reports this, it's actually the root cause is that those more delicate systems, you can do more preventative action rather than saying it's in kubernetes, it will recover and sort it out for us. So here's a scaled deployment where you can see, I've got five servers across the top. Now we're collecting different types of logs, different information from these servers, and they're sending it to a concentration point or two concentration points, which are then processing those logs and doing what's necessary. The ops alerting, potentially the very minimum, bringing it all together in a common platform for doing the analytics, you can see the dash lines where we could put failover and resilience into the setup as well. So if concentrator node in the form of one of these mid tier servers is enabling, then the sources of the logs will look for the next node that you've identified and use that. And that can then start passing the logs through so you're not losing stuff. The important thing is that's classic deployment. There's no virtualization here theyre is no containerization going on. But I could make this containerized or virtualized. And if you looked at this from the perspective of a containerization model, on the left I've got the apps now just spitting out to system out or a file that's set up appropriately in the container environment. And then we have a demon set, as kubernetes refers to it, running fluentd, picking up the logs from the different bits of kubernetes, plus these log capture, that is going to standard out and system out and assuage standard error that's happening in the applications in their pods and sorting that out and getting those routed to the right places. Or you can go to the next level and start colocating either fluentd or fluent bit into a pod, which is what you've got on the right. And we could adapt the other pods to be more like that. Or we could even go to using containerization patterns like sidecars to deploy our fluent bit or fluentd nodes. That gives us enormous amounts of scaling. The other thing to keep in mind is that a lot of people like Prometheus and Grafana for visualizing and reporting on their activities and what's going on. Nothing to stop you from doing that. Fluentd becomes a contributor to the whole process. This is the architecture for a Prometheus setup. And you can see on the left hand side we've got fluentd feeding details into the core server and taking, we've consolidated data and Prometheus monitoring data out on an export on the left hand side. And of course if Prometheus is filtering out or doing correlation analysis, it could generate new log alerts and you could then use fluentd to route those to all the different systems. So in addition to the raw data, you could be taking the Prometheus alerts and combining that with your log analytics platform, for example. So not only do you end up with a history of what happened, you can put into your core analytics log analytics when it was recognized that there was a problem and alerted. So you can then show when something was addressed or should have been addressed in the process. Yeah, that could even be. Whilst Prometheus is done in, fluentd is actually kicking off an active process to remediate. Now this is a platform set up that has been shown logically that we've established in the past. It's a real use case. And across the bottom I've put the lifecycle that I introduced earlier on into the diagram. And on the left hand side you can see the different types of sources that we're handling. So we have got kubernetes clusters involved and we are running a demon set. So we're using fluent D rather than fluent bit because it's a little bit more usable for that kind of use case. But we've also got dedicated virtual machines. We are running this particular use case in a multi cloud setup. So we are doing cloud watch, which is giving us specific information from that cloud vendor. When we're pulling that and combining that with other sources of information such as logs for J. So we've taken a legacy app that's been migrated to the cloud and rather than trying to configure it all so that that goes into cloud watch, we've just said, okay, we'll leave it as is untouched and we'll put a fluent bit node in play and there's a bit of open tracing going on. So we've got the Jaeger collectors and Jaeger analytics to help with the tracing, showing these performance of open tracing information. And it's complementary. It's not instead of, and we could easily put that into fluentd as well to do further processing on those trace steps. And of course we want to analyze it. So we've got the elasticsearch and Kibana for visualization and we've started to put a few small alerts into a slack channel and a secondary email that could easily be pager duty dealing with the fact that you might be running around these clock and who's on call overnight rather than just slacking. Everybody. And that's me done. Thank you very much. Thank you for listening. I hope that was really helpful. If you want to know more then please visit my website. There's plenty more information there. The slides will be available and I can be contacted through the site if you're interested in allowing more. Otherwise, thank you very much.
...

Phil Wilkins

Senior Consultant & Tech Evangelist @ Capgemini

Phil Wilkins's LinkedIn account Phil Wilkins's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways