Conf42 Cloud Native 2024 - Online

Tracing Triumph: O11Y in CI/CD

Abstract

In Agile development, CI/CD is crucial. Engineers collaborate for rapid, high-quality product delivery. The CI/CD pipeline is considered mission-critical infrastructure, facilitating rapid, quality-driven releases to meet evolving consumer demands and stay competitive in the market.

Summary

  • Siddhartha Khare: Why we need observability in Ci CD pipelines. Opentelemetry is an incubating project in CNCF umbrella. With observability you will be able to monitor your CI CD pipeline more effectively. This is followed by a short demo on how these things can be implemented.
  • The process is very simple here. What we have leveraged is opentelemetry. Once you install the plugin, go back to manage Jenkins inside system. You will see the information about the response time throughput, the error rate of the application. Here are the errors, which potential errors due to which the anomaly detection engine has occurred.
  • I'll end my session in for observability in Ci CD pipeline. I hope you have enjoyed this session and please give it a try. If you face any problems feel free to reach out to me on my LinkedIn. Happy learning.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Everyone, thanks for joining me in my today's session on observability in Ci CD pipeline. I'm Siddhartha Khare. I'm working as a technical account manager with new Relic. Prior to joining new Relic, I was working with Citrix as a software developer. Today's agenda will revolve around a quick recap on opentelemetry. What is continuous integration? What is continuous delivery? Why we need observability in Ci CD pipelines what are the benefits of implementing observability in Ci CD pipelines? And all of this is followed by a short demo on how these things can be implemented. Let's discuss about opentelemetry so Opentelemetry is an incubating project in CNCF umbrella. It is formed by merging open tracing and so if you have used Eger or Zipkin, you have already experienced the taste of open tracing. It also provides you a certain set of APIs libraries integrations so you will be able to collect your data more efficiently. It also provides you a standardized approach of collecting the data from your applications, which means what type of data you should be collecting from your applications or from your systems to understand how they are performing. What is continuous integration? So as the name suggests, it is a practice for merging all the developer code into a main code several times a day and it will help you to reduce the risk in already released features. It will decrease the number of the bugs which can be or in your application. It will minimize the integration issues which anyone can face. What is continuous? So the approach where the functionalities are delivered frequently, continuous delivery and with continuous delivery you will be able to deliver faster to the market in lower cost and the quality will also be enhanced. One can say like CI CD is a combined practice for integration and delivery. Let's discuss about why you need observability in CI CD pipeline. With observability you will be able to monitor your CI CD pipeline more effectively and you will be able to resolve the issues before they escalate, which will save your time and resources. By understanding the ins and outs of your CI CD processes, your team can make more informed decisions about resource allocation, process change or tool consolidation. You can also detect anomalies if there are any in your system. You can detect the performance issues. You can identify if there are any misconfigurations which have occurred, or the teams which work in the silos in more efficient manner. Let's also look at some of the benefits which you will be getting after monitoring your CI CD pipeline. So first one is about isolating the faults where it is a practice of designing systems such as when an error occurs, the negative outcomes are limited in the scope. Limiting the scope of the problems, reduce the potential for the damage and make systems easier to maintain. You will get the faster MTTR. This measures the maintainability of repairs and set the average time to repair a broken feature. You will see the faster release rates. You will also see that within your team the transparency accountability will improve. CI CD is a great way to get the continuous feedback not only from your customers but also in the demo of how we can implement the CI CD pipeline monitoring and what are the prerequisites to achieve this and how this will help you in a longer run. So let's quickly take a look on my Jenkins server what all jobs I have. So there are a couple of jobs which I have in my server and I'm running them. Let's take example of this particular job. I'm cleaning up the workspace first, then I'm checking out my code. I'm building the docker image, pushing that docker image to my Docker hub count and deleting the images from the system. Then I'm updating the Kubernetes deployment file and eventually pushing it to a deployment. So I'll also show you the actual app looks like. So if I go to sage you can see that I have an application, it's a Kubernetes hosted and I've just shown the view. If I go to this particular page, you can see I have a web page as well, produces an error, it will tell you. So this is the work of my application. Now at any point of time, let's say I want to monitor what my Jenkins server is doing, where it is failing and you don't have the access to portal. So what you will be doing, the process is very simple here. What we have leveraged is opentelemetry. So if I go to manage Jennicins search for opentelemetry plugin. So in my scenario I have already installed it. So I'll show you here. So this is the plugin which we will be using. Once you install the plugin, go back to manage Jenkins inside system. You will see multiple locations, multiple configurations which you need to do. So just search for opentelemetry and you need to provide the opentelemetry endpoint. So this endpoint can be any endpoint of your back end service which you are planning to use. In my scenario I'm using new relic. So the endpoint is otlpnrdata net 4317 is the port. The authentication which I am using is the API key. So I have leveraged the header authentication, I have named that header and the value is neuralik ingest license key. Okay. And I click on save. So as soon as I save this automatically within my neuralik portal, if you go to ApM and services inside opentelemetry, you will start seeing the here you will get the information about the response time throughput, the error rate of the application and the instance of the application. Now if you want to that what all other components your Jenkins is communicating, you can go to service map. If you want to validate what are the different type of transactions which are happening. So now the catch is when you see build in the caps. So all these are different builds which I have in my jenkins. So if you see here Argo CD Gitops worker bench Argo CD Gitops Argo CDCI operation Gitops Argo CDCI so if I go back here, you can see all these names if you want to dig deeper into it, of what is happening. Because let's say this is the main section which is taking 93%. So if I click on this particular transaction, I'll see the complete percentile graph, the throughput, and you will see the traces which are here. So let's see if there is any trace which has error. There are no one, but let's see if I go to this particular trace. You will see the entity map here, just for this particular trace. And you will see a nice indicator as well. Do a drop down, you will see all the process pens, you will see when the pipeline is starting, it starts running, the agent got initialized, and here you will see the stage wise approach. So the cleanup workspace took this much time, the checkout took 1.65 seconds. Building the docker image, it took 3 seconds. Then here, in pushing the docker image while it is running some shell command, it took some time. It is also showing me the anomaly which says this span is 3.79 seconds slower than what an average it was. Then it is deleting. So all these stages are coming up here. Now at any point of time, let's say here things are working fine. If I go back to my distributed tracing directly, I'll show you the different type of jobs which are in running the post details. So let me just filter out with errors. Okay, so this is the particular build where the errors are coming. So if I go to this particular and I will click on one of the trace here you will see the indicator is red and orange. If I click on drop down you will see the complete process span. Here you are seeing one anomalous span. So this anomalous span is generated based upon our anomaly detection engine. Here are the errors, which potential errors due to which the problem has occurred. So if I go to in span, you will start seeing when the pipeline started. Here is the first error. So let's quickly check how that error will look like. So it is tracing to clone from a wrong repo. If I go to SCM, you will start seeing this data. If I go to, let's say build docker image will start giving this. So now we know that this is the problem here and there was also an exception. So if I click on this particular exception, it will show me the complete stack trace of what is happening, which will help me to dig deeper how you can fix this problem. These are the information which you will start getting once you instrument it right. Even you will see the logs as well. So if you go to logs, you will see the details about that. There is some change which got detected in argocd Githubs workbench build, right? So it will tell you which particular build has changed. It will give you the details about the Jenkins URL. Once you start seeing this data, there are certain scenarios where every time you just don't want to go inside this and see what the problem is or how your Jenkins server is performing. So for that we have the out of the box dashboards. So if you go to any of the dashboard, you can see how your application is, how your CI CD pipeline is performing. So you can see the dashboard details like number of builds, 40, 20 failed. You will see the executed jobs, you will see the average job duration, you will see the job failures which step took the longer time, longer duration. You will see the number of the steps, you will see all those steps, the count, you will see the max error, like what type of errors are occurring most, and if there are any failed steps, you will start seeing those. So let me just increase the time frame to 3 hours and let's see. Yeah, so if you go to 3 hours, you will start seeing the steps which failed. You will start seeing the longer steps and you will see the number of builds and the failure builds and you will see if there are what is the queue time? You will see all these details coming up out of the box. I'll end my session in for observability in Ci CD pipeline and I hope you have enjoyed this session and please give it a try. If you face any problems. Feel free to reach out to me on my LinkedIn. Thank you. Have a nice day. Happy learning.
...

Siddhartha Khare

Technical Account Manager @ New Relic

Siddhartha Khare's LinkedIn account Siddhartha Khare's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways