Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Monitoring observability. It's obviously incredibly crucial
            
            
            
              whether you're on prem cloud kubernetes,
            
            
            
              standard containers, wherever you're running, you need to
            
            
            
              understand what's happening in your environment, whether it's monitoring.
            
            
            
              So graphs, alerting, seeing everything on a screen,
            
            
            
              understanding what's happening from that perspective, or observability,
            
            
            
              which is more around the idea of taking action.
            
            
            
              My name is Michael Levan. Welcome to my session at Conf
            
            
            
              42. We're going to dive into a bunch of hands on stuff,
            
            
            
              but primarily I'm going to show two different realms of focus.
            
            
            
              The homegrown or open source style solutions and
            
            
            
              the enterprise solutions. We're going to walk through installing both.
            
            
            
              We're going to do data dog on Q Prometheus and we're going to talk about
            
            
            
              the differences, which should hopefully help you decide which one you're going to go within
            
            
            
              your organization. Let's go ahead and jump right in. Let's start by
            
            
            
              diving into cube Prometheus. Alright, so I'm going
            
            
            
              to open up my terminal, do a quick cube Ctl, get nodes here
            
            
            
              we can see I am on an EKS cluster.
            
            
            
              It's not going to matter though, if you're on aks GkE on
            
            
            
              prem, all these steps should be relatively similar.
            
            
            
              Okay, so the first thing that you're going to do, you're going to want to
            
            
            
              add the helm chart for Q Prometheus. I essentially do everything
            
            
            
              via helm chart. Why? It's a great package
            
            
            
              manager. It's much better than just going and calling out to a bunch of kubernetes
            
            
            
              manifests. And instead of again using 567 different
            
            
            
              kubernetes manifests, everything is under one roof. So I typically
            
            
            
              go with helm charts. Next, going to go ahead and update
            
            
            
              the repo. Once that's done, we will install
            
            
            
              Cube Prometheus. Okay, now as the name
            
            
            
              sounds, Cube Prometheus is going to be a combination
            
            
            
              of Prometheus and grafana.
            
            
            
              Can you install these separate? Absolutely. But the reason why I
            
            
            
              actually like to do it together is because Kube Prometheus gives
            
            
            
              you a bunch of dashboards out of the box that are all kubernetes
            
            
            
              related. So let's say I just install Grafana and Prometheus separately.
            
            
            
              I'm not going to have any dashboards, but if I install Kube Prometheus,
            
            
            
              it comes again pre installed with all these different Kubernetes dashboards,
            
            
            
              which we'll go ahead and take a look at in a second. And once
            
            
            
              this is installed, it takes of course a little bit because there's
            
            
            
              a bunch of different pods that need to come up. You can forward and look
            
            
            
              at Prometheus via port forwarding, or you can just go ahead
            
            
            
              and hit Grafana, right? So let's go ahead and do
            
            
            
              this. That way we can get a nice visual,
            
            
            
              right? And then let me go ahead and open up a web browser.
            
            
            
              Web browser is up. We can see that here.
            
            
            
              I'm going to go ahead and just take a look here. To log
            
            
            
              in, the default username is admin, password is prom operator.
            
            
            
              So admin from hyphen operator.
            
            
            
              And now we're logged in. So what I was referring to
            
            
            
              before, if I go to dashboards,
            
            
            
              notice here how I have all these different Kubernetes dashboards.
            
            
            
              You will not have this by default. And of
            
            
            
              course if you want to, you can import a new one. So for example,
            
            
            
              if we just take a look here,
            
            
            
              we have the argo cd dashboard, for example. So what I can do is I
            
            
            
              can actually copy the iD, go back
            
            
            
              new, import, paste that
            
            
            
              id in load, you can see it is in fact argo
            
            
            
              cd import, and then boom, we have the dashboard. So it's pretty
            
            
            
              straightforward. You can also write your own dashboards. I believe they're still written in
            
            
            
              Python, at least they used to be, but nonetheless you can create your own.
            
            
            
              But there are a lot out there already, so don't reinvent the wheel if you
            
            
            
              don't have to. But if I go back to dashboards here and let's
            
            
            
              say I click on Kubernetes API server. Now, I haven't made
            
            
            
              any requests or anything to this, so it's probably not the best,
            
            
            
              but we can see here again, another dashboard, compute resources,
            
            
            
              some cpu information, some memory information, etcetera. But point
            
            
            
              being is we can see the dashboards work and then if we
            
            
            
              want to, we can get alerting on various dashboards and all that
            
            
            
              fun stuff. So this is the monitoring piece,
            
            
            
              and if you want the full observability stack for logs,
            
            
            
              traces, metrics, you're gonna have to do prometheus, which is already here,
            
            
            
              and then tempo and low key for traces and
            
            
            
              logging, and then you'll have the full monitoring and observability stack.
            
            
            
              But there are a couple things here, and it's not necessarily a
            
            
            
              bad thing, it's just you got to kind of figure out what
            
            
            
              option you want. So this is the homegrown solution. This is open source.
            
            
            
              I'm not paying for anything, okay? But I actually am,
            
            
            
              right? I'm paying for engineers to manage it, I'm paying for infrastructure,
            
            
            
              because this has to run somewhere, so there are still costs. And again,
            
            
            
              this isn't a bad thing. It's just all going to be dependent on your organization.
            
            
            
              If you're a startup, for example, and everybody's already working
            
            
            
              13 hours a day, adding another tool
            
            
            
              may not be the best method. Or maybe it is, again, depending on
            
            
            
              how the organization is structured. So let's say you
            
            
            
              want all these tools, monitoring and observability and even APM
            
            
            
              and alerting and a bunch of other stuff under one roof. Maybe it's
            
            
            
              a SaaS so you don't have to manage the infrastructure or anything like that.
            
            
            
              Probably want to look at a enterprise paid solution.
            
            
            
              Okay. And that's kind of what we can get with Datadog.
            
            
            
              Now, with Datadog, again, we get everything under one
            
            
            
              roof, metrics, logs, full monitors,
            
            
            
              service management, infrastructure management, APM, all of it.
            
            
            
              All we have to do for this is if I go under my
            
            
            
              and I click on API keys, right? I'm going to have an
            
            
            
              API key here. I'll go ahead and I'll just
            
            
            
              create a new one. We'll just call it con 42
            
            
            
              create key, right? And then now I have this API key.
            
            
            
              So if I copy it, I'm going to head back over to versus code.
            
            
            
              Okay. And I'm just open up a new terminal here
            
            
            
              and I'm going to paste in that API key, my cluster name,
            
            
            
              ks. Quick start. Okay,
            
            
            
              first thing you're going to want to do, going to want to sign up for
            
            
            
              Datadog. It's free to sign up. You're not going to be paying for anything.
            
            
            
              I've been doing demos on Datadog for a long time now and
            
            
            
              haven't got a bill because I just delete my stuff right away. Okay.
            
            
            
              But I'm going to set these environment variables.
            
            
            
              I'm going to use helm. Okay. So if you don't have the data dog helm
            
            
            
              chart, you're going to want to add it and update it. And then I'm going
            
            
            
              to use this fairly large helm installation.
            
            
            
              And the reason why is because this sets us up for high
            
            
            
              availability. So we're going to see, you know, multiple replicas,
            
            
            
              cube state metrics is enabled, we're enabling logging,
            
            
            
              we're enabling all the logs for the containers.
            
            
            
              So let's go ahead and run this and
            
            
            
              it may take maybe two to three minutes to actually see
            
            
            
              all the information within your environment.
            
            
            
              Right? So if I head back over here, I click finish,
            
            
            
              I'm going to go to dashboard. Oops,
            
            
            
              sorry, infrastructure and kubernetes explorer.
            
            
            
              Okay. And we can actually see all this stuff in here right
            
            
            
              away, but I want to click on one other.
            
            
            
              Let's see kubernetes overview. Okay, here it is. So if
            
            
            
              I check here, I can see my cluster, I can see
            
            
            
              all my namespaces. See the monitoring namespace, right. Because we deployed
            
            
            
              Q Prometheus. And then if I click on explore,
            
            
            
              I can see everything running here. So if I
            
            
            
              look into one of these pods, maybe, you know, one of the Q Prometheus
            
            
            
              pods, we can see the cluster, it's on the service
            
            
            
              that it's in. Well in back of the monitoring namespace,
            
            
            
              the host, the deployment, replica sets, ips, everything. We can see
            
            
            
              everything here, even the metadata. Okay. We can see
            
            
            
              any related resources which this is actually really cool. It's a little
            
            
            
              graph here that we can see. Right. Troubleshooter.
            
            
            
              I don't think we have anything on. Status is ready. Alright, so we're
            
            
            
              good to go here. So we have the pod phase,
            
            
            
              which is actually nice. We get a little bit of different information
            
            
            
              here, logs, if we turn them on.
            
            
            
              So any logs that are coming in through the pod,
            
            
            
              okay, metrics, etcetera. So point
            
            
            
              being is this, we have everything under one
            
            
            
              roof. Of course if we install it, we have to install different things for trace
            
            
            
              and stuff. But everything is under one roof.
            
            
            
              Okay. So we can dive down. We also have a visual of
            
            
            
              this, right? So we dive down, we see our clusters
            
            
            
              running, we see our namespaces, see all of our workloads.
            
            
            
              Okay. We see our networking.
            
            
            
              And this is really solid. Now, Datadog is expensive,
            
            
            
              don't get me wrong. But again, this is a good
            
            
            
              implementation. If you want that enterprise,
            
            
            
              I don't even want to say enterprise grade feel because you can get the same
            
            
            
              feel from Grafana and the Prometheus stack.
            
            
            
              But if you want that SaaS based solution that's set up for you,
            
            
            
              you just have to run a couple of installations or even just one.
            
            
            
              You got support behind you all that. Data dog is
            
            
            
              a great implementation. Again, just keep in mind,
            
            
            
              you know, never think that you're not paying because I
            
            
            
              know a lot of people go open source because they don't want to pay.
            
            
            
              Either way you're paying. You're either paying engineers to manage it
            
            
            
              and the infrastructure to run it on, or you're paying a SaaS solution.
            
            
            
              It's really going to be up to you at the end of the day.
            
            
            
              Thank you so much for joining me for the session. Really do appreciate it.
            
            
            
              Hope that you enjoyed it.