Conf42 DevSecOps 2022 - Online

Debugging Schrodinger's App

Video size:

Abstract

If a deployed app error occurs, but is not observed does it happen? We’ve all been here before, apps that have been deployed but not optimised not only affect performance but can also cost you resources and money. Come with me on a journey, looking at some ways to debug and optimize NodeJS apps.

Summary

  • Steve is one of the senior senior developer advocate, the migo. Please drop any comments, questions and emojis into the chat, if there is one. I always love connecting to folks and geeking out.
  • This is an open source joke, so please make sure you share it amongst everyone. There's a couple that will be coming up during this presentation. I always love opening with them though, because it's important that we all keep smiling.
  • Steve Kuchin is the senior developer advocate at Lumigo. He says he's been a developer advocate for many, many years now. He's also worked in digital agencies, which he says is "organized chaos" He says scaling applications is fundamental in the early stages of any application.
  • Python is great for game development, application development and heavy lifting. It can handle a whole bunch of tasks that you can throw at it, particularly in the API space. Shout out to all the folks that help maintain and contribute to those too. Please contribute back wherever you can to keep all our applications happy.
  • Application deployment doesn't stop beyond that initial deployment. Monitoring and sort of tracing and observing applications beyond deployment is equally as important. There's a number of ways to identify and spot issues inside code.
  • Schrodinger's app simulates errors so we can see and understand as developers how you can identify and trace and monitor. Essentially, the app itself is built using Flask. Always looking for contributors too.
  • Tracing is important and monitoring, because you're able to identify this particular, this type of stock that should have thrown. And as a deployed application, this helps me understand how my application will handle these errors. We'll look at how that works on the cloud side of things in 1 minute.
  • Having that one error occur locally is completely different to having that happen on not only cloud infrastructure, but deployed cloud native application infrastructure. This is weve distributed tracing is able to help with this. It simplifies the mapping of your application and its footprint across that cloud native service.
  • Open Telemetry has been around since 2019, and is part of the cloud Native Cloud foundation. It takes a vendor neutral approach to observability across application metrics, frameworks and that whole community industry standard approach. Lumigo is always looking for contributors to the project.
  • These are really because of the industry standard approach to open telemetry. These are super easy to configure and install. You can use PIP with Python, you just use PiP to install the tracer library. Configure some environmental variables. How that can fit into tracing to give you a better view of what your application is doing.
  • The Lumigo open telemetry tracer runs inside the ECS application. The library then adds additional trace information as part of the application running. And again, it appears to be working on the front end but not necessarily on the back end because of the distributed cloud.
  • Always be building for scale or aps as I like to think of it from the initial onset. Make sure you trace and monitor everything you possibly can to make sure everything's working as it potentially should. Always remember to use your tech superpowers for good and be excellent to each other.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone and thank you all for watching my session on debugging Shodinger's app at Comp 42. So amazing to be here and I look forward to geeking out with you all today. My name, developer Steve. I'm one of the senior senior developer advocate, the migo. I'm going to talk about more about my background in a moment, but first, some housekeeping. Hello from the past. First of all, hello from the past. Hope you are all well. Also, as we go through the presentation, please drop any comments, questions and emojis into the chat, if there is one, or reach out to me on social media. I always love connecting to folks and geeking out. Yeah. With that in mind, I do have one thing to cover before we do get underway. And that is I have a disclaimer and it has an asterisk, which makes it even more of a disclaimer. But the disclaimer is I love tech jokes and I have a whole bunch of them. There's a couple that will be coming up during this presentation. I always love opening with them though, because it's important that we all keep smiling. This is an open source joke, so please make sure you share it amongst everyone. But how do fallen trees check for errors? Thinking music via log files? There we go. I didn't say they were good jokes, but all the same, please share, share amongst the community and just someone that you may know that just needs to smile on, well, anytime at all. Don't even need a reason. Anyway. Hello again, my name is developer Steve. I'm the senior developer advocate at Lumigo. I've been a developer advocate for many, many years now. And also writing code. Three times as long as that. No, twice as long as that. Yeah, I'll have to do my math anyway, for a long time, let's say. But funny story, I've been doing developer advocacy. That's not a funny story, but kind of is because I have a whole bunch of tech jokes. Anyway, I've been a developer advocate for many, many years now. And as such, I've been able to connect, been fortunate enough to connect to many tech communities throughout the world. And one of my favorite things is just being able to geek out and learn new things and share what I know and just help the communities do awesome things and all the amazing work that they do. But one of the things, I've done loads and loads of events through that as well. Over the years, people started calling me developer Steve. It's my social media. Like, I've been using it on social media for quite some time now, but so much so when I got married in 2018 and we combined our surnames, but Kuchin is my married name. When I legally changed my name, I thought everyone's just been calling me it. I wonder if I can legally change my name to that. So turns out you can, and I did, and it's now my middle name. So, yeah, there you go. That was the funny story. But like I said, I've been coding for many, many years now. Back in the days of Cube Basic, for those that remember it on an Atari 800 excel. Taught myself Cubasic, which, well, I mean, as the name suggests, it was fairly basic, but being relatively new to code myself, it did take some time to, well, understand the fundamentals on how the language worked and then what could be done with it. You actually needed a lot of, just as an FYI, for those that haven't encountered it before, you needed a lot of basic to do, well, anything basic, which, well, I guess that's called that for a reason, right? Humble beginnings of the industry and also my developersteve Coochin story. But over the years, I've then gone on to work through a number of digital agencies, which I always loved because literally organized chaos. And you have to take all sorts of requirements in zero time and build out all sorts of applications in zero time as well, and then support them beyond deployment. Whole other story and whole other talk right there. Shout out to anyone that is in digital agency world, because it is literally, well, this is literally organized chaos sometimes. And hopefully that is not you. But one thing I took away from many projects as such is scaling applications is fundamental in the very early stages of any application. And so you might be building that application for only ten users now, but you have to build small now, but with big ideas later. And by that I mean you have to build out your application so that it can scale as the application adoption and the scalability is required as the app grows out. And if you think about it, that application that you're deploying now, or even that project that you're starting now with that idea, or that very early client stage project has to build for scale as the application requirements grow and also as its user base grows as well. That in a sense is, I mean, that's tech debt in a nutshell right there. Because that application, if you pick that up as either a new dev or go back to a project that you've already started building, but building with that scalability in mind is fundamental to the application's longevity. And to avoid that whole tech debt and, or as I like to think of it, future proofing for your future self. Now, the flip side of this is, and weve all been here too, is dealing with what I call the Friday night rule. The Friday night rule is something that came about from all the hackathons I've done over the years, which is developers and teams sort of asking which language should we build, this particular function or this particular idea in? And so for me, sort of going back to digital agency days, in particular, having the Friday night rule, which is half a game into a particular online streaming game or game not streaming game, a particular game you might be playing going, you might be halfway through a game and all of a sudden you get that alert saying applications down, or there's an issue, there's a problem with that deployed thing needs to be fixed ASAP. Being able to avoid that is, well, the fundamental goal when it comes to deploying applications and keeping them stable and more importantly, keeping our end users happy as well. Just going back to the Friday night rule too, with which language should I use? I've always loved Python for many reasons, but in particular it's versatility and heavy lifting. And by that I mean like, I've used it previously and it's fundamentally great for these things and a whole bunch more. But game development, application development, of course, being able to build things out super quickly on very robust and matured ecosystem and framework tools and libraries, et cetera. Shout out to all the folks that help maintain and contribute to those too. And please contribute back wherever you can, because it keeps all our applications contributing back, keeps all our applications happy and, well, our end users happy too. That's always important. Anyway, game development is definitely one of the versatile things that I've used Python for previously, API development as well. Given its Python's versatility and also being a heavy lifting language, well, I consider it to be heavy lifting. It can handle a whole bunch of tasks that you can throw at it, particularly in the API space. And I do love APIs. That's a whole other talk on its own, right. Just speaking volumes to its heavy lifting nature. Data sciences, I mean, very commonly used for any sort of analytical, any analytical heavy lifting. And I've previously spent ten years as a data analyst. So Python was definitely one of my go to tools in my toolkit to be able to build out any manner of reporting, but also analytical understanding on huge data sets. It's always great for web, of course, Django. Shout out to Django and one of the application components we're going to be looking at today. Flask for light demos and light application building. Flask is one of my total favorites to use and one of my all time favorites, micropython in IoT use as well, which is really cool. Actually, something I wanted to mention here that I've used it for previously. This one time I bought a new coffee table and looked at it and thought, wow, that looks like a really big iPad. I should put 300 led addressable leds in it. And so I did, using micropython ASB 32 and you can change the color of the lights, which is amazingly cool. Also you can use Python, of course, on the back end of this too, to be able to do all the amazing coloring that you can see in this GIF here, which is always a fun project to do. Anyway, sort of going back to building for scalability. And this is where sort of something I've been thinking about a lot lately, how application deployment doesn't stop beyond that initial deployment. Because as developers, DevOps and technologists, monitoring and sort of tracing and observing applications beyond deployment is equally as important. And if we think of Schrodinger's rules of observability, for example, which is essentially a thought experiment around quantum superposition, in that if you aren't observing something, it is both not happening and happening at the same time, because that particular element isn't being observed. If we apply that same logic to application development, if you aren't observing and monitoring that deployed application, then it both has errors and doesn't have errors at the same time because you're not observing it to know that it's not having errors. So therefore it must also have errors too, which always hurts my head thinking about, but you know what I mean, like, unless you're observed these applications beyond deployment, you don't know whether an error is being thrown, and perhaps a user on the other side of that error assumes that that's the way the application is supposed to work. As the end users of multiple of application, it's always something I see, where there might be a little button that you click and a certain thing happens quickly, but then it's redirected to another screen. You can spot that as developer, as a developer, and be able to go, oh, that error is being thrown. Perhaps they don't know that that is actually occurring. These are those type of instances where observability, monitoring and tracing is able to help you identify that beyond deployed and to make sure, well, fundamentally, that if those things are happening, you can cater for them in the application and be able to continue to refine your deployed app to make sure your users are kept happy and the application is kept happy too. Additionally, that could be using resources that it's costing you money essentially as well. So always be always something to be mindful of. Of course, building these apps locally, there's a number of ways to identify and spot issues inside code, and as devs we fundamentally do it as well. So in using the core Python frameworks, for example, core debugging, weve got things like print logging, warning and PDB that's able to identify, you can use to basically output breakpoints or certain highlighted parts of the code to be able to identify issues as they're occurring. There's also libraries as well, a multitude of libraries like pprint, which I always love to do more extensive sort of output of issues and find things before my users find them and before they surface in production, because nobody wants that, particularly us maintaining and deploying said applications. Of course, our ides often use the same aforementioned methods to be able to surface those inside ides as you're building, but these applications as well. So like vs code, eclipse for example, all have built in mechanisms to identify this sort of stuff. So that brings us to the first demo, because I thought we'd look at some ways to be able to output and identify issues. And that's essentially what Schrodinger's app, Python app does, is simulate these errors so we can see and understand as developers how you can identify and trace and monitor and will keep your app nice and happy and healthy. It also brings us to another tech joke potentially, but this is a Python one and you can see it on screen sponsors, but what do you call eight bits in Python? A snake bite. But yeah, this application, and I'll have the GitHub link for this at the end of the talk so you can try it for yourself. Also, always looking for contributors too. So if you've got something you want added to the application, please open a pull request, but by all means try it for yourself. There's two demos we'll be doing today, spoilers. The first one I'm running locally to look at some of the output from the application, and the second one I've containerized and deployed to ecs, but we'll get into that in a little bit as well. Essentially, the app itself is built using Flask. I like flask. I need to make that as a meme. I like flask. Flask is great shout, but to that community I'm using alchemy on the back end to do some very basic databasing. Didn't have a lot of complex databasing needs for it. So yeah, it's pretty light. And then there's a handful of routes to get us started. Actually there's a few other ones, but these are some basic ones to get started. And this is a to do application. So I can enter to do items and then interact really, really easily just using some basic flask routes. Like get a list of to does, post a list of to dos, update and delete as well. Yeah, that's the fundamentals of it. There's a couple of other little fun ones that I've thrown in just to do more of that testing and sort of understanding of how errors handled not only through the infrastructure that the application is running on, but then also how your application can handle such things, and then also spotting and being able to identify these errors and warnings as they're flagged through. This does bring us to the first demo. So this is the application running locally. I actually don't have it running at the moment. There we go. Kick that off. Let's make sure. Yes, all running. Okay, so this is the app. Like I said, there's a bunch of really basic routes like add, which is handled through a post, the basic get, which will get the list of, make sure that's running, that'll get the list of to dos from the database, and then being able to see the output of all the routes being called and interacted with, which is also super important too, particularly locally development. But then we'll look at how that works on the cloud side of things in 1 minute. So if I create my first route, you can see in the terminal window there, it's basically showing the route being called and then the response from it as well. So from the flask server that's running, you can see that my HTTP status 200, everything's okay. We can interact with that a number of ways. So that there's an update route which basically just changes that particular database to do entry listing. It changes a status flag on it so that it changes the status type. And as you can see, it's finished, not finished, then triggers as well. And of course we can delete that as well. Now, I do have some fun things. Fun to do, special to does built into this as well for testing. So playing on the Schrodinger's cat paradigm, I have cat as a special task item. So you can see there the cat buttons now appeared as part of the cat entry listing. So if I click that now what that'll do is incrementally start to go through the 400 HTTP statuses. So that just threw a, should have been a 400. And if I click that again, that will then start to iterate through different HTTP statuses as well. So it'll actually be throwing. Why is that not. There we go. Now it's throwing a 400. I don't know why I think there was a redirect stock there. Anyway, this is why tracing is important and monitoring, because you're able to identify this particular, this type of stock that should have thrown, yeah, there's a 401. All right, now it's working. See I can use tracing to basically delve into that a little bit further. You can see now it's like there's a 402, there's a 403, so it's just going to incrementally shift through those. And as a deployed application, this helps me understand how my application will not only handle these, but how these errors appear in the infrastructure that I'm deployed on as well. And in this setup it's fairly easy because well it's local and I can see what's going on. So if I do, the other one I can do is HTTPstat and I'm going to use HTTP status 418, one of my favorites, rarely used other than, well, other than in applications like this. But 418 is I'm a teapot as a HTTP status, which is totally one of my favorites because it doesn't really mean anything but it's fun for testing and you can see that I've basically was able to throw HTTP status 418 and well I kind of knew what was causing it because well I caused it to happen, which is fairly easy. Also, first demo works era well it didn't work because it broke but then it somehow fixed itself. But anyway I think that was a caching issue inside the browser I'm pretty sure. But let's just try that quickly again. 401. Yeah, see now it's working. I reckon that was a caching issue because I hit a particular HTTP status which was one of the not caching ones or do ignore ones. So yeah anyway, fix now. Hurrah. See Tracy, so easy. It's so helpful anyway and switch back to the deck and special mention here too. So that was like working the application locally. But of course deploying to a cloud is a whole other story because that application once deployed may not necessarily be able to see what is happening behind the scenes as easily as what weve able to there. Although all those times that I've dug through so many fervor logs to find that one needle in that haystack to figure out what's going on with my application, the amount of hair that you pulled, like doing that whole exercise. Special mention here too, to all those times where you've got that error appearing in a deployed application, but you fundamentally, even to yourself, come back to that whole but it was working on my machine. I've been here so many times, like having that error that we just had happen, which may have been an at happy accident. Having that one error occur locally is completely different to having that happen on not only cloud infrastructure, but deployed cloud native application infrastructure. And by that I mean when a cloud has a multitude of services that your application will use as part of a holistic application deployment, you might have a multiple of services connected now that are working completely fine, or at least behind the scenes, seemingly aren't throwing any errors and seem to be working okay. But as we all know, as the industry changes and continues to grow, a lot of these applications not only get new functions and new approaches to how they work, but they may change without you realizing and potentially cause issues with your deployed application. Also, again, going back to the Friday night rule where you half a game in, you get that message come through on that messaging app saying hey, something's down, something's not right, something's broken. Murphy's law suggests that it's probably going to happen in the most inconvenient time possible, either half a game in on a Friday night or at 02:00 a.m., granted, it's probably going to be both, baby, as well. We've all been there, we've all had that happen. This is weve distributed tracing is able to help with this because you can take a multitude of services consumed by your being consumed by your application, or running and empowering your application and get a nice holistic map traceable map of how those transactions interact with each of those different services. And if you think about it like being able to sort of navigate that maze of multiple application sort of resource usage across a single transaction, it sort of simplifies the mapping of your application and its footprint across that cloud native service. What's really important and really fun here though, is that one error that appears on the most inconvenient Murphy's law moment. You can then trace really easily as part of your holistic application map, which is really, really cool. Additionally, thanks to some industry standard approaches to tracing, monitoring and logging, now you're able to also do that agentlessly as well, and by that I mean, and we've all been here too. You don't need to run an additional service anymore on server infrastructure or cloud infrastructure to be able to monitor and map the full end tracing of those transactions across the multiple of cloud native application services as well. So you're able to get that holistic sort of application map and do it in an agentless approach and get that nice application cloud native footprint. Yes, agentless. And this is in part two, like I said, like I was saying, we were around the industry standard approach to a new approach called Open Telemetry, which open telemetry has been around, not only been around since 2019, but it's actually part of the cloud Native Cloud foundation as well, which is really cool because it takes a vendor neutral approach, a vendor neutral open resources approach to observability across application metrics, frameworks and that whole community industry standard approach, which is fundamentally great because it means you don't need to get vendor locked in with your observability solution. And that is super important because friends don't let friends get vendor neutral locked in. Vendor locked in, rather vendor neutral for the win. Yeah, open telemetry. I won't go into this too much, but open telemetry is quite extensive. The open telemetry group have an amazing community, have some amazing write ups, a multitude of blog posts, and they're always looking for contributors to the project as well. They actually have a slack group too, of anyone looking to get involved or to connect to the community monitor as well. But essentially open telemetry, even the instrumentation or libraries that can connect into your application, all vendor neutral, all industry standard. So you can take these and connect to whatever monitoring solutions you want, including some open source ones or some auto trace, auto magical ones even, that we'll be looking at in a little bit as well. But it takes that industry standard approach across the multiple frameworks and multiple languages and standardizes it, which is great. I'm saying that as somebody that has not only run and deployed a number of services and cloud native applications over the years, but has also spent many hours delving through a multitude of logs looking for that one error that sometimes you just don't find, and you still have to figure out what your application is doing or what your application was trying to be doing. Special mention here, too, and I've already mentioned it once, but you can never say it enough times, but this is completely the open telemetry community, all open source, including all the amazing instrumentations you can see on the screen here too. So please, where possible, please make sure you are contributing back when you can as well. And so, Lumigo, one thing we always do is not only try and contribute back where we can to the open telemetry community, but we also support a number of our own open source, open telemetry traces as well, which sometimes you need alongside auto instrumentation, depending on how you're auto tracing or how you're tracing your particular application. So, two languages that I wanted to mention here, of course, Python, because we're at a Python conference, and then there's this other language which I won't talk about, but both of these are completely open resources. We're always looking for contribs and ideas on how we can build these out and make them not only more robust, but a lot easier to use. We're going to be looking at one of those in a moment as well, because they're really easy to set up and very easy to. Very easy to deploy. Just quickly on that too. And again, going back to the previous slide where I said, please contribute where you can, but I live in rural Australia and I have sheep, so I affectionately called one lambda recently, because Lambda to lamb. I've always wanted to call a lamb that, and I now have a lamb, but Lambda to lamb thanks you in advance for contributing stars. And this is me holding said Lambda. Oh, isn't he cute? He's actually a lot heavier now. This was only a couple of months old, and he's about seven months old now, so probes are not picking him up anymore. This is what happens when your lambdas get put on too much weight. It's probably a whole other joke there. These are really because of the industry standard approach to open telemetry. And again, going back to the vendor neutral approach as well. These are super easy to configure and install. I mean, you can use PIP with Python, you just use PiP to install the tracer library. Drop a reference into code. I'm going to show you what that is in a moment. Configure some environmental variables, because friends don't let friends hard code environmental things that can be environmental variables anyway, namely hotel underscore service, underscore name and Lumigo token tracer values as well. This version of the demo, I've taken the same application and containerized it and put it into ecs, essentially, yeah. And also I've got a different to do command set up to demonstrate interaction with sqs, simple queue service and how that can fit into tracing to give you a better view of what your application is doing as well. So with that in mind, let's take a look at a demo, a second demo. So like I said, I'm using the Lumigo open telemetry tracer, which you can see has been imported there. Alongside that I've got come environmental variables set in this next demo, namely a whole bunch of keys and secret accesses, region name and send queue URL for the AWS SQS service which is part of this application. And of course I've also made sure that I can handle those values not being set which is on inside the app. But we'll have the link at the end, so stay tuned for that one. You can try yourself. That's basically it. So just to be clear to the Lumigo tracer or the open telemetry tracer that I'm using here, that is it. Other than calling the library in, I'm not adding any additional code because I don't need to. That's pretty much it. So it just runs inside the application and sends all those traces through. So this is the ECS application I have running the container as application. It's pretty much the same as we saw before. I'll just refresh that so I can send through a bunch of basic path invocations which then will get surfaced inside our tracing service. So I have free tier Lumigo running here and as you can see, it's already been tracing. I'm just going to refresh that screen. It's already been tracing the cluster and the app that I've got deployed there by default. And all I've had to do to get to for this screen to happen, this monitoring to happen is just connect the two platforms, which takes a second, it's a couple of screens to go through as part of the free tier setup. What that library does is then add additional trace information as part of the application running. So if I click, I can click through to the application and see all the application screen and see some more details about the cluster that's running. But then if I click on see traces, because I've got that tracer library running, I'll then be able to see more detailed information about what's happening behind the scenes. So you can see the services that I just, or the routes that I just called just then are then already creating invocation data to come through the surface inside the open telemetry monitoring. Now I can level that up a little bit more by not only using one of the functions that we were looking at in the last demo. So if I click can again, it's going to start throwing a bunch of 400 errors like it did before. Hopefully this time maybe was almost going to throw something. I think it's because it was a 402. Yeah, there we go. That's a 403. I can even do stats 418, which is I'm a teapot. It's definitely getting unhappy with I'm a teapot. So we will bail from that one. Let's go. So like I said, I have this other one set up that does SQS creates, essentially sends messages through to sqs as similar to what we've already been doing as well. So if I do meow, I then get a meow button and that will then start sending messages through meow one, meow two, meow three to the sqs queue that I've got set up. And you can see there, it looks to be working. But again, going back to the idea of Schrodinger's thought experiment, it appears to be working on the front end that I can see as an end user, but I don't necessarily know that what it's doing on the back end because weve got that distributed cloud, that approach to deployment, that distributed application. So if we go back to our explore, in fact, let's go to transactions, the transactions tab, you can see here there's some errors or some invocations which have started to appear from the stuff we've been doing inside our application. So you can see here there's two entries for 401. 402 is being thrown as part of the errors we're simulating. And then up here, those meow to does that we were just creating are actually picking up additional traits, information not only from our sort of base application, but any services that they then interact with as well. So you can see here the flask application, all this great information that appears in it is then also showing as part of the transaction that was happening, showing a connection straight into sqs as well, which is really handy when you start to think about really large applications and the footprint they can have across a multitude of services, not only within the same cloud, but associated services to like sending sms, sending emails, or if you're dealing with an e commerce application, also transactional systems like square for stripe for example, and how those services interact. And for ecommerce applications, you totally want to be monitoring for this sort of activity because you want to make sure that again, everything's working on the back end and your users are having the best possible application experience they can. So anyway, hurrah. The demo number two worked I'm almost out of time, so I'm just going to wrap up with a few more slides here. Just some takeaways to close on, but always be building for scale or aps as I like to think of it from the initial onset. Make sure you're building with that scale and growth mindset for your application in mind and making sure that everything will handle minimal users now, maximum users later, with a little refactor in between. Always be future proofing yourself. Rinse, repeat and refine. Just make sure again that you're identifying issues that occur and also ways that you can always improve your application because it's going to make that experience so much better and make your application run smoother as well. And most importantly, make sure you trace and monitor everything you possibly can to make sure everything's working as it potentially should. Nodejs available on here on my GitHub as well so so please go check that out. Always looking for contributes, stars and comments. So please reach out on socials if you have any issues or anything you wanted to add. Just lastly, please always remember to use your tech superpowers for good and be excellent to each other. Thank you very much.
...

DeveloperSteve Coochin

Senior Developer Advocate @ Lumigo

DeveloperSteve Coochin's LinkedIn account DeveloperSteve Coochin's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways