Conf42 Observability 2023 - Online

Exploring ChatGPT for Improved Observability

Video size:

Abstract

In today’s fast-paced and complex landscape, observability has become a critical aspect of ensuring the stability of IT systems. DevOps teams are overwhelmed by the sheer amount of data being collected by their systems, in this session, we explore how we could potentially ease their pain using LLM’s

Summary

  • Gareth is talking about exploring Chat GPT for improved observability. Modern platforms like the hyperscalers make customers responsible for ensuring their solutions are architected in a way that achieves their required reliability. Let's take a look at some of the major outages which occurred in 2022.
  • Gartner estimated the average cost of downtime RT downtime per minute was $5,600 in 2014. Risk of iT outages are expected to increase in 2023. Customers looking for cost effective and unified observability platforms.
  • Large language models are a type of AI that can process and understand human language. They are trained on massive amounts of text data, such as books, articles, and websites. These models have been used to create chat bots that can hold conversations with humans. There are some concerns around the ethical implications of large language models.
  • Prompt engineering is a new discipline. Be specific. Leave as little to the imagination as possible. Use analogies. Provide samples, double down. Make sure the order of the prompts prioritize what you actually want. Do I think large language models are a panacea? No, I don't.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello and welcome. My name is Gareth and I'm pleased to be here today to talk about exploring Chat GPT for improved observability. So why do we care about observability? Well, Werner Fogels, the CTO of Amazon, famously said everything fails all the time. He was emphasizing the importance of designing systems that can handle failures. With this in mind, it's critical to plan and deploy a comprehensive observability. The solution when designing, building and operating our software solutions, we need to take into account the fact that modern platforms are largely ephemeral in nature. They are highly dynamic and constantly evolving. Modern platforms like the hyperscalers make customers responsible for ensuring their solutions are architected in a way that achieves their required reliability. Let's take a look at some of the major outages which occurred in 2022 which may have affected customers. In January 2022, Google Cloud performed a routine maintenance event which in the end went wrong in the US west region. It caused increased latency for 3 hours and 22 minutes. This affected Google Cloud networking, DNS cloud run spanner and compute engine. In March 2022, Google Cloud again experienced an outage. This time it was Google's traffic director which experienced elevated service errors for 2 hours and 35 minutes. This was caused by a change in the traffic director code that processes the configuration. It also impacted a number of large customers like Spotify and Discord amongst others. In June 2022, Azure had an outage where customers had trouble connecting to resources hosted in east US two. According to Microsoft, this was due to an unplanned power oscillation. The issue lasted for around 12 hours. It affected application insights, log analytics manager, identity service media services and NetApp files. In July, a heat wave caused cooling systems to malfunction at data centers in London. This affected both Google Cloud and Oracle. Also in July, AWS suffered a power failure in its eastern zone, US east one, the outage affected connectivity to and from the region and brought down Amazon's EC. Two instances this impacted applications of customers such as Webex, Okta, Splunk and Bamboo HR amongst others. In September, Google Cloud again suffered an outage, this time with its cloud file store list instances API which started to fail with an error code four to nine globally. Apparently, this outage was triggered by an internal Google service which was managing a large number of Google projects. It malfunctioned and overloaded the file store API with requests. These are just some of the major outages, but every day there are minor outages occurring on the hyperscalers, so keep in mind that you need to architect your solution. You're responsible for architecting your solution to achieve your required reliability goals. Gartner estimated the average cost of downtime RT downtime per minute was $5,600 in 2014. Over the years, this number has been steadily rising. According to Pingdom, today it can cost as much as $260,000 an hour in the manufacturing industry, $450,000 in the it industry, $3 million in the auto industry, and up to $5 million in enterprise industries. So a number of factors play a role in these costs, those being the size of the business, the industry vertical, as well as the business model. Today's observability challenges are complexity. So we've introduced multicloud environments, which are increasingly complex, and many legacy observability platforms are not able to keep up. The volume of data and subsequent alerts has also exploded in recent years. It's resulting in lost signals as well as alert fatigue for operations teams. We also have challenges around silos within organizations where infra, dev Ops, and business teams cause many key insights to become lost or surfaced too late because they don't talk to each other. Correlation is also a challenge for many customers. So we need to realize what actions, features, apps and experiences actually drive business impact. For most customers, this is a very challenging thing to do. Unfortunately, risks of iT outages are expected to increase in 2023. This is largely due to the fact of the current economic climate as well as tech layoffs which are occurring. So in 2023 alone, 715 companies have laid off around 200,000 employees. This results in a loss of institutional knowledge and expertise for these companies and other challenges due to economic concerns and cost reductions. So what do customers actually look for in their modern observability solutions? Customers are increasingly looking for cost-effective and unified observability platforms that can help them monitor and manage their complex it environments. They also expect their monitoring solution to not only provide real time visibility into their systems, but also to leverage machine learning and AI to predict potential issues before they occur. AI alerting is also something that businesses are looking for, essentially to reduce alert fatigue, and this also allows them to proactively address potential issues before they become major problems. Correlation to causation analysis also helps customers to identify the root cause of issues and incidents, allowing them to quickly resolve problems and minimize downtime. So there's a lot of innovation happening in the AI and large language model space. So what are large language models? So large language models are a type of AI that can process and understand human language. These models are trained on massive amounts of text data, such as books, articles, and websites. They use complex algorithms and neural networks to learn the patterns and structures of the language, allowing them to generate humanlike responses essentially to text based queries. Large language models have a wide range of applications, including natural language processing, chat bots, and language translation, amongst others. Some of the most well known language models include GPT-3 Bert, and some others. Like Chat GPT. These models have been used to create chat bots that can hold conversations with humans, generate realistic text, and even write news articles. However, there are some concerns around the ethical implications of large language models, such as their potential to perpetuate biases and misinformation. Okay, so large language models leverage deep neural networks. Deep neural networks attempt to imitate brainlike functionality. Now, neural networks have been around for some time. They are algorithms which are essentially modeled after the brain that are designed to recognize patterns. They interpret sensory data through machine perception. The patterns they recognize are numerical in nature. This is what all real world data must be translated into. In the diagram on the left hand side, you can see a deep neural network. So deep learning occurs when you use stacked neural networks, that is, networks composed of several layers. The layers are made of nodes, and on the right hand side, we can see one of those nodes magnified. A node is a place where computation happens. It's loosely patterned, and it fires when it encounters sufficient stimuli. A node combines input from the data with a set of weights that either amplify or dampen that input, therefore assigning significance to inputs. So we can see that we feed it an image of a dog. It runs through the neural network and it detects it's a dog. Large deep neural network models are pretrained from the whole Internet. This requires a significant amount of effort and engineering. Initially, you need to prepare the data. This includes actually selecting the data that you would use, filtering it, deduplication of the data, redaction. So essentially removing PII, and finally tokenization. We then adjust the billions of parameters to ensure that our model returns the expected results. As you can see, for GPD one, it had 117,000,000 parameters, but GPD three, which is a fairly new model, has 175,000,000,000 parameters. Finally, we need to actually reinforce the learning with human feedback. So we see in the diagram on the left hand side that outputs from a neural network are awarded by human labeler. So this incentivizes the model or the neural network to favor those outputs over others. These models are extremely expensive in energy costs to train, so it can cost in excess of $20 million in energy costs. This is an important caveat. Okay, here we can see some of the data sets which were used to train the various open AI models right in the family. So GPT one took around 4.8gb of unfiltered data, GPT, 240 gigabytes of human filtered data. And finally, GPT-3 has 570gb of filtered data from 45 terabytes of raw data. Chat GPT, which is the focus of this session, is a version of GPT 3.5. It's fine tuned on dialogue, using 175,000,000,000 parameters, so Chat GPT can understand and respond to a wide range of user queries, from simple questions to complex conversations. It can also generate humanlike responses and adapt to the human's language and tone as well. Chat GPT has been used in various applications, such as customer service, education, and entertainment. So typical language models use next token prediction, or mast language modeling, to predict the next word in a sequence. There are limitations to these two approaches. So limitations are they are unable to fully understand the context. Another is inputs are processed sequentially on an individual basis. What OpenAI and GPT essentially brought to the table was that they were designed as an autoregressive language model. So this uses previous words to predict the next word in the sequence. This means they use previous words to predict the next word in the sequence. They also leverage the transformer architecture, which is a deep learning model that adopts the mechanism of self attention, so differentially weighting the significance of each part of the input data. In fact, Chat GPT is able to process all input data simultaneously. So Chat GPT is a generative AI that uses the ability to learn, which has the ability to learn, sorry, and make decisions. But this does not mean that it's Skynet. There are some key differences between the two. Skynet is a fictional AI system that was created to control military, weapons and defense systems. It became self aware and decided that humans were a threat to existence. This led to a war between humans and machines. Skynet is often portrayed as a malevant force that seeks to destroy humanity. So chatgpt and generated AI models. They're used in a variety of applications, such as image and text generation, a far cry from what Skynet supposedly could do in Terminator. They are trained on large data sets and can generate new content that is similar to the original data. Generative AI models are not inherently good or evil, but their use can have ethical implications. While there are some similarities between the two, they are fundamentally different, and we still have a way to go to reach Skynet level. It's always important to consider ethical implications when leveraging AI in any application, including with generative AI models. So luckily, chat GPD is not perfect. As I mentioned, it can on occasions return nonsensical responses. It's sensitive to minor changes in prompting, it's excessively verbose and overuses phrases, and it's challenged by ambiguity. Another issue is that it's susceptible to prompt hacking or injection, so we have to take care when designing our prompts. Okay, so how can we use Chat GPT in our observability solutions? Well, there are many things that we could use it for, for instance, conversational UI. So using natural language is a very comfortable way for users to query data. Also code generation. So Chat GPT could support developers and operation engineers when writing scripts in code. Another area which is very interesting for us is intelligent problem remediation. So Chat GPT essentially is suggesting ways to resolve problems in custom code. Finally, we could look at enriching observability context using Chat GPT. So this means we will enrich problem tickets or alerts using Chat GPT to provide additional context and essentially driving more effective remediation. Keep in mind that Chat GPT's responses are non deterministic. You can see on the left hand side, I prompted Chat GPT with a question, and then I prompted it again with the same question. We can see a number of differences. Although at first glance it looks like it produced the same output, I would say that humans expect it, systems or computers, to be deterministic. Now, if we had to integrate this into our solutions, operations engineers might be thrown off by the fact that it produces different guidance based on the same input. We need to do some work to make sure that users of our systems understand and receive the correct output every time. Also, to make informed decisions, Chat GPT needs to build up a lot of context, and that's in the form of essentially prompt and completion or question and answer. You'll see this thread, this chat thread that I has with Chat GPT, where I asked it why I was experiencing additional latency between layers in my application. Now, it was very verbose initially, and as I drilled down into specific areas, it asked more and more questions, right? Eventually you would get to the answer, but you can't expect engineers or operations teams to do this every single time to solve every problem. It just takes too long. And this basically means that, well, as a result, we need to ensure that we engineer our prompts very well using guidance. You'll see later in the next slide to ensure that we get the right answers as quickly as possible. So as I mentioned, prompt engineering is important. So this is, in my mind a new discipline, and there are a number of things that you should keep in mind. There's a lot of guidance out there. In this case, I'm looking at guidance from Microsoft. But top off of the screen, we can see some basics for designing your prompts, right? So be specific. Leave as little to the imagination as possible. Also use analogies, so be as descriptive as possible. Provide samples, double down. You may need to remind the model what you actually want, because that may be lost as you proceed in your chat thread. Make sure the order of the prompts prioritize what you actually want. So order matters. Give the model an alternative, right? So give it an option to say that I don't know or I don't have enough information to do that, those kinds of things. And that translates into essentially a number of different implementation techniques. So priming the model, for example, you make sure that the model has sufficient context, instructions and other information which is relevant to what you're trying to achieve, and you use a system prompt to prime the model. Providing examples is a very good technique, and this also provides additional context to the model. It's called fusod. Learning excels can be susceptible to recency bias, so make sure that you repeat yourself with the most important points at the end of your prompt. So a few words or phrases at the end of the prompt. So use a few words or phrases at the end of the prompt to obtain the model response that you want in the format that you want. So if you want JSON, you can tell the model that yes, I want JSON in this particular format, or CSV or whatever you want. Keep in mind that large language models often perform better if the task is broken down into smaller steps. In recent months, we've seen a very big push by the hyperscalers to incorporate generative AI into their platforms, so Azure has invested heavily in OpenAI, so $10 billion. They have also announced a number of different services like prompt flow and support for various foundation models. They also announced that they will be adopting the Chat GPT plugin standard by OpenAI. AWS also announced a number of different models. They also announced new hardware and infrastructure for training models to make it more effective, more efficient to reduce those costs for training models. They also recently gaed code Whisperer, Google Cloud, in their Google I O conference, announced more than 25 products which featured Palm two and Gemini models which powered them they also announced next generation a three gpus for training models as well. So I want to leave you with some thoughts. Do I think that large language models are a panacea? No, I don't. I think we need to use them in the right way. I think prompt engineering is critical. We need this new discipline, which may require reskilling, and also correct tooling, which may not even exist today, to support engineers protecting intellectual property and data. Also security. Their security concerns can be difficult. We need to think carefully about how we do that when we engineer our prompts, and we need to understand in general the risks of the GPT family of models, as well as generative AI before we actually use it in our systems. Thank you for joining my session. I really enjoyed presenting to you. Please feel free to reach out to me anytime for further discussions.
...

Gareth Emslie

Cloud strategist @ Dynatrace

Gareth Emslie's LinkedIn account Gareth Emslie's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways