Conf42 Large Language Models (LLMs) 2024 - Online

How to use ChatGPT without getting caught

Abstract

Explore the detection of AI-generated text! Learn from scratch how AI text generation works. Discover classifier systems, black/white box models, watermarking, and manual techniques for detection. Uncover the tricks to bypass these systems by diving into the world of detection of AI-generated texts!

Summary

  • Hi, are you ready to answer the question that everyone's a bit too afraid to ask? Let's find out how to use chat GPT without getting caught. We're just gonna try to understand how the systems to attack AI generated text work. The key to know how to get around them.
  • Presentation will help us understand the common techniques that exist to detect generated text. We're going to be speaking about four techniques, but actually just three. The first of them is a classifier. We should be aware that classifiers tend to make mistakes.
  • Ghostbuster is just another system to detect generated text. It's better than the OpenAI classifier, which could be accessed anyway. And it's state of the art. If you want to use a classifiers, this is the one you should probably consider.
  • We're going to be speaking about not four but three systems for detecting generated text. The first type of analysis is a thing that we call black box analysis. Let me skip over this part to speak about white box analysis, which works like this.
  • Chat GPT will be more surprised to see a human text than a machine text. If something can be surprising, what we do is normalize the thing by the expected surprise of an LlM on that text. It's a great idea, and this system can be tried online.
  • The fourth technique is watermarking. Watermarking is a system of two lists, a red list and a green list. These are the four techniques that will allow us to understand how the detection of AI texture works. Here are some tips on how to attack the thing just using your common sense.
  • Deeper. write allows you to just select what you want us, your rewritten version interactively. Also, it's nice to use the active voice. Don't let the system choose for you. If you tell the system everything that you want and are very specific, it will pick up that way.
  • But I'd love to leave you with something which is a QR code to rate the session. If you could give me some positive feedback, I really appreciate it. And if you have things that you think that I could improve, that's also very helpful.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, are you ready to answer the question that everyone's a bit too afraid to ask? Let's find out how to use chat GPT without getting caught. And to do that, we'll start from scratch. So you don't need to know like any programming or nothing at all. We're just gonna try to understand how the systems to attack AI generated text work, which is actually the key to know how to get around them. So, are you ready to uncover the mysteries? Thanks a lot for joining. We're gonna have three parts in the talk, but I'd say that if you really want to get into like, the interesting part, so that's the real answers, you could just skip to this part here. So the circumvention that's gonna come around like minute 2020 something. So if you want to just get answers and that's it, you can just skip to that part. But if you want to understand everything, I'd say stick with me for a little bit, and we are going to be understanding how the detection works, which explains the rest of the presentation, up to you. But I really suggest that you stick for the first 20 minutes, kinda. And the first of the things that we need to know is, what are llMs? So, like, there's lots of ways to, like, explain this thing, right? But in the context of this presentation, we just care about the fact that LLMs are text predictors. So it's like, you know, like when you're writing like a text for someone or something like that, like, your keyboard is like predicting the next word all of the time. So it's just literally that. So another lamb just completes the text all the time. Something like this, you see, like, it's writing the next word or the next part of the word all the time. That's an LLM. That's what we need to know here. And the only difference between like, chatgpt or something like bar to, etcetera, and your phone is that chat GPT is, let's say, intelligent. Like it's a big system, lots of parameters, etcetera. And your phone is something small, something less intelligent. But the way that they work is the same. They just write the next word all the time. So that's what we need to know in this presentation, and that will help us to understand the common techniques that exist to detect generated text. And we're going to be speaking about four techniques, but actually just three. You'll see why. And the first of them is something super simple. So, like, I'm just gonna really go over this part very quickly, the first thing is a classifier. So classifier is just the thing that tells us, yes, it's human, or yes, it's generated. No, it's not generated. So it's human. How do we do that? We just give lots of examples to the system, and we hope that the system will just know somehow how to, well, figure out if a text is generated or not. There's an interesting system that was created by OpenAI to detect generated text. And if you go to their website, you see here, like, this is the OpenAI website, and in January 2023, so that's last year they created, or, well, they launched this website. That just gives us an error right now. So why am I showing you the website then? It's because, like, they discontinued the system because they said our classifier is not reliable. People are, like, trusting our system, but actually, like, it doesn't really work. So that's one of the things about classifiers. They are not. We should be aware that they tend to, well, make mistakes. And specifically the OpenAI one, it's not very good. That's why they just retired the thing. But it also raises some questions. Like, you're OpenAI. You created a chatbot that writes answers and people are using it, but at the same time, you're not able to create a system that really works for detecting the things that you are creating. That's more about ethics and those things that I just don't have time to get into, but, well, it raises questions. Right? I don't know, that's just things to speak about in a different talk. But I'd love to mention another thing, which is Ghostbuster. Ghostbuster is just another system to detect generated text. You see, like, I get some score, etcetera. It's better than the OpenAI classifier, which could be accessed anyway. Like, it's on hugging face. So, like, if you search on Google hangface, OpenAI classifier or something like that, you could still use the system. But this one that I'm showing you right now, Ghostbuster, better. It's just better because they just take lots of different metrics to train normal classifier. Well, it's just come out pretty recently, some months ago. It's state of the art. So I'd say if you want to use a classifier, this is the one you should probably consider. And it's a good system, works fairly well. And yeah, it's state of the art. So that's four classifiers. The first of the systems that I wanted to, like, just have a note on. And just for the sake of time, let me move on to the second type of systems, which is the second type of analysis that is a thing that we call black box analysis. However, as I said before, we're going to be speaking about not four but three systems. And that's the reason why we're not going to be really speaking about black box analysis. And that's just because this type of analysis is not very effective today. It's kind of outdated, you could imagine, like, it's not the best tool that we could use right now for detecting generated text. So let me just skip over this part to speak about white box analysis. And, well, to speak about white box analysis, I think that the best thing that I could do is just show you how it works. So let me show you this tool that's called GLTR, okay? That's a website. You can search for it, and it's this one here. So here we are, GLTR. I'm gonna pick like, yeah, like a sample text. For example, this text, it was written by GPT-2 and what we see here is that we have an analysis of the distribution of the probabilities of each of the words or parts of words in the text. So for example, if you look into this word. So I've been a gamer for over ten years during. And then we ask the system, what do you think comes after this word here? The system tells us, I think that probably the next word is that if it's not that, then it should be those this my, the, et cetera, et cetera. So it's giving us a distribution of probabilities of the words that the system thinks that should come after the first bits here. And that is the, well, it's a great way to see if the system would write this text. So if we see lots of green things as here, it means that the system would have written this text because like all of the words are things that the system, things that should go here. And that is something very different from a human text. Because if we take an example from a real text, we see that the system is very surprised to see things like learned facts or represents or structure. Gan here, those are words that GPT-2 would have never written. If we see a text with like lots of colors, probably the text is human. Now we're going to take a look at like an evolution of this thing that is called the tagpt. And that thing is like probably the number one result that comes up. Like, if you just search for the tagged AI generated text, something like that, on Google. What's the thing that, or like, the twist that detect GPT does? I love to use an example to explain that thing. So when I talk, I tend to use like, a lot, like, like that word. I really use it, like a lot. So if you see a text that contains, like, a lot of likes, probably it's my text, right? Like you would say that the probability that that text is mine is very high. But if we take a text that I spoke or wrote containing lots of likes, and we change the likes in the text by another word. So like, for example, we take the likes and we just write okay, instead, or like another word, right? Like we just change the word. Then we can do this thing that's called, well, just a perturbation. And we take the score of the perturbation and the original text. So the original text, the text with lots of likes, will have a super high score for being my text. However, the rewritten text, so the text that doesn't have any likes probably will have a super low score. So the detector will not think that the system is sorry that the text is my text. So what we do is we take the scores of the original text and the written text, and then we compare. The question here is, was the original text much more likely to be my text compared to the written version? So if the answer is yes, then we conclude that yes, probably the original text was my text. And what we do with GPT, or any of like, any LLM, really, is that we just take like a text, we do some modifications to the text, so we perturb the text, and then we take the score of the original text and the perturbations, and we compare again if the text originally was written by an LLM, and then we rewrite the text, and it doesn't look like it's being written by an LLM. That means that the original text was written by an LLM. So that's how the text GPT works. Very interesting idea, very smart. And it's one of the state of the art systems as well, a really, really good system. So that's attached GPT, but it has a problem. And it's a problem. I love this name, the capybara problem. I didn't come up with it, but I think it's a great name to describe something that is really important, which is that if I go to chat GPT and I ask chatgpt about something like, rather strange, something that people don't usually speak about. So in this case like capybara that is like an astrophysicist. If I ask Judge GPT about the thing, of course I will get an answer about capybara that is an astrophysicist. It makes sense, right? Because I just asked about this thing. So we have conditioned probability here for this thing being the answer of my query. It makes sense. The likelihood of this text knowing that I just asked about this is very high. It's probable that I would get this answer when I see a text in isolation. So, like, for example, if I see a high school report or something like that, I never get to see the prompt of the user, so I don't know what the person asked. Chat GPT. Which means that if I just see a text without any context, it can be very surprising. I knew where it was coming from. So I understand that this thing is likely. So I can say, oh, it was written by an LLM, because an LLM would likely write this thing when it's being asked to write about a capybara that is an astrophysicist. But if I just see this text and I don't know that it's been written by chatgpt, then I would say, oh, no, an LLM would never write about a capybara that is an astrophysicist, because, like, it's, well, a super weird idea. And, like, an LLM would never imagine that this thing could be, like, a real thing, you know? So it's very surprising to someone that doesn't know the context. That's the capybara problem, and it's a hard problem to solve. Well, a group of scientists tried to solve pretty recently, like, some months ago, and they came up with formula that I'm not gonna get into, but the idea is just to measure, well, an idea that I think is very interesting, which is that, as we just saw, like, 30 seconds ago, something can be surprising. So if something can be surprising, what we do is that we normalize the thing by the expected surprise of an LlM on that text. That can sound confusing. So I'm just going to give you an example so we understand, what's this thing? Let's take a text, right? Like, the capybara text. That's confusing or, like, that's surprising. We wouldn't expect to, like, see this text just coming out of, like, NLM. So the perplexity, the surprise, the. Just, like, the likelihood of that text, or in, well, the unlikelihoodness of that text is very high. So we wouldn't expect to see this text. But why would we not expect to see that text? There's actually two parts of something being unlikely. The first thing is that in this case, this text is unlikely because it's unusual. So, like, the topic is something very creative, so we wouldn't expect to see this thing. But then another source of perplexity surprise is that the text is written by a human person. So when we write as humans, we use, like, some words that wouldn't be the top one choice for GPT, for example. GPT will be more surprised to see a human text than a machine text. And what we try to do is that we try to isolate these two parts of surprise. So we try to remove the part of the perplexity that just comes from the fact that the text is unusual. We remove this part. And that's what I said before, right? Like, we normalize by the expected surprise of an LLM on that text. We remove the part of a text being surprising because it's unusual. And if we get to do that, what we're left with is that we have a measure of how likely is the text to be written by an AI. So that's what we do. That's a formula. We get a measure, and the measure that we get is a much better measure of how is the this, how much is this text likely to have been written by an LLM. There's, well, a ratio that they came up with experimentally. 0.85. If we get like a higher number, it's human. If we get a lower number, then it's AI. It's a great idea, really. It's a great idea. And this system can be tried online. I'm just not going to show it because it's basically the same thing as the other systems that we just saw. But it's a very smart idea and a very smart way of trying to get around the problem of the capybara problem. So that's binoculars. Really great system, came out very recently, so a really great system to try. And the fourth technique, and the last one I'm going to be talking about is watermarking. So what's watermarking? Watermarking is just like something that we embed into a text, right? Like we put a watermark on something and that thing sticks into the thing, but we can't actually see it. So when we do watermarking with the text, we do a thing that's called a system of two lists, a red list and a green list. And to imagine how it works, just imagine that I have a dictionary here and I start highlighting all of the words in the dictionary. So, like, the first word is red, second word is green. The third word is red. The fourth word is green. And after doing that with all of the words in the dictionary, I have notion of which word is red and which word is green. Right. I have two lists. One list is my red words and the other list is my green words. So what I do after that is that I just tell the system, you can never write any of the red words. If you write a red word, that's banned. So you can't choose those words in your distribution of probabilities that we saw before, you can choose those words. That works great until we get to words that go together. So, for example, Barack Obama, that thing has to go together, right? So what we do to solve this problem is that we just do a thing that's called weak water marking in contrast to strong water marking. And basically the idea here is that if you have two words that are, like, 99% likely to go together, then you don't apply this thing of the red and green list. If you have, like, a set of possibilities that look likely, then you do apply the watermark. That's a great technique, and it's especially great because it gets rid of the capybara problem. If we see a text and we see lots of red words, we know it's human. If it doesn't have any red words, it's very likely to have been written by an AI, by an AI that was using the watermark. So it's a great technique, really very interesting. And those are the four techniques that will allow us to understand how the detection of AI texture works. But. But I'd love to give you some little tips on how to attack the thing just using your common sense. I'd say one of the things that we should be aware of is that looking into the types of words doesn't really work well. Llms tend to write kind of similarly to humans, so that thing doesn't really work. But if you think about other things, like your writing style, if I'm, I don't know, let's say I'm 15, I'm supposed to write like a teenager, right? Like, I'm not supposed to write like a PhD. So your writing style can give some hints of that person using, like, a chatbot to write the text. Also dialect. If I'm american, I don't write as a british person, so I wouldn't use british words. Also, typos sometimes, like, can give you, like, a hint of, okay, that person like was writing text or of course, like they could just like do it on purpose. But like, these are just ideas, you know, hints, things that could help you to see, like, well, like to get ideas. Also hallucinations. If you see a hallucination, that thing is really the best hint that you can get. Like something that just doesn't look real, like something that you know is false. There's different types of them, but try to specially look out for things that just don't make sense at all or facts that you know are not true or bad math. So, like, when you're doing math and like, the result is just incorrect, those are very evident types of hallucinations. And it really can give you a great hint on this text has been generated. However, human annotators are horrible, are really bad at detecting generated text. So you should really keep that in the back of your head. We are very bad at detecting generated text. So we can try to do it. But yeah, will it work? Kind of complicated. This is the real, real important part, which is how to get around the systems. So what can we do to just evade the detectors of generated text? We've just seeing how they work. So now it will make sense that we just need to do a very specific thing to get around them. Right. And if I ask you what's the main idea that comes to your head when like, you try to get around them? Probably you'll tell me that what you have to do is paraphrasing or rewriting text or changing the words. So if you do this thing, if you just like take a text, change the words of the text, then you will arrive to a text. If you do it well, if you paraphrase well, you will arrive to a text where it can't be spotted anymore. This thing can be done manually. You can just rewrite the text yourself or you can use paraphrasers. We'll look into that in a second. But the idea here is that if you paraphrase, you can really evade the detectors. In fact, how much can you evade them? Well, here I have a table where we had a system that was detecting text with a 90% accuracy, which is a lot, right? It's really great. But my question is, how much do you think it drops when we paraphrase? So, like, yeah, like how much can we, like, reduce the accuracy of this detector that was working really well with like, well, a set of texts drops a lot. It drops so much that it basically becomes a coin. So a flip of a coin has a 50% chance of being correct. Right. If we like paraphrase the text, we make the detector behave as the flip of a coin. So it means that it's not effective at all. It means that paraphrasing or rewriting is really, really the best thing that we can do. And it's really effective as a technique. We need to do it very well to be able to actually evade the detectors because it's not just changing five words, it's really rewriting the phrases. But if we do it well, we can really evade those systems. And now I'm going to give you some tools that you could consider using. If you want to rewrite your texts, you can just search for them in Google. Really, there's new tools all the time. So you can just search paraphrase, text, AI, something like that on Google. And like you will probably like just find all of these tools that I'm just gonna give you. But if you want some like names of tools. Yeah, Grammarly is a good tool. Auto writer, go copy. There's really like lots of them and deeper. The one that we are seeing here, it's one that you have to use in hugging face. So the website, you just search for deeper paraphrase, hugging face, something like that. And you'll arrive to a hugging face where you can actually go and put your text, get the written thing. T five is another paraphraser that's hosted on huggingface. So you just go there, you write your text and you get the paraphrased version. And you can also do a smart thing, I'd say, which is you take a text, for example, like a text in English, you translate it to French, and then you take the french translation and you translate back to English. And you should get some differences in the way that it's been translated, especially if the translator is not perfect, then you should just like, yeah, have changes in phrases, etcetera. So that's a way to write. But if you're going to use DPel, there's a great tool and that's the one I'm going to show you right now, which is my favorite thing to write. Deeper. So you see like you have the translator or DeepL write, which is a new thing that came out pretty recently. And what's the great thing about this? The great thing is that it allows you to just select what you want us, your rewritten version interactively. So I can say, I don't like composition, I prefer resistance. Or instead of this, you should write this thing. You see, like you can interactively change the text. And that's the best way to really like do rewriting well. So you want to rewrite. That's my favorite tool. Really go for any of them. But this one really, I'd say it's top notch. So deeper write. Really great tool. Having said that, I'd also love to give you like some manual tricks that you can try to apply to. Just, I don't know, avoid being detected. And one of them is that you should give the system a lot of information about who you are. So you should tell the system. You should write as if you are a high school student. That is, I don't know, like writing about the french revolution and you're like a student from the US. Whatever, you know, like you give lots of information about like that person, you, and that way you will allow the system to write a bit more like you. Also, it's nice to use the active voice. You should try to ask for that because like a lot of the examples that are given to these systems to train the systems are in passive voice. So that means, for example, 500 patients were used for this study. That's passive voice. That, like in a scientific study, right? Like they use lots of passive voice. If you ask for active voice, not only does your text sound like, well, more personal, but also it changes the distribution of, like the tokens, like the probabilities of the tokens, it makes the copyware problem a little bit worse. So if you ask for the active voice, that's a good idea to ask for. It's not going to change a lot. But like, I would try to ask for this also using very specific data. So if we go back to the French Revolution example, just give like lots of data about like what you are asking about. It started this year, it lasted for this many months, and I don't know, like this many people died, etc. Etcetera. As much specific data as you come that will help the system to write a text. That is much more surprising. Also avoid quotes. I'm not saying to avoid quoting other people. What I'm trying to say is that you should try to avoid quotes like George Washington said, whatever those phrases that, like everyone says, you should try to avoid them, because when the system sees one of those phrases, it will see that all of the words are green because it's being seen a thousand times already. So it will think that it's been written by AI when actually it's not. It's just like a very common phrase, but it will think that it was written by AI because it's a thing that doesn't surprise the system at all. So if you want to not get in trouble, I would avoid using quotes like that. Quotes and also legal texts and texts that appear in like, lots of places. Just try to avoid them. Another great thing to do is to just outline the structure that you want. First, I want an introduction about the effects of the French Revolution. Then I want a description of what happened in the first three months. Whatever, you know, like, just tell the system what you really want. Don't let the system choose for you. If you tell the system everything that you want and are very specific, for example, by writing the beginning of the answer, you will get an answer that is much more surprising, hence much more difficult to detect. Also, if you write the beginning of the answer, you are allowing the system to see how you write. So, for example, I said, I use lots of likes when I speak, or I try to not do it when I write. But everyone has a certain style when they write. So if you allow the system to see how you write, it's more likely that it will pick up that way of writing and that it will just keep writing like that. So that's a great way to take advantage of those systems. Also, maybe consider using other llms. Like, not everything is just chat GPT. Like there's cloud, there's bar, there's lots of them. There's also like hugging face chat. You can just google for that. There's a lot of llms, so just try to experiment with other things. Also, if you don't use English, it's better because the detectors just, well, lose a lot of accuracy when you don't use English. You could also try changing some of the words for other things. So, for example, like, if you take a word and you switch it into an emoji, it could potentially change the distribution of the probabilities in the text and confuse the system. Doing these types of changes that you see in the table could potentially change the detection of the detected text. And if you want to rewrite your text, I'd say that you should probably focus on the start of the text, because it's like, it's just a simple reason, it's because it's computationally expensive to analyze the whole text. So if you have like 500 pages and you try to run that through, like, I don't know, detect GPT or some of those tools, it's very expensive to do. So. Usually the systems that detect generated text just scan the first two paragraphs or the first paragraph, and they assume that if the first paragraph looks generated, then the rest of the text has been generated. Otherwise it's human. So if you were right, I'd say that you should probably try to focus on the first part of the text. It's not necessarily the case all the time, but many systems take like this little shortcut. So I'd suggest that you focus on the start of the text. And the last tip is that just try to see if it looks generated or not. So like, you just go to a website like this, you check if it looks like AI, it looks like AI. Okay. I have to keep writing. And those are the main things that you could do to, well, just use AI without getting caught, you know, having said that, I'd love to have a final note on the fact that it's gonna get harder to attack these texts over time. And what I mean by that is that as these models improve over time, the text will look, well, more human and more human, like, all the time. So it's going to be much harder to attack. Bottom line, you might be able to attack your text, especially if you apply watermarks, et cetera, et cetera. But this thing of detecting a generated text is really hard and it's going to be a problem in the future. So that's where the talk ends, but with a lot of open questions regarding ethics, like what should we do about this, etcetera, etcetera. I don't have time to go into that. It's a really interesting topic, just too broad to cover. But I'd love to leave you with something which is a QR code to rate the session. So if you like the session, I really love it. If you could give me some positive feedback, I really appreciate it. And if you have things that you think that I could improve, that's also very helpful because I always look at all the comments here and I try to take them into account for improving other sessions. So I really be super thankful. If you could just take like 15 seconds. It's super, super short. Like really, it's two questions and just give me like, and well, what you thought about the session. Also, I should mention there's a surprise when you submit the form. So, like, when you click the submit button, there's, well, a little surprise there. So that's, well, just to encourage you to fill in the form. And having said that, I would love to give you some pointers for, well, some things you might want to read about if you're interested or just message me, I'm super happy to speak about these things. All the time. But having said that, I'm gonna say goodbye for now, and I hope to see you at another session. And I really, really wish you a super great conference full of fun stuff. So goodbye and see you soon.
...

Aldan Creo

Technology Research Specialist @ Accenture

Aldan Creo's LinkedIn account Aldan Creo's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways