I talk to ChatGPT about things

Video size:

Abstract

The word ChatGPT has captured the imagination and the internet, but does ChatGPT truly know everything, is it truly AGI?

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello, welcome to my talk. Today I'm going to talk about how I spoke to cha g pity about things. All of these experiments were conducted on Cha g Pity three. So this was like a really long time ago. But as we go through this talk, you'll actually realize why this was an important sort of exercise to be done, especially on the primitive not really primitive, but like primitive versions of pure elements or pure language models. Some of these exercises are like fun experiments that are conducted with Ja GT three. The first one is where we ask Cha gt. They were conducted on a personal free use account. So the call t will vary you depending upon whether you are using like Chad GP Plus or something else. So at some points I just basically decided to ask Chad Gpt a bunch of really weird questions that I thought it wouldn't be able to answer to check the viability of the A GI theory. So these are some of the examples. The first one is when is a door no longer a door? So the answer was, a door is no longer a door when it's no longer serves the purpose of providing an entrance or exit to a room or building or when it's unable to be open or closed. So what I decided was, let me give it like some more context because it's way possible that it has been trained on this data before. So I was like this is a riddle, so answer it. When is a door no longer a door? And that's when all the training basically kicked in. The productive modeling kicked in and it said a door is no longer a door when it's a jar, which is right. That's the right answer of the riddle. The second one was, what tastes better than it smells? And at this point, JG PT basically understood that we were in like the context of answering griddles or something. So it was like this is a riddle. It told me that this is a riddle. And the answer is UNK because it tastes food but does not have a smell. Which was the right answer. The next question was, what building has the most stories? And it basically recognized the fact that this was still a riddle. And the answer was library because it has many stories, what has bottom at the top. So again, Chad Jupiter was like, this is a riddle. The answer is a staircase because it has a bottom step at the top when you reach the top of the staircase, which was not the right answer. 'cause the right answer is actually legs. But it did recognize the fact that we were in riddle mode and it went into a story mode of thoughts. A little bit of imagination. I don't think it's like sourcing all of this data from its previous training, though. I do believe for the co last couple examples, we did source a lot of data from previous training. I. So Chad GP D had at some points in all of its tasks and all of its examples, seen those examples before in either like q and a task or something else. So some other examples were what has four wheels and twice. It's still in this mode where it's this is a riddle, and the answer is a garbage truck because it has four wheels and it moves around to collect garbage, which can be considered as flying through the streets. That's not the right answer actually, but like it does make up like a lot of imaginative shit when it encounters questions that are really, but it doesn't really have any good answers to those questions. What month of the year has 28 days in it? And it's this is a riddle. And it says that all the 12 months of the year have at least 28 days in it. Which is the right answer. A lot of humans get this wrong 'cause they're like it's just Feb. What can you put in a bucket to make it way less? This is a riddle. The answer is a hole because a hole in a bucket decreases the weight of the bucket. That's right. And it's not like it has a lot of intuitive thinking about, oh, this is what the physical world is like. If you drill a hole at the bottom of a container, it's going to make a lot of its contents. Spell out what starts with the t. Ends with a T and has T in it. This is a riddle. The answer is teapot because it starts with a T, ends with a T and has T in the middle. It got that but I think all of those were like extremely classic examples of riles. And based on all of the corpus that JG PT had been trained on before, it probably had gone through all of those examples before in its training. Where is the only place where today comes before yesterday? This is a riddle. The answer is in a dictionary because the words are listed in alphabetical order. What goes all around the world, but stays in a corner. This is a riddle. The answer is a stamp. It goes all around the world when it is used on letters and postcards, but it is physically located in a corner of the envelope I'm used to bat with, yet I never get a hit. I'm near a ball, yet it is never thrown. Why am I, this is a riddle, and the answer is a baseball bat. It's used to bat with, but it never gets a hit. It's near a ball, but the ball is never true. And at some point I'm like sometimes it gets a lot of these things wrong. And it starts making up a lot of like imaginative stories about what the real answer could be like. And it tries to like basically fitted all of those stories into the narrative so that it seems like the answer that it's giving me is the right answer. So I tried prompting it, saying, can there be another answer? And it was basically like when you're trying to hint to your friends when you're playing riddles with each other and you are like this is a hint. Why don't you take this as a hint? Or Why don't you think of any other answers? Do you think there could be any other answers? And it did give me back another imaginative answer, which was a baseball glove, which is also not the right answer. But you can see where this is basically thinking process of a language model. You're literally seeing it think, and this was before all of the RAG or any other sort of layers on top of Cha Gt. This is pure language models. So you can see it thinking when you say, can there be another answer? Because it has already made up a lot of imaginative storytelling to fit a certain answer into this riddle. It's thinking along those same sort of lines. 'cause the answer that it gets back to is the baseball globe as opposed to the baseball bat. Unfortunately that was not the right answer actually. But I just decided to continue and tried to change the genre basically, like the geography. So my question was, I heard Arabian Knights first. My queen was the writer of the great story. Guess my name, and it says this is a riddle. This is not a riddle actually, it's just a question that's framed in a specific manner in order to confuse the person you're seeking an answer from. So Chad Gri at this point was like it's like in riddle mode. So it keeps on assuming that the context is still riddle mode, and it says to me, this is a riddle, and the answer is shares adi, which is not the right answer. Because the question is, my queen was the writer of the Great story. Guess my name. It does get all the context. It's like she's a queen in the collection. It's a Middle Eastern folk tales known as 1001 Knights, also known as Arabian Knights. She's the storyteller of the Tales, which are also known as the Tales of 1,001 Arabian Nights. Okay. But it did make a way human mistake here by telling me that my name is Cher Zdi as opposed to the King's name. So that basically worked like in my head I was like I was able to track trick Chad Jupiter into being confused just using, a specific method in which you ask tricky questions, which also works with humans. So we just basically decided to continue and I moved on to another genre of sorts. Basically currently I was in fiction, so I'd been asking it questions about like physics or physical phenomenon or sports or something related to actually perceiving or observing. And now I basically moved on to some fictional results. So my question was, what do you call a rabbit that has fleas? And Chad g pity is this is a riddle. And the answer is bug bunny as its apply onwards. Bug bunny is a cartoon rabbit, and bugs is also another word for fleas, but it's right. This comes from like a certain source of fiction that it was trained on specifically. So it knows like the references, what you get if you pour boiling hot water down a rabbit hole, this is a riddle. The answer is hot cross bunnies as a supply onwards. Hot cross bunnies sounds like hot cross buns and the word bunny is another name for a rabbit. So I did to a certain extent understand punts. Whether it's like intuitively understanding punts, whether it's capable of making bunts by itself or whether it was drained on this specific bun, is something that at this point, because we are looking at it as if it was a black box, we don't know. So we would basically have to devise like a bunch of other tests to figure out if. It was actually intuitively understanding how to make a pun, or whether it was just trained on this data to mimic or to basically remember that it was trained on this data and remembered through its neural network that the answer is hot crust bunnies. So I just went on what has to be broken before you can use it, and it's like this is a riddle, and the answer is an egg. It has to be broken in order to use it, and cooking or eating. I'm tall when I'm young, and I'm short when I'm old. What am I? This is a riddle. The answer is a candle. A candle is tall when it's new and has not been burned, but it becomes short as it burns down and gets short with age. It's right. Classic riddle, what is always in front of you, but can't be seen. This is a riddle. The answer is the future as it's always in front of you, but cannot be seen because it hasn't happened yet. Technically it's right. And this is one of those philosophical genre based riddles of sorts. At this point, it used to like the observer of Chad GTI as a black box. It would seem that Chad GTI did in fact understand some human philosophy, which at that point, 'cause it was Ja GBT three at the point was like awe inspiring, right? So I just decided to go further into this rabbit hole of sorts. Because I was getting some very interesting signals. So I was like on a table, there are two apples on the plate. Three men are eating it. How should one not eat an apple? And cha, but he's this is a riddle. The answer is one of the men is a doctor because the doctor would not eat an apple as he would be the one giving advice on how to not eat the apple. And I think that answer was wrong, but like it, it's so imaginative. It just made up like a whole story about how if you are a doctor, you're not supposed to eat an apple, though the actual saying is, an apple a day gives the doctor away. So I went on and I was like, if five rabbits eat five apples in five minutes, how many apples will 10 rabbits eat in 10 minutes? This is a classic work problem that a lot of people actually have to solve when they're doing any sort of math problems. And it's still in riddle mode, so it's this is the riddle. The answer is five apples. This is because the problem doesn't take into account the fact, but eating the apples would take a different amount of time depending upon the number of rabbits. Instead, the problem only relates to the number of apples and the time given incredibly weird math, but you have to understand that. GP three was never linked to any sort of math engine to do any sort of math and would get like the most basic math questions wrong. And this is an example of such a problem. So when it's purely language based language model, it does seem to get a lot of mathematical problems wrong, though the way that humans learn language or humans learn mathematics is primarily through language they do because in most school curriculums, all of the problems that people solve are closely related to language. People specifically have a lot of problems with respect to a man buying like 10 watermelons and then spending like x amount of money and such. So technically you would think that a large language model would be okay, understanding math, not through a math engine, but instead through language like humans do. So I just went on suppose you're touring on a bus with 10 passengers, two descendant, and four climbed on the first stand. Five descended, and two climbed on the second stand, two descended and three climbed on the next stand. Now tell me how many passengers are traveling on the bus. So it basically gave me like the wrong answer and then it basically got confused because it thought that it had to explain why the answer is what it thinks the answer is. And went into like deep math with, like each statement corresponding to like mathematical operation and still got it wrong. A prime example of behavior that's actually seen in human kids, like human children when they're like, I want to say five or six years old, where we lose track of the mathematical operations that they are conducting. Especially if they have like multiple tasks and multiple steps and they basically lose track of history. So while it was telling me things, 'cause this is like a predictive language model, while it was telling me things, it was losing track of what it was telling me. There was no current memory. To charge three. I don't know if the current models have any current memory, like when they're telling you things, whether they have good memory or track of what they're telling you in the moment that they're telling you those things. Apart from all of the proactive modeling and such. Because proactive modeling is oh, what word would come next as opposed to, or this is the calculation that is supposed to match up to the practical modeling that I've done. And at that point I basically was like I have no way of knowing if JT three was actually trained on this data in the sense that. Whether all of this data or all of these riddles were part of its carus ever 'cause the carus was huge or whether it was actually being extremely intuitive. 'cause when you're playing riddles with your friend, usually you are not allowed to look up any sort of information. You're not supposed to cheat, you're not supposed to ask anyone else. You are just supposed to rack your brains. Sometimes you would've played the game before and would remember it from previous instances, but I think in most cases you are supposed to be asking each other questions that you've never heard before and making connections that you've never made before and basically thinking outside the box. So that was this, the test that I was like performing on g PT three. So at the point I was like it could be a part of the corpus that it was trained on. So I decided that I was going to look up riddles in another language, and in this case it was Hindi. I'm just going to translate them. And I'm going to ask Chad, GPD, those riddles because I was assuming that it would not have been trained on data that was available on the internet in a language other than English and did not have any sort of corresponding English translation. So some of these examples are from that data. For instance, Ajay's parents have three sons, which I wish, and the third son is. It does get it right. It's like Ajay is the third son. This is something that actually stumps a lot of humans who has many words but does not speak. This is a real, the answer is a book because a book contains many words but does not speak a car. Gives milk a hand gives an egg, which gives both milk and egg. D so as you can see, this is like an extremely crudely translated version of a rootle that I think was either in Hindi or some regional language. And it comes up with the weirdest story you've ever heard. It's what is riddle? And the answer is a duck. Ducks are domesticated bulls that are kept for their eggs, meat, and feathers. They're also known to give milk, which can be used to make dairy products like cheese and butter. And at the point I basically wondered if there was like a brand called Duck Dairy Farm or something, which would basically trick a language model into believing that there is such a thing as a duck giving milk and eggs or something. Another two ants, head of ants, two ants behind ants say, how many ants this riddle? The answer is two ants. And the actual answer is three ants. And like it actually makes no sense that it got this wrong. But I'm pretty sure if I had put in something like two ducks ahead of ducks, two ducks behind ducks, how many ducks? It would say three ducks. Because the classic griddle is with ducks and not ants. So if you're using a productive model with respect to language that does not actually intuitively understand what you're saying in the sense that it's not able to imagine, it's not able to imagine in its mind that there's two ants ahead of ants and two ants behind ants. So how many ants? It's an imaginative question. You have to be able to visualize. The end. It would rely on previous training and it would rely on everything that people would have ever said to it, like history. It would be remembering things or it would be if you tell it like the task is to identify upon or to make upon. It would be making or like mimicking a pun, which means it would not necessarily get the right answer that you want it to get, which I do believe most humans are capable. Capable, they're able to get to the right answer when they're playing riddles with each other. What is the thing that can be soaked in the rain? It can never get wet. This is a riddle. The answer is a sponge because it absorbs water or stokes it up, but it never gets wet. That's not the right answer actually. It's something like watch or never gets wet or something. It's a classic griddle, but I don't think it's like that popular in English as opposed to the regional language that it came from is, so tell me what's in your pockets? Zhan replied, there is something in my pocket and nothing in my pocket. Tell me how can this happen? And it just basically went on like this philosophical rant that I was not expecting. And it's very interesting to me, to be honest. 'cause it's, it reads almost like philosophical essay of salts. It's this is a riddle. Sohan replied that there is something in his pocket and nothing in his pocket, like scro of salts. Which means that he was carrying an empty pocket. An empty pocket is something because it is physically present, but it contains nothing. The answer was that he had a hole in his pocket, actually, so nothing but there was a hole in his pocket, so there was actually something in his pocket. What is the thing that always increases but never decreases? This? The answer is age or time, because it always goes forward and never goes backward. There are 10 birds sitting on a wall. A person shoots one of them with an arrow. How many are left? There would be nine birds left on the wall. If one bird is shot with a arrow, there will be one fewer birds on the wall. Practically impossible. Most of the birds would have, they would just basically fly away, right? They would fly away. In reality, in physical reality, if there were 10 boats sitting on a wall and you shot one of them with an arrow, all of the other boats would fly away. And that's the answer that you're looking for. Either all of them fly away or some of them fly away, but the answer is ambiguous. It's not as simple as there would be nine boats left on the wall. At this point, Chad, GPT basically didn't know the behavior of boats in the sense of oh, if one bird is shot, when there there was like 10 boats sitting on a wall. It didn't know that birds fly away. It was basically treating birds as if they were like inanimate objects like bolts and not like real living beings. People making their own decisions and such and reacting to stimuli. As you can see, I was getting like extremely interesting signals from this whole exercise. And I just decided to go forward with everything else, which is a room that has neither or window nor a door. This or riddle. The answer is a mushroom. A mushroom is the type of fungus that grows in the ground. It does not have windows or doors. Now, this is a pun because it's a mushroom. What is the thing that is broken and there is no sound even after breaking it. And the answer is an egg. Because when you break an egg, it makes no sound even though it's broken. I've never met an egg that didn't make a sound when it was broken. And the real answer was silence. But it was like philosophical of thoughts. So if you would have ever read anything philosophical we broke silence. So now it's broken. And 'cause it was an abstract concept, like the silence was an abstract concept. So silence was broken basic English usage. At the point it didn't, I think it does better now, but one can never be sure if it's like because of the corpus or if it's just been trained on all of these examples or whether it's cheating in the sense of it's using RG. Which is where it's like straight up looking up answers for all of these questions on the web, which is not what you want your friend to do if you're playing riddles with them, or if you are testing like a bot or a large language model for prices of aj, you basically don't want them to be cheating. That's useful for all other sort of sorts of tasks and we'll get back to why it could actually be fatal like the misuse of our age and all those sort of things to accomplish even practical tasks. But at this point I was getting like some extremely interesting singings and I decided to continue. So I was like let me try instead of gently prompting it, I just want to negate what it said to me. Because that is what you would do to a friend, right? If they gave you like the wrong answer, you'd be like eggs do make a sound when broke. And then it was like it just basically justified there are many riddles and they can all have multiple answers. And I was like, depending on the interpretation of the words, et cetera, et cetera, it never actually gave me the right answer. So I basically decided to continue because I had like bunch of signals, but it's always nice to get even more signals. And I was like, you are in a running race. You crossed the person who is in the second place. In which place are you now? And it's if you cross the person who is in the second place during a race, you are now in the second place which is right. If you use me, you can walk through walls. What am I, this is a window. The answer is a key because a key can open a door allowing you to walk through walls. The answer was actually a door, but I guess it was like this is real mode. The answer can't be like that simple. And it's let me make it even more deeper. Or you need like a key to open a door. And that's the only way you can walk through walls though you can see exactly how the thinking is working as opposed to. When you are like prompt training it and telling it, you have to tell me what you're thinking or you have to like, think in layers. You have to first think about step A step one and then step two, and then you have to think about why you went to step three. And you have to think about why you have to do tasks A, B, C, D before you can come to step eight or something. And it gives you like this longish answer, but there's like other ways to see what an LLM is thinking. And this is like one of those cases, right? Where you're just treating it like a black box and you're bombarding it with signals and you get back some signals and you can derive some inferences from those signals as long as the signals that you're getting back are not Abacus which is why you basically have to probe it. For one particular area in like multiple different ways so you can get some solid signals. So my next question was, because this is not riddle at this point, I'm like let me just ask it math. I'm like, how many times can you subtract 10 from 50? And I think it was like we are still in riddle mode, so it's oh, this is the state and you can only do it once because the state was 50 and then it goes to 40, so you can't do it again. I would argue like both would be wild answers in the sense that if someone said five times. Or if someone said that the state has changed, so it's now 40, so you can't do it again. And then I was like I have many keys but no locks. I have space but no room. You can enter but cannot go out. Why am I? And it's the riddle, the answer is keyboard. 'cause keyboard has many keys which are used. And that is actually the right answer, but I'm not, at this point, we are not too sure if it's because it's seen the riddle before or if it's actually thinking it through and getting to the right answer. So we shifted back to like ambiguous question territory as opposed to straight up physical riddles. And there is a slight shift from one to the other, from one to the other because I'm like experimenting with different genres and also experimenting with a lot of like regional language and the sort of language they use to ask questions because I was hoping that some of it would have never been translated and would have never been a part of the corpus of the training of this particular language model. 'cause it would give me some better signals. So I was like, there are two mothers and two daughters in a car, but there are only three people. Is that possible? And it was like. The three people are a grandmother, her daughter and her granddaughter. The grandmother is the mother of the daughter, and the daughter is the mother of the granddaughter. So there are two mothers and two daughters. We're only three people. That's right. Which is like the z real question or the S question. But at that point it didn't matter, like whether you're using apples or oranges to ask cha gpi a question. ' cause it was relying specifically on not intuitive understanding of the language itself, but rather straight up predictive modeling of the next token in the English language. Two of them are identical in looking and they say they are not twins. Is that possible? And, it just went on like this philosophical sort of answer to it that it's possible for two women to look identical and not be twins. And it could be due to genetic similarities or cosmetic procedures or they could be sisters or cousins or unrelated, but just look alike. I think this is a classic ritual and it ties a more poetic answer of sorts, which I don't remember off the top of my head, but it does have an interesting answer and that has nothing to do with Gentech. But you can see as I ask it questions, it's sometimes it's I'm in riddle context. And sometimes it's I'm not in riddle context. And we, as the answer of the questions, I have no choice. At least in this case, I had no choice. It would flip through the riddle context. Out of the riddle context into the non riddle world all by itself. So I was like can there not be triplets? And then was like, yeah they could be triplets. So back it was like a lady was buying fridges for 10,000 rupees and selling there for 5,000 rupe after a few years she became a millionaire. How? And then it goes back into this whole sort of like storytelling adventure of sorts. And it's she was not actually buying and selling fridges she was selling the shares of the company. That actually makes no sense because the company is operating on lots or like she was operating a lot apparent. The answer is like as funny as well. She was a billionaire and she was operating on lots. So now she's a millionaire, which is another classic griddle. But the way you phrase it, depending upon how it has been trained on what sort of tokens and what sort of direction of tokens, it would either make up a story or it would give you the right answer. Like the same rhythm. If you phrase it in a different narrative of thoughts, it would give you the wrong answers. If it were something like, there was a man who married a woman. 'cause that's how the class girl goes. And he used to be a billionaire and she basically made him a millionaire. It's like classic white choke of sorts. If you had feted that it would give you the right answer actually. So back to riddle. What can jump higher than a building? The riddle, the answer is everything. Because a building gun jump, which is heavier, one kilogram of iron or one kilogram of cotton, a kilogram of iron and a kilogram of cotton would weigh the same one kilogram. The weight of an object is deta mean by its mass, which is measured in kilograms. One kilogram is the base unit of mass. In the international system of units, they're both iron and cotton would weigh one kilogram. Actually, I've gotten this wrong. Me as a human being, I've gotten this riddle wrong because they used to be like this famous riddle story of thoughts. Where in order to make the one kilogram of cotton heavier people would soak it up in water. Bacteria, Dolly's mother had five children. She named four of them. January, February, March, and April. What is the name of the fifth child? And Ji please like Dolly. That's right. A man and his son met with an accident. The man died and his son was taken to the hospital and the doctor said, I cannot operate as he's my son. How's this possible? And Chachi pretty basically was like that's straight if not possible. And it goes against the law of GenX. Actually, the answer is the doctor is a mother, so the doctor is famous. Bact. There's a bowl of six mangoes. You took four mangoes. How many are you left with? And it, there's like mass confusion where it's like, whether you're left with two mangoes, no, the bowl is left with two mangoes. You have four mangoes 'cause you took four mangoes. So inversion of perspective. So we'll move on to like theater levels. There are five students in a room. RA is watching tv. Amar is playing chess. Anamika is reading, shoot is eating. Can you tell me what the fifth student is doing? And then it's it's impossible to deduce that except the answer is she's playing chess with thumb green because the question is very physical. Like they're not playing chess with a computer. Your parents have six sons and each son has a sister. How many kids do they have? And it does this like extremely classic mathematical thing of adding six to six and saying 12. The answer is supposed to be seven. How many apples can you eat with an empty stomach? The number of apples one can eat with an empty stomach would vary from person to person and depend upon the appetite. There's a classic trick of a state of thoughts. So once you've eaten one apple, you don't have an empty stomach or like even a bite of an apple, you don't have name please, stomach. How can you share 10 mangoes equally between nine people? It's not possible. Basically cannot comprehend cutting at all. What loses a head in the morning and gains one at night? This is a riddle. The answer is pillow because a pillow loses its head in the morning, and when you wake it up and fluff it up, it gains one at night when you put it back under your head to sleep. Actually, very clever. What is it that has many keys but cannot even open one door? It's a little, the answer is a keyboard. That's true. Ramesh is the son of Raul. Raul is the watch of Ramesh's father. Ramesh's father is the father of Raul. That's wrong. Ramesh is the son of Rahul, and Rahul is Ramesh's father, basically. It just basically lost track of who's related to who in what, a way in the word problem, what can be as tall as you but doesn't weigh anything. And the answer is your shadow. That's right. A girl came into town on one day and after two days she left on Monday. Is this possible? So it goes into this whole philosophy of like why it could be Monday on both the day she came and she left, except in this case it's like thinking outside the box where Monday is either like the name of the boat that she came on, or like the name of the horse that she came on. And if you've studied enough riddles or have been through like enough literature, which I'm pretty sure GPD three at this point was. You would have recognized this as that sort of a riddle. So it's actually, it was very surprising to me to get this sort of a signal from it. A man was walking in the rain without any covering on his head, and yet even one hair strand didn't get wet. Is this possible? Yes. 'cause he was both. If your uncle's sister is not your aunt, what relation is she to you? She's your mother. That's right. A girl was 13 on her last birthday and will be 15 on her next birthday. Is that possible? Yes. Because it's her birthday today. That's what it means. But that's only possible on her birthday, like on her 14th birthday. It's not possible if it's one day later. One day before, but the answer doesn't specify that it has to be her birthday. On that specific deck, like her 14th birthday, you are in a room that has three switches on closed door. The switches controlled three light bulbs on the other side of the road door. Once you open the door, you may never touch the switches again. How can you definitively tell with switches connected to each of the light bulbs? And it goes into like this weird step by step of turn on switch one and leave it on for a while and then turn it off and open it and note which is warm to touch, et cetera, et cetera. Which. Doesn't make a lot of sense 'cause you've told it that once you open the door you may never touch the switches again. So the answer that it gives is very close to the actual answer. 'cause it's, I don't think it's thinking through it, but it's definitely like predicting something by doing some sort of training corpus, it seems to be close to the actual answer. 'cause it's probably been a question that was similar to this one, but not quite this one. So it can't actually strategize even if it's mimicking the strategy of a completely different problem. ' cause at this point it's just predicting and the prediction has almost nothing to do with what the actual question is versus what thinks the actual question is. In the context of a bag of words sort of behavior, which is really weird. A man is looking at a photograph of someone. His friend asks who it is, the man replies, brothers and sisters. I have none, but that man's father is my father's son who was in the photograph. The sister riddle. The man in the photograph is the man's own son. Actually, that's not true. Man's father is my father's son. No, it is actually true. So it did get right. I got it wrong this time. But it does go on this weird tangent of sorts where it's but the man of the photograph is his father's son, meaning he is his own son, which is wrong because when you're thinking of all of these relational problems, if you don't have space to visualize exactly what you're thinking, and because this is a language model, it actually doesn't have any space to visualize anything, any relations. And even if it doesn't have a lot of history to keep it in its memory space. So it would get, while it did get it right, it eventually got it wrong because it was like am I thinking in the right direction? In 1990, a person is 15 years old. In 1995, that same person is 10 years old. How can that be? There's a riddle. The answer is that the person in question is a set of twins, and one of them was born in 1990 and the other one was born in 1995. So in 1991, twin is 15 years old, and in 1995, the second twin is 10 years old. So clearly that's not the right answer. It's like it's BC as opposed to ad, which is why in 1990 BC the person is 15 years old and in 1995 BC the person is 10 years old. And it gives you like this extremely science fictiony answer to it. That doesn't make a lot of sense. Next one. Your parents have six sons, including you, and each son has one sister. How many people are in the family? So it's a little that I had asked it before and it's still got wrong. Like it, the six plus six equals 12, which is wrong. The right answer is supposed to be seven. Four of the kids in this class are girls. Eight of the kids wear blue shirts. Two of the kids are neither girls or wear blue shirts. If five of the kids are girls who wear blue shirts, how many kids are in the class? This is a classic Venn diagram problem, but you don't actually need a Venn diagram to know or keep track of all of those things. 'cause every Venn diagram is like also a union and intersection set of sorts, which can be represented. In algebra, which is in language. So you don't actually need to visualize this problem. It's a matter of addition and subtraction, which it's not able to do. The answer is actually wrong. And when I tell it the right answer, it basically goes ahead and tries to fit its thinking with the right answer, which may or may not be right in every case. You're escaping a maze and there are three doors in front of you. The door on the left leads to a pit of lava. The door in the center leads to a room filled with a deadly gas. The door on the right leads to a line that hasn't eaten in three months. Which door do you choose and. It just tries to give you like, again, like a way science fiction or like useless answer to this question though. The answer is supposed to be if the line hasn't eaten in three months, the line is dead. A man describes his daughter as saying they're all blondes, but two all brilliant, but two and all redheaded, but two, how many daughters does he have? Surprisingly, it gets it right, which is really weird 'cause it should have gotten like the ant questioned, right? It's like a version of this problem, right? If you have a seven minute hour class, an 11 minute art glass, how can you boil an egg in exactly 15 minutes? And it says you can't. No, that's not true because this is something that every mathematician has seen. At some points in their lives where you basically start one hour class at some point and the other one after seven minutes there is something, and then set the next one. And based on all of those subtractions, you make it up to 15 minutes. And then I tried asking at a mathematical riddle where you're like just supposed to assign a bunch of operations to a bunch of things and it just refused to be like I just want to look at these as numbers and not as c ciphers or any other sort of operational representations or something. And it refused to do the work for me. I guess it could have been avoided if I told it, or few shot prompted it, telling it that, oh, this is like an example of what happens. But then it's you're technically training it to respond to a certain thing, which a lot of times you actually don't have to train a lot of humans to do those things. So the point of this exercise was to get a better, accurate definition of agi. I, 'cause what cannot be defined, cannot be achieved. Historically different people have had different interpretations of a GI. A few of these definitions can be considered achieved by gt. Yes, that's true. Our diversity of perspectives is required to define it completely, and that may also mean that we will never reach 100% Asia for both development constraints reason, as well as the safety concerns surrounding a X. So these are some of the historical definitions. The during test, a machine and a human both converse unseen with a second human who must evaluate which of the two is the machine which passes the test. If it can fool the evaluator, a significant fraction of the time note during does not prescribe what should qualify as intelligence only that knowing that it is a machine should disqualify it. I would say that the non PR version of GPT, which was extremely controversial, would've actually passed this test. A bunch of people would've considered it machine. So you would think that it would have surpassed the during test, except it's mimicking a bunch of things. And it's also remembering a bunch of things that it's been trained on. And unless you have an extremely smart human being on the other ends, who's asking it? 'cause that's a test. You have to find out what the machine is. So you cannot go in and ask questions that you would ask a human being. You would have to probe it like you would probe to figure out if the person that you're talking to is a human being or a machine. If that were not the case, I think a lot of LMS would actually pass the urine test. But that's not significant at all. It doesn't mean anything. It just means that our definition of the Turing test is inadequate for a GI at this particular point. The coffee test, a machine is required to enter an average American home and figure out how to make coffee, find the coffee machine, find the coffee, add water, find a mug, and brew the coffee by pushing the proper buttons. In certain cases, G PT would have achieved this as well as a matter of fact, including techniques like RAG and prom training. One would think that in a bunch of cases it would have actually it will be marked a GI With respect to the coffee test the robot college student test, a machine enrolls in a university taking and passing the same classes that humans would and obtaining a degree. I think this is probably the easiest one and it's surpassed this one. The employment test. A machine performs an economically important job, at least as well as humans in the same job. This means nothing to me, to be honest. 'cause there was like a bunch of things that determine istic machines already do. And we don't actually need probabilistic models or LLMs to do those things. That's they sometimes do better than humans in the same job. Most of like our industries are actually that mechanized. They're full of machines as opposed to humans. Often faster actually. So what is GPT? GPT is a language model. It's a large language model. Each of us has a language model inside our heads and it changes constantly. A language model embedding is the multidimensional representation of a given sentence based on the context or environment in which our language model was trained similar to humans. The existence of a language model leads to NLU and NLG, which are all sub-problems for all the other NLP problems. NLU is natural language understanding and NLG is natural language generation. So this was GPD three and what I basically did was I went ahead and as the current version of GPD, what it thought of our prompting history. 'cause I do heavily use Chad GPDA lot. And it told me that he was very good at creative writing for me, like context linked creative writing. It was very good at essay or project expansion. It was really good at travel and museum planning. It was not good when I didn't provide it, like good context. It was minimal or one line prompts or when writing emotionally intense fiction or any sort of financial or tax strategy. The issues with its own evaluation is I have actually not been able to use charge GBD as much as I would think I would be able to for any of the things that it, things, it works very well for me. Whenever there's been a case of context linked creative writing, it's been extremely difficult to ve it into a direction where I wanted to go. In the prompt space that I basically have. 'cause if I were to give it more details, might as well write the plot myself. Same case with any sort of essay or project expansion. 'Cause it loses a lot of context. It loses a lot of history and it, you have to basically keep telling it no, we are like in this context of riddles or we are in this context of mathematical riddles as opposed to any fictional riddles. You have to keep reminding it. And the last one where it's travel museum planning, that's where it's not been great for me at all actually, because. Was a, there was a itinerary that it basically wrote for me for a specific trip where Chad GPD thought I could teleport from one place to the other. That would usually take 30 minutes in five minutes. So as you've seen, it actually has no understanding of basic physics or the realities of the world. There is no way for it to scope any sort of observational details of whatever prompt you are like asking it to perform. And it does really bad at it. So cheaply today doesn't have a sense of time or practicality because it actually did think that I could teleport when making travel plans. Those were not hallucinations. It did actually think that a cab right from point A to B would take 10 minutes. In reality, it would take around 30 minutes to do the that task. So using it to travel, considering how it like, depends upon a lot of RAG which is retrieved augmentation coupled with the generation. So it'll source its results based on either Google search or Bing search or any other sort of search engine. And it'll ground all of its information on the basis of that, except it's way easy to like for someone to start a blog and post all sorts of fake information on the blog, which will be cited as a source in all of these search results based on how they gather that information. And those things are like always evolving. Like all of these documents are always evolving. So there's like a bunch of blogs with all sorts of conflicting information. And it doesn't do really well with all of that. If it's bombarded with all sorts of confusing sort of information, it can give you all the wrong answers. It can make up places for you to travel to that don't really exist. And it can actually believe that you can teleport from place A to B in 10 minutes when the actual task will take you 30 minutes. It loses train of thought which I thought was actually very human. On the other hand, if a human were to do it to you multiple times, you are just basically stopping being friends with them or talking to them if they're like your therapist or something else, you're just basically stuck. Which has been my experience. So not quite a GI LMS gr and cha g Bt are very good at chasm style. Guessing games probably due to RG injection. I've had to give it to some more prompts, like some more hints. But usually they do get me like the right answer misheard lyrics or figuring out some sort of a photo, like what sort of artist that belongs to, et cetera. And I've actually searched for a number of wild things and it does really well with it, except I do know of a deterministic engine which used to perform the same sort of analysis of sorts. If you answer I want to say 10 outta 20 questions, and those questions used to be like is this character fictional or is this character female? Is this character in a fantasy movie? And would basically come upon the right answer. There were a lot of rumors that this machine is actually, there's like a bunch of humans behind this particular website, but that was not the case. So that's not a GI, it doesn't have any observational skills for obvious reasons. It's an LLM. It is not equipped to see or to perceive or to even imagine actually in any sort of space except a bunch of the things that don't require imagining. For instance, verbal algebra should be easy enough, but it's not. So it relies on available knowledge only straight up available knowledge, either available in its own training base, either as a mimic of its training base or any sort of information that it gets from REG sources, which are search engines. So it relies on available knowledge only, which is not how humans learn. A lot of humans will learn using observation. For instance, a lot of Greek mathematicians were able to calculate the radius of the earth using two poles in two different cities or something that's observational physics. And it did not rely on available knowledge in the sense of memory. It did rely on knowledge in the sense of observation that I can observe that the shadow of this particular pole exists in a certain direction while there is no shadow for this particular pole, but that is observational knowledge as opposed to available knowledge. So I just wanna conclude that GPT today is actually not a GI. And in fact, if you play with it using all of the riddles that I have showcase today, it'll give you all of the right answers except it's cheating. It'll literally go look up all the keywords or like all of the phases or like queries, like extremely smart queries and come up with the right answer from the internet, which is basically where I source all my rules from. But that's not a GI. That's all I had. Thank you so much. Okay.

See all 28 talks at this event!

Conf42 Prompt Engineering 2025 - Online

November 06 2025 - premiere 5PM GMT

I talk to ChatGPT about things

Video size:

Abstract

Summary

Transcript

Aroma Rodrigues

Software Engineer @ Microsoft

Join the community!

Featured event

2026

2025

Info

Conf42 Prompt Engineering 2025 - Online

November 06 2025 - premiere 5PM GMT

I talk to ChatGPT about things

Video size:

Abstract

Summary

Transcript

Aroma Rodrigues

Software Engineer @ Microsoft

Join the community!