The Limits of Imagination : An Open Source journey

Video size:

Abstract

A journey filled with bold ideas, epic failures, brilliance. Started in 2016, I built a shoe to help the visually impaired navigate the world. Armed with an Arduino, an Android phone, and an unreasonable amount of optimism, I hacked together an obstruction-sensing prototype that actually worked!

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi welcome to my presentation called the Limits of Imagination, an open source journey. Basically, this is about like my open source journey and projects in natural language processing over the years. Starting from 2018 till 2025. Basically like seven years of work. Of these projects are open source. My name is Aroma Rodricks. This is a little bit about me. This is my bio. I'm a Python enthusiast. I've been coding since 2014. It'll be a decade next year. I'm a multi speaker, so I've spoken at Picon, US Estonia, Lithuania, Hong Kong, Sweden, Europe, Python, bio Hire, Picon, South Africa, India, and also had a pro project being present at first Asia. I've been on the natural language processing train since 2018. I do fun experiments sometimes through data python and language models because I believe as do most multinational companies now that the human condition is encoded in language just as science and math, and it's inevitable that one day we will be able to use computers to help us linguistically as they do mathematically. I've been using this bio since 2019 and it has come true basically. So these are like some of the stories that I'm going to be sharing with you. I'm going to be sharing like 10 stories with you. The first one is from Picon n India, 2018. It's a tho conditions summarizer. The second one is Picon South Africa in 2019. It's an NLP fake News detector. The third one is Picon US 2024. It's called Only BMOs in the building. The first one is bike on NTR 2018. Terms and conditions, summarize it. So terms and conditions, everyone knows terms and conditions and it's all very clunky. Not great to go through, but you basically just have to do it. I personally have not gone through a lot of terms and conditions except like a couple of like rental agreements. And for this project specifically, I think I went through a terms and conditions document by PayPal. If I remember right. We are always drowning in unreadable legal language across emails, apps, and websites. What I was trying to do is I was trying to build an NLP pipeline to extract obligations, permissions, and risks from terms and conditions, documents. Basically what I wanted to do. Was when I was going through like a document that was full of terms and conditions, I wanted to see the risks associated in that terms and conditions document. The risk had to be highlighted in like red basically. It would have empowered users to skim legal agreements with context and not confusion. And if the line would be like a risk, it would like basically be highlighted. My problem with this particular project was that there was no existing data asset. And I had to hand annotate a bunch of sentences as obligation, permission, or risk. So I built a label data set from scratch using first principles of contact structure. The outcome was an NLP pipeline to extract obligations, risks, and user rights from legal documents because there was no usable dataset. I could only answer it, I want to say around two. Which was not a good amount to set of train a good classifier. But I created an SNLI style dataset from scratch using first principles. And what you could do today to improve accuracy is basically to use some of those sentences to one shot train, an LLM, which is a large language model, or you could use an LLM, which is a large language model to create like synthetic data of around 10,000 sentences to train like a more powerful classifier basically. This project basically came from my work at Fidelity Investments. We were working on a bunch of 4 0 1 accounts and what happened at that time was there's supposed to be a lot of rules that are supposed to be applied on of all of the accounts later. And what happens is you have legal documents. That are these rules and then you have aqua map in like a language that computers can understand, but there is no mapping between these two. So because I was working with natural language at that time, I was like, we could maybe create our own rule processor of thought. The second one is from back on US 2024. It's called Only Bads in the Building. I was spending a lot of time at home. During this week called fix hackler at work, where they just let you do whatever you want. And I had a couple of existential questions that I wanted answers to. For example, will forensic portrait artists keep their jobs? Can AI write my job block better than me? Can I build a clueless trial fashion match? How do I under clean, non duplicate synthetic data? So basically what happened was I was waiting at home for my H one B Visa and I was not in contact with my colleagues. They were like 12 hours away from me, so I could not collaborate with them for this week. I went through an existential crisis. I got into a little bit of stoicism, looked up a lot of philosophy, and that's why I had all of these philosophical questions basically. So who gets to keep their jobs in this post AI world? Will forensic portrait makers keep their jobs? Not so what happened was. What happened was we were going through a lot of large language models being released in the world and everyone had this exist existential crisis of whether they were going to keep their jobs or not. And because I'm also an artist on the side, do I don't draw a lot of portraits. I was like what's a good job to get into if they do take away my software engineering job? And, I was like what is AI not good at? What is AI not good at? And the answers were like, oh, maybe images. But they have to be like very specific images. And when do you need very specific images? So the answer that came to me was forensic portrait makers, basically. And what I did was I went onto Kaggle. I found this data set of character actors, what happens with character. They are like very they're very descriptive, right? You're always introduced to a character as, or he was like a particularly looking man. He had a certain kind of a nose very bushy eyebrows, maybe like a spot on the top of his lip, et cetera, et cetera. So the first one is a picture of the actor and all of the rest of those pictures are from they're created by ai. So it's a man with a curly hair, dark eyes, small nose and thin lips, wearing glasses. And there is a couple of, like other descriptors basically. And this is basically what AI created. I do think this is a particularly hard example simply because the first one if you see the structure of his like upper lip, it's like a certain structure that's like difficult to explain. Unless you like gesture, put like your fingers or something. So the AI didn't get it right till, I want to say the third picture. Here's a couple of like other examples. This wasn't a very good example either. It didn't get it right and it also put on a lot of like facial hair or something. This is another example with a blonde character with dark eyes and a side smile and fringe. His eyes are dark and small and his lips are thin, his upper teeth are visible. He is smiling and he seems excited. None of them accurate are the original actor. So basically my conclusion was that, oh, maybe forensic portrait painters will keep their job. And then what happened is I put in this example I really this example because this is the perfect example of what how to say, a really nicely descriptive character actor looks like. And what happens because of this specific sort of description is AI almost gets it right. You should see these examples, especially the first generated image. It's like very similar. It's very similar to the face of the actor itself. These are from, I want to say a bank creator. These are from Dreamworks. And both of them did really well. I want to say even the bottom left over here looks like the Charact actor. What did I do then? I thought about what would I do if they took away like my job? What would I do? So I have a couple of dream projects that I would spend a lot of time on, and one of my dream project was to write my own travel blog. But I had a lot of logistical issues. Like for instance I took 300 pictures of a day out in China. So what I wanted to do was I wanted to make AI do it basically. So I wrote so I wanted to write my own, and basically what I try to do is I just try to use a lot of AI models to write my trial log for me. I had a lot of photos that I plugged into from my drive. Used a caption and I would just say did pretty well actually some of the captions like what a child would say about an image. For instance, this one says My prompt was on her day out in China, a. And it's all very simplistic. It's oh, I saw through a window. It doesn't say a lot about the image or anything descriptive, but it was like, it was factual. It was a caption. The second image, it's on her day out in Chen Aroma saw I saw this beautiful floor on her. Day out in Chen Aroma saw she spotted a car driving down the street. Now the issue with that is that if you check out images on Instagram, you'll see an image posted. Like the one with the car and like all of those like colorful buildings of sorts. And it wouldn't look like, oh, she spotted a car driving down the street. It would be something like, oh, there were like a lot of colorful buildings lining the street that would, that is what like an artist would say, or someone who was using Instagram would say. But AI basically decided to concentrate on cost simply because of like how it had been tried. On aroma saw the plums of the frania trees were it basically gave me like the descriptors, but it didn't do a really good job with adding poetry until the captions basically. I was like there's some work to do, but it'll probably just write my child off for me. And then I just basically gave up and watched a movie, and the movie that I watched was Clueless. And there's this scene where it lets you like match dresses and lets the dress be fit onto your body so that you can see what you look like on that Kevin side, and I. That sounds absurd, but can AI do it because we can't do it anymore. So I went to Kaggle again, and I looked up like a data set that had a lot of images of tops, bottoms dresses, et cetera, et cetera, and I plugged it into. This collab notebook of thoughts. And what I tried to do is I tried to superimpose it on the photograph of a model wearing completely, a completely different dress of thoughts by finding out the boundaries of the dress that they were wearing, like the top or the bottom. And I tried to repair it using AI. And let's see what AI basically did. So this is what the original image sort looked like. And if you look at repair images. I would just say the third and the fourth one look slightly normal. Everything else looks off, but the third and the fourth ones look slightly normal. A long way to go, but we are basically on the right path. The third one is from South Africa in 2019, which was a natural language processing, fake news detector. So the goal was to trace who was being blamed. Misinformation, not just flag. What's fake. The dataset was real WhatsApp forwards, informal, noisy, multilingual. My method was combined SciPi and NLT care based event models with POS tagging and dependency parsing with a lot of like minimal training data required basically, because these were parsers, these were not trainers. These were not classifiers. The improvement is today. LMS will offer richer context, better understanding of blame dynamics and scalable solutions. So impact of fake news. Buzzfeed, they were like 22 million interactions on the top 50 fake stories. This is from 2018 Knight Foundation. 10 million plus tweets from 700 K plus accounts linked to fight conspiracy news routers, which is in India. 52% get news from WhatsApp. Rumor fuel violence had led to deaths. This was around the time when WhatsApp forwards would basically get you Lynch back home in India. So techniques for detection keyboard extraction and verification. So basically I used direct and LCK for extracting key phrases. I used a news api. Google had a news API back then to crosscheck real coverage of keyword based claims. I used reverse image search for detecting Photoshop images. There was a lot of content verification, so I compared articles from spoof sites versus mainstream media. I used a lot of fact checking platforms such as all news or hooks, layers. And for textual cues, we use grammar and spelling mistakes, overtly positive or negative sentiment, no sources or suspicious sources and repetition of certain words. Some other techniques that were used, and this is way before we had like really good classifiers on sentences before bot basically. So all of these things are normal parcels, and they use NLTK from Stanford. Syntactic patterns, POS tagging statements to detect blame assignments, price even casually active passive voice patterning using NLTK entity and emotion tracking tracking name politicians, detecting associated emotions that are fear, hatred, sympathy. My basic goal over here was to actually detect propaganda. The reason I wanted to detect propaganda was because sometimes the news stories are like, oh, X happened. And sometimes it's X happened because of Y. So they're basically trying to blame Y for whatever happened. Whatever happened. So there is, like in literature, there is a thing called depart model of blame, coded what and why. In literature, like in languages, they basically have a certain language model to identify if the structural features of that particular sentence would actually lead to someone to blame. So there were like positive words, basically ordered, cause claims. There was some sort of thresholding, like the percentage of sentences that showed propaganda structure in that text. So this is an example of how the parts are basically worked. This is a sentence and we are just putting in like a pattern. Now these patterns at that time, because this is a parcel, right? So you would have to write your own pattern based on your understanding of language and whatever is present in that text. And after parsing it, you would find out if it's followed the path model of blame model or not. So these are some of the other stories that I'm going to share with you. The one in Estonia in 2023 is called If Your Friends Are Bullshitting, using, the other ones are were presented at Europe Biden in 2022 by in Hong Kong in 2021 by on Sweden in 2021. Which was of how weird conditions to believe the news is polarized. The other one was presented at Bio Ohio in 2020. It was analyzing bias in children's educational materials. The one in Picon, Estonia in 2023 is called If Your Friends Are Bullshitting Using SNLI. SNLI is one of my favorite data sets specifically because of like iner contradiction and neutral sentiment analysis basically because I think it models a lot of things. That we need, we will need to use in the future, basically. For instance, even when we are giving like an instruction to an LLM, you need to make sure that it's, understands if it's a contradiction, if it's an entitlement or if it's like neutral in comparison to like the instruction it was given before. So the goal was to use natural language processing to spot contradictions in statements proving when your friends are being inconsistent. This project actually came from, it did actually come from a lot of bullshit that I heard from my friends, for instance they were like, oh, we are like 15 minutes away, and then they wouldn't show up for the next 30 minutes or something. That's where this project really came from. So it was a real life scenario. The models that I used at that time were to compare using both and GP two embeddings, GPD two embeddings for detecting contradiction. My data set was to leverage SNLI for building and training contradiction models. The impact was to improve L lamp coherency, tackle, hallucinations, and filter, fake or contradictory news. So this is like some of the slides that are presented in this presentation. It's like sex. Why is you hone your bullshit detector? Edward we're like surrounded by bullshit all the time. It would be really nice to give someone else the job to detect bullshit. It's like thinking through bullshit all the time. There's a couple of articles that go like how to direct bullshit even on sites like LinkedIn. So it is actually a real problem. It's not just a personal problem. It would also be I want to say a professional problem or how to measure someone's bullshit. And it seems to be for someone on the spectrum, for people on the spectrum. So I guess like people on the spectrum needs a bullshit detector more than someone else needs it. So there's three classifications for bullshiting using SNLI. The first one is ENT entailment. The second one is contradiction, and the third one is neutrality. Now these things mean exactly what they're saying. So entailment means that there is two sentences that basically say almost the same thing. They like support each other. Contradiction is like when they contradict each other and neutrality is like when they don't know anything or they don't have anything to do with each other. For instance, Jim rides a bike to school every morning, and the second one is Jim can ride a bike. I would say these are like lightly entailed because they don't contradict each other and they're not too neutral with respect to each other. They do actually have some source of information with respect to both. So this is an example from the paper that was published using SNLI. The premise is a dog jumping. For a Frisbee in the snow. The example one is an animal is outside in the cold weather, playing with a plastic toy, which is an entailment, an animal, a dog is outside in the cold weather, in the snow, playing with a plastic toy. A plastic toy is a Frisbee, so it's an entailment. The example two is a cat washed his face and whiskers with his front P. Now the animal is a dog, so clearly it's not a cat, and they're like washing their face. Whiskers with their front paw. So it's like contradicting what is being said in the first sentence, which says clearly that it's a dog jumping for a Frisbee in this no. The third example is a pet is enjoying a game of fetch with his owner. This is not neutral because it could I want to say in sale or it could be a contradiction or it could be like not related to the first sentence at all. I want to say precisely if you wanted to use by precise language, I would say the actual sentence that would be a direct contradiction of the premise is a dog is like a dog and you'd have to name the dog. A dog is eating a bone inside the house and you would have to basically make sure that it's the same dog that we are speaking about as in the first sentence. SNLI is a collection of 570 k, human written English sentence pairs, manually labeled for balanced classifications with labels, entailment, contradiction, and neutral. We aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation learning methods, as well as resource for developing NLP models of any kind. This these are like examples from a line. There's text judgments and hypothesis and everything in the judgment column. Everything in the judgment column is basically the label. So this was basically my pipeline. What on the screen is basically the pipeline that I use to find out if my friends were bullshit using Sana. That's a dataset. We use a tokenize, we convert like all of our sentences into tokens and we put them in into a model and we get like pre-trained everything sort of it. And then we put it into a new model for training. Basically what we are doing at like the second step. We are trying to use whatever information has already been learned by a certain model. For instance, in this case it's B or GT two, and we are extracting a form of that sentence that has all of that information from like the contact space of b or two, which is what? Which is why what we get out is a pre-trained ing. There's a sentence, and the end product is basically a pre-reading, which has the semantic or like the meaningful understanding of that particular sentence in the context of everything that has been learned by a model such as, but or GT two. So I enter this new model. We're basically putting in all of our old training as well, so it gets to use it's old training as well as gets to do some sort of like new magic on top of it. And the last one was the inference box basically. Some details about the birth model. The birth model was proposed in bird pre-training of deep bidirectional transformers for language understanding by Jacob Devlin Ming by Chime, Canton Lee and Christina Anova. It's a bidirectional transformer pre-trained using a combination of mass language modeling objective, and the next sentence production on a large corpus comprising the turn to book corpus and Wikipedia. The abstract from the paper is the following. We introduced a new language representation model called Bird, which stands for bi-directional encoder representations from transformers. Unlike recent language representation models, bird is designed to pre-train deep bi-directional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trade birth model can be fine tuned with just one additional output layer to create the state-of-the-art models for a wide range of tasks, such as question answering and language efforts without substantial tasks, specific architecture modifications, but is conceptually simple and am empirically powerful. It obtains new state of the art results on 11 natural language processing tasks, including pushing the glue score to 80.5%. This is all really old. Yeah, we don't really need bird today, but this is an old project. I say it's a very old project. It's just, I want to say two years old or three years old. But that's basically how massive the jump has been from Bird to what we do today using GPT. So this is like from the open source documentation associated with birth. The open a IG BT model was proposed in language models. Our unsupervised multitask learners by Radford, Jeffrey W River, Charles David Luan, Dar Moai, and Elia Soki from OpenAI. So this is a large transformer based language model with 1.5 billion parameters too, and it's trained on a data set of 8 million web pages. All right, so this is like a page from. It's open source sort of documentation. These are the two language models that I use basically. And this is a diagram for you to understand what happens when we get pre-trained evidence from a certain model. So this is the sematic space. This is like meaningful space. What it basically means is this certain word. A certain distance from the X axis, meaningful X axis. We don't know what the meaningful x axis means, but the model nodes and the meaningful y axis and the meaningful C axis, basically, and these are like all of the distances between those two words. And those are like the sematic distances between those two words that basically says the meaning of this particular word is different from this, the meaning of this particular word by so and so distance. These are basically what I did. I just trains both and G two embed things. This is another example of it's two headlines from two different editions of news. The first one says, Bernie Sanders scored victories for years via legislative siders. The second one says we are legislative siders. Bernie Sanders won modest victories, right? The first one makes it seem like the victories were really grand. And the second one makes it seem like the victories were very modest. So someone who's reading the first edition would think that it was like a great thing. It would be praise the second edition. It would make it seem like it was not praise, it was not such a great thing. So this would basically slightly contradict each other, right? So these are like some of the labels. I do not remember. I think label two is contradiction. And you can see like how. The scores are like higher for the first and the last sentence and like slightly lower for the second sentence. Maybe it just feels like modest victories and victories is not that that different. Maybe it's if we basically said that Bernie Sanders lost. It would say that it's contradictory on a large scale, but it does say that it is contradictory though lightly. So this is another project that I presented at Europe by, in 2022 by, in Hong Kong 2021 at by on Sweden in 2021. It was to analyze sentiments and news headlines to examine bias, comparing it to public perception from surveys. The initial approaches. Basically fail to capture the complexity of these sentiments and the bias in headlines, the revised approach. It leveraged advanced NLP techniques to metro line sentiment with real world reader biases. The outcome was the failure led to deeper understanding of contextual sentiments and analysis and bias measurement essential for interpreting media influence. So my next project was at Bio Ohio 2020. It was analyzing bias in children's educational materials. This was specifically gender biased such as portraying female characters predominantly as mothers and housewives while male characters was seen as breadwinners. I use NFP techniques. This was way before birth. Or maybe both had just come out. So I was using a lot of parser structures and rudimentary NLP techniques. Using mostly NLTK. We can analyze representation bias by tracking the frequency of gender terms. I. Identify stereotypes through objective associations and detect victim blaming language and text. For instance, in some countries, the proportion of female characters in mathematics textbooks was found to be as low as 30%. To combat these biases is crucial to revise curriculum, to promote gender equality, update textbooks to be more inclusive, and use NLP tools to continually monitor and correct those biases in educational content. So some of these techniques are really easy. It's just basically counting a lot of pronouns, making sure your entities are mapped really well to their genders and counting how many people of that particular gender are present. And because of like rudimentary techniques, you were able to like PUS tag a lot of objectives. So you could associate objectives with the entities in that particular sentence and be able to deduce like the sentiment of that objective. You would not need a lot of I want to say high tech NLP techniques. You would only need maybe a dictionary and a score of sorts. So my research focus was examining how gender bias in children's educational materials shapes, perceptions and roles. For example. Female characters predominantly portrayed as housewives, while male characters are shown as breadwinners. My key findings were there was a representation bias, so female characters were underrepresented and fields like mathematics. So a lot of the people doing the math used to be meant. So like it would be Tom bought 12 watermelons and never like daisy bots. 12 What moment? There was also steward of bias. For instance a doctor was usually male and a nurse was usually female. There was also sort of cultural blame. So a lot of like victim blaming languages maybe using passive language as opposed to active language. Like she was raped instead of someone raped her. So we did a lot of like analysis on techniques for bias detection. Representation analyst is using token frequency in text. Stereotype evaluation using objective and role association using spacey and LTK, blame language detection via path model of blame, identifying casual and labeling bias. The change that I came out from the study was to revise the curriculum to reflect gender equality and to remove these stereotypes. So some of the other stories that I'm going to share with you are one of it is from by Lithuania again in 2023. It's about chatting with Chad about everything and nothing at all. It was when Charge g BT first came out and most of my talk is basically based out of my free time explorations with charge GBT. It started out as like a fun experiment. I used to even post on social media about, oh, this is what I asked Chad, and this is what it said back to me. A lot of like jailbreaking and all of that. And I decided to make a whole study out of it. Basically, my next story is about AI and software development. So it's about experimenting with SVMs and why SVMs are really important. And my ninth story is about choose for the visually impaired. This was a project that was presented at First Asia in 2016. So Pi Lithuania 2023 chatting with Chad Gpt about everything and nothing at all. So it was like an overview of Chad gps rise and the question of whether it was through AGI I or not. There was an explanation of the training process for including data usage and model architecture. I subjected it to the WIT test because at that point Jiri was like trained on a lot of data that was present on Reddit. So it was not very good with wit. So I decided to test like what it feels or like what it understands through like various riddles. And I also use like riddles from like specific languages like Hindi or like some other sort of like cultural nuanced riddles to see how it would perform. And like complex context and nuance language basically. It had at that point, it had issues with verbal math problems and concept concepts of understanding Venn diagrams. So if you would give a kid a problem statement to solve using a Venn diagram, they would be able to solve it easily if they were, I want to say in grade eight, to charge GPD at that time. Also sensory experience gaps. So highlighting the model's, inability to process real world sensory experiences such as hearing an egg crack. 'cause it, it didn't know what happened after you heard an egg crack 'cause it didn't have the sensors to sense any of those things. Also metaphorical language on relationships. Oh, at that time in 2023, child jupi was not really good at metaphorical language. And comparing it to human performance. Basically to answer the question of whether charge GPD is a GI. My answer to that today actually is that in certain ways charge g PT is better than humans. So it is I want to say a GI plus, and at some things it's not, so it's. In those areas, it's not a GI. So some of these questions that I basically asked Chad GTI was, when is a door no longer a door? So the answer you gave me was very logical, but because this is a riddle, it's a riddle, right? So it's like a door is no longer a door when it no longer serves the purpose of providing an entrance or exit to a room or building, or when it is unable to be open or closed. Logical answer. Logical answer. But then I write it that we were solving griddles, so it had to be like in a fun mood of thoughts, basically. So I was like this is a riddle. Answer it. When is a door no longer a door? And then it's grab the answer from somewhere and it's like training data or something. It was like a door is no longer a door when it's a jar, which is basically what the answer was. So it was like way witty. The second question was, what tastes better than it smells? And BOUs, it had like previous context of oh, we were in like riddle mode. It knew that something that sounded absurd, like that was a riddle. And it told me back, this is a riddle. The answer is tongue because it ties food but doesn't have a smell. What building has the most stories? It detected that this was a riddle. It said the answer is library because it has very stories. It was right. It got a bunch of them rights after I told it that we were like in riddle mode. So maybe it had to access like some sort of like training memory that was associated with riddle and like witty I want to say language and I guess it got like some sort of an answer, which is probably I dunno, it could be right. Does the stackers have a bottom at the top? I don't know. But the real answer was supposed to be legs. And it did give an explanation that could in some contexts be the right explanation. So I did really well on creative tasks. I asked it like a bunch of questions again around sensory experiences as well. It didn't have really good answers for it at that time. So my next presentation is about experimenting with SVMs. This was presented at AI and Software Development Summit in 2024. The reasons we like SVMs, I like SVMs or like data scientists like SVMs. It's called Support Vector Clustering. It's a clustering technique using SPM principles for unsupervised learning, it maps data to high dimensional hill butt space versus a kernel trick. It finds minimum enclosing sphere and then maps boundaries back to the original sphere. It uses gian kernel, which is a non-linear transformation. It has two key hyper parameters. Q stands for kernel width controls cluster granularity. P, which is a soft machine controls stall, and star outliers. And it does not assume shape or the number of clusters, it basically adapts naturally. So going over it is basically the size of the circle inside, which you would want your points to be placed to say that, oh, this is the space that is similar. Then all of the others. And p is it's like this soft margin between two peaks. If there are two peaks, there could be points that lie in both of, like those circles of sorts. So you want to make sure that those between points lie in like circle one or circle two, either of those circles, right? Which is a factor that we call p And these are like the hyper parameters that are. To get that data. So SBMs basically handle non convex, overlapping noisy data. They have better accuracy than traditional clustering on complex data sets. For example Iris is like a very, I would just say f dataset that data set like data scientists like to talk about. It's an elegant blend of theory. And practical clustering part. The reason I really like spms, the reason I really like spms, and this is like an example that I give everyone, is that say you have a line which is only an X axis, right? And it has, I want to say on the screen 14, those have to be 14 dots. So you can see that clearly, there's seven green bolts and then there's seven red bolts and you're able to draw like a line in between. And you're able to say that these are like two different clusters of data basically. But what happens if the data is mixed? Like on the line on the third line? Basically it's like mix of like red and green balls, so you can't really go in and draw a line. The X to say that. This is the separation between these two clusters. Now what happens is, and this is basically the principle of svs, we introduce a different dimension. We introduce a different dimension, but take all of our data into this different dimension, basically a y axis, and we separate it out. For instance, if all of the red colored balls go to the positive y-axis and all of the green colored balls come to the negative y axis, we can just separate out the data by saying that the separator is the X axis. So this is basically what s VNS do. And this was my presentation at AI and Software Development Summits in 2024. The next project I want to talk about is the one that I presented at First Asia in 2016, which was choose for the visually impaired. My purpose was to improve daily mobility and confidence for the visual impact. They were just shoes with some basic ultrasound sensors. And what they wanted to do was they wanted to sense an obstruction either in front of them, behind them, or at the sites. It was supposed to improve mobility for visually impaired. So instead of using like an ultrasound stick they would be able to get feedback using those ultrasound centers. The control unit was a. And it had a Bluetooth module, which transmitted sensor data into an Android device. The Android app at that time was built using MIT app, inventor and phone Gap. So our vision was to like maybe move to a MATLAB based prototype at that time, because this was back in 2016 and, maybe change like the feedback, because at the time it was like really clunky and you would get feedback only through like voice into your Bluetooth headsets or something. A more graceful approach would be to give like someone haptic feedback like in the clothes or something else. My last jury is create your own data that are presented at. So when do you need your own data? We are in the mood of like magical LLM thinking available to all of us at all times, right? So what do we do with lms? Generative models are useful for creating synthetic data, even if they struggle with factual accuracy and reliability. Use case one is thought lodge generator train. A model to generate Thought Lodge is focusing on internal consistency rather than grounded two. Use case travel image tagging, generate image captions from personal photos using LMS to create contextual and creative tag for journaling or curation. Synthetic data creation through lms. Use LMS to bootstrap data sets for tasks like totology generation or image captioning without relying on factual correctness. I use techniques like distillation and fine tuning to help adapt other elements to specialized tasks by reusing generated data and tailoring models for specific use cases. The takeaway was synthetic data generation can unlock meaningful creator workflows in areas where factual accuracy is not a priority. So some examples from like the experiments that I performed the original collab is like cocky is hundred 10 examples of dot lodging. And do not give a reason why, because usually what happens when you're like talking to something that was designed to be a chat bot is they give you a lot of reasons back as to why the sentence that they have is a chart logic. A bunch of these sentences are actually not real ologies, but some of them are. And the trick that I found out to like verify if the sentences are actually ologies, is to give them back to another element to verify whether they are actually but some of the good examples I really like are the second one. Water is wet. That's a tar. Fire is hot, that's a tar. And general 10 examples of metaphors and do not give reasons why. Her voice was a CD that played hundred p in my mind, and basically I gave her the prompt to use the word cd. That's straight up crazy, right? Can you think of a lot of metaphors using the word cd? Her voice was a cd. His angers was a C that scratched and sced the surface of our relationship. My job was a CD that's kept and glitched, making it hardware enjoy the music. So there, this is synthetic data. It's really cool data. I just want to say it's really cool data. Thank you. This was all for an example of caption generation using images that was something that I used in my travel. Thank you so much. The slides are of slides, carnival. Thank you so much. Have a good bye.

Slides

Download slides (PDF)

See all 137 talks at this event!

Conf42 Machine Learning 2025 - Online

May 08 2025 - premiere 5PM GMT

The Limits of Imagination : An Open Source journey

Video size:

Abstract

Summary

Transcript

Slides

Aroma Rodrigues

Software Engineer @ Microsoft

Join the community!

Featured event

2026

2025

Info

Conf42 Machine Learning 2025 - Online

May 08 2025 - premiere 5PM GMT

The Limits of Imagination : An Open Source journey

Video size:

Abstract

Summary

Transcript

Slides

Aroma Rodrigues

Software Engineer @ Microsoft

Join the community!