Conf42 Python 2022 - Online

Data data everywhere, No time to think 🤔

Video size:

Abstract

Every one says data is the new oil. But do we actually know how to efficiently use it to make our customer lives better, or it’s just another silo of information.

In this talk, we will see a beautiful approach to planning data-based projects inspired by professionals from Google, Twitter, Microsoft, and more. This talk will cover the following things - 1. Planning a data project sprint 2. Establishing purpose and vision. 3. What data matters and what’s trash? 4. Mining the sentiments of users. 5. Diminishing the silos. 6. Tools

After attending this talk, you would be able to think more clearly through the data project and really get amazing results.

Summary

  • Aman Sharma: Data, data everywhere, no time to think. The perception of finding insights out of data is kind of the same. This is going to be an interesting journey that we are going to take today together.
  • Aman Sharma is cofounder CTO, chief technology officer at Twimbit. Also a member at deep learning AI which helps bring technical knowledge in deep learning understanding to students. Advises startups and different organizations about their technology approach.
  • Being inside data is like sailing a boat in a deep ocean, right? The key objective over here is to identify the key insights which can help make people decisions. And also help them navigate through the tough course of finding key insights.
  • We'll see how to do visualizations for betterment of explaining what you mean by the data. Also we'll see the difference between approaches in code and no code tools. Finally, how better documenting and helps in overcoming the challenge of nondiscriminacy in data.
  • The first problem is the problem of goal clarity. Once your goal is clear, team always struggles with how to plan a project. This is the main reason of any chaotic situations and also leads to abandonment. The solution for this is a better project structure.
  • Then you have a dirty data problem. The vastness of data expands and there is bad data in the same good data at the same time. It's very important to always clear these results. The solutions are advanced tools, which are data preprocessing tools.
  • There is less transparency between the technical team who is working on these challenges and nontechnical team who are actually there to reap the benefits of the data. Solution to this is, again, no code is a better method to bring nontechnical people on board to any data science project.
  • The problem of silos. Like inside the organization, there are walls, invisible walls that are built between the data science team and the nondata science team. Solution to this is a combined solution of all these strategies.
  • There are four steps to this sprint approach, clear goal, plan well, execute and test and improvise. The whole target of doing a sprint is to achieve a limited scope in a limited timeline. Finally, make a map of how you are going to arrive at a solution.
  • I think diagrams are really underrated when it comes to different teams. If you diagram something, it's visually appealing and it helps people make decisions faster. It also helps set realistic expectations and timelines. Here's a small demo of how you can create diagrams.
  • Code and no code tools are very popular. No code tools have a low learning curve, almost very easy to learn about these drag and drop functions. It increases the productivity because you don't have to do different things. The core tools are divided into three main sentiments.
  • Next step. I think visualizations is also something that everybody in the team needs to be aware about how to use different charts and methods. The final step in doing all these things is dashboarding. Basically presenting your findings in a very organized manner and making it available for everybody.
  • Aman Sharma presents a presentation on how to use visualization to find insights out of data. This was not a code 101 or DIY that you might be expecting, but this was more around how to bring that exposure. If you really like the presentation, please let me know the feedback.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, this is Aman Sharma and today we are going to talk about a very interesting topic that has been very close to my heart, which is data, data everywhere, no time to think. Well this is like a proverb and saying that we usually have like water, water everywhere and no drop to drink. I think the perception of finding insights out of data is kind of the same. So this is going to be an interesting journey that we are going to take today together and we are going to discuss a lot of things, but in a very summarized in a very fun manner. So let's directly dive into it. Firstly, let me introduce myself. My name is Aman Sharma. I am cofounder CTO, chief technology officer at Twimbit. Twimbit is a platform where the world can create and discover research insights and I lead the technology team overall to create the platform and the SaaS products. Also, I am member at deep learning AI which helps bring technical knowledge in deep learning understanding to students. Also I am mentor entrepreneur and also I'm like a general guy who can span across into different meta knowledge and technology themes. I advise startups and different organizations about their technology approach and how to adopt new technology methods into the tech stacks. So well, enough about me, you can find more about me on my handle. That is Amantech and my website is also actually amanin tech and you can find all the details about me and my work over there. So let's start with our first question. What basically is the main key difference between data and insights? Well, the first time that I was seeing this popularity of data science, data science and what the hell it is, it was very confusing for me. Like there is data and then you can directly see it. But actually being inside data is like sailing a boat in a deep ocean, right? You are always covered with this different stream of options, different streams of data, and you are not able to identify what are you actually looking for unless there is a lighthouse which gives you a direction and it shines you all the pearls and all the different treasures that are hidden deep beneath the deep ocean of data. Well, I think the big data or any data that a company possess has a similar challenge. And the key objective over here is to identify the key insights which can help make people decisions and make them better decisions and also help them navigate through the tough course of finding key insights. Well, that is going to be our agenda of the talk and that has been divided in such a manner that we'll cover the brief problem that we have on hand. And what are the different scopes of these problems that usually covers and also whats are possible solutions that are existingly there. And what else could we do about it? Then we'll also cover a new approach, that is data science prints that I myself have seen over the course of times, like how different organizations are adopting them. Also we'll see the difference between approaches in code and no code tools. Today there are many no code tools also available which helps in data science. And then we'll see how to do visualizations for betterment of explaining what you mean by the data. Then we'll see an approach that is called dashboarding, that is bringing everybody on board onto the same idea and everybody is aware about what data we are talking about. And then finally documenting the steps that goes along and how better documenting and helps in overcoming the challenge of nondiscriminacy in data. So let's begin with our problems. The first problem is the problem of goal clarity. Well, simply explain, goal clarity is when team that are working on a similar objective doesn't have the main idea that what they are trying to achieve, as I've written in the definition as well, important to keep everybody aligned to ultimately achieve and improve service. Now the key symptoms that you might see when you are facing this kind of challenge is often your teams are losing track. They are always asking this common question, why we were here. Again, like what was the main theme of what we were discussing? Everyone have a different perception. Somebody sees it as one challenge, somebody sees as another challenge. And of course when people see these goals as different, the outcome that would arrive out of them would also be different and ultimately leading to a poor ROI, which is the main cause why team then kind of get demotivated and they don't go with the data science path as often. Now the solution on a very generalized approach is to first of all identify the main goal and then communicate that goal easily along the whole team span, which can help everybody to get onto the same page. Now the second problem, whats I see is poor planning. Once your goal is clear, team always struggles with how to plan a project which can help them achieve that particular functionality. And this is the main reason of any chaotic situations and offer also leads to abandonment. I have seen projects even in our organization and other organizations as well, that often whats the deadline is too long and there is no decision arrived at right time. And team often tend to just abandon the project and move ahead. And this also leads to a lot of wastage of resources and time, right. So the symptoms that you would see commonly in organizations for this kind of problem is they are missing deadlines all the time. The results that are yield are not proper and they are always questioning the resources, that the resources are not right, the technical skills are not right, and that is a repeating question always coming. Well, the solution for this is again a three step approach that is a better project structure, understanding how to divide the project into proper timelines, and also making sure that the deliverables are very defined and they are less like, they are very lean and the scope is not too broad. And then once you come up with that, always stick to the timelines and limit the scope to it. We'll talk about this approach in sprints of how you can make a better timelines and how you can make team structures better. Then you have a dirty data problem. Well, this is something that I have seen with almost all projects, that the vastness of data expands and there is bad data in the same good data at the same time. Right? So you spend too much time in data processing. That is like one of the key things that you would see in teams, that they are always struggling with, that they are always trying to clean the data. Sometimes team also do have to do it manually, right. And also the ratio between the whole data set that you have versus the amount of insights you gain would always be less because your data is always already polluted with so much of dirty data that you are not able to get the actual insights of the main data source. So it's very important to always clear these results. And for this, the solutions are advanced tools, which are data preprocessing tools. We'll talk about them as well. And also that source ingestion, like how you are capturing the data, if it is analytics data from the website, you have to rethink about how you are calibrating, how you are capturing the insights from that website or an app. And also if it's a surveying tool, how you are capturing this data. So all those source ingestion tools needs to be improved. And doing these three things parallel can help in overcoming the dirty data solution problem. Then we have on the other side, those are more, not project related, but more technical challenge related. Now, how I define the technical challenges are like data is important for any company, whether it's a startup or it's a big organization. But the challenge for small organization or mid organization or mid teams is that they are limited on tools that they can use, they are limited on the resources that they have, and they are limited on the talent that they have. And also for big company, the challenge goes beyond and they have privacy issues like they cannot take all the decisions lives that they are dependent upon GDPR and other data protection policies and they are not able to rectify their path through this time. And the symptoms that you would see is that people are complaining about less people, there is lower turnaround time, whats amount of time you are putting in and the amount of results that are coming are not very good. You are always complaining about the system efficiencies, like how different systems are not working properly. And of course people are complaining about there is less transparency between the technical team who is working on these challenges and nontechnical team who is actually there to reap the benefits of the data. And ultimately there is a silo that becomes between these two teams. Now, the solution to this is that has been like a time trial method, at least for me, over the course of time, is this new wave of no code tools that can be adopted by any organization, whether they have good technical bandwidth or not. So no code tools and also documenting your steps on the way. It's helpful for scaling teams, but also it's helpful for people to understand how the conclusion to arrive at a certain data sets was made. So the documentation step is really underrated in the industry, but it needs to be really highlighted over here. So every step of the process needs to be documented and read by everybody. Ultimately, this thing kind of brings the transparency in the teams, right? And teams are more flexible about discussing different priorities and options and ultimately leading to less technical challenges in the team. Then you have the problem of complexity. Right. Data science is often limited to only the technical people. That was a notion before, right? And insufficiency in the representation of data also leads to poor decision making. So, for example, the person you put in charge of finding insights of the data, he was not very good at visualizations from his side. He has presented the insights in the right manner. But the person who is there to make decisions out of this data is not able to understand that data very properly. Right? And this often leads to non judgment. Like there are judgment issues in this clearly, right? And you are not able to understand what actually this data is trying to tell me. Right? So you often complain about that the data is unreadable. Again, you will see poor decisions making out of him. And also then the stakeholder is always thinking about, like, data science is too complicated, let's just skip it at all. Right? Solution to this is, again, no code is a better method to bring nontechnical people on board to any data science project. And the turnaround time, from technical to nontechnical people can be reduced by just using no code tools, a better visualization techniques that we are going to emphasize and talk about in this presentation as well. And finally, a proper feedback mechanism that every time the project ends, how do people discuss? They come together and they discuss about what were the good things that we did in this project, whats are the bad things we did in this project. Right. Last problem, but not the least, which is kind of the cumbersome of all these different problems that we saw. And that is the problem of silos. Like inside the organization, there are walls, invisible walls that are built between the data science team and the non data science team. And often these walls kind of create these problems of non interdependent department communications, right. And ultimately, when there is less communication, people are not talking about the data that often, or they are not transparent about what is the approach. Of course, it leads to the lower growth of organization because the person who is there to make decision doesn't know data science, but the person who is there to do data science doesn't have the capability to take decision. So the wall is developed and now nobody is able to reap the benefits and the overall performance is going down. Right. Again, solution to this is a combined solution of all these strategies that we did. First. One is dashboarding how to have dashboards in internal team. So the data is available 24/7 for anybody at any time. Then automation systems like how we can reduce a dependency on technical team to always be there to present data feedbacks, as we discussed already, feedback mechanism that can properly help people navigate through these steps. And then finally a documentation method so everybody knows how the process is going. Well. That is overall the different themes of problems. Let's dive into the solution. Now, the first solution, which is not directly mentioned over anywhere, but I kind of tried to get it from this book by Jake Knapp called a sprint. Now, sprint is a method that people often use in technical teams who are into product development as well. But it has not been that much used in small teams or organizations which have data science as their bread and butter. So often it always tend to go into more kind of agile methodology like how they want to work on it. Well, what I did was this sprint approach kind of inspired me and it was a method of doing projects and testing ideas in just five days in different organizations, including Google and different ventures that Google invests in. So I kind of picked some of the techniques from there and combined it with some data science approaches and kind of try to came up with this sprint approach that works a lot better than before. So there are four steps to this sprint approach, clear goal, plan well, execute and test and improvise. So all these stages are divided into these four tools. And then finally in each step, everything starts with the introduction of this project. So everybody comes together, discussed. What is the main idea over here? Whats is the problem that they are trying to solve? So they set a long term goal. A long term goal could be a long term questions that they are trying to improve the user consumption method on the platform or they are trying to minimize the cost. So that's like a long term aspirational goal. Then you set some kind of sprint questions, like what are the questions you are trying to answer over here? Right? So these could be like directly you are trying to understand the male versus female ratio of the data. So you are kind of lives trying to be exact over here. Then what you do is once you create a question bank, like what are the questions you are. Of course these questions should be limited. Don't try to exceed it to 2030 or more than that, because ultimately that would lead to the longer timelines. The whole target of doing a sprint is to achieve a limited scope in a limited timeline and having the fixed timeline and fixed scope to do it. Finally, make a map of how you are going to arrive at a solution. We will see about how to create a diagram or a map to this as well. So kind of map you can imagine is like you have data, how the data is flowing, how you will flow it through different systems. So this is kind of a project that you are doing in the initial steps of your project discussions, just kind of sentiments so that everybody starts imagining what are the resources required to do that. So this kind of will help you in. Of course, clearly the goal. The second step is now starting with planning, right? So the first step with this is talk with experts. If your team already has experts, data science engineers, experts, go talk to them. But don't kind of always undermine the solution, just what they are saying, because they might be limited in their understanding about the project as well. So listen to them, keep the thoughts, but ultimately you are the decision maker in it. And you can go to other outside help as well. You can talk with other people, like how somebody else would have solved that problem. Go to different forums so that could help. Then what you have to do is pick a small target. So out of this question that we discussed now what you are doing is for the starting going, you have to pick a small target and then you have to see how you can arrive at multiple conclusions from the same small set of questions. Now, for everybody who is in the team, I am imagining that the team is usually of the size of four to five people. Two of them are pure technical, hands on people who are writing the code. Two of them are into data and visualization and one of them could be a manager. So ideally it works good with the five people team. Now what you have to ask is everybody in the team like how they think they might go for the solution? What are the different approaches that they think they can adopt? Don't discuss it out. Let everybody write on a sticky note and stick it to a board. And then let people vote for these approaches. That would help us identify what approach we can go for once an approach is identified. Second thing, what you have to do is to create a flow diagram. Now this flow diagram is a little bit different than the map whats we discussed in the previous step. Flow diagram is more like now you have started to discuss that. This is the data that has to come through. If the data need a preprocessing, you have to add a preprocessing step. If the data needs some more big data solutions or processing on that and so that would be captured. We'll discuss more about how to do diagramming in the upcoming slides as well. Now you are clear with what you are actually trying to solve. You have created a flow diagram as well. The next step in this step is so this was the first step. Second is the plan. Well, third is the execute step. Now you have a plan in action. Now you want to execute that and you want to bring everybody on board and arrive at a conclusion as soon as possible. So you have to set the deliverables out of it. So, which is similar to what you set as a sprint question. Then you set up a pick target and then you are trying to set deliverables out of them. Fix these deliverables. Don't let anybody add more deliverables to it. Do another sprint or maybe a future project to overcome that. But for now, fix these deliverables and then set a pure timeline. A timeline could be one to two weeks. Whats ideally works for the sprint, it could be three weeks as well. If you think that scope is a little big, divide these tasks according across the team. So this is normal project management 101. Then meet regularly. Already decide what would be the meeting points, what would be the meeting agenda, depending upon how your project projects, and always do a health check of how different members of the team and how different aspects of the projects are working. So then you have the last step that is test and improvise. Once you have the data in place, now you want to test your hypothesis if it's working or not. Now, instead of just going into plain dashboarding and trying to display data, first of all, have a small mvp or a test report to test. But if your hypothesis were right or not, go back to the main stakeholder, ask it if it's right or not. There would be some minor changes that you might need. Do these changes, bring back the data and then present this data on any live dashboard. So this is the present findings. Collect feedbacks in that group, improvise over this feedback if it could be done in the same sprint, and then finally document these learning. Now, you can see there is a big blue arrow that goes back to introduction to the project. So every time you find these learning, discuss them again when the next sprint is going to start. So these were the feedbacks, these were the learnings that we did for the last projects. This is how you're going to help. Again, I would highly recommend to go through the book by JKnAp that is sprint, and it would really help you understand how you can arrive at quick decisions, how you can make small projects and create this sprint approach and add it to your organizations. Now, we were discussing diagramming a lot, right? So to me, I think diagrams are really underrated when it comes to different teams. I have not seen anybody who is very enthusiastic about, okay, let's create a diagram and let's solve the problem by creating a diagram. But what basically diagrams do is that they get everybody on the point, it gets everybody clear about the thoughts and they bring everybody to the same page. And it also helps set realistic expectations and timelines, what people think about how things take time, right? If you're looking at just a bunch of code, then it doesn't make sense and it doesn't helps people estimate the resources properly. But if you diagram something, it's visually appealing and it helps people make decisions faster. And of course, once you have realistic, achievable goals that you can set from the diagrams or timelines that you have, it also helps you in estimating the resources. So I'm going to show you a small quick demo of how you can create diagrams. Again, it's a complete diagram 101, but I would highly recommend to go over small videos over UML diagrams or flowchart diagrams of how you have to do so. How I actually go always with diagrams is that I always place the main components and key findings or the key components of the whole diagrams first. So for example, let's say I am trying to find the male versus female ratio out of a web analytics data. So of course what I would do is that I would kind of make things web chart as first that this is the data ingestion source. Then I think this next key step or the next data source is of course a database or a data lock system that is keeping this data. Now as I go ahead, I am not putting any arrows or connectors right now. Firstly, the important thing is just to keep all the elements over here. Then I would actually need a script that would do a data preprocessing. So it would be a python script or something like that. And then once the data pipe processing is complete, I would probably run some kind of SQL queries. And let's say if this was more like a big data situation, I can actually go with bigquery from Google and that would actually help me solve this thing. So that bigquery is the script that I have to write and I can write query over here. Once this query is done, of course I would have bunch of data. Then I can use something like let's say data studio or Google data studio to present this visualization. I think the better would be to kind of have visualization tools like that. Now this is a typical thing. Now what I have to do is when I'm connecting it, so the data is constantly updated onto the databased. So that is like a repeat step. Now every time, what do you say? Once in a while I will go and pick up this data and kind of try to pass this through this data pipeline process. So what I'm going to do is every 24 hours, let's say I would pass this data to my python script which will do the data preprocessing and clear the data. Then I will do some kind of querying on this which will arrive me at the visualization and this would be the ETL that I will set for the whole time. And this would be the non tech objective outcome that any user or any decision maker can see to identify this thing. So this was a very simple example. Sometimes things would go complex that you would need conditional things, right? Sometimes the query won't work, then you have to go back to the main data and then you have to do the preprocessing again. So the brainstorming that the teams are doing and the planning that the teams are doing should be done on these diagramming steps, which can help people understand what are the main goals, what are the main objectives that they are trying to arrive at? So next, an important section of this presentation was to find approaches. Code and no code tools are very sorry, no code and low code tools are very popular. So I wanted to do a small comparison between these two tools. So code, of course, we are all familiar with it has the flexibility, it has the scalability and high function availability. Like you can pretty much do everything that you can imagine if you know what are the right code ways to do it. But of course, the cons on things end is that you need to know technically how things are done. Talent acquisition is a problem these days. Again, data is silos because tech people are not the decision makers and decision maker doesn't know the tech. And of course the model complexity also is a problem that if you are using any third party models to process your data using machine learning approach, then you don't know how things work and you don't have the clear idea of how you can get things done. So there would be always a time that you will get stuck and you don't know what to do after that. Now, some of these challenges are overcome by no code tools. And first of all, the fun aspect of this is like, it's very fast. Like you can arrive at a conclusion at very fast things because you are not setting up the bare minimal or the base things over here. It has a low learning curve, almost very easy to learn about these drag and drop functions. It's fun and engaging. Usually these tools are very fun to use. Like I have seen different tools with Google data studios or intersect labs or any parabola AI. So it's very fun to use and it's visually appealing. You can easily understand what are the things going on. It increases the productivity because you don't have to do different things. It's just simple ingestion of data and presenting it. And also it's kind of open between the team. Like anybody can come and see how are the different functions that are working. But it also has its own cons. It's not too flexible to do it. You cannot do everything with it. You can do only the tools that are provided to you on that aspect, right. And also you're limited to the source that you can choose of your choice, right. Which also means there are less options of these tools that are available. And also you are always dependent upon these approaches and these tools for going forward. So if, let's say, the company shut down tomorrow, your project is also shut down forever. And also you have these scalability issues like if the data grows big, then there might be some issues with if you can use the tools or not. So let's go one step deeper and see little bit of core tools. What I see usually is that the core tools are divided into three main sentiments, data science programming languages which includes Python, R or Scala. These are main bare minimal ways of doing data science. Then you have querying and analysis tools which include SQL, MatLab and Bigquery. And then you have application suit which is like a packed bunch of things that are packed together. Apache, Spark, Big ML, Hadoop. Again, this is just a general example. There are other tools as well. So anybody who is beginner, they can choose one track, like they can choose the Python track, add SQL to the stack and then use Spark to kind of have an application suit. You want to go in more generic manner. You can just go with the querying analysis tool that I've discussed like Bigquery over here. Then the next side of tools that we have no code tools. Again, it's just a generalized way of presenting these things. The tools expands more than that. So the first set of tools you have is easy to create dashboards and reporting tools. So popular one includes Google Data Studio, completely free to use tableau, it's limited free and then power bi. It's also kind of free to use these tools, doesn't require any technical knowledge whatsoever. You just come and drag and drop data and then you are able to do things out of them as well. Then you have build and automation data science flows tools, right? And these are more like if you want to do repeat tasks, you don't have time to go and set up the pipeline. Again, you don't have the manual time of ingesting the data. So then you can use these tools such as explainti, intersect labs and data robot. And then there are again complete end to end data science flow tools that we were discussing on the core tool side as well. You have obviously AI and Ghana which kind of helps you to provide all tools at the same place and that kind of helps you do everything with the data. Again, these are all no code tools, you can explore them one on one and then that would help you identify what kind of tools would work best for you. Then now you want to make things decision that which tool you should go for, code or no code. This metrics kind of help you understand what kind of solution would be better for you. So if your functional requirements are low and you need the results high, then it is the perfect way to go with the no code solution, right? Again, if your functional requirements are low and even the expectations are still slow, you can still go with no code solution which are ideal for individual projects. So that is the main thing. So you can see that the function requirement is the main thing that decides if you want to go with the no code tools or not. So then if your function requirements are high and your expectations are fast, then I don't things the no code solution would actually make sense to you. And also if the expectations are slow still you will go with the traditional ML technique. So it all depends upon what I've seen is if the low functional requirement is a key decision maker to choice if you want to go with no code tools or not. Next step. I think visualizations is also something that everybody in the team needs to be aware about how to use different charts and methods and different says to visualize data when it comes to making these decisions, like some libraries that helps you make visualizations are d three, plotly and ratplotly. This is kind of an order of different flowchart that. What is the kind of data you have which will help you make the decision? So a common path is if you have more than one variable, you will go with this path. Then are these variables similar? Yes, you will go with this part. Is there a hierarchy involved in it? If you say no, okay, are they ordered? No. Then go with this. So you can take a screenshot of this slide and that would help you understand, make proper judgment of how to present data going forward. This is a more in detail kind of diagram which helps you see what are the different charts and when they are used in certain situations. So again, you can take a screenshot of this as well and share it with your internal teams. That could help you devise the proper method of what chart you should use in which situation. Again, the link to things talk would be also available. And you can use this for your future projects as a kind of playbook for your future projects as well. Now you are done with the whole planning thing. You are done with diagramming thing, you have done with deciding the tool that you have to use. Now, the final step in doing all these things is dashboarding. Basically presenting your findings in a very organized manner and making it available for everybody. And dashboarding is divided into four main steps. Whats is collecting these visualizations, what you have to do is you have all these 1234 question, and every question has its own visualization, different kind of charts. You collect all them together and you try to have one way to do it. Then you organize them based on priority. The first thing to keep in mind is the most important question. Always remain on the top, right? Always the main question, arrive at the top. And the second thing you have to do is if there is a data connection, like if first question, answer the second. So those graphs needs to be organized together. So let's say if this was one graph and there was one conclusion out of it, so you need the second trash next to it, then this is plain 101. Kind of organizing and kind of arranging thing doesn't make sense, but we always skip the steps. And when it comes to dashboarding, we are just applying things as they go. Then you have to set an automatic schedule. So tools that I have mentioned over here, like Apache Superset or Airflow or Gramx, they kind of have inbuilt capability of automating and pulling the data from the main source on time to time basis. So if that is kind of a niche that you want, you can then set these things, set up a timer that when you want these data to be accessed again, again, tools such as superset, Gramx, plotly, dash and quicksites have functions that you can include non tech people, share the access with them and that would actually help them also come at any time and do this. So this is kind of the dashboarding process that you need to make as the final step of your journey. And then finally your dashboarding is done. Now you need to document stuff, right? So from the day one, only create a shared document. People, users, notion, excel and all bunch of stuff. For my preference, I think the simplest way to go about it is just an excel sheet with a point to point description of what are the different things. Create a shared document, add different progress as you go, and then add the findings that you were looking like, main findings from that project, and then collect feedbacks into the same document. Make this document available in the next print. And that kind of become evolutionary cycle improves the overall step and overall planning. Well, that is all about what we want to discuss today. So we started with a problem, and we started with what are the different problems that are into the data science issue in things space. Then we also looked into some of the solutions, which are approaches. Then we discovered the code approach and the no code approach of doing these things. We also saw how you can use visualizations, different techniques of visualization, to present your data dashboarding, to collect all things visualization into one organized fashion. And then finally documenting as this final sprint. And also we explored a new common method that is going to be popular. These says that is sprint. And also you can use diagramming to explain what you want to know. Well, that is more or less about it that I could find in this time. I hope I was able to deliver something new. I was able to open some thoughts about it. Again, it was not a code 101 or DIY that you might be expecting, but this was more around how to bring that exposure of finding insights out of the data. This was Aman Sharma. If you really like the presentation, please let me know the feedback on my handle. And again, the link to this presentation would be available. So until that time, if you have any questions, please drop it to me and thanks for your time.
...

Aman Sharma

CoFounder & CTO @ twimbit

Aman Sharma's LinkedIn account Aman Sharma's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways