Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello, everyone.
My name is Mithun Panda and I specialize in technology, data and AI helping
Fortune 500 companies solve complex business and technical challenges.
Today I'll be talking about how to build scalable AI and data solutions
using cloud native architecture.
Now, imagine this, AI systems on that scale effortlessly, large data pipelines
that adapt in real time and businesses that continuously innovate faster
than ever and without worrying, about infrastructure management over it.
And that's the really true power of cloud native technology.
So let's dive in and explore how it is transforming the way we AI.
Now, when you look at the trends, what is happening in the industry, we are
always obsessed about value creations.
So when we think of digital or technology or AI transformations, and we always
focus on business value creations.
Now, cloud adoption has definitely helped organizations accelerate
on the business value side of it.
However, to make this happen, we must get a few things right?
The foundation has to be correct.
for example, there are certain six key elements that I have
mentioned here, starting from building foundation for analytics.
you foundational platform, which will enable your use cases, which
will help you accelerate use cases.
The second element is the cloud native architecture, which is very important
and the cloud native architectures.
And we are moving away from on premise to cloud native, which will help us and
achieve the scaling up or down based on the demand, achieving the fault tolerance
side of it with high availability and self healing mechanism, right?
The third is a rapid value creations.
Now to achieve the rapid value creations, we have to.
understand the cost side of it as well, and this is financial operation side of
ops side of it, or the cost s how we can make sure that, and we can process the
large data and AI workloads, while, making sure that we are cost efficient, right?
And then scaling of the resources seamlessly.
that is also very important because, we, we'll be, deploying a lot of AI
use cases, we'll be building a lot of models and AI models and this
is really necessary that no, we can scale up the resources seamlessly.
As because we are data driven, we are in a data driven journey and
making sure that we have the data is governed properly, it is accessible,
it's reliable, and it's available.
That is really important.
These are the three key data enablers we must have.
We must have to get this right.
And then finally, future proof digital advantage, right?
We are moving away from on prem to cloud and now from cloud to multi cloud.
Again, I'm not saying that on premise will die here.
But definitely we will Live in a world, where we'll have, architecture
and aware, which will support very diverse kind of infrastructure.
So we have multi cloud, you have SAS based applications when you have on
prem applications going forward to understand bit on the cloud native
side of it is very foundational on the what cloud native means, which is to
design, which is to build and which is to run applications specifically
optimized for cloud applications.
The cloud environments, right?
So instead of traditional on premise infrastructures, and we, in cloud native
applications is leverage the true power off the cloud capabilities such as
scalability, resilience, automations and flexibility, so on and so forth, right?
So when you look at the scalability, what exactly scalability means is this
is where the cloud has the ability to adjust resources based on demand, right?
for example, Netflix, it scales is infrastructure.
During peak hours, and scales down during peak hours.
it's very intelligent and this is how they scale, their recommendation systems
and their streaming system as well.
When you look at the resilience, what is very simple, right?
If something happens, some failure happens and make sure that your system
is so much resilient that It can quickly recover from failures and the
automations really, we cannot separate automations without DevOps or CI ICD
and part of it is the infrastructure management, which is infrastructure as a
code, such as, infrastructure, which is very important to automate deployments.
and finally the flexibility and flexibility is really super, super
important here because we are.
need to we need to provide to make sure that okay, this is interoperability across
various cloud providers as because we are going to the multi cloud journey.
now all if I sum it up on the cloud native architecture side, it leverages
the microservices, the container, serverless computing, the DevOps CI
CD and orchestration tools to make sure that it helps us achieve the
lower cost or achieve cost efficiency, innovate faster and optimized best.
Thank you.
AI performance.
Now to just to deep, deep dive into just more and just to take a
thousand foot view on each of these elements, which I just talked about
what exactly mean microservices.
Microservices are nothing but applications which are broken down
into smaller, independent and modular services that communicate by APIs.
This is very straightforward.
Microservices.
Then we have the containers.
What are containers?
Is nothing but a lightweight or which is portable, right?
Environments for running applications seamlessly across different cloud
setups and then container orchestration such as Kubernetes, which is very
popular to manage, scale and deploy.
and containerized applications or automate containerized applications,
server serverless computing.
And this is where we do not need to manage the cloud infrastructure,
such as AWS Lamb Lambda or NGCP.
You'll have Google Cloud functions in Azure.
You have Azure functions.
These are the kind of example of serverless computing.
And this is really.
it helps us without really looking or scratching our head on managing
the cloud infrastructure side of it.
And then DevOps and CICD, I already mentioned the CICD side of it,
which is DevOps side of it for the continuous integrations and
continuous deployment pipelines for faster deployment and innovations.
Now look, now we are building a lot of generative AI models as well.
So, CI, CD has now been extended to CI, CD, CE.
So what that means is we are also continuously evaluating the models, right?
so that's why I know this is CI, CD and CE.
So just continuous integrations, continuous, development and
continuous, evaluations.
Moving on.
again, this is a deep dive on each of these components that I mentioned,
looking at how we, how the microservice and the container principles help us
enable the scalability and resilience.
one is the quick, three quick things that I would like to point out here.
Han look at the data and AI solutions and to scaling operate is, independent scaling
look, so scale scaling of AI inference separately from data ingestions, right?
When you build AI models, basically you have.
to separate the front end side of it, the back end side of it, the data ingestion
side of it, the feature engineering side of it, your AI model development side of
it, which is again feature engineering and the model and the evaluation side of
it, yeah, inference piece of it, right?
and this is where the containerization is going to help you.
And fault isolation, so if one service goes down, then making
sure that another is operational.
So it helps us build the resilience side of it.
And then the faster deployments, making sure that, you are creating a
modular, your architecture is modular.
and.
And which is really, which is going to help you in deployment, deploy faster.
Now on the right hand side, I've just given one example here.
There's tons of examples in the industry that we have seen.
In fact, in, in your organizations, you might already be doing that, right?
Leveraging microservices and Kubernetes based architecture.
How it really help us scale, in, in building our AI applications or the data.
Driven applications or data centric applications.
So for example, Spotify uses microservices in the Kubernetes to scale its AI
powered music recommendation engine.
Netflix is another great example, right?
You know where the deployment happens, every 10 seconds,
I would say or 11 seconds.
And they leverage Microsoft Kubernetes modular and, Amazon is another one,
which is the early adopter, right?
I really don't need to talk more about containerizations, but looking into
our AI specific one basically help.
It helps us the consistency, the managing consistency across environments.
and then when you look at the Kubernetes, which is the container
orchestration platform, which has helps us scaling, and the orchestration and
the selfing of AI workloads, moving on.
The serverless computing.
Now serverless computing is another really important piece when we
manage our data and AI workloads.
It helps us execute a code without managing infrastructure, right?
Now in, in generative AI solutions and when we build generative AI model or
large language model, applications, GPU as a service or inferences are
two really critical things here.
And this is where You know, making sure that our infrastructure has the capacity,
our infrastructure is efficient enough to make sure that we can build a model, we
can deploy the model and we can achieve the low latency through the inference.
And this is where the serverless GPUs are very important now.
Inference as a service, you can leverage, definitely leverage GPU, but also you
have LPUs such as Grok, which provides a faster inference as a service.
now.
What are the benefits that we achieve out of, leveraging serverless computing?
One is the auto scaling, right?
it allocates dynamically, the resource based on the usage, and
then cost efficiency is again, pay only for execution time
and a faster deployment really.
And you are not, it is eliminating the overhead of managing the infrastructure.
So this is, these are the kind of three key benefits that there are so many,
but for a developer and for us, senior executive or the decision makers, these
are the tangible benefits that we.
immediately seen once we start leveraging serverless computing.
what are the common use cases?
And definitely there are tons of use cases, but, in, in, in current
scenario and AI powered chatbots can be, is one of the use cases, AI model
inference, real time data processing using serverless ETL pipelines, extract
transfer and load or extract load and transfer, whatever you call it.
these are the kind of use cases, which help us, managing the data and AI
workflows with on demand resources.
So moving on now, when we look at, the managing data and AI solutions
are scalable, making sure that it's scalable, we cannot ignore the storage
side of it and the data management side of it, as well as MLOps, which is the
extension of DevOps side of things.
Now, when you look at the storage setup, we definitely data lake is
there, which we leverage in the S3 or in AWS world or Azure Data Lake Storage
Gen2 or Delta Lake, data warehouse, such as we have BigQuery, Snowflake,
Synapse, there are a lot of things.
And in generative, we have, we cannot separate vector database such as VVH.
in Pinecone, there are so many which helps us store the embeddings for the
generative AI applications, right?
And the data processing tools, again, there are so many data processing tools
and Spark has been very popular in Google.
GCP, Dataflow, Databricks is one of the, one of the best in the
market in terms of the adoption.
now in terms of scalability strategy, how should.
What should be our scalability strategy?
Definitely, we can use the databases, for the realtime IU workloads or tier storage,
and the lifecycle policies must be there.
And again, it should be surrounded with the data governance principles and the
best data management best practices.
Now, on the right hand side, if you look at an ML ops, to automate
and scale AI workflows, really if you wanna innovate faster.
If you have to achieve the faster time to market, MLOps is very important.
and this integrates basically the DevOps best practices, into AI model
development, deployment, and monitoring.
And now what are some key components?
So definitely when you look at the machine learning pipeline, it starts
from, building your feature engineering.
storing into the feature stores, building the model, creating the model
versioning, and then deploying and then monitoring and the bias detections, right?
to make sure that now it is prevented from, it prevents drift, right?
One of the examples could be, okay, if you look at the investment banking
side of the retail banking, A lot of experience on the banking side of it.
It is just MLOps to continuously update fraud detections
model, is one of the example.
Risk management is another example.
Hyper personalization is another example, and this is where MLOps has
been really beneficial, managing large scale AI workloads, and achieving
the automations and the scale.
Moving on, how should we think of the security and compliance?
Because we really can separate security and compliance.
And considering the generative AI adoption that is happening in the
industry and security and compliance remain really too critical in our blogs.
and I'll just want to make it very high level here.
And there are four parts when you think of the security and compliance.
One is the privacy side of it, data privacy.
The second is the encryptions.
And the third is GTA, which is zero trust architecture based,
which is super important.
And then API vulnerabilities.
In terms of the data privacy, we have to follow certain techniques,
such as an implementation, implementing, differential privacy.
to prevent a data leak, such a data leak is, and then data masking, the
anonymizations or the pseudonymizations, making sure that the PII data is not
exposed, making sure that you are compliant with GDPR, HIPAA compliance
for the healthcare or the CCPA, which is California Consumer Protection Act.
There are so many other regulations and are different based on the
jurisdictions or the country, countrywide regulations that you are in.
When you look at the encryption, so definitely, Implementing a hardware
security modules, HSMs for key management, for example, if you want
to look at in a certain tools or the services that you want to leverage
on to build your generative AI solutions, making sure that it is a
hardware security model, HSM compliant.
That is very important.
Otherwise, your security office or the compliance team, they might
not allow to use the service of the tools that you'll be using.
Zero trust architecture.
Really, this is very simple here and making sure that.
you have, you enforce least privilege access, role based access control is super
important here, continuous authentications with, behavioral analytics, and some
sort of secure enclave, enclave computing as well, and not besides this, you
also have the MFA, and that is also very important, OAuth2, for the API
vulnerabilities, so anyway, how to secure the AI models, Endpoint security, which
is using an O2 or the you, JSON web token, J JT JW T or mutual TLS or SSL 2.0.
We have application firewall.
Again, this is very important, to, make sure that it helps us secure API
gateways, to protect AI services, the endpoints and it's really important
to regularly, Conduct penetration testing, and your architecture
must be privacy and security fast.
This is really very important, when we, are in the journey of
building AI solutions in a cloud native, architecture presence.
Now, how can we optimize the performance, right?
Making sure that now we are getting the right speed.
We are getting the right efficiency.
We are getting the right latency in terms of the inference.
So I've just mentioned four things here.
One is the infrastructure.
You must ensure that your infrastructure is sized properly, right?
So so that means and how to design an auto scale AI workloads based on demand, right?
and then next is the model optimization So what are some techniques that we
can think to improve model training and the inference speed, right?
And maybe you know quantizations is one of the approaches to reduce
model size of the memory and while maintaining accuracy, right the data
locality can the data be stored and processed in the same cloud region?
And then What are some frameworks that we can really adopt for the AI accelerations?
not only in terms of the development, but also in terms of the inference set up it.
So for example, NVIDIA tracked on hogging face optimum for LLM performance tuning or
Grok, which uses LPU language processing unit for really faster inference.
Now I have developed one architecture, which is very
naive, rag based architecture.
Leveraging how we can implement rag based architecture, leveraging cloud
native capabilities, just to spend, 10 seconds of the quick, 20, 20, 30 seconds
on the rag, what exactly this means.
Look, you have lot of data on your, in your organizations, and if
you take any, existing LLM or pre model, it's not in your data, right?
it is trained up to a certain point in time.
Now, how to ensure that we achieve accuracy, and how we can make
sure that, okay, this is, we are getting some expected results,
based on the correct information.
So the real time information, and this is where the RAG is very much popular because
we really cannot fine tune all the time.
So you have the unstructured reference data, as you see a lot
of PDF files, word or text files.
You chunk it, and then you convert it into embedding, leveraging embedding APIs.
You have tons of embedding APIs, OpenAI embeddings or Hoggingfix embedding.
So there are a lot of embeddings.
And you can just take a look, the leaderboard, and just
pick one and just move on.
And then you load into vector database such as Pinecone or Weaviate or you
can also leverage Redis in Azure.
there's this another tool as well.
I forgot, but there are a lot of vector database, so you really
don't need to worry on that.
So this is the step one.
The step two is the user interface side of it.
And this is where multiple users will be sending their queries.
And make sure that you have the load balancer, such as API
gateways or Nginx load balancer.
It comes to the APIs, basically in a search into the vector database, right?
The embedding API must be the same as the embedding API
which was used earlier, right?
And then it will retrieve the top key.
So in the top three or top five based on your configurations, and
then the query and the top and the retrieve documents and are passed
to LLM to get you get the answer.
Right now, how do how does it translate to the cloud native capital?
Again, this is the right nag.
It is not advanced rag.
But when you look at the load balancer side of it, which is to ensure that
even distributions of request, right?
It's very important.
And we can also use a load balancer on the LLM side of it as well,
which I have not shown in this diagram in this architecture.
But.
A load balancer is really critical.
Number one, each of the front end side of it, the back end, the vector
database, the prompt template, the LLM services are docker containerized, and
this is really important because we don't want to create a monolith systems
and each component is containerized.
Now we can also leverage container orchestrations like Kubernetes to manage
each of these containers that I just mentioned with each HPA, horizontal pods,
autoscaler capabilities, capability to ensure services scale up and down, based
on demand, which is really important.
Now making sure that in this architecture, the security and the privacy, as I
talked earlier, are leveraged and the best practices are leveraged,
such as Istio for service mesh, OAuth for API authentications, RBSC
data encryption, so on and so forth.
Now, when you look at the vector database, it runs as a stateful, Kubernetes
service and with persistent storage.
Now, LLM hosting and LLM inferences are very important, which is based
on the GPU nodes or optimized inference service such as Azure ML
or AWS SageMaker, a lot of things.
And then Grafana API latency and ELK Stark also, you can leverage that.
Now, Where are we moving?
What are some future trends?
and these are the kind of four things that I mentioned.
One is we are moving towards in a kind of a, a power cloud automations through
self optimizing cloud architecture.
This is really happening now.
The second is, you know how we can make sure that no.
We can process AI closer to the data source, right?
This is really important.
LLM Ops is another really important piece, as we are trying to optimize
our generative AI workloads, LLM based workloads and finalize
the quantum computing in AI and emerging potential of quantum.
Enhanced AI workloads and that is also another future trend that I see
look, this is a very small quick 15 minutes presentations, but definitely
you know what I would like to see and what we have seen working with
so many Fortune 500 organizations in the adoption is really happening.
The leaders in which have already established certain foundation on
their infrastructures are really accelerating their business value.
but also the firms, the organizations, which have started to realize that,
okay, we have to leverage the cloud native applications to build a
scalable data and AI applications.
they are also definitely investing, into their capabilities to make sure
that, they will stay ahead of the curve and they will remain, into
their competitive advantage positions.
That's it.
Thank you very much.