Conf42 Python 2024 - Online

Probabilistic Programming in Python

Video size:

Abstract

Unlock the power of probabilistic programming in Python with this presentation! Dive into foundational principles, explore Bayesian inference, and master PyMC3 for seamless implementation. From basic models to advanced techniques, simplify complexities for researchers and practitioners.

Summary

  • Probabilistic programming addresses critical challenges in traditional machine learning and AI techniques. It enables us to embrace uncertainty, incorporate expert knowledge, and enhance transparency in the decision making. Finally, I will present the implementation of probabilistic models in Python.
  • probabilistic programming is merely a programming framework for bayesian statistics. It inherently uncertainty within its parameters. The whole model architecture offers transparency and more explainable models. Here's a quick demo on implementing bayesian models in python.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
In this presentation, we'll be deep diving into probabilistic programming, which is a powerful modeling approach that addresses critical challenges in traditional machine learning and AI techniques. We'll explore how probabilistic programming enables us to embrace uncertainty, incorporate expert knowledge, and enhance transparency in the decision making. Finally, I will present the implementation of probabilistic models in Python. Why do we need probabilistic programming and what does it has to offer? What does it offer over other machine learning and AI techniques? So the fundamental challenge in conventional machine learning and AI techniques is the lack of uncertainty quantification. These models typically provide point estimates without accounting for the uncertainty surrounding their predictions. This limitation hampers our ability to assess the reliability of model and undermines our confidence in the decision making process. The second challenge we face is that machine learning models are data hungry and often require correctly labeled data, and these models tend to struggle with problems where data is limited. Conventional machine learning and AI techniques lack of framework to encode expert domain knowledge or prior beliefs into the model. So without the ability to leverage domain specific insights, the model might overlook crucial nuances in data and tend to not perform up to its potential. Lastly, machine learning models are becoming more and more complex and opaque, while public demands for more transparency and accountability on decisions being derived from data and AI. So all of this presents a need for a modeling framework, encode expert knowledge, work with limited data, provide predictions along with associated uncertainty, and provide models or enables models which offer more transparency and explainability. Probabilistic programming emerges as a game changer, so to understand probabilistic programming, it is essential to grasp bayesian statistics. How bayesian statistics differ from the classical frequentist approach. In frequentist statistics, model parameters are treated as fixed quantities, and uncertainty in the parameter estimation is typically addressed through techniques such as conference intervals. However, frequentist methods do not assign probability distribution to parameters, and their interpretation of uncertainty is rooted in the long run frequency properties of the estimators rather than explicit probabilistic statements about the parameter values, while in contrast, bayesian statistics. In bayesian statistics, unknown model parameters are treated as random variables and are modeled using probability distribution. So this approach inherently captures uncertainty within the parameters themselves, and hence this framework offers a more intuitive and a flexible approach to quantify uncertainty. How does Bayesian statistics work? Bayesian statistical methods use Bayesian theorem to compute and update probabilities as you obtain new data. This is a simple but a powerful equation. What we start with is the prior belief, right? So what's the prior belief or the prior distribution for the unknown parameter likelihood represents the information. The new information represents our updated belief about this unknown parameter, which incorporates both prior knowledge and observed evidence. The term in denominator marginal likelihood is more of a normalizing constant, making sure that posterior also represents a probability distribution. Now let's look at how inference happens with bayesian versus non bayesian models. So we'll start with non bayesian inference, and then we'll go to Bayesian inference. So in case of Bayesian inference, what we do is we determine the value of unknown a point estimate of the unknown parameter which maximizes the likelihood of data. So likelihood is given the unknown parameter. So we defined the parameter which maximizes the likelihood of evidence, and it comes as a single point estimate. And for a new instance, we predict only using that point estimate. While in case of Bayesian inference, we start with our prior belief about this parameter, about this unknown parameter, which here is represented as p theta, and then we compute posterior distribution, which is p theta given evidence. So it's an updated distribution about the unknown parameter given our prior, starting from our prior and given the new data set. So now for a new instance, you compute the probability of the new instance considering the entire posterior distribution rather than a single point estimate. So this simple implementation is a lot more complex. In practice, the integral here tends to be interactable, especially when we work with higher on a higher dimension parameter space. There's no closed form solution to get this posterior distribution. So what do we do in that scenario? Right? So if we can't get a closed form solution, can we get samples from the posterior distribution? Right. So if we are able to sample from this posterior distribution, we effectively have its posterior distribution. So the whole idea is if we can sample from this posterior, and then we can use that samples to get inference for a new instance along with the associated, as we touched earlier, p of y, which is the normalizing constant. Normalizing constant, which involves integrals, is generally not interactable, and then we don't really have a closed form solution. Numerical integration techniques also tend to be too computationally intensive here. How do we sample from here? Right. How do we sample the posterior? So for this, we rely on a special class of algorithms called Markov chain Monte Carlo methods, through which we are able to sample from a probability distribution. So if we're able to construct a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain samples for the desired distribution by recording states from this Markov chain. The different MCMC samplers here. So you have metropoliscape sampling and so on, which can help you which can help generate samples from this distribution. So now going back and explaining what is probabilistic programming or probabilistic modeling. So probabilistic programming is merely a programming framework for bayesian statistics. It inherently uncertainty within its parameters. So it tends to thrive in a world of uncertainty. And as you define your prior beliefs, you built in your model, your prior beliefs or the expert domain knowledge. So it tends to work well with little data as well. And it can be updated. Your distribution can be updated as you get more and more new information. And the whole model architecture offers transparency and more explainable models. Now a bit about workflow of probabilistic programming. So the first step is we identify all the unknown parameters. We define the prior distribution, and while defining the prior distribution, we encode our prior belief or we encode our expert knowledge, expert domain knowledge about the model parameters. Then we specify the likelihood, which is the probability distribution of observed data as a function of unknown quantities. And then we can run a suitable MCMC sampler to get the posterior distribution for all of these unknown parameters. And now for any new instance, now we have, instead of point estimates, we have a distribution, we have the entire distribution for the unknown parameters, and we can utilize that distribution to compute the estimate along with this uncertainty for a new instance. So now a quick demo on implementing bayesian models or probabilistic models in python. For my demo, I'm using a data set in which from a sample of population, I have height, weight and gender. So gender here is in binary form whether the candidate is female or not, and then you have the height and weight of that candidate. So in a non probabilistic world, we will try to fit a logistic regression model here. And for bayesian model I'll be using. So we start with a simple logistic model with your is female flag as the target, and then we try to find the coefficients of height and weight, which best fit our problem. Right? So in this case, you run a linear model, a logistic regression model, and then you get coefficients of that logistic regression model. And again, you're only getting point estimates, you are not getting an estimate that what is the range of this parameter and what's the underlying uncertainty? Associated model coefficients, right? So the next thing I'm going to do is move on to running a bayesian model. So for bayesian model, as we discussed earlier, we start with defining the unknown parameters. We define the prior distribution, we define likelihood, and then we run an MCMC sampler. So, Stan, it's its own language. First thing I have done is I have built a Stan model. So I'll just give a quick glimpse of that model. So Stan has a couple of modules. So you have your data transform data parameters, transform parameters model and generated quantities. Since that simple model, I'm just models here. So data is where you define the structure of your data, data types, parameters where you define the data types of your unknown parameters. And then we come to the moderate part. In the moderate part, the first thing I do is I start with my priors. So what are the prior beliefs? I hold about the three coefficients, your interceptor, your coefficient for weight, and your coefficient for height. And then here is the part where I have defined likelihood, right? And again, the target metric here is binary Bernali. So we fit a Bernoulli logic Bernoulli likelihood here. That's it, that's how you define your stand model. Then I can take this model into Python, run the compiler for this model. The data which needs to be fed to stan needs to be in form of a dictionary. So this data is just being transformed. We run the MCMC sampler, and then you get a series of estimates or samples for each of the unknown parameters. And then you can use those, you can look at the mean median models and other centrality measures for your parameters. And along with that you can also understand, okay, what's the standard deviation, what's the range? So this gives you a lot more information about your coefficients rather than giving you point estimates, right? And then subsequently, as I perform predictions, instead of considering just a point estimate, I can consider the entire distribution of my parameters, unknown quantities, right? Or my coefficients here, that can help us get a better prediction and along with that get the associated uncertainty with the prediction as well. So that's about it. If you go onto Stan website, it has a lot of detailed documentation and you can find out how you can build more complex stand models as well. So that's about it from, in terms of my presentation. Thank you everyone.
...

Salman Saeed Khan

Associate Director @ Afiniti Europe Technologies

Salman Saeed Khan's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways