Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Hi. My name is Tempest Vansky, and I
            
            
            
              am a machine learning engineer at Microsoft in a team called commercial software
            
            
            
              engineering. And today I'm going to speak about responsible AI in
            
            
            
              health from principles to practice. So, an overview
            
            
            
              of what I'll speak about is the historical journey of
            
            
            
              responsible AI, lessons from biomedical research
            
            
            
              in responsible AI, some ethical questions to
            
            
            
              ask about projects, and some tools to
            
            
            
              help when working on AI projects.
            
            
            
              So I'll start with the historical journey of responsible AI
            
            
            
              as I see it. So, when new
            
            
            
              technology is developed and unleashed, safety and
            
            
            
              responsibility considerations usually follow.
            
            
            
              So, for example, the development of cars. We take
            
            
            
              car safety for granted these
            
            
            
              days, but once there was the first car on the
            
            
            
              street, and then there was the first car fatality,
            
            
            
              and then can manufacturers started adding things like
            
            
            
              windshields and headlights and traffic lights on the street and
            
            
            
              seatbelts, and eventually a driver's test,
            
            
            
              which only came into being in 1955.
            
            
            
              So we take that all for granted, but it wasn't always there.
            
            
            
              And I feel like responsible AI is
            
            
            
              in a similar position to the very early days of cars,
            
            
            
              where the technology has been released. And now
            
            
            
              there's a lot of focus on safety and responsibility. And what's
            
            
            
              been interesting to observe over the last couple of years
            
            
            
              is that machine learning researchers and practitioners have
            
            
            
              moved from asking, can we do it?
            
            
            
              To should we do it? For an example, look at
            
            
            
              facial recognition. A couple of years ago, this was really exciting.
            
            
            
              It's a genuinely exciting engineering breakthrough where
            
            
            
              we can get computers to recognize human faces.
            
            
            
              It's something that people have been working on for decades. So there was
            
            
            
              a lot of excitement about whether we could have this breakthrough. But now
            
            
            
              that we've seen the consequences of releasing facial recognition
            
            
            
              technology, people, even companies,
            
            
            
              are asking, should we be doing this?
            
            
            
              Should we be using this technology at all? So I will
            
            
            
              now speak about my personal journey with responsible AI.
            
            
            
              Now, my background is in biomedical research because
            
            
            
              I'm a biomedical engineer.
            
            
            
              It's been interesting to see that some of the concerns with AI today
            
            
            
              have actually are quite familiar from biomedical research.
            
            
            
              So one of the most important documents that was written
            
            
            
              in the medical research ethics world was the Belmont
            
            
            
              report in 1979. And this established
            
            
            
              new norms for ethical research. This was in response
            
            
            
              to some really bad research that
            
            
            
              has happened, mistakes that had happened,
            
            
            
              and this was a response. So this was decades
            
            
            
              ago. So medical research, it has a couple of decades
            
            
            
              of a head start on doing things in a more ethical way.
            
            
            
              So that's quite interesting. So some of the lessons that
            
            
            
              we can use from biomedical research, so some of the standards
            
            
            
              that we see there that are now considered part of responsible AI
            
            
            
              are the concerns of data transparency.
            
            
            
              So when you publish a medical research paper with human subjects,
            
            
            
              you have to be very clear about who the human
            
            
            
              subjects were, how many people were there,
            
            
            
              what was their race and sex,
            
            
            
              and what level of education did they attain,
            
            
            
              and what part of the country are they from,
            
            
            
              that kind of thing. You have to state that really explicitly when you publish a
            
            
            
              paper. And that's now being considered part of responsible
            
            
            
              AI is being transparent about who was in your data set
            
            
            
              and who was not in your data set.
            
            
            
              Another standard is informed consent. So initially
            
            
            
              it was informed concerns to have your data used in a study.
            
            
            
              And in terms of responsible RAI, we're obviously now
            
            
            
              asking, have people consented
            
            
            
              to having their data used by this machine learning algorithm.
            
            
            
              There's also the concept of group harm and individual harm.
            
            
            
              So if the study could
            
            
            
              harm an individual or even harm the whole group, that that individual
            
            
            
              is from. Likewise, in responsible AI, we're starting to
            
            
            
              consider both of these types of harm and how to avoid them,
            
            
            
              and then privacy. So with medical research, privacy is obviously
            
            
            
              about most important, keeping that research data private.
            
            
            
              And likewise in responsible AI, we're very interested now know
            
            
            
              it's our responsibility to keep people's personal data private.
            
            
            
              Because I had this background in biomedical research, I think it had primed
            
            
            
              me, and I had this kind of lens for
            
            
            
              working in AI, and I
            
            
            
              came across a project to work on. But I had ethical
            
            
            
              concerns with this particular project. And thankfully,
            
            
            
              I was supported by my leadership to ask, should we do it?
            
            
            
              Kind of questions, should we do this project?
            
            
            
              And the support that I got from my team and
            
            
            
              the tools that I discovered for addressing responsible AI issues
            
            
            
              have prepared me for recognizing and addressing responsible
            
            
            
              AI issues on future projects. So I wanted to share some of
            
            
            
              the learnings that I've gathered along the way, in case they're
            
            
            
              helpful for you. So now I'm going to talk a little bit about responsible AI
            
            
            
              reviews. So we have a responsible
            
            
            
              AI review board on my team because we work across really
            
            
            
              diverse customer AI projects. They're very complex,
            
            
            
              they're in different industries, each one is different, and we see a huge
            
            
            
              variety. So we have this responsible
            
            
            
              AI review board, and it's a sounding board for people to
            
            
            
              express different views and explore different ideas
            
            
            
              and ultimately provide responsible AI recommendations for
            
            
            
              our projects. And I find that the following
            
            
            
              questions are very helpful to ask when thinking about the ethical
            
            
            
              implications of an AI project. So, first of all,
            
            
            
              let's remember that AI is not magic. So is
            
            
            
              this problem actually solvable with RAI or can
            
            
            
              it be solved in a simpler way? So, for example,
            
            
            
              sometimes a SQL query will do the job.
            
            
            
              So can we just write a SQL query? Or do we need
            
            
            
              an advanced machine learning algorithm that needs a lot of maintenance
            
            
            
              and has a lot of responsibility of its own?
            
            
            
              And similar to this is, does this problem have
            
            
            
              a technical solution, or is this a problem that could be solved with
            
            
            
              some kind of social intervention? So get
            
            
            
              that out the way. Do we need technology at all? Do we need AI
            
            
            
              at all? If yes, then it's helpful
            
            
            
              to think of who are the stakeholders in this project.
            
            
            
              So think about each different group that this RAI impacts.
            
            
            
              And think especially if there are any vulnerable groups.
            
            
            
              So vulnerable groups might be children, the elderly,
            
            
            
              immigrant groups, or any groups that have been
            
            
            
              historically oppressed. And other stakeholders
            
            
            
              might be regulators,
            
            
            
              lawmakers, even company and
            
            
            
              their companies and their brands. Think about all the different stakeholders
            
            
            
              that could be affected by this technology. And once we've
            
            
            
              identified a map of different stakeholders,
            
            
            
              it can be helpful to think, what are the possible benefits and harms
            
            
            
              to each stakeholder so exhaustively list benefits
            
            
            
              and has to each it's useful to ask,
            
            
            
              does the data used by this code contain personally identifiable
            
            
            
              information? Most of the time when we're
            
            
            
              training a model, we don't need to know people's names and address and telephone
            
            
            
              numbers. So really we don't need to work with that data.
            
            
            
              If for some reason we really need that data needs to be handled
            
            
            
              in the appropriate ways, it's useful to ask, does this
            
            
            
              model impact consequential decisions like blocking
            
            
            
              people from getting jobs or looks or health care?
            
            
            
              In these situations, we have to be extremely careful, and often in these
            
            
            
              situations, the model needs to be explainable to
            
            
            
              explain why that decision was made.
            
            
            
              A couple more questions to ask are how could
            
            
            
              this technology be misused and what could go wrong? And I
            
            
            
              like to call this black mirror brainstorming. And the
            
            
            
              idea is named after the UK tv series called Black Mirror,
            
            
            
              where they explore how technology goes very, very wrong.
            
            
            
              Does the model treat different users fairly? Is the model
            
            
            
              accurate overall? But is there a particular group that
            
            
            
              it's performing very badly for? How does the training data compare
            
            
            
              to production data? So if we rai a language
            
            
            
              model using tweets,
            
            
            
              that is not appropriate for doing a medical literature search,
            
            
            
              because it's a very different language. So it's
            
            
            
              our responsibility to make sure that those align appropriately.
            
            
            
              Another question is, what is the environmental impact of the solution?
            
            
            
              And there's more and more interest in this topic recently.
            
            
            
              So, for example, if we have a
            
            
            
              huge language model that takes days or weeks to train.
            
            
            
              That's using a lot of computational power, it's using
            
            
            
              a lot of electricity. So what's the environmental impact
            
            
            
              of that? It's worth thinking about. And then how
            
            
            
              can we address concerns that arise? So, through answering all these
            
            
            
              questions, what concerns have arisen? Do we need to reformulate
            
            
            
              the problem, rethink it?
            
            
            
              And are there some risks that we can mitigate? And I'm going
            
            
            
              to discuss some tools that we might use shortly.
            
            
            
              I did want to highlight some special considerations for healthcare
            
            
            
              and responsible AI. And the first
            
            
            
              one that might come to mind for people is privacy.
            
            
            
              So health data is
            
            
            
              extremely sensitive and private, especially genetic data,
            
            
            
              which tells us so much about a person and even
            
            
            
              their family members. So it's extremely important to
            
            
            
              maintain that privacy and not let information leak
            
            
            
              through the model somehow. And on a similar note is security.
            
            
            
              So we need to follow the right
            
            
            
              regulatory requirements to handle data securely.
            
            
            
              And sometimes that means doing a lot of training, like HIPAA
            
            
            
              training, to be compliant with handling that kind of
            
            
            
              sensitive data. I think an important aspect of responsible AI
            
            
            
              in health is collaborating with domain experts.
            
            
            
              This is crucial, I believe, for all machine learning practitioners,
            
            
            
              especially so in healthcare. Are there doctors or
            
            
            
              nurses or even patients who can do
            
            
            
              a sense check?
            
            
            
              Do you have access to that domain expert? That's really important if
            
            
            
              you're working on a healthcare project. And then there
            
            
            
              is this idea of open, oversees, closed science, so we want
            
            
            
              to get the balance right. So on one hand, we have open science where,
            
            
            
              say, we have sequencer genome for cardiovascular
            
            
            
              research. But hey, this data
            
            
            
              set could actually be really useful for respiratory research
            
            
            
              as well. So could we share it with those researchers?
            
            
            
              Because that could benefit everyone.
            
            
            
              So that's open science, and we've got to balance that
            
            
            
              with keeping people's data private
            
            
            
              and secure, so we have to get that balance right.
            
            
            
              There's also the issue of unequal access to healthcare. So that's really something
            
            
            
              that we have to keep in the forefront of our mind.
            
            
            
              Healthcare people in wealthier parts of the world have better access
            
            
            
              to healthcare. And something that
            
            
            
              I have found in the USA, because I've recently moved to the
            
            
            
              USA, is how important it is to consider the
            
            
            
              bias that's introduced to the cost of healthcare, because healthcare
            
            
            
              is so expensive in the USA.
            
            
            
              Any data about billing costs,
            
            
            
              prices, we have to be really careful of,
            
            
            
              because it can contain a lot of bias because
            
            
            
              of unequal access to health care. And I'm going to show an
            
            
            
              example of that shortly. And then lastly, race and
            
            
            
              sex can be extremely important disease
            
            
            
              predictors as we've seen with COVID it affects
            
            
            
              different groups differently. However,
            
            
            
              race and sex can also introduce a lot of bias into the model,
            
            
            
              because historically these groups have been unfairly
            
            
            
              treated in healthcare. What I found works quite well,
            
            
            
              actually, is not just turning away race and sex and just ignoring
            
            
            
              them completely, because a model can still be biased without these features,
            
            
            
              as I'll show in the next slide. But what works really well
            
            
            
              is to capture that data and keep that data so that you can actually
            
            
            
              audit how fair your model is
            
            
            
              for those different groups. But you can only do that if you have the data.
            
            
            
              That's my recommendation, is that they're actually really helpful to have.
            
            
            
              So this is a really interesting paper by Obermayer Etel.
            
            
            
              It's called dissecting racial bias in an algorithm used to manage the
            
            
            
              health of populations. And they show how an
            
            
            
              algorithm that was actually used in production in the
            
            
            
              USA did not use race as
            
            
            
              a feature at all, but it was still
            
            
            
              very racially biased. So I think this is definitely
            
            
            
              worth checking out this paper. So now I'm going
            
            
            
              to talk about some responsible AI
            
            
            
              tools that you could find helpful.
            
            
            
              So the first tool helps us answer the question,
            
            
            
              is there a good representation of all users? And this tool
            
            
            
              is called data sheets for data sets. And I really like the
            
            
            
              idea because it comes from electronic
            
            
            
              engineering, where if you buy
            
            
            
              an electrical component, like a little microcontroller,
            
            
            
              you always get a data sheet with it,
            
            
            
              and the data sheet tells you all about that component, how to connect it,
            
            
            
              what the operating temperatures are, and so on. And the idea
            
            
            
              is that when you build a data set, you should compile
            
            
            
              a data sheet too, explaining how the
            
            
            
              data set has collected, who was in the
            
            
            
              data set, who was not in the data set, what limitations there are,
            
            
            
              so that every data set is accompanied by a data sheet.
            
            
            
              Another tool I would recommend using is it
            
            
            
              helps us answer the question of whether a model treats different users
            
            
            
              fairly. And one particular tool is called Fairlearn.
            
            
            
              Fairlearn is produced by Microsoft. It's an open source Python
            
            
            
              package. And here I've used
            
            
            
              it to look at overall accuracy. The first line,
            
            
            
              so this was a model which had an area under the RoC curve
            
            
            
              of 92%, which is great, but then it helps you
            
            
            
              break down accuracy by different groups. So we can see how
            
            
            
              well the model performed for female and non female
            
            
            
              people in this example,
            
            
            
              and we could also break down the accuracy for different races and see
            
            
            
              how accurate it is for each of these different races.
            
            
            
              And you can use any sensitive feature here to check
            
            
            
              how your model is performing. And then the
            
            
            
              last tool I wanted to mention helps us deal
            
            
            
              with models that need to be explainable. This tool is called
            
            
            
              interpretml. It's also an open source Python package developed
            
            
            
              by Microsoft, and it's a whole suite of functionality
            
            
            
              and visualizations, and also a model called the explainable
            
            
            
              boosting machine, which prioritizes explainability without
            
            
            
              sacrificing accuracy. Actually. And here's
            
            
            
              an example of the explainable boosting machine applied
            
            
            
              to the adult income data set where we're trying to predict who
            
            
            
              earns more than $50,000 a year. And you can see
            
            
            
              that it gives a weighting to each of the different features
            
            
            
              to say how important that feature was for this prediction.
            
            
            
              So for this person, we can see why the model decided
            
            
            
              whether or not they earn more than $50,000. So in
            
            
            
              orange we can see what was for that decision.
            
            
            
              So like number of years of education and
            
            
            
              in blue we can see what has against that decision.
            
            
            
              So for example, marital status and age,
            
            
            
              and we can see what positively affects and negatively
            
            
            
              affects the decision about whether someone earns more than $50,000.
            
            
            
              So it's very transparent and explainable, which is
            
            
            
              great. So I wanted to share some resources.
            
            
            
              I am a machine learning practitioner and I look to machine
            
            
            
              learning researchers that are at the forefront of thinking about this
            
            
            
              topic and I like to read and
            
            
            
              follow. Kathy O'Neill, Hannah Wallach, Temnet Jabru,
            
            
            
              Rachel Thomas, Deborah Rai, Kate Crawford, Arvind Narayanan
            
            
            
              just to name a few people. I would definitely recommend
            
            
            
              watching the coded bias documentary on Netflix.
            
            
            
              This is a great primer if you're new to this idea of responsible
            
            
            
              AI. Kate Crawford's book Atlas of AI
            
            
            
              just came out. I'm also looking forward to reading the
            
            
            
              redesigning AI book by Darren Ashimoglu, which is a collection
            
            
            
              of essays from some of these people. And here is
            
            
            
              a link to the Obama racial bias article,
            
            
            
              also to the different Microsoft responsible AI tools.
            
            
            
              And another resource is the GitHub repo
            
            
            
              that my team has where we share our best
            
            
            
              practices for engineering and machine learning fundamentals,
            
            
            
              including responsible AI. So we've shared that
            
            
            
              at this link. And then lastly, my team,
            
            
            
              commercial software engineering, the applied machine learning team is
            
            
            
              hiring in a number of different places. You can find our
            
            
            
              open job roles at this link and
            
            
            
              thank you very much. It's very easy to find
            
            
            
              me on Twitter and LinkedIn. And thanks very much to
            
            
            
              conf 42 for having me.