Risk Management for LLMs
Patrick Hall, Professor, AI Risk Management, The George Washington University
Okay, so I'm going to be talking about risk management for large language models and if something and I really wish I could have been there. I miss all my H20 friends quite a lot, but I'm stuck teaching way too much here in Washington DC.
So here I am. And if you know, I'm texting with people so if there's some kind of technical error, please let me know via text, but I'm going to be talking quickly about just broader risk management for especially for large language models.
But I think what I have to say generalizes to other sort of multimodal generative AI systems. Okay, that's me. Let's see. So, you know, if we had an hour and a half, I would talk to you about all of these different things.
And before we dive in, I think I'll say that we, you know, we shouldn't think about risk management and large language models just as red teaming. And I'm, you know, I think I actually may have done the first commercial red teaming exercise of a large language model a few years back for a system that was based on Roberta.
But I still have no idea what people mean when they say red teaming AI. And I hear that quite a lot. I know NIST has a definition for red teaming. I know the executive order on AI that was just released has a definition for red teaming.
But but but I want to make the point that risk management for such broad application systems itself has to be very broad. And if you are thinking about sort of risk management and generative AI as sitting down in front of, you know, the chat prompt to make the model say some nasty or offensive things, you may be left exposed to quite a broad set.
of risk and I'm going to try to get into this this deck and kind of explore what's what might also help us. And another thing I'll say before I jump in, I've also been working with NIST, National Institutes of Standards and Technology on their AI risk management framework for about the past three years.
And we are hopefully cooking up some very thorough guidance for risk management in generative AI across governance, you know, direct management of risk, documentation, and and measurement of generative AI risk.
And that's due out towards the end of the year. So please be on the lookout for that. But let's let's keep moving. Okay, some definitions, I think we'll just just for time we'll skip through these. Let's see one thing I wanted to highlight very quickly if possible.
Okay, I'll share. I will make sure that you can access these slides later. I thought I had a link up. Okay, so again, just some definitions. When I, the one that I will highlight here is when I say risk, I have a very traditional definition for risk here.
I just mean the likelihood that something bad happens multiplied times the cost if something bad happens. I've heard smart people say that risk based governance approaches don't work for generative AI systems.
I don't think that's true at all, but I will say that if everything you want to do with the technology is high risk and low utility, like say a fintech having a chatbot talk to their customers about regulated dealings and consumer credit, then no, a risk based approach won't work because you're taking too much risk and there's no business value.
So if you're thinking a risk based approach won't work for your use of generative AI, you might want to think about what you're doing with generative AI and make sure that it's not high risk and low utility.
So I'm using a very standard definition of risk here. Okay, and then I think if we want to do risk management for generative AI, we need to think about a standard to which we're going to adhere, right?
And the standard shouldn't ideally shouldn't be like what my internal data science team thinks. We'll talk about that. We'll talk about why that is on the next slide. And there are emerging standards.
So I think NIST has done a good job. I think that the EU is putting out a great deal of sort of draft regulatory materials associated with the EU AI Act. There are, you know, there's a long history of data privacy laws, a long history of non -discrimination laws.
There are legitimate standards that you can measure adherence against to conduct risk management. and management activities even for today's sophisticated generative AI systems. And, you know, you can expect technical standards to come online soon as well.
I so I triply have, have also done a ton of work with more technical standards for AI systems. And I suspect they'll crank some out for generative AI soon enough as well. Okay. Right. Like I was saying, if you're really going to manage risk in generative AI, sitting down at the chat interface and tricking the model into saying, you know, they can say truly horrible things, but again, that's not really all that we want to go after here.
You need to think about the supply chain to begin with, in my opinion. These systems have very large supply chains relative to, you know, say your, your standard scikit -learn classifier and a great deal of risk can arise out of the supply chain.
So there are security issues, there's potential regulatory issues with data privacy. Sadly, the sort of geopolitical situation in the world seems to be on a destabilizing trend. And you can imagine that some kind of political situation might up in some of your data moderation, your content moderation or data labeling plans.
And then, you know, we have to worry about vulnerabilities sneaking into our supply chain. And I think every AI system I've ever been involved with, there's been a ton of third parties. And the thing with third parties is you just don't really know what they're up to.
And so all of these risks are sort of amplified to a certain degree when using large complex generative AI systems. And if I'm gonna think about risk management in generative AI, I'm gonna start by looking at the supply chain.
You know, I made the comments earlier about how we might want a real, you know, an authoritative independent standard to which to measure adherence for risk management. The reason we don't wanna do just what our internal data science team says is because there's a lot of human bias.
And so I thought it was, I got to see the end of the last presentation. And it was interesting that the speaker was bringing up thinking fast and slow. Cause some of the biases that were identified in some of the earlier behavioral economics, behavioral psychology work, those are real sources of bias and risk in AI systems.
So we really need to be conscious of our own confirmation bias, the Dunning -Kruger effect. We all tend to self -rate ourselves higher, higher at our abilities than we actually are. Funding bias, right?
We all wanna get paid, including me. Groupthink. And then the magniform, merifalicy and technostevenism, I'll leave those for a later date, but feel free to go. Google McNamara fallacy, I think you'll find it interesting.
And then, of course, we have to remember to stay humble. Instance can happen to anyone, even people who show up to AI conferences and talk about AI risk. And then also, I think we just can't be naive.
The AI systems that we make today are powerful. They hurt people. And people want to hurt the AI systems that we make. And there's no reason to beat around the bush about that. But again, we want to make sure that we're adhering to some kind of external authoritative standard to prevent issues, human biases, like confirmation bias, stunning Kruger effect, funding bias, group think, et cetera, from sort of poisoning our risk management efforts, which happens quite frequently, honestly.
OK. Again, if I'm going to manage risk, I want to review what's going wrong. I'm the another thing I get up to is, running the AI incident database. And so these are all incidents from the AI incident database.
If you're not aware of the AI incident database, go to incidentdatabase .ai. Interesting reading. And if we have more time, I'd go through these incidents, but we don't have more time. The reason I'm bringing this up is not to blame and shame people, but past behavior is the best indicator of future behavior.
And so if you want to know what risk to prioritize when you're managing risk in a gendered AI system, you should study past failures. And the incident database was chugging along last year, and we had about 250 documented incidents and several thousand reports related to those incidents.
And ever since the release of chatGPT, the incident database has exploded. So now we've more than doubled in size and have many thousands more reports. We've got some OG chatbots up here on the screen.
And what I'll highlight is just don't fall into the trap of repeating past incidents. TAE, Microsoft Research 2016, very notorious, like most of you know about this. I would say we saw repeats of TAE with Lee Luda on the South Korean social networking app, Kakao, and then with Neurosama.
And I just like to draw, not only were the incidents in terms of sort of generation of toxic content and privacy violations sort of very similar, the look, the anthropomorphized look, which was given to these chatbots is all very similar.
So I like to point these out as repeat mistakes. That's why we review incidents, not to name and shame, but to avoid past failures. And I'm not sure that the design teams on these systems did a great job avoiding past failures.
Okay. I was also heartened to hear the last speaker say that chatbots are not alive. I agree. And so, you know, I, if you want to pay me to do risk management on things like acceleration, acquiring resources, avoiding being shut down, emerging capabilities and replication, I'll be glad to take your money.
But I think it would be a waste of your resources. The most realistic risk of the systems today are things like abuse and misuse for disinformation, automation complacency, meaning that we've become too dependent on the system.
There's a sort of sublet there that we use these systems which are not designed for decision -making that are designed for content generation. We use them for decision -making. So, if you think decision -making by guessing the next word is a good way to make a decision, then, you know, that maybe that's your business, but it's not how I would like to make it.
decisions. So I try to be very careful not to use general AI systems for decision making. There's other types of AI systems that are designed for decision support and automated decision making. Privacy violations, errors, I don't even like the term hallucination.
Neural networks have a long history of hand wavy vocabulary and hallucination is just one more of those. They're just errors or bugs in software programs, that's all. There's intellectual property infringements.
While I think the current generations of chatbots have done a great job with toxicity and content moderation, there's still the lesser chatbots that don't get all the investment that have serious problems with toxicity and there's still older chatbots that have really serious problems with toxicity and bias.
And then of thinking about risk, we want to say what's the most likely, what's the most expensive, those are the risks we want to focus on. Okay, so there's notions of data quality in NLP. We don't have time to go through these slides.
I love this paper, DQI, Measuring Data Quality in NLP, certainly not written by AI cool kids, but nonetheless very useful in my opinion. And so they set up kind of a taxonomy of data quality. And so if garbage in, garbage out, still applies in the days of generative AI.
And you can check if you're putting garbage in. Okay, there's a good number of benchmarks. These are some of my favorite ones. If, again, I don't want to just do prompt engineering if I'm doing risk management.
There's other, there are quantitative benchmarks that I can apply. These are some of my favorites. I would be applying these to systems if I was worried about risk. They touch on bias, toxicity, errors, that those types of topics.
Okay, oftentimes, generative AI systems are just one step in some broader machine learning process or some broader business process. If that, you know, subsequent process has a binary outcome, a yes -no outcome or a numeric outcome, we like to think that we're pretty good at assessing those types of systems.
So if you are using generative AI as part of a more traditional business process or machine learning pipeline, just test the output of that process, test the output of that pipeline. That might be much easier and sort of much more standardized than what you might be able to do in trying to test the generative AI system directly.
Okay, people love prompt engineering. So these are my prompt engineering strategies. You know, like the thing I think this is what most people think of as red teaming. It's fine to do this. You need to do this to get a sense of your sort of worst, you know, your bounds for the worst possible behavior of the system.
But what I'd also like to point out is that this kind of sitting down at the chat interface and trying to trick the model, it's low in anomaly detection. And that's not, it just doesn't work well. If you're looking for anomalies, you need a big in.
You need a lot of data, more data than a person sitting at a keyboard can generate. And so keep in mind that when you're doing these kinds of exercises, you're doing low in anomaly detection. And it's really hard to get enough data to get a good sense of what could possibly go wrong.
So again, more prompt engineering strategies that I like, that I've found to be effective. And again, I'll make sure that these slides are made available and I think they already are, but I'll double check.
So these are just the prompt engineering strategies that I like. Okay, we can't forget about security. So there's, you know, a growing list of sort of specific attacks for language models. Then we also have the sort of, you know, last generation of machine learning attacks that would still work on language models.
And then we can't forget about the good old boring basics of data breaches and vulnerable and compromised dependencies. So fairly significant security risk. And again, if we're thinking about risk management writ large, then we wouldn't want to forget about security and prompt engineering might get it some of this, but certainly not all.
Okay, another thing I think is really important when dealing with these systems is to acknowledge that we just don't know a lot about them yet. You know, they're very large, they're very complex. They have very broad use cases.
A lot can go right. and a lot can go wrong. And so I like to employ these older notions of chaos testing, pull the plug on a server rack, turn the air conditioning off, start typing in a made -up language, just do very adversarial things and see what goes wrong.
And then the machine learning analog of that, and this gets back to that idea of high -end, more data, anomaly detection would be to use one language model to generate a lot of random prompts to test another language model.
That makes a lot more sense to me. And so using random data and using a large amount of random inputs to understand the broad distribution of possible outcomes, I think is another good risk measurement approach here.
OK, and the number one thing, the number one bottom line is, it doesn't matter what. you know, us data nerds think what matters is what your customers think. That's what matters, what your users and customers think.
So I was involved in like a two -year effort with NIST to come up with very comprehensive guidance for AI bias. And then the number one thing that we learned is just talk to people, just talk to your users.
Yes, there's tests, yes, there's theories, just talk to your users, just talk to your users and understanding what they experience. It's better if that interaction is more structured. So using things like bug bounties, hackathons, standard product management approaches, UI, UX research, you know, these are more structured ways to get feedback from your users.
And you certainly should be applying these to generative AI systems. So, you know, the bottom line test is what are your users experiencing? You should try to find out if you don't know. Okay. And luckily, you know, if you do find risk, there's lots that can be done about it.
So, you know, you can read the slide and I'll probably maybe stop here and see if there's any questions. And I'll just call out very quickly before we switch to questions. You know, some of the things that I think make the most sense to me here, and they're actually pretty basic.
You really want clear instructions for how users are supposed to interact with the systems, you shouldn't assume that they know what to do. Kill switches, you should be able to turn these systems off quickly if something goes wrong.
And I'll just highlight that that's more difficult than it sounds because oftentimes the person with the business authority to turn the system off, you know, maybe on a different continent than the technical person who would actually know how to turn it off.
And so you want to have incident response plans. You want to plan for something to go wrong and know how to deal with it. A lot of the incidents on that, the slide that I showed with all the incidents arose from just these really long use, usage sessions with chatbots.
So I would ask why we, why would someone need to have a three week chat with a chatbot? Why can't it restart after a day? Why can't it restart after an hour? So just rate limiting, throttling, session limits, things like that.
User feedback mechanisms, just let people tell you if they're having a good or bad experience. You know, anonymous use tends to be a bad idea with digital technology. Anthropomorphization is a bad idea for risk management because people are going to fall in love with your chatbots.
People have been falling in love with chatbots since 1966. So if the anthropomorphize your chatbot, people are going to get emotionally entangled with it and all kinds of weird stuff could happen. I don't really want to get into the miners thing because I'm going to get mad.
But I've seen Instagram recommend me services that generate NSFW AI generated kids, literally searched on AI on Instagram recently and the most popular result was quote, AI generated kids NSFW. So that's really disgusting and troubling.
I would just not do anything with minors, with generative AI, call me crazy. I'll leave it there. I probably rambled on too much. I hope we have time for some questions. Thank you all for your attention.
And I'll point out there's just lots of resources and sort of citations here that you might be interested in. And I'll go back to the mitigation side, the good news slide. Thanks, Patrick. If you do have a question, just raise your hand and we have some microphones around.
We'll come to you. Thanks. Hi, Patrick. Thanks. So you started off saying that you don't know what red teaming is and I know you're being a little sarcastic there, but it's difficult to define yet we just did put it in legislation that we're supposed to be doing this.
And what is your feeling on maybe start with, for me, the open AI paper released some good protocol. That was interesting. How do we start defining this? And is it materially different from what you're saying here?
Yeah, thanks for that question. Yeah, I think what I'm getting at, and I just had a call with some other very smart AI red teaming people this morning. And I think what it's not, of course, it's important what legislation says.
It's important what NIST says or what ISO says. It's important what your company's policies say. But really what I'm looking for is some kind of, I just feel like, out of touch from what the community thinks of red teaming and I think that's because a lot of red teaming goes home behind closed doors and also because we don't really know what it is yet.
And so I just hear people talking about red teaming a lot and I think I'm really just trying to call out that we should have some idea what that means. And if you're just thinking prompt engineering then you're leaving yourself exposed to a lot of risk potentially anyway.
And so I think you could refer to the executive order for a good definition of red teaming. You could refer to NIST for a good definition of red teaming but I don't think they're quite hitting on what we as like a practitioner community mean and I'm not sure we as a practitioner community are quite clear what we mean yet and I don't know maybe I'm just trying to highlight that issue.
So thanks for your question and hopefully that was a half decent answer. Hello Patrick, thank you for doing this conference for us. My question is if any thoughts on the development of large language models and unconscious biases happening as they've been developed and how do we put guard rails against that or if we can or we cannot or thoughts on that, on unconscious biases?
Oh yeah, that is great. So I'm an author of this document, so take my recommendation with the grand assault, but this was sort of the NIST effort that I was talking and it did come out before chat GPT, but I promise it works on generative AI.
And when I am asked to do bias audits of generative AI, this is the document that I use as guidance. And so what I wanted to highlight is yes, unconscious bias plays a huge role in design and implementation of any AI system.
And so I'm sure this is kind of impossible to see, but I'll just highlight, I have this graph taped up on my office wall, it's one of the most thorough investigations of bias and how it affects AI systems that I'm aware of.
And so you'd want to think about systemic bias and that's all the isms, racism, sexism, ableism, ageism, those certainly tend to infiltrate the data that we record about ourselves and use to train systems.
They're sort of baked into our brains in ways that we might not understand and they impact the development of AI systems, those human biases and confirmation bias group thing, funding bias, I think those are very dominant actually in data science design and implementation processes and we could all be a lot careful about that.
And the way that you are more careful about that is to be able to see that you're more Careful about that is the good old fashioned scientific method, which is boring and slows things down, but it actually works.
And so I would just say you're completely, in my opinion, you're completely correct to bring this up. It's a nasty problem. This is a 80 page paper with roughly 300 citations. And there's other good guidance too.
But I would point you to this, NISSP1270 for a lot of thoughtful objective guidance on how to deal with bias in AI systems. And it's not just about the training data. It is not just about the training data or the algorithms.
And to make a long story short, it's about people. All technology problems are people problems. So we have to govern the people that build these systems essentially. But thank you for your question and your comment.
Alright, I think that's it for questions. Thanks so much. Let's give a round of applause to Patrick. Thank you.