Return to page
h2o gen ai world conference san francisco h2o gen ai world conference san francisco

Applied GenAI for the Finance Vertical | Megan Kurka

 

Speaker Bio

Megan Kurka | Vice President Customer Data Science, North America

Prior to working at H2O, she worked as a Data Scientist building products driven by machine learning for B2B customers. She has experience working with customers across multiple industries, identifying common problems, and designing robust and automated solutions. Megan is based in New York City and holds a degree in Applied Mathematics. In her free time, she enjoys hiking and yoga.

Read the Full Transcript

 

Megan Kurka  00:06

Hi everyone, I'm Megan Kurka. I'm a data scientist at H2O and today I'm going to be talking a little bit about applied GenAI in the finance vertical. Okay, so first I just want to talk about a couple of different use cases more generally that I've seen and how large language models can be applied to it. 
 

00:28

So the first I have here is a virtual agent. So for example, maybe an analyst wants to answer questions based on internal and external documents. So we've been seeing a lot of demos around this with RAG, the retrieval augmented generation. 

 

00:42

Another use case that I've seen in the finance vertical is content generation. So for example, I want to take all of the tweets that occurred for a specific ticker and generate some kind of report or a couple of bullet points around it. 

 

00:57

The third is analyzing content. This is one I'm going to spend the most time on, but how do we analyze content? There's so much unstructured data, especially in the finance vertical. How do I analyze that programmatically? 

 

01:10

And then the last is how do I use that to find trends and anomalies? So I'm going to be walking through that today and do a couple of demos. So let's get started. All right. So our kind of default use case is this virtual agent. 

 

01:26

I'm going to use large language models to help assist me in answering questions. So this can be run on the cloud or on prem, and it can have access to internal or external documents, and it can just basically help me answer questions automatically. 

 

01:42

And we've talked a lot about that today. But one thing we didn't talk that much about yet is how it could be included with machine learning. So Arno mentioned it a little bit in his panel. But we can ask questions on these unstructured data documents, but we can also ask questions about a model. 

 

02:00

So for example, maybe I have a machine learning model for recommendations or forecasting or classification. But I can do with large language models is not only tell large language model to build a model, but also to explain the model. 

 

02:16

So why is this bond recommended and why? What's recommended and giving it to me in human understandable form? Another use case is content generation. So can we generate a report or an article or some summaries around this data that we have? 

 

02:37

So not just answering questions, but generating something in a specific way. This is where prompt engineering can come in where I give an example of what I want it to look like. Just to show a quick example. 

 

02:50

So here I have a bunch of tweets on Apple from a specific day. So I can go ahead and upload this document into H2O and maybe ask, you know, I'm a financial analyst creating a report, create five bullet points. 

 

03:06

Give me the most interesting bullet point first. So I can start to now tell the large language model how exactly I want its output and start to specify how it should be answering. Another use case we see often is just analyzing content. 

 

03:28

So during today's talk, I'm going to be talking about earnings calls a little bit. And we're going to talk about how I built an application to analyze a specific earnings call. So you'll see in the application, and I'll show a demo, we're splitting the transcript up by speaker and asking the large language model to tell me how defensive the speaker is and what their overall sentiment is. 

 

03:52

Now, this is all done programmatically. I'm going to ask it to give me that answer in a specific way, in this case, in a Python dictionary format. And what that means I can do is I can, without doing anything except changing a ticker, run some code, and create an application. 

 

04:08

So let's look at the application together. And we'll talk a little bit about transit anomalies as well. So the way I built the application is I split the transcript up by speaker, and I'm analyzing each chunk for sentiment and defensiveness, giving it a little context about what the call is about. 

 

04:27

But I also asked the LLM to explain the reason for it. And maybe I could even say, is there something I could have done better. And then finally, I'm using Wave, which is our open source Python library for dashboards, to explain, to make this an interactive application. 

 

04:46

So let's go to the app. So here's my application. I can see it for earnings call 2022 Q1. And this is Netflix's earnings call. And I have an overall analysis here. coming from the large language model. 

 

05:08

And I also have for each section or each speaker a defensiveness ranking, a sentiment ranking, and some general metrics around each speaker. So this application has been already written. I really just need to change the ticker and I automatically generate all this information. 

 

05:25

And I can see how everyone's doing and interesting pieces of the talk. So for example, let's click on this piece. It has positive sentiment. So the talk or this section, we can see the original transcript here. 

 

05:38

It shouldn't be any more complicated than that Doug. It's moderately positive. So it's an earnings call. The people that work at the company are gonna keep it upbeat, but they're still being defensive. 

 

05:48

And I can see the LLM's reasoning for why both are. And I can use that to maybe find interesting points in the earnings call, maybe for self -correction, maybe as a training. process, and so we're getting to automatically understand this call in more detail. 

 

06:08

All right. Now if I did this for one call, I might want to analyze a series of quarters to understand if there's any trends I can pick up with a large language model. So we have so many earnings calls available over a series of years. 

 

06:26

Can I iteratively redo this process essentially to understand any correlation between these earnings calls and the stock movement? So what I'm going to do is I'm going to not just look at each speaker and each little section of the call, but I'm going to try to get a general ranking of this call. 

 

06:45

So I'm going to do a couple of things. But first I want to talk a little bit about how that would work. So I have a whole transcript, and I want to basically get a ranking for, let's say, how well did this company meet the expectations set? 

 

07:03

So that's going to be my question, ranking from one to five. Five meaning they've exceeded all expectations, one meaning they've, you know, done, have not met any expectations. So that's going to be my goal. 

 

07:14

And I want to iteratively get this for every quarter for a company over a series of companies. Now, I can really simply load up this transcript and ask the large language model, okay, let's do it, rate how well the Netflix did from one to five. 

 

07:32

And I want a table with the rating and the reason for the rating. Now I just got that. It gave me four out of five, and it says the reasoning is because they, you know, have strong content coming up, they beat the EPS expectation. 

 

07:47

Okay, so this makes sense to me. But if I think about this quarter, this is Q1 of 2022. If anyone's familiar with that, this is when Netflix had a really poor quarter in terms of, there was a lot of people who were like, oh, I'm going to be on the show. 

 

08:01

a lot of news content around it. It was a first quarter where they had a decrease in subscribers. So I understand the four and I get what the large language model is saying, but it's not really right. 

 

08:11

Does anyone know why it's not right? So this method is just using our retrieval augmented generation RAG, and it's figuring out the sections that talk about expectations. So probably in this call when they talked about expectations, they were talking about the expectations they had met or exceeded. 

 

08:32

They didn't talk about the subscribers. So what the large language model is talking about is only this positive section. But really this should be a, I think a lower ranking. So let's do something different. 

 

08:45

Instead we're going to ask the large language model to summarize the transcript into a couple of paragraphs with a focus on their expectations. So it gave me a summarization. It talks a little bit about the positives. 

 

09:03

It's a few paragraphs on it. Now I'm going to go and ask the question again, but this time I'm going to ask it on the summary. And let's see what it comes up with. OK. So now it gives me a two. So this is much more realistic because I first took this huge transcript, well, not huge, but big transcript, and condensed it down with a specific focus and then asked the question on the content of the summary. 

 

09:31

So I'm starting to chain the large language models together to get a better accuracy. We talked a little bit about evaluating large language models. Any idea how I could evaluate this model? Well, I'm trying to figure out if they met expectations and I want a ranking from one to five. 

 

09:54

That should be correlated to the stock price movement. If they don't meet expectations, there should be a drop. So if we start to think about it in that way, I have now a validation data set really easily to make. 

 

 10:05

It doesn't have to be any specific large language model evaluation. I actually can just do it with regular correlation or machine learning to analyze this. So one thing to think about is how can we take this unstructured data, create it into structured tabular data, and then have a label to kind of sanity check or validate this results? 

 

10:28

OK. So let's go back to the slideshow. So what we just seen is that just asking the question didn't perform well. So we did a summarization of the transcript. We rated the summary. We asked the LLM to justify it. 

 

10:42

And now we're checking correlations with the stock price movement. And here's the results. I'm going to show it in the interactive dashboard. Let's see. But just before I do that, here is our list. stock price and the vertical lines are the ratings for each quarter so the higher up the line goes the better it did. 

 

11:10

I've highlighted this quarter this is the rating two this is the 2022 Q1 quarter and if we Google that quarter and anything about the earnings you see only pretty much negative content so it's kind of corresponding the two rating is corresponding to what we're seeing in the headlines and it's also corresponding to that drop in stock price. 

 

11:33

Alright so let's go back to our application and take a look at it. So here is our stock price with the earnings ratings from one to five and then below that I have my table so this was automatically generated I asked it to be in tabular format and I've concatenated across each quarter so I have my rating and then the reason for the rating. 

 

11:57

Now because this is something that is programmatic. Okay. I can change the company really easily and redo this. So here we have the same analysis for FedEx, and I can see the trends over time as well. 

 

12:21

And I can visualize it a little bit differently if I want to, take a look at correlations. So here's the quarters that had a sentiment of three, moderately positive or neutral, moderately positive rating of four, and how it correlates to the stock price. 

 

12:37

So another way to sanity check that my large language models are doing well. All right. And with that being said, I think I'll leave, I guess, two minutes I have for questions if anyone has any, and if not, I'll pass it to the next speaker. 

 

12:54

But I hope that was interesting to see how large language models could be applied for real-life use cases. Thank you.