ON-DEMAND WEBINAR

Using H2O Driverless AI for Cybersecurity

Read the Full Transcript

Patrick:

Thank you for joining us today for our webinar titled “Using H2O Driverless AI for Cybersecurity.” I’d like to start off by introducing our speaker. Ashrith Barthur is a security scientist designing anomalous detection algorithms at H2O. He is a graduate of the Center for Education and Research in Information Assurance and Security at Purdue University with a PhD in Information Security. He has specialized in anomaly detection on networks under the guidance of Dr. William S. Cleveland. Before I hand it over to Ashrith, I’d like to go over the following housekeeping items. Please feel free to send us your questions throughout the session via the question tab in your console. We’ll be happy to answer them towards the end of the webinar. And this webinar is being recorded. A copy of the webinar recording and slide deck will be made available after the presentation is over. Without further ado, I’d like to hand it over to Ashrith.

Ashrith Barthur:

Thanks, Patrick. Today we’ll have a talk about machine learning for network security. And, today we’ll talk about how to design and model a specific model for distributed denial of service attacks. This will be using one of the flagship products at H2O which is basically Driverless AI. I would like to take you to the point where you can probably build a model and then deploy it and showcase how you can go through the whole process.

One of the most important things that you would also want to do is try and identify the kind of features that actually are valuable while building this model. So, having said that, a general quick run of the company itself. H2O.AI is primarily a machine learning focused company, and we were founded in Silicon Valley in 2002. We have about 190 people right now. I think we are growing really fast, and we’ve got offices around the world. We’ve got a decent enough footprint in insurance, manufacturing, financial services, healthcare, retail, and ad tech.

We’re considered one of the leaders in artificial intelligence or machine learning platforms. The next thing that I do want to identify is the fact that we have our flagship product which is called Driverless AI. Driverless AI is actually the product that helps you build models using machine learning automatically. One of the biggest advantages of Driverless AI is that it’s able to optimize the model with as limited inputs as possible from you and tries to optimize the model.

Essentially, in this experiment or in this designer process, we will actually be using Driverless AI to build our models and showcase you across the entire gamut of how you go about doing this. Having said that, I think let’s come to a bit of the basics. Even before we jump into DDoS, one of the fundamental origins of DoS itself is a distributed denial of services attack. Or, it’s a denial of service attack. Essentially, the attack is where an infrastructure or a targeted system is actually being limited in its services provision due to some kind of an attacking infrastructure.

Basically, you overwhelm the system with a lot of service requests, which eventually limits the system’s ability to actually service you in respect to the attack. So, having said that, the DDoS is another variant of it which is actually called a distributed denial of service attack. What the DDoS essentially does is that multiple systems attack one single target or multiple targets, and then they basically slow down the services or they actually cause a complete destruction of the service possibility out there.

Now, this is actually a big problem that occurs over the internet. One of the reasons is because bots are martialed essentially to have a DDoS attack across different targets around the internet. This essentially is like a big problem that exists. This exists as a problem where you’re trying to solve this time and again. Much of the way in which we recognize DDoS right now is based off rules and it’s based off people monitoring this traffic.

Ashrith Barthur:

For example, you would have people monitoring traffic between different aliases, trying to identify different BGP routes, trying to identify some unusual traffic that is coming within an alias, within a service, within an infrastructure or within an organization. And there is consistent and constant monitoring that actually happens where people are trying to recognize if this is a significant denial of service attack or not, and based off that filter, rules actually happen. Now, the issue with that is that you need a large amount of people to actually be sitting and monitoring these services. You need constant monitoring.

So, it’s not something that you can do like a nine to five job where you say, “Okay, after 5pm, I’m done.” You need monitoring 24/7. One of the big problems with is that it becomes very difficult to actually identify the behavior of DDoS itself. It takes some amount of time just like any other form of attack. It takes some amount of time for this attack to actually come up to critical mass and then be recognized as a DDoS attack, which is why one of the first things that investigators go about doing, in looking at our system, is the investigation itself. They try to identify whether you actually are seeing a real DDoS attack. Then you follow it up with certain monitoring. So, after you try to identify it, mind you, you are marking it, highlighting it as a potential DDoS attack. Then, you monitor it for some time to see whether there is some level of service degradation or some level of service removal from the network infrastructure that you have.

Having done that, if this happens to be true as well, then you go about filtering the traffic. Now, the big problem with this entire procedure that we use today in operations is that it takes quite a reasonable amount of time, which essentially means that the infrastructure in the system that you’re using to actually service might have taken a lot of damage, and it might not be recoverable, which is one of the reasons why we showcase how to use models design based off machine learning to be able to identify this traffic much better, much faster and much earlier.

Having said this, a lot of these decisions that the investigators make, the designers, the scientists, the modelers who look beyond this are actually looking at different traffic patterns, analyzing different sources of data that they’re getting probably. They are bringing in all this data to actually identify whether this is an attack or not. And, now, even though they can operate at a certain rate, there is no assurance that the next time the denial of service attack changes, they would operate at a much lesser time.

That’s primarily because there are different mechanisms in which DDoS attacks. There are different levels across the protocol stack where the attack can actually happen, which means that different layers of the infrastructure itself could be brought down and different traffic patterns might emerge. So, having said that, this is quite a complex situation and is one of the reasons why, even though you might have a really tight ship in the whole monitoring aspect, it might not necessarily be efficient. So that’s one of the reasons the whole proposal of using ML in the design comes into the picture.

Having said that, I’d actually like to take you through an entire setup, where we actually build a model based off a DDoS traffic that we have. The model that we build using Driverless AI means that much of the model building is actually automated. And then, beyond that, we actually get a model, and how do you analyze whether the outcome of the model is something useful or something valuable? What are the parameters that you look for?

If this is something that you wanted to build using Driverless AI, I would like to showcase the next steps for building it. Having said that, the data that we have is actually an initial sliver of the DDoS traffic that we had actually captured. And the attack was primarily towards an http server or a bunch of http servers that was basically behind a load balancer. Now this is a very simple form of the attack, actually. There are much more complicated methods in which the attack can happen, but this is one of the simplistic traffic patterns that we actually had, that we were analyzing that one of our clients actually went through, and we were able to assess them using the model to try and prevent the situation again.

Now, one of the things we also do while using Driverless AI is to make use of specific packages that are built for specific use cases. You could be looking at, for example, DDoS attacks. You could be looking at identifying machine generated domains, or you could be looking at fraud. Any of these use cases have specific packages that come with Driverless AI, which you can use to build these models. DDoS has a specific set of transformations that exist as a part of the package. Now when you’re building these transformations, one of the most important things that you actually have to look for is whether these transformations are purely analytical, which means it is post attack or is it productional? Do you want to be reactive or do you want to be proactive? That’s essentially the position you have to maintain.

There is a pretty good reason why you have to identify that position before you actually go into the modeling, because one of the things is we have tools. We have capabilities. We have people who are really smart who can identify and build transformations and build models that are extremely good. They probably would identify really well with an extremely low false negative. But the problem is if those models are super heavy, then deploying those models in production becomes a problem, which means that you don’t necessarily get the same return on value that you would expect from a model which is that good. On the other hand, if you look at this as a production problem where you are basically saying, I need this model to be as fast as possible, as proactive or as reactive a possible, even before the attack actually happens, then you’re giving it some margin. I would actually say a wide margin for the model to be much faster and much more and actually bring in some value.

So, as I was saying, the analytical models are very valuable from a study point of view, but production models are much more valuable when you’re actually trying to prevent something. I’m guessing we have a wide variety of audience members out here. A lot of them who work under the analytical side. A lot of them who work on the production side. But I would like to point it out that identifying this as an applied or a real time or a production use case is very important to actually solve this problem, because there is a real threat there. This is not an imagined threat.

Having said that, the way Driverless AI actually builds these models is it creates simple transformations. These transformations are pretty simplistic. Although your transformations are very pre-processing, meaning that the amount of data munging that you need to do is very light. The model is the one that actually does the heavy lifting. The model tries to dig as deep as possible into the behavior that you’re looking for and then it basically does the heavy lifting in this case. And that’s essentially how we design the whole process. Having said that, I’ll quickly switch to one of the windows that we have.

Okay. So, I’m guessing everybody is able to see our screen. So, what we have here is Driverless AI. This is essentially our flagship product and this is what helps you build a model or automate your model building process. There is a fantastic algorithm that sits in the background which helps you build the model, which helps you choose the right set of features to optimize it, to have things as fast as possible, and to have things as light as possible which you want.

Having said that, what we are looking for today is a very simple data set. We are looking at, in this case, a DDoS dataset. What you’re looking at here is essentially different kinds of information that you’re collecting. While you would see very basic features or very basic columns in the dataset is because when you have this attack at a very initial level, the amount of information that you have is necessarily very limited, which essentially means that you are expected to make a decision very early on to try and figure out if this going to propagate as an attack or is it not. Essentially, you’re expected to make a decision so as to prevent any further problems.

I wouldn’t say it’s a bit kidish, but you would probably be a bit proactive in how you’re identifying it. So, here we have very simple information. We have the definition, and we have the time period. That last We have the amount of packets that are ingress. We have the amount of bytes, the intra-viable times between data. We have some basic flags. If a flag has been enabled, the primary flag exists in a TCP packet in this case. And mind you, we have not gone very deep into the packet itself. We’re purely holding ourselves to the TCP aspect of it because sometimes when you actually have to make a decision, you probably will not have enough time to query through. You might not be able to do any form of DPI to identify whether this is true traffic or not.

And essentially, if I am in charge of doing denial of service, I probably don’t care about the kind of data I’m sending you, or I’m going to try and keep the packet as light as possible so that I can bombard you with as much traffic as possible. So essentially, packet inspection doesn’t necessarily help much here. So, we keep ourselves just to the PCP headers and that seems to work enough. That’s arather good enough representation of what we need in terms of the modeling data. So having said that, this is basically the dataset that we have. Now, what we do is, we’ll design a model using this dataset. So, we’ve chosen that DDoS dataset.

And in this case, we are selecting a target column. In this case, we’re selecting whether this traffic is good traffic or this is actually DDoS traffic. It’s as simple as that. And that’s essentially is what the label is. Okay? So, having said that, one of the things that happens is in our case, while we’re processing the data, we tended to have the row numbers. So, that’s what we are going to drop essentially. And that’s what the first column is. The next thing is actually very important. So, the Driverless AI helps you build as good a model as you want or as interpretable as model as you want.

So that is controlled by three different bars, which is accuracy, time, and interoperability. In this case, just for this experiment’s sake, we can keep the time at about four, just down half way. The interoperability as well is below four and the accuracy at four. And interoperability actually helps you try and explain the model, after the model is built. Essentially, what happens is there are lots of use cases where you probably have a risk office. You probably have your regulatory office.

Let’s say you have to keep your CISO abreast about what is happening. In those situations, if you turn on the interoperability really high, then the model that you build will not only assist you in identifying the DDoS behavior, but it will also help you identify why a certain traffic was actually seemingly a denial of service. And, using that information, you can make a case about why the decision that your model took was a certain way versus another way. Cranking up the interoperability always helps. And, time, you can increase it. When you increase the time, time is not directly co-related to the goodness fit of the model.

You could probably get a really good model even when you keep the time at say, five. But, usually, with the complexity of the model, if you increase the time, it tends to improve it as well. In terms of accuracy, of course, you are fitting it really well. So, higher accuracy, you can set it as high as you want. And having said that, the next thing that you want and whenever you’re working on any of the use cases, let’s say, in like cybersecurity, you’re looking at DDoS. You’re looking at malicious domains. You’re looking at interpreting botnet. Let’s say you’re looking at any kind of DLPs; with any of these models that you’re building, make sure that you choose an F1 scorer.

One one of the reasons for you to choose an F1 scorer is basically so that you can minimize false negatives as much as possible. It is really important to minimize false negatives as much as possible and that is primarily because it doesn’t matter if you’re able to prevent 99 attacks into your network. But, if that one attack actually got through, then you’re still under the same amount of loss that you might have had. This doesn’t necessarily seem that important when it comes to denial of service because you’re merely building a model to identify behavior for denial of service. So, if some attacks come through, it’s still fine.

If someone is trying to build a botnet out of your organization or any of these situations, cranking up your scorer to F1 is actually really valuable. Having said that, one of the things that we also build along with Driverless AI is we build specific transformers or specific engineering techniques that are able to basically work on the data and try and identify different behaviors. In this case, we have some that are very specific to DDoS. Some of the features that we have built are very specific to the signatures of the flags itself. One of the reasons I want to highlight that we’re using flag signatures is because if you guys have gone through the TCP packet, flags are always Boolean, which means that it’s very fast to actually analyze them, process them and use them towards whatever model that you want to use.

And they’re essentially usually very light as well. That’s one of the reasons our transformations, or the packets that you use for DDoS, is actually heavily flag-specific in this case. Having said that, let’s see choose some of the features that we actually have built. Let me quickly choose all the features.

So, in this case, we have chosen about eight features in this case. So, we are also looking at different packets. So what we have done now is we have chosen, using Driverless AI, a package that is very specific for us to build models purely for DDoS attacks. Essentially, what we’ll do is we’ll basically try and launch this model. What you’re seeing actually happening here is, Driverless AI is identifying what are the best features, combinations of features, and the best models that it can build using this dataset that you’ve provided, using these features that you’ve actually chosen, specifically to identify the DDoS behavior.

Now what it is doing in the background is it’s actually building it based off the different model settings that you’ve given it. And then, essentially, it will start building your model in this case. Now, what you see is that we have tried and optimized it to F1. So, here if you see, it’s still continuing to build the models. So, here you see that it is actually giving you the different kind of variable importance. We have some feature packages that actually identify different kinds of packet structures. It is looking at certain kinds of time series sequences. It is also looking at flag fingerprints, and essentially that seems to be a really good identifiable metric in this case in terms of identifying this behavior.

So, this essentially is basically how the model will be built. Once the model gets built, you’ll actually have access to the model itself. And the model itself is super light. You can get it in different formats and you can basically use that to deploy. Maybe we can answer a few questions while the experiment is running.

So one question is, what are the advantages of machine learning-based DDoS prevention solutions over the traditional DDoS protection solutions? For example, firewalls, etc.?

That’s actually a really fantastic question. So, one of the most important things that actually happens is, the reason why DDoS is so damage creating, if I can use the whole phrase is because we wait till the traffic actually hits our organization and then we realize that this is actually causing trouble for us. Now, I have to prevent it in some way. One of the advantages of using ML is it disconnects intelligence away from the system that is actually telling it what to do. Essentially, the intelligence is within the model. And the model is executable which means that it is just another piece of code that you can execute. If you happen to have access to your edge nodes (let’s say you’re an ISP and you would like to run it), if you’re having transactions between different aliases, you can deploy this between aliases. If you are an organization, you can deploy this around your edge nodes.

You don’t necessarily have to wait for your firewall to come back and tell you, this is bad. This is good. And that essentially is one of the biggest advantages of using ML – purely the engineering aspect of it. But the traditional difference is that firewalls are heavily rules-based. They are very heavily rules-based, which means that you need to consistently and constantly update the rules of your firewall to see what is good and what is bad, which requires an expert to actually sit and curate these rules. What is good? What combinations are bad? And so on. But in ML, these are actually learned purely by the data itself.

So, you can have an expert actually build a model and this model could be updated every minute. I wouldn’t want to say every second, because it takes some time. So, it could be updated every minute and that model could be immediately deployed, which means that if you’re looking at dynamic traffic, and if you’re looking at dynamic behaviors, you can very quickly avert any kind of disaster.

Patrick:

Right. Thank you. Let’s keep the questions coming. It looks like the experiment is about to finish up.

Ashrith Barthur:

We can probably take another question while I come back to the slide.

Patrick:

So the next one is, “I would like to know your suggestion to minimize false positives and false negatives, especially when there’s no sufficient training data.”

Ashrith Barthur:

This is a fantastic problem which dogs machine learning every time. I wouldn’t think that this would be a problem especially when it comes to denial of service attacks, because one of the things that you require in denial of service attacks is that you require a large amount of traffic to be actually causing trouble for you. So data would not necessarily be an issue. What I would suggest is to get in touch with someone out here at H2O or myself for that matter, and if we can probably talk specifically about the problem that you’re facing. We can help you try and identify what’s best, because sometimes oversampling helps.

But if you’re looking at a dataset which is actually a functionality of time, oversampling is actually a bad thing, because that means that you’re re-representing the data without understanding if that’s independent or not. So, I would suggest that it’s specific to the problem at hand. So, we’ll probably just come back to the experiment that we ran itself. So, one of the advantages that we have in Driverless AI is that it shows us how well our model performed across the different experiments. It will tell us what features were actually valuable or what transformations were actually valuable.

So let’s say you guys have a briefing. You guys have a deep dive with your CISO and with your SOP team to identify what is your bad behavior, or what happened in that whole situation that you guys averted. Now, at that time, the variable importance and how much of traffic actually came in adds a lot of value. You can always build a report and let them know that this is how things work. This is how the attack actually happened.

Okay, so one of the things that you also get just right out of Driverless AI is an entire report of how your model is, an what the accuracy was. You learn how was your model built, what were the columns used, what were the kind of features that were built. Let’s quickly jump to the features if we can.

Okay, so, you have different features that were actually identified in this behavior. And essentially, what these features are telling you is which one was valuable in making the decision to avert certain traffic or not. So, essentially, if you were walking into this whole deep dive over the attacks, this will be a very good report to tell you why your model decided to behave in a certain way, why this was feasibly important or why this was not. And having done that, you can also get a model out of Driverless AI, which means that if you click on the model tab, you’ll basically get a consumable piece of code which you can deploy on any of the servers or on any of the remote edge nodes to build your model.

So, this essentially gives you a general idea of how to build the model, what are that things that you need, the packages that helped you build the DDoS behavior itself. Having said that, I’d quickly like to switch back to the slides. I have a few more things that I would like to talk about. So Driverless AI is probably building features itself. So, in those cases, one of the things that could be happening is that you don’t necessarily subscribe to the package that exists in H2O, but you probably are building these transformers yourselves and essentially, Driverless AI also provides you with the ability to build these transformations on your own.

And one of the ways that you can actually do it is to use the custom transformer extension, which Driverless AI provides and you can build all kinds of transformation that you feel are valuable for you. But keep in mind, when you build these transformations, these rules that I spoke about, is this for analytical or for production use, still holds value. It is still important for you to decide. Many times when we come out of research, we build really amazing identifiers for this behavior, but they are necessarily not productionizable because they’re super complex. And essentially, things like that are more troubling for you than anything else, which is why I would say that it helps to actually identify the kind of behavior that you want to build using this extension.

Having said that, let’s say you want to actually jump into this and try and build transformations for yourself using Driverless AI. Then, what you need to know is that there is a clear separation of space. Everything that you build is very functional. It’s not like writing an entire script and it’s not like filling up a pipeline notebook to try and model the whole thing. You have specific parameters that you need to fill. You have specific functionalities that you need to fill with certain data sources, and that essentially builds your transformation for that matter. I do also have another talk, swhich talk about how to build transformations specifically, so I can always offer you guys that, or you guys can get back to us and we’ll be happy to help in this process as well.

Having said that, this is just a general representation of how our code looks. We’re essentially extending the custom transformer class. Here, all I’m doing is trying to build an example of a transformer, and we have some parameters which are basically trying to check whether this is a regression problem, a binary problem, or a multi-class problem. I think in cybersecurity probably, most of the problems that we’re dealing with are usually binary or multiclass, where we’re trying to see if this attack is dangerous for me or not. And essentially that’s where it remains.

I can always point you to talks where we’ve done this more extensively, and talk about how to build this. So please let us know if that’s something that you’re interested in, and we’ll be happy to guide you out there. Having said that, some of the advantages of building these features and just running them through Driverless AI is that you have preset parameters. You have preset methods which you necessarily need to fill. The effort is very minimized. And you build once. Essentially, the way the whole modeling actually happens is you train and test, and that helps you deploy the whole process through and through.

That means that you only have to build transformations once. You don’t necessarily have to build it again and again. And Driverless AI can automatically run multiple models on different feature sets just to try and optimize what’s best. And that essentially is handled internally by Driverless AI, so that gives you an advantage. I quickly want to come to the deployment architecture. The reason why I want to do that is because I know that when you’re talking about DDoS, if you can’t deploy the model, then there is no point in building the model in the first place itself.

The model itself is available as Java Mojo or C++ Mojo or a Python scorer, depending on the kind of infrastructure that you are deployed on. In our model test, we’ve seen about 1.6 million records being scored per second. That means that it is pretty efficient when it comes to network traffic. I am sure that there are network notes which carry much more traffic than this. And essentially, that’s actually one of the reasons why we refrain ourselves or, in the packages, why we restrain ourselves from going much deeper into the packet stack. That’s because the deeper that you go, the more closer you get o to applications there or to the actual data. You are essentially taking up more time and trying to identify the behavior.

But of course, that comes with its own limitations, which means that you don’t necessarily know that you have some kind of a cross-prey or something like that which someone is sending it out to you. That is something that you wouldn’t be able to identify but which is one of the reasons why you might have to make this decision very subjectively based off your use cases. For example, if I’m trying to actually load up some kind of a Trojan onto your device, then essentially deep packet inspection will help.

But, if I’m trying to identify if there’s an attack or I just want to stop communication of a certain kind based off a decision, then looking at the shallow level in the stack helps. Having said that, the model itself is super light. It can be deployed at your edges, which makes it really easy. The model is independent. You don’t necessarily have to get your firewall to talk to it, which means that it can operate independently. There are times when because of the overwhelming traffic, the communication between your edge nodes and your control plane, for example, the one that you’re using to monitor your networks, might actually be slow, and because the model itself is very independent, that helps you make decisions.

The model makes decisions on their own. We can say there is some artificial intelligence out there in the edge nodes. So it can stop your malicious traffic way before we intercept it, analyze it and filter it, which makes it a big advantage for any of these networks. You probably prevent a lot of damage to the infrastructure. The services, if you have an SOA, the availability of services can be assured if you’re serving customers. So that kind of helps when you’re deploying it in a certain way.

There are some basic references. I’ll put out the slide as well on our talk, so that will make it easy, and I’ll quickly jump to any questions you want to ask; I’d be more than happy to help.

Patrick:

Thank you, Ashrith. Yes, we do have a few more questions here and please keep them coming. So the first question is, “Is DDoS available with different cloud platforms and pipelines to use it in different servers?

Ashrith Barthur:

Okay, let me re-read the question. Is DDoS available with different cloud platforms pipelines to use it in a different server? Are you talking about the package itself? I’m guessing you’re talking about the package itself. So, the package itself is a part of the Driverless AI custom packages. Essentially, this builds the model specifically for this behavior using Driverless AI. But if you’re talking about the model, then yes, the model is a mere piece of code and you can deploy it however you feel comfortable with it. It can be deployed on bare metal. It can be deployed on Docker in case you want to deploy it. It can even be deployed on different chip architectures. So, all of those should help. If that does not answer your question, I’d be more than happy to take it further if you can just come back to us. Thank you.

Patrick:

What do you do for clustering algorithms with billions of records? They tend to not scale well with big data.

Ashrith Barthur:

Yes. That’s actually a fantastic question. With reference to DDoS, we don’t necessarily use cluster, because for you to cluster, you need to have a significant amount of data to be available. So in those situations, what we essentially do is try and sample from different sources. There are different grouping mechanisms in which we can sample from different sources. We tend to sample from different sources and then see if the clustering is homogenous across these different groups, or if they’re not necessarily the same. And based off of that, we build it, but sampling is one of the best ways to do it. But try and sample as much as possible, because that gives you critical mass in terms of your model building.

Patrick:

Right. Can we use DDoS with premises hosted servers?

Ashrith Barthur:

Yes. You actually can use it locally because it’s merely a piece of code that you can start running locally. You don’t necessarily need any kind of an infrastructure. Of course you need a system to be able to run it. But the model itself is deployable on any kind of system, on most architectures that we know.

Patrick:

Okay. I’ve not yet downloaded or tested H2O Driverless AI. I have a question before using it. Do we get data for testing a platform in it? I’m thinking of image, video, audio, or data sources that can be used in a platform.

Ashrith Barthur:

For a lot of work that we do especially for these custom packages or custom models, you do not get the same data set, because of course, these are very specific to the problems that we are solving, and some of them are given to us by customers who understand that this could be showcased as an example. But, you do get some sample datasets that are publicly available. One of the reasons why we put them on Driverless AI is that so it’s convenient for you to pick it up and actually start modeling.

Patrick:

Great. Well that’s the end of the questions that we have so far. If you do have any more, we can hang out on the line for another couple of minutes.

Ashrith Barthur:

I’ll just quickly go through what we use for our work. Some of the things that we went for in terms of building the features, is that we have an open source github repo about how to write the recipe or the transformation that we talked about. This, of course, will be there on github. It will be there on the H2O github as well.

In identifying these different kinds of behaviors, there is a 10-year old-paper that we have that we had written back in Purdue. Of course, you can just look at the name of it on the slide and you can look at it online.

One of the beautiful sources for testing out these models that we build was actually a generated dataset that was produced by the Canadian information security group based out of Toronto. That’s a fantastic data source. I can add the link before I publish the slides as well. I’m happy to do that.

Patrick:

Okay. So we have a couple of more questions. After detection of DDoS, how is H2O Driverless AI going to neutralize the attack?

Ashrith Barthur:

Oh, that’s actually a fantastic question. So, H2O Driverless AI does not neutralize the attack. What we do provide to you with the intelligence to detect the attack. All H2O provides is a feedback. It will provide you with whether it’s a yes or no, or if it’s a binary problem or if it’s a multiclass problem. Based off that yes or a no, you’ll actually have to tie it back into some systems. Some system that makes a decision. Or you could add it back to your IP table or your firewall in the endpoint that helps you identify whether this behavior is valuable or not.

Patrick:

Excellent. The only other question we have is, “Is H2O available for image and audio as predictions?”

Ashrith Barthur:

Is H2O available for image and audio as predictions? I’m not totally sure about the question, but maybe you’re looking for packages that help you identify images and audio. Yes, there are transformations that are image- and audio-specific. You can probably try the 21-day license and let us know if that helps your cause. Thank you.

Patrick:

That does it for all the questions we have. I want to thank Ashrith for taking the time today and giving a great presentation. I’d like to say thank you to everyone who joined us today as well. The presentation slides and recording will be made available after the presentation is over on our BrightTALK channel. Have a great rest of your day.

Generative AI

Predictive AI

On-Premise Platform

Managed Cloud

Hybrid Cloud

Industry Solutions

Use Cases

H2O.ai Hospital Occupancy Simulator

Strategic Transformation

View All Case Studies

FINANCIAL SERVICES

TELECOM

ENERGY

MARKETING

Partners

Resources

Open Source

Join H2O University

Support

Events

H2O.ai Wiki

Responsible AI

Company

Submit AI 100 2025 Nomination

2025 Gartner® Magic Quadrant™

H2O AI 100 2024

ON-DEMAND WEBINAR

Using H2O Driverless AI for Cybersecurity

Read the Full Transcript

Why H2O.ai

Products

Resources

Insights