November 19th, 2013

Machine Learning for Adtech

RSS icon RSS Category: Uncategorized [EN]
Fallback Featured Image

Characteristics of advertising data:

  • tens of thousands of columns or more (top 100k or 1 m sites)
  • high collinearity factors: eg demographics, with a strong correlation between eg income and education
  • collinearity: sports fans follow nfl + espn + bleacher report + fox sports; users of ravelry also shop etsy.  Those features are certainly not independent so naive bayes fails.
  • high cardinality factors: eg bluekai style registration data
  • thousands to tens of thousands of campaigns
  • billions of rows; the us pop is 330mm times mean devices (laptop + phone + work) times cookie churn
  • because cookies churn, you have partial row duplication
  • extremely sparse

In addition, adtech machine learning systems have a few other quirks.
A typical campaign has a $100 cost per action (cpa) target and ads cost between $.1 and $5 per thousand (a 10 cent to $5 cpm). Assuming your business needs a 50% margin, you can show between 10 thousand and 500 thousand ads per conversion. Therefore your business is sensitive to missing out on converters and insensitive to showing an ad to a non-converter. This type of system requires very high recall.
Attempting to model hundred million or billion row datasets generally is out of the capacity of anything but linear systems such as logistic regression.  Nonlinear systems such as gradient boosting are often known to produce better business results, are more capable of picking out the tiny fraction of converters, and are more robust to collinearity but can't be run on adtech sized datasets. In a linear only world, better model performance comes through careful feature engineering. Modeling speed / flexibility thus increases the capacity of your modeling scientists and engineers to experiment and allows you to move along the bias-variance tradeoff without using more complex models.
Further, even linear systems will often have to be trained with small batch or stochastic gradient descent, either because better minimization algorithms are too expensive or because it's too difficult to scale learning across multiple machines. This is undesirable because of additional tuning parameters that your results will often be sensitive to. And even with sgd, you will often be required to sample; creating good samples is a difficult problem in it's own right. A typical use case would subsample non-converters and take all the converters to create workable dataset sizes and to help alleviate class imbalance. This alone is hard to get right: one trap is if your positive / negative time periods don't strictly coincide, you can accidentally find a popular but transient event such as the death of a movie star showing up only in the positive or negative converters and becoming a strong but meaningless signal. Another common trap is without careful sampling of the negative converters the negative set will have much shorter histories than the converters; in this case, your model may well give a small positive weight to every feature and what you've really built is a system that detects users with lots of features present.
More next week where I'll discuss ways to decrease model creating time, increase modeling scientist throughput, and begin to look at running more complex models.

Leave a Reply

+
H2O Wave joins Hacktoberfest

It’s that time of the year again. A great initiative by DigitalOcean called Hacktoberfest that aims to bring

September 29, 2022 - by Martin Turoci
+
Three Keys to Ethical Artificial Intelligence in Your Organization

There’s certainly been no shortage of examples of AI gone bad over the past few

September 23, 2022 - by H2O.ai Team
+
Using GraphQL, HTTPX, and asyncio in H2O Wave

Today, I would like to cover the most basic use case for H2O Wave, which is

September 21, 2022 - by Martin Turoci
+
머신러닝 자동화 솔루션 H2O Driveless AI를 이용한 뇌에서의 성차 예측

Predicting Gender Differences in the Brain Using Machine Learning Automation Solution H2O Driverless AI 아동기 뇌인지

August 29, 2022 - by H2O.ai Team
+
Make with H2O.ai Recap: Validation Scheme Best Practices

Data Scientist and Kaggle Grandmaster, Dmitry Gordeev, presented at the Make with H2O.ai session on

August 23, 2022 - by Blair Averett
+
Integrating VSCode editor into H2O Wave

Let’s have a look at how to provide our users with a truly amazing experience

August 18, 2022 - by Martin Turoci

Start Your Free Trial