November 19th, 2013

Machine Learning for Adtech

RSS icon RSS Category: Uncategorized [EN]
Fallback Featured Image

Characteristics of advertising data:

  • tens of thousands of columns or more (top 100k or 1 m sites)
  • high collinearity factors: eg demographics, with a strong correlation between eg income and education
  • collinearity: sports fans follow nfl + espn + bleacher report + fox sports; users of ravelry also shop etsy.  Those features are certainly not independent so naive bayes fails.
  • high cardinality factors: eg bluekai style registration data
  • thousands to tens of thousands of campaigns
  • billions of rows; the us pop is 330mm times mean devices (laptop + phone + work) times cookie churn
  • because cookies churn, you have partial row duplication
  • extremely sparse

In addition, adtech machine learning systems have a few other quirks.
A typical campaign has a $100 cost per action (cpa) target and ads cost between $.1 and $5 per thousand (a 10 cent to $5 cpm). Assuming your business needs a 50% margin, you can show between 10 thousand and 500 thousand ads per conversion. Therefore your business is sensitive to missing out on converters and insensitive to showing an ad to a non-converter. This type of system requires very high recall.
Attempting to model hundred million or billion row datasets generally is out of the capacity of anything but linear systems such as logistic regression.  Nonlinear systems such as gradient boosting are often known to produce better business results, are more capable of picking out the tiny fraction of converters, and are more robust to collinearity but can’t be run on adtech sized datasets. In a linear only world, better model performance comes through careful feature engineering. Modeling speed / flexibility thus increases the capacity of your modeling scientists and engineers to experiment and allows you to move along the bias-variance tradeoff without using more complex models.
Further, even linear systems will often have to be trained with small batch or stochastic gradient descent, either because better minimization algorithms are too expensive or because it’s too difficult to scale learning across multiple machines. This is undesirable because of additional tuning parameters that your results will often be sensitive to. And even with sgd, you will often be required to sample; creating good samples is a difficult problem in it’s own right. A typical use case would subsample non-converters and take all the converters to create workable dataset sizes and to help alleviate class imbalance. This alone is hard to get right: one trap is if your positive / negative time periods don’t strictly coincide, you can accidentally find a popular but transient event such as the death of a movie star showing up only in the positive or negative converters and becoming a strong but meaningless signal. Another common trap is without careful sampling of the negative converters the negative set will have much shorter histories than the converters; in this case, your model may well give a small positive weight to every feature and what you’ve really built is a system that detects users with lots of features present.
More next week where I’ll discuss ways to decrease model creating time, increase modeling scientist throughput, and begin to look at running more complex models.

Leave a Reply

+
Recap of H2O World India 2023: Advancements in AI and Insights from Industry Leaders

On April 19th, the H2O World  made its debut in India, marking yet another milestone

May 29, 2023 - by Parul Pandey
+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More