The popularity of YouTube, the world’s largest video platform, continues to soar and with that comes an increased appetite for online video content. This comes at a time when (US) online video viewers are expected to hit 236 million by 2020, up from 213.2 million in 2016. For advertisers, the situation is pretty straightforward: spend more advertising dollars where there are more eyeballs. However, achieving precise content targeting at scale remains a big hurdle.
Zefr harnesses the power of machine learning – with a touch of human curation – to wade through YouTube content and classify it at the individual video level. Why is this important? By doing so, we’re able to provide brands and advertisers with a content targeting solution that is relevant (at the campaign level), brand safe, and highly scalable. Only machine learning can do this – effectively – at the truly massive scale of YouTube. (We’re talking billions of videos!)
Let’s start with the basics: what is machine learning? Machine learning is a computer’s ability to learn and improve from experience without being explicitly programmed to do so. It is another way of talking about artificial intelligence, which was specifically designed, in its early days, to improve the overall speed and efficiency of data-heavy processes. Today, this allows us to tackle otherwise costly and time-consuming tasks at a manageable scale.
As you would expect, machine learning isn’t a one-size-fits-all concept. There are actually three primary types of machine learning: 1) unsupervised, 2) supervised, and 3) semi-supervised. All three types require data input – either “labeled” (manually done by humans) or “unlabeled.”
- Unsupervised Machine Learning
How an “intelligent” system learns through unlabeled data inputs. Because this isn’t human-curated learning, there is no easy way to determine learning accuracy through the output (as it’s hard to identify the relationship between input and output). If the output of unsupervised machine learning eventually delivers against a pre-stated hypothesis, you know that “learning” is taking place and delivering the desired end result(s).
- Supervised Machine Learning
How an “intelligent” system learns through a combination of labeled input and output examples. Because the inputs and outputs have been guided by humans upfront, this is the easiest way to ensure learning objectives and, once achieved, begin applying that learning to new functions.
- Semi-supervised Machine Learning
As you may have guessed, this is a hybrid of the two learning models above. This process is guided primarily by unlabeled data, but influenced by a small amount of labeled data to guide and expedite the learning process to achieve the desired outcomes. Combining the two processes have not only been found to be more effective overall, but more cost-effective as well because it requires less time and attention from a human to curate that learning process.
How it Works
The team at Zefr primarily relies on supervised and semi-supervised machine learning models to make the magic happen. According to Zefr Principal Data Scientist, Ryan Deak:
“We have millions of videos to categorize in some way. If we had an infinite amount of time, we (humans) could go through them one-by-one and say, for example, ‘this video has X property, so we want to group it with other videos that also have X property.’”
Unfortunately, we don’t have an infinite amount of time nor would anyone in their right mind want to dedicate that much budget to tackle a process like this manually. Doing so is simply not scalable in any way. That’s precisely where the supervised machine learning comes into play.
“We can go through the ‘cherry-picked’ process for a select number of cases to label videos based on the properties we find. At that point, however, we pass those labeled videos over to the machine learning algorithm, and it analyzes that input to predict which unlabeled videos should be associated with a particular category. The ultimate goal of machine learning, therefore, is to be able to generalize what it’s learned, based on the labeled data provided, and accurately apply that learning to data it has never analyzed before. This is how Zefr scales a massive amount of online video content to deliver a precise content targeting solution to brands and advertisers.”
One of our key differentiators are the proprietary tools we use to streamline this machine learning process. We use a framework called Aloha, which makes it easy to extract data (from content) and integrate it into our learning and prediction pipelines. This is huge! All too often, as much as 75 percent of a data engineer’s time is spent on manual tasks – like extracting and cleaning data – instead of focusing on more complex, problem-solving-oriented tasks.
“In the past 5 or 10 years, there has been an explosion of higher quality open source machine learning libraries, like Spark, Vowpal Wabbit, H2O.ai, and MXNet to support our deep learning needs. The fact that we operate on a single framework (Aloha) to integrate all these libraries means both our data scientists and data engineers can do their jobs much more efficiently without being bogged down by the typical troubleshooting that comes with model deployment.”
*More information about these libraries is outlined in “Hidden Technical Debt in Machine Learning Systems” and in our own SysML 2018 conference paper, focused specifically on Aloha.”
The massive volume of videos on YouTube means that machine learning is table stakes when it comes to developing and deploying a precise content targeting solution. It is the only efficient way to navigate – and make sense of – the totality of the YouTube online video ecosystem. As Deak points out:
“It’s not like TV where you have maybe 1,000 TV shows at any given time in a year vs. a billion videos on YouTube. It’s both a problem and an opportunity for us as well as for advertisers. The only hitch: it’s an opportunity that’s about a million times the size of TV.”
The truth is, without some human review mixed into this process, it’s difficult to accurately assess the relevant alignment of a particular piece of video content to a particular brand message. That’s all about brand safety and, in spite of how advanced machine learning has become, we’re not ready to let artificially-intelligent algorithms take complete control of the driver’s seat here. There’s a certain ethos involved with human review and curation that rounds out this process – and all with the goal of driving the most value to brands and advertisers at the end of the day. Eliminating human review from the machine learning process raises the chances of “unsafe” content slipping through the cracks or, more generally speaking, content misalignment with brands. At this point in time, that’s not a gamble we’re willing to take.
Our machine learning process starts and ends with human review. We feed our algorithms useful data and then assess and evaluate the output from those algorithms. And it’s thanks in big part to this human-machine collaboration that we’ve been able to create highly accurate models that can blacklist unsafe content, whitelist safe content and deliver more precise content targeting.
The Zefr Touch
The team at Zefr is evolving alongside machine learning. We’re creating new ways of making machine learning more accurate, effective, and efficient vis-à-vis YouTube while simultaneously taking those learnings to make our (human) team more effective as well. The foundation of our system is based on modeling and scoring. This means that our algorithms are built specifically to provide high quality machine learning models that improve as they learn over time. Adding the human layer to this dynamic is our “secret sauce.” We know the ins and outs of the YouTube ecosystem better than anyone else (except maybe for YouTube). For this reason, Zefr is uniquely positioned to provide brands and advertisers with a precisely targeted opportunity to align their promotional messages with contextually relevant online video content – all at scale!