This is the first in a series of posts that will provide greater transparency about how we make our ads safer by detecting and removing scam ads. -Ed.
A few weeks ago, we posted here about our efforts in fighting bad ads, and we shared a video with the basics of how we do it. Today I wanted to delve a little deeper and give some insight into the systems we use to help prevent bad ads from showing. Our ads policies are designed with safety and trust in mind—we don’t allow ads for malicious downloads, counterfeit goods, or ads with unclear billing practices, to name a few examples. In order to help prevent these kinds of ads from showing, we use a combination of automated systems and human input to review the billions of ads submitted to Google each year. I’m one of many engineers whose job is to help make sure that Google doesn’t show bad ads to users.
We’ve designed our approach based on a three-pronged strategy, each focused on a different dimension of the problem: ads, sites, and advertiser accounts. These systems are complementary, sharing signals among each other so that we can comprehensively attack bad ads.
For example, in the case of a site that is selling counterfeit goods, this three-pronged approach aims to look for patterns that would flag such a site and help prevent ads from showing. Ad review notices patterns in the ads and keywords selected by the advertiser. Site review analyzes the entire site to determine if it is selling counterfeit goods. Account review aims to determine if a new advertiser is truly new, or is simply a repeat offender trying to abuse Google’s advertising system. Here’s more detail on how we review each of these three components.
An ad is the snippet of information presented to a user, along with a link to a specific webpage, or landing page. The ads review system inspects individual ads and landing pages, and is probably the system most familiar to advertisers. When an advertiser submits an ad, our system immediately performs a preliminary examination. If there’s nothing in the ad that flags a need for further review, we tell the advertiser the ad is “Eligible” and show the ad only on google.com to users who have SafeSearch turned off. If the ad is flagged for further review, in most cases we refer to the ad as “Under Review” and don’t show the ad at all. From there, the ad enters our automated pipeline, where we employ machine learning models, a rules engine and landing page analysis to perform a more extensive examination. If our automated system determines an outcome with a high degree of confidence, we will either approve the ad to run on Google and all of our partners (“Approved”), approve the ad to show for appropriate users in specific locations (“Approved - Limited”) or reject the ad (“Disapproved”). If our automated system isn’t able to determine the outcome, we send the ad to a real person to make a final decision.
A site has many different pages, each of which could be pointed to by different ads, often known as a domain. Our site review system identifies policy issues which apply to the whole site. It aggregates sites across all ads from all advertisers and regularly crawls them, building a repository of information that’s constantly improving as new scams and new sites are examined. We store the content of advertised sites and use both machine learning models and a rules engine to analyze the sites. The magic of the site review system is it understands the structure of language on webpages in order to classify the content of sites. Site review will determine whether or not an entire site should be disabled, which would prevent any ads leading to that site showing from any account. When the automated system isn’t able to determine the outcome with a high degree of confidence, we send it to a real person to make a decision. When a site is disabled, we tell the advertiser that it’s in violation of “Site Policy.”
An account is one particular advertiser’s collection of ads, plus the advertiser’s selections for targeting and bidding on those ads. An account may have many ads which may point to several different sites, for example. The account review system constantly evaluates individual advertiser accounts to determine if the whole account should be inspected and shut down for policy violations. This system “listens” to a variety of signals, such as ads and keywords submitted by the advertiser, budget changes, the advertiser’s address and phone number, the advertiser’s IP address, disabled sites connected to this account, and disapproved ads. The system constantly re-evaluates all accounts, incorporating new data. For example, if an advertiser logs in from a new IP address, the account is re-evaluated to determine if that new signal suggests we should take a closer look at the content of the advertiser’s account. If the account review system determines that there is something suspect about a particular account with a high degree of confidence, it automatically suspends the account. If the system isn’t sure, it stops the account from showing any ads at all and asks a real person to decide if the account should be suspended.
Even with all these systems and people working to stop bad ads, there still can be times when an ad slips through that we don’t want. There are many malicious players who are very persistent—they seek to abuse Google’s advertising system in order to take advantage of our users. When we shut down a thousand accounts, they create two thousand more using different patterns. It’s a never-ending game of cat and mouse.
We’ve put a great deal of effort and expense into building these systems because Google’s long-term success is based on the trust of people who use our products. I’ve focused my time and energy in this area for many years. I find it inspiring to fight the good fight, to focus on the user, and do everything we can to help prevent bad ads from running. I’ll continue to post here from time to time with additional thoughts and greater information about how we make ads safer by detecting and removing scam ads.
Posted by David W. Baker, Director of Engineering, Advertising