Inside Google’s Ad Display Network Black Box: Porn, Piracy, Fraud — ProPublica

2022-12-23 20:32:36 By : Ms. Emma Cheng

In late 2021, the right-wing site Conservative Beaver published a story falsely claiming the FBI had arrested Pfizer’s CEO for fraud.

It wasn’t Conservative Beaver’s first brush with fabricated news. The site had falsely claimed Barack Obama was arrested for espionage, Pope Francis was arrested for possession of child pornography and “human trafficking,” and the Pfizer CEO’s wife died after being compelled to take a COVID-19 vaccine. As Conservative Beaver pumped out these and other lies, Google placed ads on the site and split the revenue with its then-anonymous owner. Stash Box Secret

Inside Google’s Ad Display Network Black Box: Porn, Piracy, Fraud — ProPublica

Subscribe to the Big Story newsletter.

Thanks for signing up. If you like our stories, mind sharing this with a friend?

For more ways to keep up, be sure to check out the rest of our newsletters.

Fact-based, independent journalism is needed now more than ever.

Its owner was eventually identified as a Canadian man, Mark Slapinski, after Pfizer threatened to sue him for defamation, and Google removed ads from the site in November of last year due to public pressure. Soon, Conservative Beaver went offline.

But today, roughly a year later, Slapinski is still making money from Google ads.

He runs the conservative political site Toronto 99 and uses the same Google publisher account he had for Conservative Beaver to collect ad revenue. Google simply allowed Slapinski to start a new site and keep earning money. It’s the equivalent of taking away an unsafe driver’s car instead of their license.

In the nearly half-trillion-dollar digital ad industry, Google sets the rules of the road. More than any other company, Google determines the online ads we see, what they cost and who gets paid for them. It runs the biggest search ad business and provides the industry’s leading tools for buying, selling and displaying ads.

And if you have a website and want to earn money from digital ads, you can join the Display Network, where Google places ads on what it has publicly said are more than 2 million websites and an untold number of mobile apps. It’s the modern equivalent of a national network of billboards on nearly every highway being controlled by a single company — and reportedly generated $31 billion in revenue for Google last year.

But if you’re Slapinski, Google’s Display Network has another benefit besides its market share: its secrecy. Google is the only major ad platform that hides the vast majority of its ad-selling partners. This means Google does not disclose all the websites and apps where it places ads or the people and companies behind them. The company conceals this information even after helping establish and publicly supporting an industry transparency standard for disclosing such sellers, which its competitors have largely adopted.

In response to questions, Slapinski denied running Conservative Beaver. “That’s fake news!” he wrote in a Facebook message, despite the large body of evidence he was behind the site. He acknowledged operating Toronto 99, but declined to explain why that site uses the same Google publisher account as Conservative Beaver. He did not respond to questions about Google ads and said he does not publish disinformation.

“I don't publish fake news,” he said. “I follow strict editorial standards.”

Google’s embrace of publisher confidentiality means roughly 1 million publishers can remain anonymous to companies and individuals who buy ads on its network to reach customers. This opens the door to a range of abuses and schemes that steal potentially billions of dollars a year and put lives and livelihoods at risk due to dangerous disinformation, fraud and scams.

Google’s ad business helps fund dangerous disinformation that puts public health and democracy at risk around the world, earns money from millions of gun ads while publicly claiming to block them, and allowed a sanctioned Russian ad tech company to harvest data on potentially millions of people, including possibly those in Ukraine, putting their security and privacy at risk.

It all makes the Display Network one of the world’s most lucrative black boxes. Ads are placed where they shouldn’t be. Money flows to someone other than the intended website or app owner. Publishers of banned sites can easily keep collecting ads and revenue from unsuspecting brands. But because of Google’s allegedly monopolistic dominance of the digital ad industry, companies ranging from mom and pop shops to the biggest brands in the world keep shoveling money into it, hoping for the best.

Google spokesperson Michael Aciman said the company uses a combination of human oversight, automation and self-serve tools to protect ad buyers and said publisher confidentiality is not associated with abuse or low quality.

“We want to see more publishers embrace greater transparency, and we conduct regular outreach to our partners to explain the benefits of opting out of confidentiality,” he said. “We do see a lag in consent among small-scale publishers, which may be because they are unaware of this option, or because their account includes personal information and they have legitimate privacy concerns.”

Aciman said the vast majority of ad revenue from Google’s systems goes to publishers who do not keep their information confidential.

ProPublica spent months trying to crack open Google’s black box ad business. We wrote thousands of lines of code to scan more than 7 million website domains looking for Google ad activity, sourced and analyzed data on millions more domains from half a dozen data partners, and spoke to some of the most knowledgeable experts about Google’s display ad business.

In the end, we matched 70% of the accounts in Google’s ad sellers list to one or more domains or apps, more than any dataset ProPublica is aware of. But we couldn’t find all of Google’s publisher partners. What we did find was a system so large, secretive and bafflingly complex that it proved impossible to uncover everyone Google works with and where it’s sending advertisers’ money.

Alongside reputable publishers and popular games and online tools, we uncovered scores of previously unreported peddlers of pirated content, porn and fake audiences that take advantage of Google’s lax oversight to rake in revenue.

In one example, a Bulgarian company helped scores of piracy sites with close to 1 billion monthly visitors earn money from Google ads. Most alarming, Google knew from its own data that these sites were engaging in mass copyright theft, yet it allowed the sites to receive ads and money from major brands such as Nike and HSBC Bank right up until we contacted Google.

As for what else lurks in the black box, only Google knows.

Each time someone visits Toronto 99, the site sends digital requests to Google asking it to place ads on the page. Each of those requests contains this series of numbers and letters: pub-5958167306013620.

It’s a unique ID that identifies Slapinksi’s Google publisher account, much like how your Social Security number identifies you to the government. Google issued Slapinski the account ID when it accepted him as a publisher in the Google Display Network, greenlighting sites he launched to receive ads. The same ID was used by Conservative Beaver.

Google has issued millions of account IDs in the more than 200 countries where its Display Network is active. Anyone operating a website or app in those countries can apply to join.

Once a publisher has an ID, they can add it to new sites and apps that they operate, as Slapinski apparently did. Google also allows publishers to register for more than one ID. The result is an ad network with millions of constantly shifting publishers, sites, apps and IDs.

To help ad buyers navigate this murky ecosystem, ad networks are supposed to disclose a list of the publisher accounts they work with. For Google, this list — which is called a sellers.json file or sellers list — should contain all the websites and apps Google has authorized to earn money in its Display Network, from big publishers like The New York Times to small bloggers. When done correctly, the list should allow advertisers to match Slapinski and the ID pub-5958167306013620 to Toronto 99 and block the site if they wish.

Google itself helped create this concept three years ago and publicly champions it and related standards, saying they “provide advertisers with a greater visibility into the overall supply chain, which can help them inform future buying decisions.”

But among the roughly 1.3 million IDs in Google’s sellers list, over 75% are marked “confidential” and contain only the ID, including Slapinki’s. It’s the default setting in Google’s system. ProPublica’s Google ID was also marked confidential but is being changed to disclose the organization name and affiliated domains.

As of this fall, only 23% of Google’s records listed a person or company name, and just 11% also included the domain of their organization. Google’s competitors almost always publicly list all account IDs alongside such information as the name of a person or company connected to it and the associated domain or domains.

Google Is Less Transparent Than Its Competitors

Google’s list of the websites and apps it provides ads to has far more confidential and partially confidential entries than its competitors, meaning it hides either the name or the domain associated with the account, or both.

On their own, a list of these IDs provides no useful information — it’s like wiping the names from your phone’s contact list, leaving just the numbers.

The upshot is that the largest ad network in the world won’t reveal the identities of the vast majority of its publisher partners. The risks go beyond a lone disinformation peddler like Slapinski. Legislators, including Sen. Mark Warner, chair of the Senate Intelligence Committee, have warned that the opaque and fraud-ridden digital ad ecosystem led by Google poses a national security risk. Each layer of confidentiality further obscures where money and consumer data flows in the digital ad industry, undermining trust and exacerbating risks.

“The lack of transparency and regulation in the digital advertising space is an issue that I have been concerned about for many years,” Warner said in a statement to ProPublica. “Unfortunately, the industry hasn’t improved its practices since I first raised concerns back in 2017, as advertisers consistently appear to lack meaningful control over the types of content that is seen alongside their ads and are oftentimes completely unaware of where their advertisements are being displayed.”

Last year, Warner and a bipartisan group of senators expressed alarm that Google and other companies share data about Americans with undisclosed foreign partners as part of the ad buying and selling process, and that billions of dollars flow through Google to unknown parties around the world.

After the U.S. sanctioned several Russian websites following the invasion of Ukraine, ad tech researcher Krzysztof Franaszek showed that two months later, Google continued to allow many of them to earn money from ads. He also revealed the company placed ads on other sanctioned Russian, Iranian and Syrian sites for years. Critically, nearly 90% of the sanctioned sites earning money from Google ads contained no identifying information in Google’s master ad sellers list, according to Franaszek. Like Slapinski, their accounts were confidential, listing nothing more than a Google account ID.

Aciman said Google works to comply with all relevant sanctions and emphasized that publisher confidentiality should not be seen as nefarious.

“By no means does confidentiality indicate that a publisher is engaging in fraud or other nefarious activity,” he said. “The vast majority of our publishers, including those who are listed as confidential in their sellers.json, are well intentioned, policy compliant, and contribute to the overall vibrancy of our network.”

But industry experts and critics say there’s no way to prove that without Google meeting the same standard as its competitors.

“Google has manufactured a uniquely explosive situation: sending billions of ad dollars everyday to unknown individuals around the world. It is effectively one of the largest dark money transfers in the world — and it’s funded by all our ad campaigns,” wrote Nandini Jammi and Claire Atkin of the Check My Ads Institute, an ad industry watchdog, in a recent article.

They called upon Google to release a full deanonymized sellers file.

Google’s actions thus far suggest major changes are unlikely to happen quickly. The company waited a year after other ad networks began publishing their sellers files to release its own, overwhelmingly anonymous version in 2020. Following pushback, the company offered excuses, including having to update help center documentation, conduct training and contact all the account owners. The company also said there could be privacy and security risks to requiring all of its publisher partners to disclose the individual or company associated with an ID. It said things would improve.

Two years later, Google has increased the total number of fully public entries in its sellers file from 5% to 11% — still by far the worst in the industry. Google’s file also carries a notice not seen in its competitors’: “This file is a beta and is unverified.”

Google declined to comment on the notice. Aciman said publisher transparency is a “critical” part of the ad ecosystem, and pointed to a Google Help Center article that encourages publishers to make their information transparent.

“Google has a unique publisher base and we want to ensure we’re balancing both industry transparency and publisher confidentiality and choice,” he said.

But as of today, new publishers signing up with Google’s ad network are still confidential by default.

Over 380,000 of Google’s Partners Remain a Mystery

After months of data collection and analysis, 70% of the account IDs in Google’s sellers file were matched to one or more websites or apps (11% of these were accounts that Google provided public information on). But 30% of these accounts weren’t declared by Google or in our or our partners’ data, leaving us and Google’s advertising partners in the dark about where their money might be getting spent.

So we attempted to do what Google would not: connect the company’s list of more than 1 million account IDs to the actual sites and apps where ads appear. We were able to match almost 900,000, or 70%, of the accounts in Google’s file to one or more domains or apps and found over 5 million sites that are or were associated with Google publisher accounts. But over 380,000 account IDs remain ghosts, perhaps never used by the entity that registered them or used in a way our data couldn’t capture, perhaps active on a mobile app or site outside of the roughly 300 million available to us in our data and that of our partners.

Some accounts were associated with hundreds of sites, some moved from site to site like a game of whack-a-mole, some were seen on sites before or after being publicly listed in Google’s sellers file. And thousands of accounts are added and removed to the file every week, rendering a given week’s list of publishing partners almost immediately obsolete. This effectively prevents ad buyers from having a basic understanding of the sites and apps where their ads could appear, and who they fund as a result.

Google’s reasons for not disclosing its publisher partners are “rubbish,” according to Ruben Schreurs, the chief product officer of Ebiquity, a media research company that has worked with such brands as L’Oréal, Sony, Nestlé, and Audi. He said it’s in Google’s business interest to keep ad buyers in the dark, because the Display Network is filled with sites and apps most advertisers would not want to do business with.

“They have so many obviously nefarious or even sanctioned partners that use Google’s technology,” Schreurs said.

Our effort to deanonymize Google’s vast network of publishers revealed a bewildering array of sites and apps. There are news and sports sites in many languages, food blogs, utility sites such as spell-checkers and percentage calculators, and gaming sites. There are sources of disinformation, such as OANN and many others around the world, and the fetish site WikiFeet, which features photos of women’s feet, often without their permission.

In spite of a policy banning sexually explicit content, we found Google placing ads on adult sites like Sexlexikon.net, iSexyChat and Female Prison Pals. On the last of these, Google showed ads to us when we visited pages with photos of female inmates in the United States accompanied by their responses to a questionnaire with prompts such as their favorite sexual position and the age at which they lost their virginity.

Since Google doesn’t release a list of the sites and apps where it places ads, ad buyers ranging from major brands like Nike to small local businesses can’t exclude all of the unsuitable publishers in Google’s network. They can preemptively block problematic sites and apps they know about, but then they must await reports from Google about where their ads were placed.

Even then, Google keeps customers partially in the dark. In most campaigns, the company conceals a percentage of ad placements. This means Google does not reveal all the sites and apps that received the ads and associated revenue. Call it the black hole in Google’s black box.

In an example revealed by watchdog group Check My Ads in May, 10% of all the ads in a million-dollar campaign run via Google were listed as “anonymous” in the report generated for the advertiser. Roughly $100,000 worth of ads were placed on sites and apps, but Google wouldn’t say which ones. (The campaign data was shared with Check My Ads on the condition it not name the brand that ran the ads.)

Schreurs analyzed $1 billion worth of ads placed for his company’s clients and found that 3.6%, or $36 million worth, went to unknown websites and apps. Google isn’t the only company that conceals a percentage of advertiser placements and spending. But the company combines the practice with other methods of obfuscation, like its largely anonymized sellers file, that thwart transparency and accountability.

Google also doesn’t allow ad buyers to block by account ID. Even if buyers know that pub-5958167306013620 is the publisher account for the owner of Conservative Beaver, they can’t direct Google to block their ads from appearing on sites or apps using that ID.

Aciman said the company is currently beta-testing a tool that allows ad buyers to block by seller ID.

“This would enable buyers to block confidential sellers by adding those sellers to their blocklist,” he said. “The tool is expected to launch for general availability in 2023. This would go beyond our existing tools that provide advertisers with robust controls that lets them decide where their ads appear.”

Schreurs said Google has a financial interest in concealing which sites and apps it works with. The company earns money by taking a cut of each ad placement — the higher the volume, the more Google makes. To maintain that volume, the company needs to work with low-quality and risky publishers, he said.

“We all know that most of Google’s inventory is crap,” he said.

Aciman disputed the quality concerns and said that most of the money flowing through Google’s ad system does not go to confidential publishers. In late 2020, a Google executive said more than 90% of revenue goes to the small percentage of partners that are publicly identified in its sellers file. Aciman said the percentage is even higher now.

If that’s true, it begs the question of why Google risks working with so many sites and apps. But the concerns about Google’s ad network go beyond the hidden identities of its publishers and sites.

Last year, a marketer working for a Fortune 500 company launched a multimillion-dollar ad campaign.

The goal was to reach business owners in the U.S. by placing digital ads on websites and apps in Google’s Display Network. Using Google’s DV360 ad buying tool, the marketer entered details about their desired audience, uploaded a list of risky or otherwise inappropriate sites and apps to block from receiving ads and launched the campaign. The marketer said they were not authorized to share campaign data publicly, and did so on the condition that their name and that of the Fortune 500 company not be disclosed.

Over the next few months, Google placed more than 1.3 trillion of the company’s ads on over 150,000 different websites and apps. The biggest recipient of ads — more than 49 million — was a website called PapayAds. The company was registered in Bulgaria less than two years ago and lists one employee, CEO Andrea De Donatis, on LinkedIn. Its site is a single page that says it helps publishers increase their ad revenue. PapayAds has just one ad slot on its page, which is presented as a demo for prospective clients to see what banner ads look like. One of its customer testimonials comes from someone using a pseudonym.

That’s not the only time De Donatis used fake or misleading names. PapayAds is among the small percentage of Google partners that list both the name or names of people associated with the company and its domain in Google's sellers file. At least two of PapayAds’ sellers accounts list the name of De Donatis. But the rest are registered to his girlfriend, his brother and a set of dubious names that Google and De Donatis confirmed are also not associated with the company. One account is in the name of Luca Brasi, the famed character in the first Godfather film.

It seems impossible that 49 million ads were legitimately placed and viewed on PapayAds’ site over the span of several months. In an interview with ProPublica, even De Donatis expressed skepticism. “I don’t have an explanation for this,” he said, adding that he does not recall receiving payment for such a large volume of ads.

Google declined to comment on the campaign, rendering the 50 million ads it charged a Fortune 500 company for one of many mysteries of its black box.

But the story of Google’s relationship with PapayAds goes deeper. It also includes a possibly related scheme involving online piracy, fraudulent advertising and fake online traffic. And even after discovering at least part of the operation, Google didn’t take steps to remove PapayAds or the many piracy sites it works with from the Display Network.

Here’s how the scheme worked. First, PapayAds signed up website publishers to help them earn money from ads. At least 679 websites list PapayAds as their Google Ads partner, based on our findings and data from Well-Known, a site that tracks advertising systems. This means these sites publicly declare that they use PapayAds account IDs to help receive ads and money from Google.

Nearly all of the of PapayAds client sites we examined specialize in publishing pirated versions of Japanese comics, known as manga, or Korean comics, known as manhwa. Others feature pirated Japanese animated films and shows, or pornographic manga known as hentai. Google and other ad networks ban ads from appearing on copyright infringing content. Google also bans ads from appearing on pages containing hentai.

This past summer, PapayAds used code that misled Google and ad buyers into thinking Google ads were being placed on PapayAds’ site when they in fact appeared on manga piracy sites, according to Pixalate, a digital ad fraud protection and privacy compliance company that examined PapayAds at our request.

De Donatis described this as a “test” he attempted with some manga sites, and said his company did not realize it broke Google’s rules. PapayAds is merely providing a service to clients approved by Google, he said.

“I’m just providing some IT technology,” De Donatis said. “I don’t think I did anything bad.” (His first language is Italian, but he spoke English during two phone interviews.)

Pixalate also found the operation included an element of deception to maximize profit: bots. It found that some of the web traffic on PapayAds and its manga piracy partners was automated. Bots artificially inflate the number of ads viewed on a website, thereby increasing revenue.

“I can tell you that we never used bot traffic or fake traffic,” De Donatis said.

Pixalate’s findings did not attribute the automated traffic to a particular entity. It’s possible the bot activity was connected to PapayAds’ clients or another entity.

Google detected the improper activity over the summer and withheld the associated ad revenue earned by PapayAds clients from their August and September payments, according to De Donatis. According to Google policy, that money should have been refunded to advertisers.

De Donatis didn’t say how much was withheld, but described it as a large amount relative to his and his partners’ typical earnings. (He claimed on his LinkedIn profile that PapayAds generates $400,000 in revenue per month, but removed that information after speaking with ProPublica.)

Google declined to comment on the withheld revenue and overall scheme. Speaking generally, Aciman said the company is “engaged in a comprehensive effort to detect and stop invalid traffic, which is powered by a combination of technology, operations teams, and policy.”

But what did Google do after detecting what by industry definition is an ad fraud scheme involving a set of manga piracy sites filled with stolen content? It kept placing ads on them, and kept working with PapayAds up until being contacted by ProPublica.

This occurred in spite of the fact that Google has at least two years of data showing that many manga sites working with PapayAds are serial copyright infringers.

We selected a sample of 50 manga sites from the list of more than 650 sites that publicly said they work with PapayAds to receive Google ads. Data from Google’s transparency report shows that since 2020 Google has removed 1.9 million of these manga sites’ URLs from search results due to copyright infringing content. Yet 34 of the 50 sites appeared in the Fortune 500 company ad buy under their own domains, and the full list of 50 continued to receive Google ads until very recently.

Google could see in its own data that these sites were engaging in mass piracy, and that they were working with PapayAds to receive ads and revenue. But it did not take action to kick them, or PapayAds, out of its ad system.

The 50 sites in our sample collectively received close to 750 million visits in September, according to analytics company Similarweb, and were able to make money from that traffic thanks in part to Google. We were shown ads placed by Google for major brands including Nike, Sephora and HSBC Bank when visiting manga piracy sites. The brands did not respond to requests for comment.

Jalal Nasir, the CEO of Pixalate, expressed concern that Google is directly placing ads on such obvious piracy sites.

“I’m a little surprised that Google with their big team is not able to detect this stuff happening,” he said.

Nasir also said it’s a huge red flag that PapayAds does not have a privacy policy, a requirement for any Google partner and a necessity for compliance with data protection laws. “Do they have proper due diligence in place?” he said of Google.

After speaking with ProPublica, De Donatis added a privacy policy to his site. He said he’s not responsible for the content of the sites that use his platform, and noted that nearly all of the manga sites were approved by Google to receive ads before signing on with him.

“Like 90% of them already have Google ads when they come to us,” he said.

Google also failed to take action against PapayAds and the raft of manga sites it works with after being warned about them almost two months ago. Rocky Moss, the co-founder of fraud detection company DeepSee.io, identified PapayAds as a major player helping piracy sites earn money. On Oct. 25, he emailed his contact at Google to draw their attention to the company.

“Just wanted to flag a particularly egregious pirate traffic seller,” he wrote. Moss attached an image of a concerning ad he’d seen placed on Reaper Scans, a manga piracy site working with PapayAds for which Google has received and acted on thousands of copyright infringement reports.

The advertiser in question? Google.

Moss said the tech giant’s inaction is disappointing but not surprising.

“There are good people working at Google who want to do the right thing. They just can’t get the approval to solve the problem,” he said.

After we contacted Google with our findings, the company removed all of PapayAds’ seller accounts.

“We are in the process of reviewing the specific sites shared with us by ProPublica and have already removed ads from several and have terminated the accounts associated with PapayAds,” Aciman said. “We will continue to take action as we detect any additional policy violating content.”

Nasir and Moss expressed dismay that Google failed to stop PapayAds and the piracy sites sooner. They said there are likely an untold number of companies like PapayAds operating in the Display Network.

“It’s probably a drop in the ocean of what’s happening out there,” Nasir said.

This story you’ve just finished was funded by our readers. We hope it inspires you to make a gift to ProPublica so that we can publish more investigations like this one that hold people in power to account and produce real change.

ProPublica is a nonprofit newsroom that produces nonpartisan, evidence-based journalism to expose injustice, corruption and wrongdoing. We were founded over 10 years ago to fill a growing hole in journalism: Newsrooms were (and still are) shrinking, and legacy funding models are failing. Deep-dive reporting like ours is slow and expensive, and investigative journalism is a luxury in many newsrooms today — but it remains as critical as ever to democracy and our civic life. More than a decade (and six Pulitzer Prizes) later, ProPublica has built one of the largest investigative newsrooms in the country. Our work has spurred reform through legislation, at the voting booth and inside our nation’s most important institutions.

Your donation today will help us ensure that we can continue this critical work. From the climate crisis, to racial justice, to wealth inequality and much more, we are busier than ever covering stories you won’t see anywhere else. Make your gift of any amount today and join the tens of thousands of ProPublicans across the country, standing up for the power of independent journalism to produce real, lasting change. Thank you.

Craig Silverman is a national reporter for ProPublica covering voting, platforms, disinformation, and online manipulation.

Ruth Talbot is a News Applications Developer at ProPublica.

Creative Commons License (CC BY-NC-ND 3.0)

Inside Google’s Ad Display Network Black Box: Porn, Piracy, Fraud — ProPublica

Safe Box Thank you for your interest in republishing this story. You are free to republish it so long as you do the following: