Google has announced a new control in its robots.txt indexing file that would let publishers decide whether their content will help improve Bard and Vertex AI generative APIs, including future generations of models that power those products.
The control is a crawler called Google-Extended, and publishers can add it to the file in their site’s documentation to tell Google not to use it for those two APIs. In its announcement, it is heard from web publishers that they want greater choice and control over how their content is used for emerging generative AI use cases.
As generative AI chatbots grow in prevalence and become more deeply integrated into search results, the way content is digested by things like Bard and Bing AI has been of concern to publishers.
While those systems may cite their sources, they do aggregate information that originates from different websites and present it to the users within the conversation. This might drastically reduce the amount of traffic going to individual outlets, which would then significantly impact things like ad revenue and entire business models.
Google said that when it comes to training AI models, the opt-outs will apply to the next generation of models for Bard and Vertex AI. Publishers looking to keep their content out of things like Search Generative Experience (SGE) should continue to use the Googlebot user agent and the NOINDEX meta tag in the robots.txt document to do so.