Hands typing on a laptop keyboard
Google just changed the wording of its privacy policy, and it’s quite an eye-opening adjustment that has been applied to encompass the AI tech the firm is working with.
As TechSpot reports, there’s a section of the privacy policy where Google discusses how it collects information (about you) from publicly accessible sources, and clarifying that, there’s a note that reads: “For example, we may collect information that’s publicly available online or from other public sources to help train Google’s AI models and build products and features, like Google Translate, Bard and Cloud AI capabilities.”
Preivously, that paragraph read that the publicly available info would be used to train “language models” and only mentioned Google Translate.
So, this section has been expanded to make it clear that training is happening with AI models and Bard.
It’s a telling change, and basically points out that anything you post online publicly may be picked up and used by Google’s Bard AI.
Analysis: So what about privacy, plagiarism, and other concerns?
We already knew that Google’s Bard, and indeed Microsoft’s Bing AI for that matter, are essentially giant data hoovers, extracting and crunching online content from all over the web to refine conclusions on every topic under the sun that they might be questioned on.
This change to Google’s privacy policy makes it crystal clear that its AI is operating in this manner, and seeing it in cold, hard, text on the screen, may make some folks step back and question this a bit more.
After all, Google has had Bard out for a while now, so has been working in this manner for some time, and has only just decided to update its policy? That in itself seems pretty sly.
Don’t want stuff you’ve posted online where other people can see it to be used to train Google’s big AI machinery? Well, tough. If it’s out there, it’s fair game, and if you want to argue with Google, good luck with that. Despite the obvious concerns around not just basic privacy issues, but plagiarism (if an AI reply uses content written by others, picked up by Bard’s training) – where do any boundaries lie with the latter? Of course, it’d be impractical (or indeed impossible) to police that anyway.
Story continues
There are broader issues around accuracy and misinformation when data is scraped from the web in a major-scale fashion, too, of course.
On top of this, there are worries recently expressed by platforms like Reddit and Twitter, with Elon Musk apparently taking a stand against “scraping people’s public Twitter data to build AI models” with those frustrating limitations that have just been brought in (which could be big win for Zuckerberg and Threads, ultimately).
All of this is a huge minefield, really, but the big tech outfits making big strides with their LLM (large language model) data-scraping AIs are simply forging ahead, all eyes on their rivals and the race to establish themselves at the forefront, seemingly with barely a thought about how some of the practical side of this equation will play out.