26 May 2025

Controlling the impact of generative AI

A small cardboard robot sat at a desk using a laptop, mouse and calculator.

The world of artificial intelligence continues to move and change almost faster than anyone can keep up (just ask OpenAI’s Sam Altman if you don’t believe us). And so, while we only recently laid out some of the challenges facing those who want to apply the tools of generative AI to their content creation, we’re already seeing some important moves in the business world to address some of these issues.

While tools like ChatGPT can offer help in planning, structuring and drafting content to save time and resources, one of the key problems they face is the quality and sourcing of the data that makes all this content generation possible.

Generative AI tools like ChatGPT, Midjourney and DALL·E have been trained by scraping text and images from all over the internet. This includes a host of copyrighted materials. While this hasn’t precipitated any major lawsuits against users of AI-generated materials yet, it’s surely only a matter of time before that penny drops and someone is sued by creators who can prove their work has been misappropriated.

This indiscriminate harvesting of internet data also raises some serious questions about quality. AI generated text already has a reputation for sounding convincing but sometimes being utterly wrong. Quality control over the vast volumes of data used to train these platforms is essentially impossible – an issue that will hamstring many of these popular AI tools in the long term by degrading their output (especially as AI-generated content gets fed back to them).

With these concerns in mind, let’s explore some of the more recent developments that we expect to shape the future of generative AI.

Publishers are blocking AI web crawlers

High profile publishers – including the BBC, Bloomberg and The New York Times – and platforms like Reddit, X and Medium have been making moves to block AI bots from crawling their websites in order to scrape data for AI training. Of particular concern for publishers are ‘zero click’ searches, which are the result of search engine AI delivering an immediate answer without users having to click through to the source of the information. The risk of these resulting in a decline in traffic is one big reason to keep the bots out.

These steps have actually been facilitated by OpenAI and Google, both of which have released instructions on how to block the bots that crawl for ChatGPT and Bard (supposedly without impacting sites’ visibility in search engine results). The reasons for this have not been made explicit, but presumably AI companies hope to head off any future lawsuits by offering the option to opt out, pushing the responsibility for whether data is scraped or not onto the publishers.

While the efficacy of the blocks has been questioned, this could nevertheless lead to a future where these AI tools that are trained on available internet content find that pool of content shrinking and the quality of what remains diminishing. OpenAI recently began to expand its training from a two-year-old data set to up-to-date information harvested from the internet in real time, but if this coincides with a reduction in what it is able to access, that might act as a ceiling for the reliability of the content it creates.

As for what those publishers and platforms intend to do with their data and how that might impact the future of AI…

The data giants taking gen AI in-house

The walling off of data from these current AI leaders is coinciding with an expansion into AI by organisations that might not yet have the advanced tech, but control something likely to be far more valuable in the long run: a huge quantity of quality data.

For instance, the imaginatively named Generative AI by Getty Images was developed by Getty with NVIDIA, and trained on its huge library of licensed photographs and illustrations. By opting for a service trained on a clearly defined data set legally controlled by the owner of the AI tool, this should dispel those nagging concerns about copyright infringement, not to mention the issue of ripping off artists’ work with no credit or compensation. Getty knows this is an advantage over the likes of Midjourney, billing its service as “commercially safe”, “worry‑free” and one that “compensates creators”.

Similarly, Adobe’s Firefly is trained on “Adobe Stock images, openly licensed content, and public domain content”, and makes similar claims about compensation for artists. It also promises to “protect customers from third-party IP claims about Firefly-generated outputs”, a bold statement that highlights its confidence in the legality of its content.

This is certainly only the beginning. These organisations have the advantage of control and oversight of the data that their AI is being trained on. While AI trained on the internet runs the risk of decay and legal complaints, Getty and the like will be able to offer quality and transparency to their customers.

Companies are banning gen AI in the workplace

While there is a lot of talk about how generative AI tools can be put to use in the workplace, some very big names in business have been restricting or outright banning their use instead. Apple, Samsung, Amazon and JPMorgan Chase are part of a growing list of organisations that are trying to prevent the use of gen AI by their employees.

While the quality and reliability of gen AI content and the copyright issues outlined above may factor into these decisions, the main motivator seems to be data security. Tools like Bard are trained not only on the internet, but also in part by the data that users include in their prompts. Upload confidential information into a gen AI tool in order to create a press release, for instance, and it becomes part of the tool’s training dataset.

Samsung found that sensitive code had been uploaded to ChatGPT, while Amazon allegedly warned its employees not to use the platform after it was found to be generating copy that looked suspiciously like internal Amazon data. If gen AI is going to prove a security risk for companies, then – no matter how useful it is for employees – its ongoing use is going to be limited in a business setting.

The world of generative AI is still in its infancy – and, as we have seen, it’s also in a period of great flux. Any organisations looking to incorporate these tools into their workflows need to carefully consider how they do that sustainably, what the consequences might be, the services that are best suited to their purposes, and how these ongoing changes in the field are likely to impact them further down the line. At Beettoo, we’ll be keeping a careful watch on these developments, both for our own use and to best advise our clients on how and if to adopt gen AI tools into their marketing strategies.