top of page

Big Tech

Tech companies find the edge of the internet

OpenAI reportedly developed a tool that transcribes audio from YouTube videos.

Tech companies find the edge of the internet
Looney Tunes/Warner Bros. via Giphy
By
Neal Freyman
7 April 2024
less than 3 min read
Become smarter in just 5 minutes

Ai Onion delivers quick and insightful updates about the most important and impactful Ai news and insights from careers to crime

Thanks for subscribing!

Read original article


In 1974, Shel Silverstein found where the sidewalk ends. In 2024, tech companies found where the internet ends.

In their frantic hunt for data to train powerful AI systems, tech giants like OpenAI have harvested almost every last bit of web content available. And they warn that high-quality data (Wikipedia entries, scientific papers, etc.) could completely run out in the next two years, the WSJ reported.


To avoid a huge drag on their growth plans, tech firms are getting creative in finding new sources of data for AI training—methods that can fall into a legal gray area, the NYT reported.

  • OpenAI allegedly developed a tool that transcribes audio from YouTube videos, opening up a new source of data that may violate the copyrights of those videos.

  • Strangely, YouTube’s parent company, Google, didn’t raise a stink about that. Why? Because Google is also transcribing YouTube videos to feed its own AI models, and it might want to keep that on the DL, per the NYT.

The internet is so tapped out of quality data for these AI companies that Meta reportedly discussed buying publisher Simon & Schuster for the information contained in its books.


It’s getting weird out there. With limited human-created content left to scrape, some companies have reportedly begun developing “synthetic” information: Yes, that means using AI to create content that will be used to…train the same AI.

bottom of page