Summary

  • There are many questions surrounding the training data for OpenAI’s Sora.
  • OpenAI’s CTO was unsure about the training data when interviewed last month.
  • YouTube CEO has made it clear that videos on the platform are not to be used for training data by OpenAI.



While the hype surrounding OpenAI’s ChatGPT has died down considerably since its initial launch, the company isn’t resting on its laurels by any means, and really managed to “wow” the internet a couple of months back with the introduction of its text-to-video model dubbed Sora. From what we’ve seen so far, the technology is absolutely incredible, capable of producing lifelike videos that really could fool someone into believing that they are real.


Related

What is OpenAI?

OpenAI is igniting the AI revolution with bold projects and visionary alliances

Although the chats, images and movies that can be produced by prompts are astounding, there is a dark side that has surfaced with this recent movement, as these systems don’t just create these out of thin air and instead rely on tons of data for their training. This can come in the form of images, videos, and even articles. And while some sources are appropriate for this type of use, others are not. And that’s where some companies are having issues, as it’s not entirely clear where the training data is coming from.


OpenAI CTO is unclear if Sora is trained using YouTube and other social platforms

Perhaps what was more alarming is the fact that OpenAi’s CTO, Mira Murati, when interviewed by The Wall Street Journal just last month, wasn’t exactly sure or clear about where the training data for Sora was coming from either (via Engadget). And while it’s unclear whether YouTube videos were or are being used for training, YouTube’s CEO Neal Mohan has now perhaps taken a shot across the bow, issuing a warning to OpenAI that using videos on its platform is not allowed.


The proclamation comes from an interview with Emily Chang on Bloomberg Originals, where Mohan stated “It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.” YouTube’s parent company, Google, has also been working on its own multimodal AI called Gemini, which also relies on training data, but Mohan stated that “Google adheres to YouTube’s individual contracts with creators before deciding whether to use videos from the platform.”

It will be interesting to see if OpenAI will ever respond, and clarify exactly how it’s training Sora, which it may eventually need to do if it intends to allow the public the use of its tools. Of course, things will just continue to evolve from here, and while there are a lot of uncertainties involved, it also comes along with alot of exciting possibilities.