- The Rundown AI
- Posts
- AI giants stolen training data revealed
AI giants stolen training data revealed
PLUS: OpenAI founder launches AI education startup
Welcome, AI enthusiasts.
YouTube's vast library just became an unexpected training ground for AI giants — with the content’s creators left completely in the dark.
With the gold rush for data continuing to grow, has the AI revolution made everything fair game, or is a digital rights reckoning approaching? Let’s investigate…
In today’s AI rundown:
Report: AI trained on YouTube without consent
OpenAI founding member pivots to AI education
Bring your images to life with Motion
Mistral releases new open-source models
6 new AI tools & 4 new AI jobs
More AI & tech news
Read time: 4 minutes
LATEST DEVELOPMENTS
OPENAI
Image source: Midjourney
The Rundown: A new investigation by Proof News just revealed that tech giants including Apple, Anthropic, Nvidia, and Salesforce used content from over 170,000 YouTube videos to train their AI models without creators’ consent.
The details:
The dataset, called “YouTube Subtitles”, contains transcripts from over 48,000 channels, including popular creators, news outlets, learning channels and more.
Nonprofit EleutherAI compiled the data as part of a larger collection called ‘The Pile’, intended to provide training materials for developers and academics.
Creators were unaware their content had been used for AI training purposes, with YouTube’s ToS also prohibiting the use without permission.
Apple reportedly used the dataset to train OpenELM, a model related to new AI features for iPhones and MacBooks.
Why it matters: While the use of these transcripts isn’t going to create the best vibes with creators — we’ve yet to see many legal ramifications for firms in these cases. With this dataset also being public through EleutherAI, its hard to see anything other than bad PR coming from this report, despite the ethical/moral implications it raises.
TOGETHER WITH HUBSPOT
The Rundown: Unlock the secrets of using AI to effortlessly manage your tasks and increase productivity with HubSpot’s playbook — helping you reclaim hours of time in your daily workflow.
This free kit helps users discover how to:
Leverage AI for streamlined task management to optimize efficiency
Utilize AI tools to elevate decision-making and maximize workflows across teams
Explore comprehensive templates and detailed examples for AI delegation
Learn how to evaluate measurable impacts of AI on daily outputs
Download HubSpot’s free kit and start reclaiming your time with AI today.
ANDREJ KARPATHY
Image source: Eureka Labs (@EurekaLabsAI) on X
The Rundown: OpenAI founding member and former Tesla Director of AI Andrej Karpathy just announced the launch of Eureka Labs, a new AI-integrated education platform — coming just months after his departure from OpenAI.
The details:
Karpathy’s post calls Eureka Labs an “AI-native” school, leveraging advanced AI to enhance the learning experience.
Its first offering, LLM101n, is an undergraduate-level course teaching students to train their own AI models.
The platform combines human expertise with AI teaching assistants to guide students, with plans to offer both digital and physical cohorts.
Karpathy announced his departure from OpenAI in February to pursue personal projects, and has since been posting educational AI videos on YouTube over the last year.
Why it matters: From hyper-personalized learning to advanced, on-demand AI tutors, education is about to be massively reshaped in the coming years—and the sector just gained one of the most well-respected scientists in the field to help lead the way.
AI TRAINING
The Rundown: Leonardo AI’s new ‘Motion‘ feature allows users to turn static images into captivating short animations for social media, web design, digital artists, and more.
Step-by-step:
Sign up on Leonardo AI’s website (free account includes 150 daily credits).
From the main dashboard, click on "Image Generation" in the sidebar menu.
Generate an image using the prompt of your choice.
Pick your favorite image, hover on it and click the ‘Motion’ button — adjust the Motion Strength slider as desired.
Click "Generate" and check out your animated creation!
AI RESEARCH
Image source: Mistral
The Rundown: Mistral AI just launched Codestral Mamba and Mathstral, two new small, specialized language models that achieve state-of-the-art performance for open-source models on key benchmarks.
The details:
Codestral Mamba is a 7B model offering fast inference and advanced coding capabilities that surpass open-source rivals like CodeGemma and CodeLlama.
The model can handle context lengths up to 256k tokens (doubling GPT4o), making it ideal for large, complex coding tasks and local development.
Mathstral, also a 7B parameter model, achieves SOTA performance on math reasoning benchmarks like MATH (56.6%) and MMLU (63.47%).
Both were released under the Apache 2.0 license, allowing for free use, modification, and distribution — and available via Mistral’s API and Hugging Face.
Why it matters: Mistral continues to shake up the AI landscape with innovative approaches — and with these specialized models, they join other rivals in demonstrating that bigger AI systems aren’t always better. In the future, every sector will likely have a hyper-specific, highly-capable open-source model of its own.
NEW TOOLS & JOBS
👨💻Claude Engineer - Interactive command-line interface powered by Claude 3.5 models
💻 Elementor AI - An AI website builder for WordPress
🎥Haiper AI 1.5 - A free-to-try AI video generator with up to 8 second generations
🚀Prodia - Add generative AI to your app with one API
👨🎤CharacterGen - 3D character generation from a single image
🔬Undermind - Deep scientific search for complex problems
📝Writer - Product Manager
💻 OctoAI - Staff MLSys Engineer - Kernel Optimization
🩺Notable - Clinical Development Lead
🔍 Character AI - UX Researcher
QUICK HITS
Join the waitlist: Section's Build an AI Product bootcamp. Design and prototype your own AI product in eight weeks, with the help of AI product executives. Join the waitlist for a 25% discount.Sign up now.*
The UK's Competition and Markets Authoritylaunched a formal probe into Microsoft's hiring of former Inflection AI staff, mirroring a similar investigation by the US Federal Trade Commission.
Visual AI startup Haiper released Haiper 1.5, a new video generation model that doubles the platform’s output length to 8 seconds and introduces new HD upscaling, realism, and image generation features.
Anthropic announced the release of an Android app for Claude, now available to mobile users in the Google Play store.
OpenAIposted new examples of its Sora video generation model on the company’s Instagram page, saying the tool is slowly being expanded to more creatives for testing.
Cohere is partnering with Japanese tech giant Fujitsu to develop 'Takane', an advanced Japanese language AI model for enterprise use based on Cohere’s Command R+, set to launch in September 2024.
*Sponsored listing
THAT’S A WRAP
SPONSOR US
Get your product in front of over 600k+ AI enthusiasts
Our newsletter is read by thousands of tech professionals, investors, engineers, managers, and business owners around the world. Get in touch today.
FEEDBACK
How would you rate today's newsletter?Vote below to help us improve the newsletter for you. |
If you have specific feedback or anything interesting you’d like to share, please let us know by replying to this email.
Reply