- The Rundown AI
- Posts
- OpenAI's first AI agent arrives
OpenAI's first AI agent arrives
PLUS: Perplexity launches agentic phone assistant
Good morning, AI enthusiasts. 2025 has already been widely declared the year of AI agents, and OpenAI just officially joined the party.
The startup’s ‘Operator’ release takes us into a new realm with mainstream AI assistants that can navigate the internet and take actions on their own — our interactions with chatbots may never be the same.
Exclusive: I got early access to test Operator, which did not disappoint. Check out my thread of demos here.
In today’s AI rundown:
OpenAI unveils its first autonomous web agent
Perplexity debuts new AI mobile assistant
How to prompt o1 models better
‘Humanity’s Last Exam’ scales up AI benchmark
4 new AI tools & 4 job opportunities
LATEST DEVELOPMENTS
OPENAI
Image source: OpenAI
The Rundown: OpenAI just launched Operator, an AI agent that can independently navigate web browsers to complete everyday tasks — marking the company's first major step into autonomous AI assistants.
The details:
Operator uses a new Computer-Using Agent model that combines 4o's vision capabilities with advanced reasoning to interact naturally with websites.
OpenAI demoed the feature during a live stream, showcasing tasks like booking reservations, grocery ordering, and buying tickets to sporting events.
OpenAI has partnered with major platforms like DoorDash, Instacart, and Uber to ensure the agent works seamlessly while respecting platform guidelines.
Built-in safety features include user approval for purchases, automated threat detection, and "takeover mode" for sensitive info like passwords and payments.
The research preview is currently limited to U.S. Pro users, with plans to expand to Plus, Team, and Enterprise after more safety and reliability testing.
Why it matters: While we’ve seen agentic systems popping up more frequently, OpenAI’s long-awaited move is a major step towards broadly changing the entire mindset of how we interact with AI. While there may be rough edges at first, Operator feels like the official beginning of a brand new agentic era.
TOGETHER WITH WORKOS
The Rundown: WorkOS Radar is a security solution that shields your AI platform from fake signups, throwaway emails, and brute force attempts — all powered by advanced device fingerprinting and real-time detection.
With WorkOS Radar, you can:
Rapidly detect and challenge unfamiliar and suspicious devices in real time
Stop free-tier abuse and fraudulent behavior with advanced detection
Customize threat responses to fit your app’s exact security needs
PERPLEXITY
Image source: Perplexity
The Rundown: Perplexity just unveiled Perplexity Assistant, a free, agent-like tool for Android that can control phone apps and perform complex tasks with multimodal and voice capabilities — directly challenging voice assistants like Google’s Gemini and Siri.
The details:
The new assistant integrates with popular apps like Uber and OpenTable to perform actions directly through voice commands or gesture controls.
It maintains context throughout interactions, allowing users to progress from research to action — like finding restaurants and booking a table.
The system supports multimodal interactions through voice and camera, enabling users to obtain information about their surroundings or view screen content.
Users can replace Google's default assistant with Perplexity's solution at no cost, with the feature only available on Android for now.
Why it matters: Operator isn’t the only agent in town today, with Perplexity evolving its platform from a search/answer engine to a full-blown digital assistant. The assistant space could become a new battleground for AI firms, not just tech giants — and this Perplexity launch looks a lot like what Apple’s ‘upgraded’ Siri should actually be.
AI TRAINING
The Rundown: Using delimiters and structured prompts significantly improves AI model outputs by providing clear instructions and relevant context.
Step-by-step:
Structure prompts using XML tags (<goal>, <context>, <format>).
Fill each section with relevant, specific information and define clear parameters for your desired output.
Test, ask follow-up questions, and optimize your results.
Pro tip: Save successful prompt structures as templates for consistent results across similar tasks. The Rundown University members can access our full workshop on effectively using ChatGPT-o1 to get the best results here.
PRESENTED BY INNOVATING WITH AI
The Rundown: Innovating with AI's new program, AI Consultancy Project, transforms AI enthusiasts into professional consultants — tapping into a market projected to reach $54.7B by 2032.
The 6-month program delivers:
Proven frameworks for client acquisition and service delivery
A step-by-step path to six-figure consulting income
Students who land their first AI client in as little as 3 days
SCALE AI & THE CENTER FOR AI SAFETY
Image source: Humanity’s Last Exam
The Rundown: The Center for AI Safety and Scale AI just introduced "Humanity's Last Exam," a new AI benchmark designed to be the final frontier for testing an LLM’s academic knowledge — as current AI systems become too strong for existing tests.
The details:
The benchmark consists of 3,000 expert-crafted questions across 100+ subjects, with contributors from over 500 institutions in 50 countries.
Current leading AI models show surprisingly low performance on HLE, with even top systems scoring under 10% accuracy.
Questions are in either exact-match or multiple-choice format, with 10% of the challenges incorporating multimodal analysis of text and images.
A $500k prize pool incentivizes high-quality submissions, with top questions earning $5,000 each and co-authorship opportunities for contributors.
Why it matters: With top models routinely scoring above 90% on many of today’s key benchmarks, tests like HLE are an important way to continue scaling the ability to measure increasingly advancing AI systems. However, given the rate of progress, it likely won’t be long before we see some impressive results on these benchmarks.
QUICK HITS
🤖 Gemini 2.0 Flash Thinking Exp - Google’s powerful new reasoning model
⚙️ Trae - Adaptive AI IDE that helps you ship faster
💻 UI-TARS - Control your computer using natural language
🪄 Spell by Spline - Generate full 3D scenes or worlds from a single image
🎯 The Rundown - Marketing/Media Buyer
👋 Lindy AI - Head of Community
📊 Snorkel - Data Annotator (Statistics Expert Contributor)
🗣️ UiPath - ASR Manager
Anthropic launched Citations, a new feature in the Claude API that enables automated source attribution and verification in responses for increased accuracy.
Google’s Imagen 3.0 debuted at No. 1 in the LM Text-to-Image Arena, giving the tech giant the top spots on image and LLM leaderboards.
ByteDance is planning a $20B investment in AI infrastructure in 2025, half of which will be allocated to international data centers and partnerships with chip suppliers.
OpenAI CEO Sam Altman revealed that the upcoming o3-mini model upgrade will be available in the free tier of ChatGPT, with usage upgrades for plus users.
Hugging Face unveiled SmolVLM 256M and 500M, hailed as the world’s smallest vision language models that maintain competitive performance against larger rivals.
LinkedIn is facing a new class-action lawsuit, alleging that the company used the private messages of premium subscribers to train AI models.
COMMUNITY
Join our next workshop today at 4 PM EST to learn about how to use DeepSeek-R1 as a powerful, cost-effective alternative for your AI projects with Dr. Alvaro Cintas, The Rundown’s AI professor.
RSVP here. Not a member? Join The Rundown University on a 14-day free trial.
We’ll always keep this newsletter 100% free. To support our work, consider sharing The Rundown with your friends, and we’ll send you more free goodies.
That's it for today!Before you go we’d love to know what you thought of today's newsletter to help us improve The Rundown experience for you. |
See you soon,
Rowan, Joey, Zach, Alvaro, and Jason—The Rundown’s editorial team
Reply