OpenAI, the company behind ChatGPT, recently announced the operator. It is a generative AI service that acts like an agent and performs tasks on your behalf. Using their own browser, the operator views a webpage and interacts with it by typing, clicking and scrolling – no input required.
The rollout will be gradual, and ChatGPT Pro customers in the United States will be the first to receive it.
The operator can handle various repetitive browser tasks, and OpenAI claims it can fill out forms, order groceries, and even create memes. It can use the same interfaces and tools that humans interact with, and this will also help businesses, opening up new opportunities for engagement for them.
A research preview of the Operator, an agent who can use your own browser to perform tasks for you. pic.twitter.com/wkBBDIlVqj
– OpenAI (@OpenAI) 23 January 2025
The operator operates by a new model called CUA – Computer-Using Agent. It combines GPT-4o vision capabilities with advanced reasoning through reinforced learning. CUAs are trained to interact with GUIs – graphical user interfaces with buttons, menus and text fields that people see on a screen.
When the service gets stuck or needs assistance, it hands control back to you. You also need to manually input sensitive data like passwords or other verification forms.
The operator can work with services like Doordash, Etsy, Booking.com, Uber and Instacart, and it can conduct research through media partners like the Associated Press and Reuters.