3 Sources
[1]
Cheap AI "video scraping" can now extract data from any screen recording
Researcher feeds screen recordings into Gemini to extract accurate information with ease. Recently, AI researcher Simon Willison wanted to add up his charges from using a cloud service, but the payment values and dates he needed were scattered among a dozen separate emails. Inputting them manually would have been tedious, so he turned to a technique he calls "video scraping," which involves feeding a screen recording video into an AI model, similar to ChatGPT, for data extraction purposes. What he discovered seems simple on its surface, but the quality of the result has deeper implications for the future of AI assistants, which may soon be able to see and interact with what we're doing on our computer screens. "The other day I found myself needing to add up some numeric values that were scattered across twelve different emails," Willison wrote in a detailed post on his blog. He recorded a 35-second video scrolling through the relevant emails, then fed that video into Google's AI Studio tool, which allows people to experiment with several versions of Google's Gemini 1.5 Pro and Gemini 1.5 Flash AI models. Willison then asked Gemini to pull the price data from the video and arrange it into a special data format called JSON (JavaScript Object Notation) that included dates and dollar amounts. The AI model successfully extracted the data, which Willison then formatted as CSV (comma-separated values) table for spreadsheet use. After double-checking for errors as part of his experiment, the accuracy of the results -- and what the video analysis cost to run -- surprised him. "The cost [of running the video model] is so low that I had to re-run my calculations three times to make sure I hadn't made a mistake," he wrote. Willison says the entire video analysis process ostensibly cost less than one-tenth of a cent, using just 11,018 tokens on the Gemini 1.5 Flash 002 model. In the end, he actually paid nothing because Google AI Studio is currently free for some types of use.
[2]
AI researcher scrapes usable data from a 35-second screen recording for less than one cent via Google Gemini
This could potentially save thousands of hours in manual labor. AI researcher and data journalist Simon Willison used the Google AI Studio tool to convert a 35-second screen recording of 12 emails into a single spreadsheet. This experiment surprised Willison, who did not expect the AI to return accurate results at such a low cost. According to his blog, AI Studio charged him 11,018 tokens for this action, and with a cost of 7.5 cents per million token, this exercise amounts to less than 10% of 1 cent. Willison's scenario saw the need to source numerical values across 12 different emails. Rather than spend time copy and pasting the source data into a spreadsheet, they enlisted the help of AI to review a screen capture of their emails, and to pluck the data from the video. The prompt that Willison provided to Google's AI Studio being a simple "Turn this into a JSON array where each item has a yyyy-mm-dd date and a floating point dollar amount for that date" Willison provided an example of the JSON formatted output. Willison reveals that the end cost was 1/10th of a cent. This is calculated by AI Studio using 11,018 tokens, of which 10,326 were for video. The Gemini 1.5 Flash 002 model, a cheaper model than the Gemini 1.5 Pro, charges $0.075 per one million tokens. Willison helpfully shows us the math that lead to this conclusion. But for the time being, Google AI Studio is currently free of charge, so Willison didn't spend a cent! While scraping data from a few messages in your inbox might seem like an easy task that doesn't require any sort of automated assistance, this is going to be a different story if you have to find data from a hundred or even a thousand emails. There are other alternatives to screen recording and feeding the data to AI, like using an API to scrape your inbox or using Google's own Gemini in Gmail tool. However, the former requires some programming knowledge which most users likely aren't familiar with, while the latter has its own issues that might make you nervous about granting Gemini complete access to your inbox. What makes video scraping such a powerful tool is that it doesn't take much effort for anyone to use it -- all you need is a way to capture your screen and a multi-modal tool (like Gemini 1.5) and it can produce a database from the information you've recorded on your screen. Aside from not requiring any specialized knowledge, you could scrape data from potentially any source, including web pages. This is actually the same concept of the controversial Recall tool that Microsoft introduced with its Copilot+ PCs and the third-party Rewind AI tool available for macOS. However, even if these tools only process your data locally on compatible devices, they still have an inherent privacy issue because they record your screen all the time you use your computer and store them in a local folder. Even if the screenshots aren't uploaded to the cloud, the fact that they're saved in one place on your computer makes your data vulnerable. Willison's process is intriguing and will surely spark others to investigate how AI can be used to perform other such tasks.
[3]
This Expert Says AI Efficiency Soars When it Uses Video to Crunch Numbers
Willison was working on one of those everyday accounting tasks that sounds simple, but inevitably ends up being time consuming. He wanted to tally all the different charges he'd incurred for using a cloud company's services. But, as news site ArsTechnica notes, Willison's data was embedded all over the place in lots of different emails and so on, so finding it all and manually extracting the info would be one of those soul-destroying office jobs. Then inspiration struck. Willison turned on his computer's "screen recording" system, which creates a video of everything you do on the desktop, and then he navigated between all the different emails and sources of the numbers he needed, simply scrolling past the right data along with all the other info in each message. Then he put that video into Google's AI Studio system, which, as Ars explains, lets users try out "several versions of Google's Gemini 1.5 Pro and Gemini 1.5 Flash AI mode" AI systems. Willison prompted the AI to look at the video, telling it to pull out any of the relevant numbers it could see, and then put them in a specially formatted file that could be easily loaded into a spreadsheet, including specific information like dates and exact prices amounts. The task took moments, was effectively free because of the experimental nature of AI Studio, and apparently delivered accurate data that Willison was able to verify -- saving him a lot of potentially wasted time.
Share
Copy Link
AI researcher Simon Willison demonstrates a novel "video scraping" technique using Google's Gemini AI to extract data from screen recordings, potentially revolutionizing data collection and analysis.
Simon Willison, an AI researcher and data journalist, has introduced a groundbreaking method called "video scraping" that utilizes artificial intelligence to extract data from screen recordings. This innovative approach could potentially save countless hours of manual labor and revolutionize data collection processes 1.
Faced with the tedious task of compiling charges from multiple emails, Willison devised a creative solution. He recorded a 35-second video scrolling through twelve relevant emails and fed this recording into Google's AI Studio tool, which provides access to various versions of Google's Gemini 1.5 Pro and Gemini 1.5 Flash AI models 1.
Willison prompted the Gemini AI to extract price data from the video and arrange it into a JSON (JavaScript Object Notation) format, including dates and dollar amounts. The AI successfully completed this task, allowing Willison to easily convert the data into a CSV (comma-separated values) table for spreadsheet use 2.
The accuracy of the results and the low cost of running the video model astounded Willison. The entire video analysis process used just 11,018 tokens on the Gemini 1.5 Flash 002 model, which would typically cost less than one-tenth of a cent. In this case, the process was free due to Google AI Studio's current promotional offering 1.
This "video scraping" technique has far-reaching implications for data collection and analysis. It could be particularly useful when dealing with large volumes of data scattered across numerous sources. The method requires no specialized knowledge, making it accessible to a wide range of users 2.
While this technique offers significant advantages, it also raises privacy concerns. Similar concepts are used in tools like Microsoft's Recall for Copilot+ PCs and the third-party Rewind AI tool for macOS. These tools continuously record screen activity, potentially making user data vulnerable, even if processed locally 2.
Willison's experiment hints at the future capabilities of AI assistants, which may soon be able to see and interact with users' on-screen activities. This could lead to more intuitive and efficient AI-human interactions in various fields, from data analysis to everyday computing tasks 3.
Summarized by
Navi
[2]
NVIDIA announces significant upgrades to its GeForce NOW cloud gaming service, including RTX 5080-class performance, improved streaming quality, and an expanded game library, set to launch in September 2025.
10 Sources
Technology
19 hrs ago
10 Sources
Technology
19 hrs ago
Nvidia is reportedly developing a new AI chip, the B30A, based on its latest Blackwell architecture for the Chinese market. This chip is expected to outperform the currently allowed H20 model, raising questions about U.S. regulatory approval and the ongoing tech trade tensions between the U.S. and China.
11 Sources
Technology
19 hrs ago
11 Sources
Technology
19 hrs ago
SoftBank Group has agreed to invest $2 billion in Intel, buying common stock at $23 per share. This strategic investment comes as Intel undergoes a major restructuring under new CEO Lip-Bu Tan, aiming to regain its competitive edge in the semiconductor industry, particularly in AI chips.
18 Sources
Business
11 hrs ago
18 Sources
Business
11 hrs ago
Databricks, a data analytics firm, is set to raise its valuation to over $100 billion in a new funding round, showcasing the strong investor interest in AI startups. The company plans to use the funds for AI acquisitions and product development.
7 Sources
Business
3 hrs ago
7 Sources
Business
3 hrs ago
OpenAI introduces ChatGPT Go, a new subscription plan priced at ₹399 ($4.60) per month exclusively for Indian users, offering enhanced features and affordability to capture a larger market share.
15 Sources
Technology
11 hrs ago
15 Sources
Technology
11 hrs ago