Turn on Lights and Take Photos
Turn on Lights and Take Photos
https://eu.36kr.com/en/p/3678722334679937
Publish Date: 2026-02-11 08:12:00
Source Domain: eu.36kr.com
What would you do with $25 (approximately 173 Chinese Yuan)?
Buy a takeaway meal, top up your phone credit, or casually order a Bluetooth headset? But in the eyes of an AI enthusiast developer in the United States (referred to as Ethan in this article), this $25 is enough to build a “physical world operable intelligent agent”.
He did something that sounds a bit outrageous: He ran the recently popular OpenClaw on a prepaid Android phone that costs $25 – $30 at Walmart. He made it receive instructions via Discord and then directly control the phone’s hardware – turn on the flashlight, take photos for recognition, read sensors, and even attempt to make a call.
What’s even more interesting is that he’s not satisfied with just one phone. Instead, he plans to set up a whole row of phones to create an Agent “phone cluster”.
From Chatbots to “Actionable” Agents
Ethan’s solution is actually not complicated. The core structure is as follows:
● Install Termux (a Linux-like terminal environment for Android) on the Android phone.
● Run the OpenClaw Agent in Termux.
● Call the Android system capabilities through the Termux API.
● Communicate with the Agent via Discord.
In other words, this $25 phone has become an always – online “hardware execution node”. For example, he can send an instruction on Discord: “Hey Claw, turn on the flashlight and then turn it off.” A few seconds later, the phone’s flashlight turns on and then off.
The process behind this is not mysterious: OpenClaw receives the Discord message, calls the Termux API, and then the API calls the Android system interface to complete the hardware operation – things that were originally only possible for apps or system processes are now done by an agent driven by a language model.
In Ethan’s view, what’s really interesting is not “being able to turn on the flashlight”, but “the model starting to have physical execution capabilities”.
Photo + GPT 5.2: The Visual Ability of an Entry – Level Phone
To prove that…