Computing Now Exclusive Content — March 2010

News Archive

July 2012

Gig.U Project Aims for an Ultrafast US Internet

June 2012

Bringing Location and Navigation Technology Indoors

May 2012

Plans Under Way for Roaming between Cellular and Wi-Fi Networks

Encryption System Flaw Threatens Internet Security

April 2012

For Business Intelligence, the Trend Is Location, Location, Location

Corpus Linguistics Keep Up-to-Date with Language

March 2012

Are Tomorrow's Firewalls Finally Here Today?

February 2012

Spatial Humanities Brings History to Life

December 2011

Could Hackers Take Your Car for a Ride?

November 2011

What to Do about Supercookies?

October 2011

Lights, Camera, Virtual Moviemaking

September 2011

Revolutionizing Wall Street with News Analytics

August 2011

Growing Network-Encryption Use Puts Systems at Risk

New Project Could Promote Semantic Web

July 2011

FBI Employs New Botnet Eradication Tactics

Google and Twitter "Like" Social Indexing

June 2011

Computing Commodities Market in the Cloud

May 2011

Intel Chips Step up to 3D

Apple Programming Error Raises Privacy Concerns

Thunderbolt Promises Lightning Speed

April 2011

Industrial Control Systems Face More Security Challenges

Microsoft Effort Takes Down Massive Botnet

March 2011

IP Addresses Getting Security Upgrade

February 2011

Studios Agree on DRM Infrastructure

January 2011

New Web Protocol Promises to Reduce Browser Latency

To Be or NAT to Be?

December 2010

Intel Gets inside the Helmet

Tuning Body-to-Body Networks with RF Modeling

November 2010

New Wi-Fi Spec Simplifies Connectivity

Expanded Top-Level Domains Could Spur Internet Real Estate Boom

October 2010

New Weapon in War on Botnets

September 2010

Content-Centered Internet Architecture Gets a Boost

Gesturing Going Mainstream

August 2010

Is Context-Aware Computing Ready for the Limelight?

Flexible Routing in the Cloud

Signal Congestion Rejuvenates Interest in Cell Paging-Channel Protocol

July 2010

New Protocol Improves Interaction among Networked Devices and Applications

Security for Domain Name System Takes a Big Step Forward

The ROADM to Smarter Optical Networking

Distributed Cache Goes Mainstream

June 2010

New Application Protects Mobile-Phone Passwords

WiGig Alliance Reveals Ultrafast Wireless Specification

Cognitive Radio Adds Intelligence to Wireless Technology

May 2010

New Product Uses Light Connections in Blade Server

April 2010

Browser Fingerprints Threaten Privacy

New Animation Technique Uses Motion Frequencies to Shake Trees

March 2010

Researchers Take Promising Approach to Chemical Computing

Screen-Capture Programming: What You See is What You Script

Research Project Sends Data Wirelessly at High Speeds via Light

February 2010

Faster Testing for Complex Software Systems

IEEE 802.1Qbg/h to Simplify Data Center Virtual LAN Management

Distributed Data-Analysis Approach Gains Popularity

Twitter Tweak Helps Haiti Relief Effort

January 2010

2010 Rings in Some Y2K-like Problems

Infrastructure Sensors Improve Home Monitoring

Internet Search Takes a Semantic Turn

December 2009

Phase-Change Memory Technology Moves toward Mass Production

IBM Crowdsources Translation Software

Digital Ants Promise New Security Paradigm

November 2009

Program Uses Mobile Technology to Help with Crises

More Cores Keep Power Down

White-Space Networking Goes Live

Mobile Web 2.0 Experiences Growing Pains

October 2009

More Spectrum Sought for Body Sensor Networks

Optics for Universal I/O and Speed

High-Performance Computing Adds Virtualization to the Mix

ICANN Accountability Goes Multinational

RFID Tags Chat Their Way to Energy Efficiency

September 2009

Delay-Tolerant Networks in Your Pocket

Flash Cookies Stir Privacy Concerns

Addressing the Challenge of Cloud-Computing Interoperability

Ephemeralizing the Web

August 2009

Bluetooth Speeds Up

Grids Get Closer

DCN Gets Ready for Production

The Sims Meet Science

Sexy Space Threat Comes to Mobile Phones

July 2009

WiGig Alliance Makes Push for HD Specification

New Dilemnas, Same Principles:
Changing Landscape Requires IT Ethics to Go Mainstream

Synthetic DNS Stirs Controversy:
Why Breaking Is a Good Thing

New Approach Fights Microchip Piracy

Technique Makes Strong Encryption Easier to Use

New Adobe Flash Streams Internet Directly to TVs

June 2009

Aging Satellites Spark GPS Concerns

The Changing World of Outsourcing

North American CS Enrollment Rises for First Time in Seven Years

Materials Breakthrough Could Eliminate Bootups

April 2009

Trusted Computing Shapes Self-Encrypting Drives

March 2009

Google, Publishers to Try New Advertising Methods

Siftables Offer New Interaction Model for Serious Games

Hulu Boxed In by Media Conglomerates

February 2009

Chips on Verge of Reaching 32 nm Nodes

Hathaway to Lead Cybersecurity Review

A Match Made in Heaven: Gaming Enters the Cloud

January 2009

Government Support Could Spell Big Year for Open Source

25 Reasons For Better Programming

Web Guide Turns Playstation 3 Consoles into Supercomputing Cluster

Flagbearers for Technology: Contemporary Techniques Showcase US Artifact and European Treasures

December 2008

.Tel TLD Debuts As New Way to Network

Science Exchange

November 2008

The Future is Reconfigurable

Screen-Capture Programming: What You See is What You Script

by George Lawton

Researchers at the University of Maryland and Massachusetts Institute of Technology have developed a screen-capture–based scripting environment that could signal a new programming paradigm that leverages the graphical interface as a sort of API. The Sikuli system lets users with minimal programming experience use GUI screen shots to create scripts that interact with applications. Ultimately, it will open opportunities to develop scripts that touch multiple applications without requiring any understanding of the underlying programs APIs.

Tom Yeh is a post-doctoral researcher at the University of Maryland and one of Sikuli's developers, along with MIT graduate student Tsung-Hsiang Chang and associate professor Robert C. Miller. Yeh compares Sikuli with the "what you see is what you get" GUI metaphor. With Sikuli, "what you see is what you script." Users with a basic understanding of the Python scripting language will be able to write programs via screen shots rather than lines of code.

"Since the release of Sikuli," Yeh said, "we've received many emails expressing gratitude of how Sikuli helped deal with certain tedious tasks that had been done by hand."

Sikuli: Seeing Pixels

Many other tools, such as AutoHotkey, help automate routine scripting tasks. But these tools are designed to run on only a single platform, and they don't support graphical interaction, as Sikuli does. But Miller says Sikuli's greatest value is its generality: "If it has pixels that Sikuli can see, then it's open to automation," he said. (Sikuli means "God’s Eye" in the language of native Mexicans.)

The technique is open to any application with a GUI that can display on a Windows, Mac, or Linux desktop. "We've already seen users apply it to not just desktop applications," Miller said, "but also Web pages, video games, mobile phone apps (running in a simulator or using a remote connection between the desktop and the phone), and applications from other platforms running in a virtual machine." 

Miller noted another benefit: programmers can use any GUI they're familiar with. "That significantly reduces the cognitive gap between what they want to do and what they can do in Sikuli," he said.

The core Sikuli Script technology supports programming through a combination of machine vision, optical character recognition, and automation technologies. It lets users interact with graphical screen elements, such as dialogue boxes, specific text strings, and icons. The IDE is built on top of Jython, a Python implementation that runs on top Java.

Once the Sikuli IDE is installed, a user hits a keyboard command to activate a special box for highlighting screen elements. The user can then specify commands that use these elements to find icons, insert a cursor in a particular dialog box, or execute if-then statements.


A Sikuli-based application can interact with many types of screen elements. A simple script might automate tasks such as setting a computer's IP address by clicking on multiple icons and dialogue boxes in the right order and then typing in the appropriate strings.

More complex scripts can respond to screen events. For example, an airline tracking application might respond to a colored dot moving across a map and send alerts to interested parties when the airplane comes into proximity of the destination airport. Another application might notify a user when a particular event occurs on a Webcam being monitored in a desktop window.

The Sikuli team has been working on Sikuli Test, an application to help automate the monitoring of the GUI's response to overall system developments and changes. A test engineer can create a script to look for particular changes to screen elements in response to user input. For example, in music player applications, an icon often toggles between play and pause after it's pressed. Sikuli Test could monitor such cases, noting any instances in which the expected behavior didn't occur and further evaluation is required.

Another application, Sikuli Search, searches a database on the basis of what the screen shows. The Sikuli team found that users could retrieve help information for GUI elements faster by using Sikuli Search to select the elements on screen than by using traditional text-based searches. Users simply indicate the onscreen dialogue box for which they want more information, and Sikuli Search retrieves appropriate screen shots from online tutorials, official documentation, and computer books.


Miller said that researchers have been exploring the idea of software agents that look at screen pixels and understand what's going on for many years.  In the mid-1990s, Richard Potter at the University of Maryland created a system called Triggers, which used low-level pixel patterns (such as a corner or an edge) to direct an automatic macro. 

In the late 90s, Robert St. Amant, associate professor at North Carolina State University, and his students created VisMap, an intelligent system that operated a user interface using machine-vision techniques to respond to basic patterns on the screen according to a set of templates stored in a database. VisMap could play Windows Solitaire, but performance limitations of the existing hardware limited its further development.

St. Amant said he became interested in systems that could interpret visual representations because of the richness of visual interfaces and the progress in interface agents. "Most such agents worked behind the scenes," he said, citing Web recommender systems as an example. "I was more interested in ways an agent might help users work through problems they might have in the interface to applications." This meant that an agent would have to know what the user was looking at. 

St. Amant realized that a visual scripting language would bridge the gap between the way users see interfaces and the way conventional APIs and command languages handle lower-level behaviors. "The promise is a path toward end-user programming that doesn't require users to learn a huge amount for each new application," he said. "A visual scripting language would be standardized, at some level, in the same way that there are conventions for visual presentation."

St. Amant said that it took several seconds to derive useful information from a 1024 × 768 screen capture in early 2000. "This was far too slow for interactive assistance," he said. "And while we were able to build some interesting autonomous systems that walked through user interfaces, our original ideas just didn't seem practical. Systems like Sikuli show that I was wrong about the practicality of the approach."

New Challenges

Sikuli will bring new challenges for developers. For one thing, it has its own performance limitations. "Screen matching is certainly not as fast as making a direct procedure call into the application programming interface," Miller said.

Furthermore, Sikuli can't program what it can't see. For example, Yeh explained, if a window is hidden, Sikuli can't do anything with it. Because it's built on top of Java, advanced programmers can use Java APIs to work around this problem but this negates the whole point of programming using the GUI, and it's not practical for beginners.

Program construction needs further simplification. Miller said they've addressed this in several ways, such as building on Python. They also developed a specialized editor that makes it easy to treat screenshot images as first-class constants that you can easily capture, edit, and move around like numbers and strings. Additionally, they've created a recorder to automatically capture screenshots associated with a sequence of user actions, but it's not yet released.

Another challenge lies in more complex machine-processing applications. Vision systems don't recognize many patterns that are perfectly obvious to the human. St. Amant noted that system developers could have a hard time figuring out how an application failed when a problem arises from the machine recognizing patterns differently from what we would expect. A related problem arises from the way people interpret 2D representation of 3D objects. Applications that must interpret the spatial relationships of 3D objects require considerable processing power — for example, to determine the relative depth of two objects that appear near each other on the 2D screen. Consequently, Sikuli's performance might degrade with 3D gaming environments and environments where spatial relationships between objects are important. 

"Dealing with moving, animated objects was hard," St. Amant added. "Screen-capture tools work with snapshots, and it's not always easy to track objects by sampling them. These problems can be overcome, but it will take a lot of time and work." 

The Sikuli development team envisions the technology underlying a wide variety of applications. It could help improve program documentation and tutorials. It could also help automate user interactions with PCs and Web-based applications as well as the interface between the two. Applications based on Webcam monitors could, for example, alert a parent if a baby rolls onto its stomach.

St. Amant sees developers using Sikuli to create application add-ons without having to access source code. He also envisions improved accessibility for blind users through the identification of application icons on the basis of their appearance and for users with physical limitations through tracking and guiding uneven mouse movements to small targets. It could also enable more robust and capable macro recording, independent of applications.

"Sikuli is mainly a research project and aims to inspire new ways to apply computer vision in everyday tasks," said Yeh. "We'd like to keep it open source and make it free to all."

To give Sikuli a try, go to

George Lawton is a freelance technology writer based in Guerneville, California. Contact him at