JULY/AUGUST 2005 (Vol. 20, No. 4) pp. 4-9
1541-1672/05/$31.00 © 2005 IEEE
Published by the IEEE Computer Society
Published by the IEEE Computer Society
In the News
|Minding Your Business with AI|
|Computer Vision Strives for 20/20|
PDFs Require Adobe Acrobat
Minding Your Business with AI
A new crop of technology grafting AI to data mining techniques is making unstructured data—emails, blogs, business records, manufacturer warranties, and other kinds of text—more useful to companies. This utility sprouts from computers' ability to automatically understand and organize text so that they can slice and dice it.
Machine learning, natural language processing, hidden Markov models, and rules-based pattern matching methods parse text for meaning, break it into bits, tag it, and rearrange it. The newly structured text can reveal such information as workflows, instances of impropriety or legal incompliance, cases and causes of problems, and market trends.
The hybrid tools offer companies speed and complexity in their search for valuable and previously hidden business information, says Sue Feldman, an analyst at business technology research firm International Data Corporation.
Looking at emails generated by e-commerce transactions, Tessa Lau, a research staff member at IBM's T.J. Watson Research Center, and Nicholas Kushmerick, a senior lecturer at University College Dublin, noticed that the process of shopping online was quite structured. This discovery led them to wonder whether they could create an email system to reflect that organization.
Working under the aegis of IBM, Lau and Kushmerick created EAM (email-based activity management) technology, which glues together existing algorithms that look for patterns in a sequence of emails to discover the underlying workflow. Using e-commerce emails, EAM looks for commonalities across two dimensions to reverse-engineer a finite-state machine that's a model of the e-commerce business process.
First, EAM locates unique identifiers, strings of letters and numbers that identify a particular transaction. "In the e-commerce domain that might be an invoice number or an order ID," Lau says. "We have some heuristics that look for particular sequences like that and cluster the messages into groups that all contain the same order identifier." Next, the system clusters the messages into groups of the same business process step, using a heuristic that looks for the longest common subsequence. The LCSS is text repeated in messages corresponding to a certain step in the e-commerce transaction, such as the kind that thanks customers for their orders or alerts them that their orders have shipped. Once the system organizes the groups of examples, it feeds them into a finite-state machine inference algorithm to discover the underlying workflow embedded in the emails.
Kushmerick and Lau evaluated their system by taking a couple of months of one person's online shopping history and running their algorithms over the email corpus that the transactions produced. "One of the things that our experiment demonstrated is that you really only need a handful of training examples to learn a reasonably accurate process model," Kushmerick says. "That's because there is a huge amount of structure."
The results also indicate a potential problem with mission-critical workflows. "If we push this technology where you really can't miss messages, we would need to pay a lot more attention to trying to get the model—maybe a plausible strategy would be to go ask the people who built the workflow system to give us a formal description."
Lau and Kushmerick hope to expand the utility of their system, which is still in the research stage. Lau would like to scale up their system and try it on other bodies of email, such as those produced by patent registration or employee evaluation workflows. Kushmerick would like to develop the system to learn from a type of transaction from one company, such as Amazon.com, to organize workflow in another, such as eBay. "We don't have any demonstrations that that actually works," he says. "That's a very ambitious step."
Last line of defense
Making sure that emails get special attention before being released into the wild—which helps companies avoid circulation of inappropriate, offensive, and potentially illegal emails—is the province of OutBoxer. "It's a product that scans an outgoing email to determine whether or not some action should be taken with the email other than just letting it go," says InBoxer CTO Sean True.
Inspired by the enormous amount of Enron emails entered into evidence against the company, InBoxer expanded its product line from technology that helps reduce spam to include a new offering that True calls "a spell-checker for liability and compliance issues." OutBoxer analyzes large numbers of examples of messages that represent dichotomies such as offensive or inoffensive, and personal or business. It breaks up text, counts it, and makes a decision about its appropriateness.
"The decision-making engine that deals with counts of words is like any other Bayesian-type classifier," True says. Starting with a large generic database, the system adapts to a set of categorized examples provided by the customer or is set up to learn as it goes, where a user tags messages when they're misclassified and the system trains the database to discriminate that category. "Business email rapidly converges to a definition of what business is," True says. "Offenses tend to be quite different from what your business does, usually."
Protecting companies and consumers
Billing itself as "the applications company for unstructured data," Attensity offers technology that monitors email as well as text such as product service records, manufacturer warranties, and customer complaint forms. "What we do is help companies to understand how their products are behaving based on what their customers tell them and what their service repair people tell them," says Attensity CTO David Bean. Companies from Whirlpool, the appliance manufacturer, to NACCO, a forklift manufacturer, use Attensity's technology to discover and address mechanical problems in their products.
Attensity's core technology diagrams sentences to determine the subject and direct objects and whether the verb is in active or passive voice, using a set of heuristics to do the parsing. Specializing in dealing with poorly formed input (where rules-based approaches slow down, according to Bean), Attensity's technology resolves contextual ambiguities by learning about sequences and synonomy.
"One of the big problems in dealing with natural language is anaphora, or co-reference, resolution," says Bean. The system feeds large quantities of text to an automated machine-learning-based algorithm that uses cases of anaphora resolution that it can figure out easily. From those examples, the system obtains information about context. The machine-learning-based algorithm uses a log-likelihood ratio to identify when it has acquired a certain level of knowledge. Once the knowledge base is built, the system extracts and organizes facts from the text. These structured facts can then be fed into an analytic tool that can draw a conclusion or make an automated decision from the data.
Monitoring data such as service records, customer complaints, warranties, and mechanic reports to detect product defects isn't just good business in the automotive industry, it can also help manufacturers comply with the law. The Transportation Recall Enhancement Accountability and Documentation Act was enacted in the US in 2000, after the safety debacle involving Firestone tires on Ford sport-utility vehicles. The TREAD Act requires companies to prove they've provided the means to detect and address a problem on the basis of information coming from the field. "The TREAD Act is one compelling motivator," says Jim Murphy, research director at AMR Research, a firm that analyzes business and technology. "The other is these automotive companies spend so much money on warranties and recalls anyway that the earlier they detect a quality problem, the better they can stem the costs."
Attensity helped a motorcycle manufacturer discover a hidden safety problem with the windshield on a certain model by examining both structured and unstructured data in customer complaints. First, the publicly available online complaint records from the National Highway Transportation and Safety Administration Web site were fed into the software. Unstructured data entered in text fields was processed into structured factual representations. The structured information, such as vehicle identification numbers, addresses, and make and model data, was matched with the processed text data. Looking at the results, Attensity discovered that the problem of a windshield coming off the motorcycle in an accident had been initially overlooked. The case was previously coded simply as an oil leak. "What we've been able to demonstrate there is that you will not be able to recognize all of the valuable information if you do not pay attention to what's in the text," Bean says.
Finding facts and links
Inxight offers technology that can extract meaning from unstructured textual data as well—in a variety of languages. The technology searches such sources as internal business correspondence and Internet text to pull out such facts as people, company, product, and place names; dates; physical and email addresses; and distance, volume, and other units of measurement.
The technology also applies metadata to the extracted information and discovers relationships in it. "People would love to do this on text because there's a lot of valuable business information contained in it," says Ian Hersey, Inxight's senior vice president of corporate development and strategy. "But since there's no structure, there's no way for traditional data mining or predictive-modeling algorithms to work on text because you need the data organized by classes."
Based on a natural language processing platform that segments, parses, and tags text, the technology has components that extract facts and entities, code documents by category, and link events and relationships. The speech-tagging phase employs probabilistic methodology and local learning to resolve ambiguity—a real problem in extracting meaning from unstructured data, says IDC's Feldman. Inxight's disambiguation technology combines lexical transducers with Hidden Markov Models. The coding process employs a hybrid of three approaches to taxonomy and categorization: hierarchical relationships, rules in Boolean queries, and learning from example documents. The LXPlatform, which is used for tokenization, segmentation, stemming, and parts-of-speech tagging, supports 31 languages. The entity extraction component supports seven languages.
Inxight's technology is used in counter-terrorism intelligence efforts. "Intelligence agencies have a lot of individual pieces of data that are not in a database—individual messages from an operative or a military guy stationed somewhere," Hersey says. "The task is how do you digest them—people typically look for a number of views into the data." Inxight's technology "gives them a way to see the trends and the linkages in the data," he says.
Businesses can also examine unstructured data for marketing purposes. Finding and analyzing meaning in emails, call-center transcripts, Web pages, blogs, and message board postings, Intelliseek helps companies track market trends. The company's Brandpulse software also offers sentiment analysis, a way to gauge how customers feel about a product. Brandpulse measures customers' written responses to determine whether they referred to a product positively or negatively, and it measures any changes in the references' polarity.
First, the technology gathers and discovers content on the basis of user-specified criteria. For example, a digital-camera company would indicate product aspects to monitor such as model, accessories, and features. If the user wants to learn about customer response to lens features, the software will use algorithms to determine relevant vocabulary that occurs in the context of the text, such as zoom and shutter speed. "The system will ferret these things out and pop it up for the user to pick," says Intelliseek CTO Sundar Kadayam. Brandpulse accomplishes disambiguation through a relevance filter, which is set up through a machine learning process as the user indicates good and bad content examples. The system figures out the relevant patterns that represent good examples and then, using rules or machine learning, sets up a taxonomy that specifies the features of the product the user is interested in.
Once the content is gathered, a natural language processing approach provides sentiment analysis. A lexicon that covers both positive and negative expressions of sentiment in English drives the system. A tool set configures the lexicon for a particular domain or customer, such as the phrase "this camera rocks" for a product in the photography industry. The system puts together associated product references, such as "I love the zoom," and "I don't like how the shutter is working," in a structured form, using machine learning.
The system aggregates the now structured and related comments across multiple documents or pieces of information and applies metrics. The polarity metric represents on a 1-to-10 scale the positive or negative sentiment on a topic associated with a product. Once computed, the polarity metric can be compared over time and across brands or components. The influence metric, based on citation analysis, assigns appropriate weights to various sources of commentary depending on their ability to have the greatest reach or impact on the sentiment of other users who might be reading them. Volume of posting is another measure. "A person who only blogs about politics is going to be far more influential communicating a political opinion versus talking about a product," says Kadayam. The analyzed data is finally pushed into a reporting tool.
Feldman says the hybrid systems offer flexibility. "You start out with this sort of a reference shelf of rules and domain knowledge that defines words according to context. Then you may have an AI component in there so that as new things come along, it will recognize new things and attempt to find similar things and create new buckets."
However, she also cautions against taking these hybrid applications for granted. "AI programs are innocuous when people understand that we're looking at large volumes of data and trying to determine gross trends," Feldman says. "Brand reputation is a good example of that—if you've got lots and lots of negative comments about a brand, that may mean something. But, making some sort of deduction about one of those people making the comments, that's where it gets scary." That single data point might be an outlier, and the technology relies heavily on statistics—which is what computers do best.
This distinction holds particular weight concerning privacy issues, particularly when name, address, and other personal information is collected on the Internet, if generalizations are applied to individuals. As for email and other forms of text generated at work, Feldman says, "In the enterprise, all bets are off as far as privacy goes."
Computer Vision Strives for 20/20
Approaches to advancing machine-enabled sight have varied widely, yielding a spectacular array of possible applications, from ocean science and robotics to crime scene investigation and space exploration.
Visual search engines
In the era of Google and Yahoo, it's perhaps not surprising that much computer vision research is being directed toward visual search engines—using images to search for information. According to Paolo Pirjanian, chief scientist at Evolution Robotics, a robotics software developer, the interest in visual search engines is a response to market demands.
"As digital photography and camera phones have become more mainstream, a very high volume of images is being constantly created, so it becomes necessary to more efficiently sift through the images," says Pirjanian.
In response, Evolution developed ViPR (Visual Pattern Recognition) technology, which contains a computer vision algorithm that can recognize patterns in an image. Explains Pirjanian, "First, you build a database with images that you want the system to recognize—such as pictures of buildings, monuments, or products—and store them in a database. The next time you take a picture that contains any of these images, ViPR will analyze the image, compare it to the database, and find the correct set of matches."
Sony's AIBO robotic pet uses ViPR to support autonomous behavior. The robot recognizes its charging station and returns to it when its battery runs low. It can also recognize and respond to commands on owners' cue cards. For instance, if the owner shows a flash card representing dance, the robot will dance.
Wireless communication is another possible application, albeit farther down the road. Evolution is working on camera phone installations to enable applications for global travel. "You snap pictures with your camera phone, which are sent to an application server," Pirjanian says. "ViPR recognizes the objects in the images and returns relevant information, such as currency conversion or tour guide information."
Mission to Mars
MDA Space Missions is taking another approach to using AI-enhanced computer vision. The company's iSM (Instant Scene Modeler) device creates photorealistic 3D models automatically with a handheld stereo camera.
"A stereo camera and laptop computer are the only apparatus required, and the user can move the camera freely in all six degrees of freedom (DOF) in an unknown environment," explains Stephen Se, an MDA research scientist. The system recovers the six-DOF camera motion and automatically creates a 3D model in minutes. Se says this is preferable to wheel odometry, which is prone to slippage.
One obvious application of iSM is space exploration. Planetary rovers must create 3D models of unknown environments for navigation and obstacle avoidance. Additionally, 3D models can be sent back to Earth for mission planning. According to Se, iSM can achieve significant data reduction. "It is more efficient to send back [a 3D model] due to the limited bandwidth," he says.
Although the technology hasn't yet been deployed in space, MDA is currently proposing use of this technology in Mars rover missions.
A patchwork algorithm
Both ViPR and iSM use David Lowe's SIFT (Scale Invariant Feature Transform) algorithm, which extracts multiple features from a single image that each describe a local portion of the image.
Lowe uses a patchwork analogy to explain how SIFT works. The technology divides a larger image into hundreds of smaller images or "patches" (approximately 12 × 12 pixels) and looks for unique information in each smaller image. When it analyzes a new image, it accumulates information about each image's unique features to eventually find a match against a database of training images.
"What makes the features particularly effective is that they are each invariant to rotation of the image or change in resolution, brightness, or contrast," says Lowe, a computer science professor at the University of British Columbia. "Because the features are individually matched, extra background clutter or partial occlusion of an object do not prevent matching."
Developing the algorithm has been replete with challenges, which have encouraged improvement on the original formula. "The most difficult aspect is making features invariant to more changes, such as changing illumination or 3D rotation," says Lowe. "We are currently testing a range of image properties on databases of real images to understand which ones will produce the best invariance."
Matter of probabilities
One of SIFT's most important features, according to Lowe, is that it doesn't need to analyze all the patches to make a match, just enough of them. "It uses a small number (as low as 1 percent) to make a match," he says. "So it's using probabilities."
This makes sense to Thomas J. Anastasio, associate professor of molecular and integrative physiology at the University of Illinois at Urbana-Champaign's Beckman Institute for Advanced Science and Technology. Anastasio believes that AI has been moving from rule-based methods to probabilistic methods. He says this is no less true in computer vision research.
Anastasio's interdisciplinary research team has developed a computer model of how the brain combines inputs from the various sensory systems (including sight) and transformed it into a method called IMF (Intelligent Multisensor Fusion).
"Our system is also probabilistic," he says. "It uses inputs from different sensors to estimate, at each discrete location in the environment, the probability that an event has occurred."
The team has implemented IMF in a self-aiming camera. Possible applications of this device include security (detecting and tracking intruders), teleconferencing, and auto safety (detecting oncoming vehicles in collision avoidance systems). "Our goal is to transition our ideas from the lab to the marketplace," Anastasio says. "We have not had any deployments yet, but we hope to soon."
The road ahead
What does the future hold for machine-enabled sight?
"The long-term goal of computer vision is to do what human vision can do," says Lowe. "The next goal is recognizing an entire category of objects, such as cars or chairs, which requires the application of machine learning methods that can determine relevant features automatically from a training set of images." According to Lowe, progress is being made on this problem, with systems for recognition of faces, pedestrians, and other categories garnering reliable results.
The biggest challenge, says Duane R. Edgington, software engineering group leader of the Monterey Bay Marine Research Institute, is that the field is too theoretical. "Computer vision and AI today is much focused on basic research; the leading-edge developments are rarely challenged with real problems," says Edgington. "I see work after work where the images processed are the corridors in universities or the insides of a research lab. I think the machine vision community would be well served by paying attention to opportunities for real applications."