NEWS


Computing Now Exclusive Content — January 2010

News Archive

July 2012

Gig.U Project Aims for an Ultrafast US Internet

June 2012

Bringing Location and Navigation Technology Indoors

May 2012

Plans Under Way for Roaming between Cellular and Wi-Fi Networks

Encryption System Flaw Threatens Internet Security

April 2012

For Business Intelligence, the Trend Is Location, Location, Location

Corpus Linguistics Keep Up-to-Date with Language

March 2012

Are Tomorrow's Firewalls Finally Here Today?

February 2012

Spatial Humanities Brings History to Life

December 2011

Could Hackers Take Your Car for a Ride?

November 2011

What to Do about Supercookies?

October 2011

Lights, Camera, Virtual Moviemaking

September 2011

Revolutionizing Wall Street with News Analytics

August 2011

Growing Network-Encryption Use Puts Systems at Risk

New Project Could Promote Semantic Web

July 2011

FBI Employs New Botnet Eradication Tactics

Google and Twitter "Like" Social Indexing

June 2011

Computing Commodities Market in the Cloud

May 2011

Intel Chips Step up to 3D

Apple Programming Error Raises Privacy Concerns

Thunderbolt Promises Lightning Speed

April 2011

Industrial Control Systems Face More Security Challenges

Microsoft Effort Takes Down Massive Botnet

March 2011

IP Addresses Getting Security Upgrade

February 2011

Studios Agree on DRM Infrastructure

January 2011

New Web Protocol Promises to Reduce Browser Latency

To Be or NAT to Be?

December 2010

Intel Gets inside the Helmet

Tuning Body-to-Body Networks with RF Modeling

November 2010

New Wi-Fi Spec Simplifies Connectivity

Expanded Top-Level Domains Could Spur Internet Real Estate Boom

October 2010

New Weapon in War on Botnets

September 2010

Content-Centered Internet Architecture Gets a Boost

Gesturing Going Mainstream

August 2010

Is Context-Aware Computing Ready for the Limelight?

Flexible Routing in the Cloud

Signal Congestion Rejuvenates Interest in Cell Paging-Channel Protocol

July 2010

New Protocol Improves Interaction among Networked Devices and Applications

Security for Domain Name System Takes a Big Step Forward

The ROADM to Smarter Optical Networking

Distributed Cache Goes Mainstream

June 2010

New Application Protects Mobile-Phone Passwords

WiGig Alliance Reveals Ultrafast Wireless Specification

Cognitive Radio Adds Intelligence to Wireless Technology

May 2010

New Product Uses Light Connections in Blade Server

April 2010

Browser Fingerprints Threaten Privacy

New Animation Technique Uses Motion Frequencies to Shake Trees

March 2010

Researchers Take Promising Approach to Chemical Computing

Screen-Capture Programming: What You See is What You Script

Research Project Sends Data Wirelessly at High Speeds via Light

February 2010

Faster Testing for Complex Software Systems

IEEE 802.1Qbg/h to Simplify Data Center Virtual LAN Management

Distributed Data-Analysis Approach Gains Popularity

Twitter Tweak Helps Haiti Relief Effort

January 2010

2010 Rings in Some Y2K-like Problems

Infrastructure Sensors Improve Home Monitoring

Internet Search Takes a Semantic Turn

December 2009

Phase-Change Memory Technology Moves toward Mass Production

IBM Crowdsources Translation Software

Digital Ants Promise New Security Paradigm

November 2009

Program Uses Mobile Technology to Help with Crises

More Cores Keep Power Down

White-Space Networking Goes Live

Mobile Web 2.0 Experiences Growing Pains

October 2009

More Spectrum Sought for Body Sensor Networks

Optics for Universal I/O and Speed

High-Performance Computing Adds Virtualization to the Mix

ICANN Accountability Goes Multinational

RFID Tags Chat Their Way to Energy Efficiency

September 2009

Delay-Tolerant Networks in Your Pocket

Flash Cookies Stir Privacy Concerns

Addressing the Challenge of Cloud-Computing Interoperability

Ephemeralizing the Web

August 2009

Bluetooth Speeds Up

Grids Get Closer

DCN Gets Ready for Production

The Sims Meet Science

Sexy Space Threat Comes to Mobile Phones

July 2009

WiGig Alliance Makes Push for HD Specification

New Dilemnas, Same Principles:
Changing Landscape Requires IT Ethics to Go Mainstream

Synthetic DNS Stirs Controversy:
Why Breaking Is a Good Thing

New Approach Fights Microchip Piracy

Technique Makes Strong Encryption Easier to Use

New Adobe Flash Streams Internet Directly to TVs

June 2009

Aging Satellites Spark GPS Concerns

The Changing World of Outsourcing

North American CS Enrollment Rises for First Time in Seven Years

Materials Breakthrough Could Eliminate Bootups

April 2009

Trusted Computing Shapes Self-Encrypting Drives

March 2009

Google, Publishers to Try New Advertising Methods

Siftables Offer New Interaction Model for Serious Games

Hulu Boxed In by Media Conglomerates

February 2009

Chips on Verge of Reaching 32 nm Nodes

Hathaway to Lead Cybersecurity Review

A Match Made in Heaven: Gaming Enters the Cloud

January 2009

Government Support Could Spell Big Year for Open Source

25 Reasons For Better Programming

Web Guide Turns Playstation 3 Consoles into Supercomputing Cluster

Flagbearers for Technology: Contemporary Techniques Showcase US Artifact and European Treasures

December 2008

.Tel TLD Debuts As New Way to Network

Science Exchange

November 2008

The Future is Reconfigurable

Internet Search Takes a Semantic Turn

by George Lawton

Search has become one of the Internet's most important technologies, as evidenced by the rise of Google as one of the world's most important technology companies. 

Google built its success on its PageRank algorithm, which combines keyword search with sophisticated technology for determining the relevancy of Web pages that represent potential search results based on the number of links that point to the pages. 

Despite this success, keyword search has many limits caused by its inability to process the meaning of queries and Web pages. Because of potential confusion over the meaning of words, traditional searches generally return large numbers of pages, including many irrelevant to a query. Furthermore, keyword-based approaches let search-optimization techniques artificially make hackers' or other irrelevant pages rise to the top of search results.

Semantic search would solve many of these problems, said Kathleen Dahlgren, chief technology officer of Cognition Technologies, a vendor of semantic-based text-processing technology. Semantic-search tools use document tags and topic-based indexes of material to create a model that represents what various pieces of content mean. This lets a search engine more precisely respond to a query by disambiguating the multiple meanings of words in a document and determining how they relate to one another within a sentence. Semantic search could be the Semantic Web's killer app, said Peter Mika, a researcher and data architect at Yahoo! Research in Barcelona. 

There are now several types of semantic search approaches. However, despite the technology's promise, it must clear numerous hurdles before it can be widely adopted. 

Pushing Semantic Search

Interest in semantic search is growing because it promises to help make the finding of relevant information online via queries easier, quicker, and more effective. 

Traditional keyword search — in which applications look for instances of query keywords in online documents — rose to prominence because it is efficient and good at simple searches. 

In the early days of search, in the mid-1990s, Yahoo! rose to prominence with a directory in which human experts assigned various websites to topic-based categories. When there were far fewer websites than there are now, this approach was accurate and efficient. However, this approach didn't work well as the number of websites increased. 

Google's approach, on the other hand, could work well with a far larger number of documents. But a simple Google search can still yield thousands or millions of results, including many irrelevant to the original query. This occurs because many words have multiple meanings. For example, a search for "tank" could return web pages on water containers or military vehicles.

More precise queries could help address this problem, but the lack of semantics can still make it difficult for even Google's search engine to return accurate responses with few irrelevant results, said Cognition's Dahlgren.

Under the Hood

Semantic search is most effective for complex queries, such as those involved in medical or scientific research, and legal discovery. The concepts behind semantic search were formulated years ago.

History

In the 1980s, Xerox began experimenting with various automated natural-language-processing (NLP) technologies that could parse sentences and create a semantic representation of them. 

In 1998, Tim Berners-Lee laid out a plan to create a Semantic Web by adding information about the meaning of documents that could be stored with online content. 

The technology

Semantic search operates on the meaning of words rather than just how they read as text. Basically, either a human or computer creates a semantic model of a document based on the Resource Description Framework and the Web Ontology Language (OWL), or on proprietary formats. 

RDF is a World Wide Web Consortium (W3C) standard, providing an XML-based framework for metadata description and interchange. OWL, a markup language for publishing and sharing data online using ontologies, represents the meanings of terms and the relationships between them within sentences in a way that software can process.

Semantic tags create machine-readable code about Web page elements. For example, microformats — a Web-based approach to semantic markup — use HTML tags to label metadata about items on a Web page, such as whether a snippet of text refers to a displayed product's price, size, or color.

A document's semantic model is stored as a semantic index. This specially crafted index of all documents that a search engine has processed includes the context and meaning of words in the documents.

Interfaces. Search engines use an interface that lets users query the index to find either a relevant document or a relevant part of a document.

The simplest interfaces let users narrow their searches semantically by checking boxes that identify categories of concepts that must be included or excluded in the search. For example, a search of medical conditions for doctors would let them specify the symptoms a patient has and exclude those they don't.

Other engines let users enter search terms that the system parses via NLP techniques to determine their meaning. The system then uses this information to search an index generated by parsing a collection of documents to find those relevant to the query. 

Indexing. Some tools — such as Yahoo!'s Search Monkey and the University of Maryland, Baltimore County's Swoogle — require users to manually enter tags to describe a document so that they can be indexed. Newer technologies use NLP technologies to parse documents and automatically convert text into a complex semantic network representing the relationship among concepts in the material.

According to Cognition's Dahlgren, semantic search engines generate indexes in four primary ways.

Manual tag-based systems such as Search Monkey and Swoogle generally use RDF- and OWL-generated information about documents and text within documents to create indexes. Document creators manually write tags to their web pages or to data within documents, which can be understood by RDF-compatible semantic-search engines such as Search Monkey.

Statistical systems such as Autonomy use Bayesian or Latent Semantic Indexing to guess at the meanings of words in a query or document. These techniques analyze documents and statistically identify relationships between words and sets of words in documents, thereby improving semantic accuracy and indexing. 

Ontology-based systems like those from Cataphora, Hakia, and Stratify organize language into an ontology. They use the ontology to automatically classify text in documents, e-mails, instant messages, and other sources into semantic categories, used to generate the index and help with subsequent searches. 

Linguistically based systems made by companies such as Cognition Technologies and Expert System use linguistic rules and mathematic associations of words in a document to automatically parse the meaning of text in a document, Web page, or other source into an ontology or semantic network. 

Obstacles

Despite its promise, semantic search faces numerous obstacles, such as the increased cost of the software and the increased processing and storage overhead.

Creating detailed semantic indexes entails up to 100 times the computational overhead of building traditional search indexes and storing them uses up 10 times as much hard drive space, said Yaniv Golan, chief technology officer of semantic-search provider Yedda. This computational overhead causes slower performance.

Adding semantic tags to documents makes more work for website authors. Organizations have been trying without much success to get authors to tag documents for decades, Dahlgren said. Wider adoption probably won't happen until there are systems that tag documents on the fly, she added. 

Semantic search must cope with the changing semantic landscape, marked by new words, the changing meanings of words, and changing associations among words. Semantic search systems will have to be adaptable enough to understand these changes quickly, said Golan.

Other challenges include the lack of perceived need for the technology by some potential users, user unfamiliarity with the approach, concern about it being new and untested, and lack of a proven business model.

Moving Ahead

Semantic-technology use is growing quickly, according to Lehigh University associate professor Jeff Heflin. About 4 billion pieces of data have been tagged via RDL and OWL, he added. This enables users to conduct semantic search on more documents.

In his research, Heflin, who directs Lehigh's Semantic Web and Agent Technologies Lab, is studying ways to make ontologies interoperate so that machine-based agents could generate answers to queries by synthesizing information from multiple sources. Ontologies frequently don't interoperate because authors use different categories and systems for describing their elements.

Now, said Yahoo!'s Mika, "Fully integrated semantic search engines such as Sindice and SWSE (Semantic Web search engine) are implementing the entire process from crawling to indexing, ranking, and visualization." Developers are experimenting with interfaces that allow a deeper understanding of content by creating maps, timelines, charts, and tables, he added.

The W3C has proposed RDFa (RDF in attributes), which would add semantics to the Web via extensions to XHTML (extensible HTML) that embed rich metadata about words, their meanings, and their context within documents.

Google has implemented a rich snippets feature for its search engine that uses RDFa tags created by website authors to better index documents. The engine will use RDFa to extract the meaning of words in a document to provide more detail in its query responses.

Companies such as Autonomy, Cognition Technologies, and Expert System are selling semantic tools to help companies and large organizations better index their data, which could encourage wider semantic-search use. 

Organizations are using semantic search to find information to help with activities such as knowledge management, intelligence about competitors, scientific and medical R&D, and self-service customer support, said Expert System CEO Brook Aker.

Semantic search has a long way to go to reach its potential, Heflin said. For example, said Mika, significant adoption of Semantic Web standards began only during the last two years.

Semantic search's adoption might be somewhat limited if the technology depends on manually, rather than automatically, generated tags, according to Aker. User expectations and the technology itself might be limited until sophisticated language-processing techniques become more widely used, he added. 

One of the most significant questions for the field is how the technology will compete with or complement search engines from the major providers such as Google. Dahlgren said the major players have built an infrastructure that would be difficult to change and thus have a vested interest in maintaining traditional search approaches. 

The semantic-search technology that automatically parses text will be popular for complex, specialized corporate, enterprise, or scientific uses but would entail too much computational and storage overhead for general Web searches. Approaches that involve manual tagging create less overhead and thus will work better with Web searches.

In the long run, Heflin predicted, semantic search will never replace traditional search because there will always be content that is difficult to represent semantically. Thus, he said, semantic search will become a complementary technology to traditional search approaches.

Nonetheless, Mika said, "I'm hopeful that we have managed to kick-start a positive cycle."

George Lawton is a freelance technology writer based in Monte Rio, California. Contact him at glawton@glawton.com.