“Semantic” is a word with a magic ring to it in search engine circles. The way it’s hyped makes you suspect it is the second coming of search. Hypes make me skeptical and I have been biding my time, waiting for the technology to mature. Now the time has come and I’m glad to present the top 5 semantic search engines.
What is semantic search?
A semantics search engine attempts to make sense of search results based on context. It automatically identifies the concepts structuring the texts. For instance, if you search for “election” a semantic search engine might retrieve documents containing the words “vote”, “campaigning” and “ballot”, even if the word “election” is not found in the source document.
An important part of this process is disambiguation, both of the queries and of the content on the web. What this means is that the search engine — through natural language processing — will know whether you are looking for a car or a big cat when you search for “jaguar”.
The five search engines below all use semantic analysis to sift through and present data. But, as you will see, they do not do this in the same way and present five different products.
When to use semantic search engines
Semantic search has the power to enhance traditional web search, but it will not replace it. A large portion of queries are navigational and semantic search is not a replacement for these. Research queries, on the other hand, will benefit from semantic search.
Read on to see our list of the top 5 semantic search engines and learn how they can improve your search experience.
Hakia
Hakia is a general purpose semantic search engine, as opposed to e.g. Powerset and Cognition (below), that search structured corpora (text) like Wikipedia.
Hakia search results are organized in tabs: Web results, credible sites, images and news. Credible sites refer to results from sites that have been vetted by librarians and other information professionals invites by Hakia to identify credible web sites.
For some queries (typically popular queries and queries where there is little ambiguity), Hakia produces resumes. These are portals to all kinds of information on the subject. Every resume has an index of links to the information presented on the page for quick reference.
The elements of these resumes will vary according to the nature of the query (e.g. biography, bibliography, timeline etc. for persons, government, economy, culture etc. for countries). Resumes are excellent for researching a topic and are my favorite Hakia feature.
Often, Hakia will propose related queries, which is also great for research.
For instance, if I search for Barack Obama, Hakia suggest I might be interested in information about Michelle Obama, Hillary Clinton, Democrats, Sarah Palin, John McCain, John Sununu and Joseph R. Biden Jr. as well.
For some queries Hakia presents really poor results, but it is still in beta and is improving rapidly.
SenseBot
SenseBot is a web search engine that summarizes search results into one concise digest on the topic of your query. The search engine attempts to understand what the result pages are about. For this purpose it uses text mining to analyze Web pages and identify their key semantic concepts.
This way SenseBot helps you get a better grasp of what the relevant term is about. In this way you do not have to go through a large number of web pages and comb through results with incomprehensible expert definitions (or any definitions at all).
The summary serves as a digest on the topic of your query, blending together the most significant and relevant aspects of the search results. It contains a tag cloud, relating your query to other relevant concepts and a list of sentences believed to define or describe your query. Each sentence is followed by a link to the source.
Not all of the summaries are informative or even intelligible, but that is likely to improve; Like Hakia, SenseBot is in beta. This is bleeding edge technology — it’s evolving as we speak.
Read our review of SenseBot.
Powerset
Powerseet is at present not a regular web search engine. It works best on smaller, relatively structured corpora.
The technology offers a comprehensive view of such information. You can test it on Wikipedia and Powerset definitely excels at this, structuring the information and presenting it in a way that, for research purposes, is a great improvement on Wikipedia’s own search engine.
You can enter keywords, phrases, or simple questions in the search box. On the search results page, Powerset often answers questions directly. My favorite feature is the way it aggregates information from across multiple articles.
“Factz” is a box that often appears in the search results and is a set of suggestions for reference queries based on the information available. For instance, when I search for Obama, Powerset offers links to information on what Obama has said about Robert Gates, Middle East, Pakistan, trade and more. Clicking one of these links brings up a box in the search results page with the actual words said by Obama and links to the articles in which the quotes appeared.
DeepDyve
DeepDyve DeepDyve is a powerful, professional research tool available for free for the general public.
It is a research engine that lets you access expert content from the “Deep Web”, the part of the Internet that is not indexed by traditional search engines (e.g. databases, journals etc.).
Researchers, students, technical professionals, business users, and other information consumers can search Wikipedia or deep web resources within these categories: Life Sciences and Medical, Physical Sciences, Humanities and Social Sciences, Business and Finance, Patents, Legal, Clean Technology and Energy, IT and Engineering.
Research sites’ search engines often rely on Boolean languages or hard-coded taxonomies, which constitutes a threshold and makes them hard to use (or even inaccessible) to anyone but insiders. DeepDyve is an advanced yet easy interface to these valuable sources of information.
Your query can consist of anything from a single word to 25 000 characters. The search results are presented in a complex manner with many advanced options for refining, sorting or saving your search. Despite the complexity, the search results are relatively easy to navigate.
Cognition
Cognition has a search business based on a semantic map, built over the past 24 years, which the company claims is the most comprehensive and complete map of the English language available today. It is used in support of business analytics, machine translation, document search, context search, and much more.
You can use Cognition’s technology to search one of four bodies of information:
• Public.Resource.org (currently 1,858 volumes consisting of 675,704 files of federal case law in XHTML format). The release comprises US Supreme Court Decisions and Court of Appeals decisions from 1950 on.
• MEDLINE (Medical Literature Analysis and Retrieval System Online) Abstracts: abstracts for life sciences and biomedical information from an international literature database. It covers the fields of medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care, as well as fields with no direct medical connection, such as molecular evolution (currently 18,005,903 files).
• The English version of Wikipedia
• The complete New English Translation including text and translator notes of the Gospels of Matthew, Luke, John and Mark.
We tested Cognition on Wikipedia. On this huge volume of text, Cognition is especially useful for sorting out meaning in complex queries:
• Phrases like “historical houses of worship & historical temples”
• Meaning: “worker on strike” vs. “strike oil in California”
• Classes like “Indian tribes of Latin America” or “diseases of North American trees”
The technology that goes into solving queries like this is impressive and Cognition gives you valuable control over the assigning of meaning and classes in a user friendly way.
The presentation of the search results is less than perfect, though, and I wish the cognition team would learn from Hakia or Powerset in this regard.
Thursday, July 15, 2010
Subscribe to:
Posts (Atom)