ITPRO

Printed from www.itpro.co.uk

Register to receive our regular email newsletter at http://www.itpro.co.uk/reg/register.

The newsletter contains links to our latest IT news, product reviews, features and how-to guides, plus special offers and competitions.

Skip to navigation

    Why enterprise search is not internet search

We explain why you can’t always get the best search results for your business from Google.

By Mary Branscombe, 4 Dec 2008 at 14:55

"A sophisticated enterprise search tool doesn’t only deal with the document, but it extends what we know about a document much further than you can on the web. Who it was written for, where it was published, who was the editor, who quotes from it. Even questions like 'have we worked with them in the past', 'were they under contract', and if so, 'was it delivered on time and how much were they paid'? This is far more sophisticated knowledge than just file metadata.”

Recommind used a technique called Probabilistic Latent Semantic Analysis, which are statistical models that the system builds from your documents.

“It will look at the language and derive meanings, themes and concepts from within content, then relate them to similar concepts in a different batch of documents,” says Carpenter.

Autonomy uses Bayesian statistical models, similar to those used to filter spam, to determine the categories of documents. FAST uses a semantic index that can restrict the scope of a concept to a sentence or paragraph to get a more accurate answer. For example, a document might talk about both orange (the fruit) and orange (the colour) but a paragraph is more likely to be about one or the other. It also extracts ‘entities’ like names, phone numbers, addresses and companies.

Microsoft does face the challenge of integrating FAST with SharePoint, but the core Enterprise Search Platform remains popular. Future Microsoft products may also use technology from another acquisition, Powerset. This startup uses natural language processing to extract concepts from documents so it can build a semantic index and a conceptual graph of relationships between them; the Powerset site (and iPhone app) searches Wikipedia but the concept should translate to documents.

Although Google is experimenting with machine learning with the Google Sets feature, the results you can see from this on Google’s public site today are not sophisticated. A list of animals is currently likely to include terms like wood and plastic because they’re found on web pages dealing with toy animals. The web is a rich source of training data for machine learning, so Google may well improve its classification in the future.

More than the web

If you run the same query on the public web via Google and on your internal corporate systems, you should get very different results, because you want what your company knows about the topic, not just what’s common knowledge.

Enterprise search needs to integrate with document repositories, corporate databases, ERP and CRM systems, email, call centre and customer support systems, directories like LDAP and Active Directory, your HR and accounting systems and everywhere else you store business information. And it needs to find experts as well as the documents they’ve written.

InQuira offers both enterprise search and customer support systems that are used by Avis, Honda and Apple.

Serena software is another customer. Chief executive Jeremy Burton is also on InQuira’s board of directors and what he values is not just the natural language search engine but the ability to correlate information from different systems.

Email to a friend

Print this page

Be the first to comment on this article

You need to Login or Register to comment.

For more details about purchasing this feature and/or images for editorial usage, please contact Jasmine Samra on pictures@dennis.co.uk

    You may also like...

advertisement

    Latest Management How Tos & Tutorials

BlackBerry Messenger

A guide to BlackBerry Messenger 5.0

Andrew Williams guides us through the range of new features available in BlackBerry Messenger 5.0.

Read more

 
advertisement

    Whitepapers

Want more background on today's hottest IT trends?

Visit IT PRO's whitepaper library for more on virtualisation, encryption and other topics.

Advertisement
{* ======================================= TRACKING IMAGES ======================================= Tracking images and img counters go below here. REMOVE WHEN TAKING OFF THE SKIN!! *} {literal}