Posts Tagged ‘query’

Search Engine Query, Parsing Improved

Saturday, February 27th, 2010

Improved search engine query parser and with that more accurate search results, that’s what you can expect from last days updates.

I just had the privilege of spending a day programming instead of doing marketing. I used the time to improve the search engine query parser to find long keyphrases in a query and to remove redundant shorter phrases that are really just a sub phrase of one of the long phrases.

The problem with the old search engine query code was that it would sometimes omit common words even though they were essential to the search engine query. It could omit “best” in the query best hosting providers, and that changes the meaning of the query.

It also split search engine queries into far too many sub-queries.

Like in this search for Hotels in New York City, it’s split the old query parser into “hotels”, “in”, “new”, “York”, “City”, “Hotels in”, “New York”, “in New York”, and “New York City” which is a lot of redundant queries and may end up giving New York too much weight at the cost of Hotels

With the new search enging query parser in place the query for “Hotels in New York City” will omit “in”, “city”, “New York” and even “New York City” as they are sub phrases of the longer “in New York City”

Not that the new query parser is perfect, it still fails to search for “Location: New York City”, Hotels in Manhattan or an address in New York City.

The solution to this of course is the planned synonym logic that will make a search for “in New York City” also query alternatives like “in NYC”, “of New York City” “in Manhattan” etc.

It will be some time though before I have synonyms running as a fair amount of processing power is needed to determine synonyms form the main index. I currently have one Linode working on it and I expect to be able to test the algorithm in a month or two.

Update: I found an interesting article from SEO by the Sea describing a Google patent on Search Engine Query Statistics. The huge volume of search queries that Google has in it’s logs lets the draw a lot of useful conclusions on word relationships and user behavior. I would love to have that much data in my query log :)

Please leave you comment and tell me what you think about the new search engine query parser and please also leave a comment if you notice any interesting phenomenons with the new search query analyzer :)

Simon
Secret Search Engine Labs