Posts Tagged ‘search engine’

Search Engine Query, Parsing Improved

Saturday, February 27th, 2010

Improved search engine query parser and with that more accurate search results, that’s what you can expect from last days updates.

I just had the privilege of spending a day programming instead of doing marketing. I used the time to improve the search engine query parser to find long keyphrases in a query and to remove redundant shorter phrases that are really just a sub phrase of one of the long phrases.

The problem with the old search engine query code was that it would sometimes omit common words even though they were essential to the search engine query. It could omit “best” in the query best hosting providers, and that changes the meaning of the query.

It also split search engine queries into far too many sub-queries.

Like in this search for Hotels in New York City, it’s split the old query parser into “hotels”, “in”, “new”, “York”, “City”, “Hotels in”, “New York”, “in New York”, and “New York City” which is a lot of redundant queries and may end up giving New York too much weight at the cost of Hotels

With the new search enging query parser in place the query for “Hotels in New York City” will omit “in”, “city”, “New York” and even “New York City” as they are sub phrases of the longer “in New York City”

Not that the new query parser is perfect, it still fails to search for “Location: New York City”, Hotels in Manhattan or an address in New York City.

The solution to this of course is the planned synonym logic that will make a search for “in New York City” also query alternatives like “in NYC”, “of New York City” “in Manhattan” etc.

It will be some time though before I have synonyms running as a fair amount of processing power is needed to determine synonyms form the main index. I currently have one Linode working on it and I expect to be able to test the algorithm in a month or two.

Update: I found an interesting article from SEO by the Sea describing a Google patent on Search Engine Query Statistics. The huge volume of search queries that Google has in it’s logs lets the draw a lot of useful conclusions on word relationships and user behavior. I would love to have that much data in my query log :)

Please leave you comment and tell me what you think about the new search engine query parser and please also leave a comment if you notice any interesting phenomenons with the new search query analyzer :)

Simon
Secret Search Engine Labs

Thanks to Server Upgrade We Can Now Index Up To 300,000 pages

Wednesday, December 16th, 2009

A couple of days ago we upgraded our VPS server from a Linode 360 to a Linode 720 which means we now have doubled our indexing capacity and you should see a steady growth in index site during the coming month.

In reality even though we doubled the server resources the indexing speed has, according to the stats, increased from about 6,000 pages/day to around 10,000 pages per day. I believe the reason we don’t see a doubling of indexing speed is the fact that we still run the indexer on a single thread which means we are spending too much time waiting for network transfer and disk access,  not utilizing the CPU even if we could.

Which just makes it that much more important to get the multi-threaded indexer implemented… just need to get some marketing done first.

The migration to the new servers went really smoothly. With the system that Linode uses you just select the upgrade you want from the online interface and once you have clicked the “Yes” button they instantly and automatically shut down the site and copy everything to the new bigger system.  It all took 12 minutes of downtime plus a couple of minutes to reconfigure mysql for the larger memory size.

If you’re looking to get a VPS server that is easy to upgrade Linode is a great alternative and you can support this search engine by signing up for a Linode using this link.

Linode will support us with $20 for every one of you that stays a Linode customer for more than 90 days.

Please tell me what you think about the quality of the search results by leaving a comment below!

Simon Byholm
CEO and founder,
Secret Search Engine Labs