Ideas for Better Search Results

April 24th, 2010

Making a good search algorithm that can’t be gamed by greedy marketers and other SEO scumbags like myself is not easy. Many doubt it’s even possible.

Just look at Google, the biggest search engine on the planet. Many of the top results are there because they were SEOed to be there. If Google can’t make unbiased search results, then who can?

I just read a great blog post by Jonathan Leger where he asks his readers for ideas for how to make a better search algorithm and I’m going to list some of the ideas I think might be useful and some additional ones I made up while reading.

Votes from web 2.0 sites

Using the votes from social sites like Digg, Twitter or StumbleUpon to help rank search was suggested and I think it’s a good idea. It’s the vote of the people and it reflects popularity.

The likely problems I see is that you can only use stories with hundreds of votes as the lower vote counts are often manipulated by marketing people.

Also the people participating in the voting is not always the same people you are making search results for. I for one do very little voting and a lot of searching and the votes cast might not be representative of what I want to find.

Bounce Rate

The bounce rate is the percent of visitors that click on a search result and then immediately hit the back button when they realize the site was not what they were looking for.

This is a factor already used by Google and I think it’s a good indicator if a search result is actually useful to those searching.

Then again this is nothing new but something to keep in mind when I do further algo changes.

Michael from better click bank analytics noted that the bounce rate can be manipulated using bot nets with automated scripts doing searches and clicking search results. This would require a significant number of IP addresses to work though as it’s easy to just count every IP address once.

Content

A lot of people suggested that great content should be the determining factor and I agree. I just don’t know about any way to determine if the content is great or not, it’s really a matter of taste and content to demographics match.

Both bounce rate, inbound links and social votes are ways to find great content without having a computer that can actually review a site an determine if it’s good or not.

Link activity

I don’t know if the big engines are doing this yet, but what if you let the browser toolbar measure how often links on a specific web page actually get clicked? This way you can rank different links according to how prominently they are placed on the page and how relevant they are to the page theme.

If the footer is stuffed with keyword rich links, nobody is going to see them or click on them and they will be discounted. The same goes for hidden links and just off topic ads.

I use a variation of this in my CashRank algorithm where I only count the first x links on a page, with x depending on how much CashRank the page has. Usually the most important links are placed first so you get an coarse emulation of actual link popularity.

A better way would be to actually render the page in a browser and measure where on the page the link is and how big it is to determine the likelyhood of it actually being clicked.

The key theme here is to improve classic link relevance by giving different links on the same page different weight based on how valuable screen and page estate they occupy.

Human Reviewers

Naturally having a real human reviewing a site will give the best results as far as removing plain spam from the results, only problem is that I can’t afford to hire 100,000 people to review all search results.

I do think however there is an idea here to have people review random sites and give statements as SPAM/NOT SPAM or COMMERCIAL/NON-COMMERCIAL and then you would feed the human reviewed results into a self learning filter and get more pages classified than is actually reviewed.

This is definitelysomething I will look into some day, could work well with a Mechanical Turk.

Deep Digging Tools / Categorization

This is something I’m working on and I think it’s one of the core issues with search. To better find relevant search terms to the one searched for and to find pages on the exact same topic that does not have the exact search term but still is relevant to the search term.

To categorize search terms and learn what they mean, or at least how they relate to other search terms will allow the search engine to provide better search results and tools to refine the search results.

Tools to Help The User Search

That’s a good one. People search in different ways, some type “dog food”, some “food for dogs” and some “dogs food” some “food dogs”. Having some tools that helps the user search in a way that would give relevant answers, some sort of easy, interactive refinement tool would be nice to try out.

Email and IM content

Using links embedded in instant messages and email is another way you could boost links as-it-happens but there’s privacy concerns of course and I don’t run an email service.

I’m going to investigate the Twitter API though if that could be used to found out what people are tweeting about.

Best Domain Name Registration

March 29th, 2010

There’s a lot of bad places to register domains online. There’s also good places, but what the best domain name registration place is for you depends on your needs.

To help you out I’m going to list the best domain name registration websites I’ve used or that have been recommended to me by people I trust.

Best Domain Name Registration Sites

GoDaddy has been recommended to me by several people. They are the best domain name registration company if you need cheap domains that work. Be prepared for a lot of upsells though.

eNom has also been recommended to me by trusted sources. They are regarded as more established and is the best domain name registration company for important domains.

Expect prices at least double that of GoDaddy though. Many people keep their most important domains with eNom and the less important domains with a cheap registrar.

Losing your main business domain because of a bankruptcy at the registrar may be the end of your business if you do most of your business online.

Resell.biz is the best domain name registration company if your want really cheap domains as a reseller. I use Resell.biz for all my domains.

The interface is clean, prices are around $6,50/year for a .com but you need to have more than 10 domains to get in here.

NameCheap and NameSecure is best domain name registration sites for low cost domains. I have seen good reviews if these but I have no experience with them myself.

To Your Success,

Simon Byholm,
CEO and Founder,
Secret Search Engine Labs

Search Engine Query, Parsing Improved

February 27th, 2010

Improved search engine query parser and with that more accurate search results, that’s what you can expect from last days updates.

I just had the privilege of spending a day programming instead of doing marketing. I used the time to improve the search engine query parser to find long keyphrases in a query and to remove redundant shorter phrases that are really just a sub phrase of one of the long phrases.

The problem with the old search engine query code was that it would sometimes omit common words even though they were essential to the search engine query. It could omit “best” in the query best hosting providers, and that changes the meaning of the query.

It also split search engine queries into far too many sub-queries.

Like in this search for Hotels in New York City, it’s split the old query parser into “hotels”, “in”, “new”, “York”, “City”, “Hotels in”, “New York”, “in New York”, and “New York City” which is a lot of redundant queries and may end up giving New York too much weight at the cost of Hotels

With the new search enging query parser in place the query for “Hotels in New York City” will omit “in”, “city”, “New York” and even “New York City” as they are sub phrases of the longer “in New York City”

Not that the new query parser is perfect, it still fails to search for “Location: New York City”, Hotels in Manhattan or an address in New York City.

The solution to this of course is the planned synonym logic that will make a search for “in New York City” also query alternatives like “in NYC”, “of New York City” “in Manhattan” etc.

It will be some time though before I have synonyms running as a fair amount of processing power is needed to determine synonyms form the main index. I currently have one Linode working on it and I expect to be able to test the algorithm in a month or two.

Update: I found an interesting article from SEO by the Sea describing a Google patent on Search Engine Query Statistics. The huge volume of search queries that Google has in it’s logs lets the draw a lot of useful conclusions on word relationships and user behavior. I would love to have that much data in my query log :)

Please leave you comment and tell me what you think about the new search engine query parser and please also leave a comment if you notice any interesting phenomenons with the new search query analyzer :)

Simon
Secret Search Engine Labs

Technorati Claim

February 27th, 2010

U84DG85X894C

This is something Techorati requries to accept our blog into the directory

Simon

Start a Business on a Shoestring Budget

February 5th, 2010

For those of your considering to set up a business or who just needs to save costs here’s some free advice.

I just read a blog post by Thom Ruhe on Entrepreneurship.org where he provides links to a lot of free and low-low cost resources you can use when setting up your business.

There’s  phone, website, marketing, legal advice and a lot more for free or almost free.

Don’t however fall into the trap of looking to get everything free. If you run a real business a good paid service will many times be the best alternative. It might give you better support, less time spent tinkering with it or better overall quality.

Some tools, like OpenOffice.org, though are pro quality even if free and there is no need searching for a paid laternative.

Search Results on Your Website, Try The Feed

January 30th, 2010

The feed I promised you last time is now ready and you can now get our search results directly on your own website, allowing you to make your own search engine or use the results as additional content on your pages.

You can test it by simply including this URL in you website as an iframe or directly form PHP or some other scripting language.

http://api.secretsearchenginelabs.com/html-feed.php?q=obama&start=2&num=3

In this example q is the query “obama”, start is 2 meaning we skip the first result and start from result 2 (not page 2), num is 3 meaning we show only three results, namely the results number 2, 3 and 4

A normal page one query for “hosting” would be

http://api.secretsearchenginelabs.com/html-feed.php?q=hosting&start=1&num=10

The results are plain HTML with CSS calsses that allows you to customize the look and feel of the results.

There’s a link back to Secret Search Engine Labs at the bottom of the results which you are required to display.

Please tell me you experience with this, as I’d be happy to improve on the interface if you find some issues!

Happy searching!

Simon
CEO and founder,
Secret Search Engine Labs

Big Server Upgrade Lets Us Increase Index Size and Paves Way for New Algorithms and Public API

January 22nd, 2010

Yesterday I completed the promised server upgrades, upgrading the main server from a Linode 720 to a Linode 1080 and adding a new Linode 360 server to host the API (I’ll tell you more about that later) and for doing special data processing that is outside the normal indexing cycle.

So what does this mean for you?

The main thing you’ll notice is that the index will get bigger. I can’t tell you how much though as this is impossible to predict, but I really hope we will be able to reach 300,000 pages with this setup.

There’s two things that will allow the undex to grow:

1. With more RAM, 1080MB instead of 720MB the updating of the index database will be quicket letting the robot index more pages in a month.

2. As there is less VPS nodes sharing the same server there should be more disk and CPU cycles available to Us. This is not cut in stone though as it depends a lot on what usage profile the other nodes have and I don’t know that yet.

The index will grow, great. What about that other server then?

The other server, the Linode 360, will be used to host the new API/Feed that I will announce soon. I will make a feed of the search results available for free letting you make your own search engine, use it as additional content for your directory or anywhere that you want to give your visitors some relevant websites to visit. But more about this later.

The other mission for the new server will be to do calculations of data sets that will support the main indexing.

Mission one will be to make a related keywords database to allow us to find sites about “New York City” when someone writes “NYC” or “Win XP ” when someone writes “Windows XP”.

I know I promised you an index of 300,000 pages the last time we upgraded servers but things changed and I implemented a few new algorithms that improv quality but slows down the indexing process.

The main slowdown this year has been the addition of the search cache, a separate database arranged to make search queries ultra fast. You can now expect to get your results in under a second, sometimes in two, while before the cache searches was anywhere between 4 and 40 seconds as they had to dig through the main index.

Now lets just wait and see how much the index grows.

Simon Byholm
CEO and founder,
Secret Search Engine Labs

P.S. If you order a Linode through the links in this post Linod will give us $20 in free server time.

PRESS RELEASE! We Are Joining The Independent Search Engine And Directory Network (ISEDN.org)

January 20th, 2010

I have the pleasure to announce that late yesterday the final steps was taken to integrate the advertising feeds from ISEDN.org into the search results pages. (It’s the green text ads to the far right)

On December 26th (between eating leftovers from Christmas) I signed the partner agreement with Jayde Online Inc to join the Independent Search Engine and Directory Network (ISEDN.org) also known as the ExactSeek featured listing program.

The feed is now live (for an example see the search for webmaster) and you can get a featured listing for your website targeting a specific keyword. Your ad will show up on Secret Search Engine Labs and on over 375 other search engines in the ISEDN network.  See the advertising page for details.

Joinig the ISEDN will allow us to increase revenue from the search result pages which in turn mean we will be able to get bigger and better servers to increase the index size and improve the search results.

The press release about this just went live on PRWeb and I’d like to say welcome to all new visitors, please try the search results and tell me what you think by leaving a comment below.

Simon
CEO and founder,
SecretSearchEngineLabs.com

Monster Crawler – Robot Spider Combines Google, Yahoo, Bing and Ask

January 13th, 2010

I found a new (actually old but I didn’t know about it before) search engine by some graduates from Southern Illinois University. Founded in 1999 it has had a recent facelift as looks nice providing meta search results combined from Google, Yahoo, Ask and Bing.

They include “Did you mena…” type suggestions and related searches and I like the clean and professional looking interface.

See monster crawler in action here. It’s now added to our big list of search engines.

Beta Testing Is Open, Did You See The Press Release Last Week?

January 12th, 2010

I forgot to blog about it but in case you didn’t notice we started the official beta testing period last week and we even had a press release distributed through PRWeb.

Thanks to the press release we had about 400 extra visitors on last Thursday and Friday,  most spending more than the average number of click on site.

The number if visitors isn’t exactly earth shattering but nevertheless it was great to have you all here and I hope many of you will be back for more searches.

The Beta testing period is planned to run all year and end in a public launch in January 2011 with the target of having over a million relevant pages indexed by then.

I also want to take the opportunity to apologize if you have stumbled upon some less than relevant results. We had a bit of a Cuil experience when in anticipation of a stampade of visitors from the press release we added a new cache database to speed up searches.

Yes the searches are now about 30 times faster with most searches ready in less than half a second.

The downside is it will take a whole month before the cache is fully populated with data and in the meantime the search results are based only on a subset of the whole index. For some search terms this means junk and garbage from the bottom of the barrel will show up.

Please enjoy the new super quick search results and check out the ping tool in the webmaster tools section.