Search Engine Software
Search Engine in Pure PHP
The first version of our search engine software was done in a weekend using PHP for rapid application development. It was a fun hobby project to boost my programmers ego. Since then the code has grown over countless hours of development from the initial 1000 lines of code to over 5000 lines of PHP, SQL and HTML.
Made for Shared Hosting
From the beginning this software was made to easily run from a shared hosting account with PHP, mySQL and crontab support.
At the time that was quite an unusual approach to making search engine software for a general Internet search engine with most software being written in C++ for the perceived performance.
The Feature List
The current development target is to make the software modular enough to allow it to be used as the search engine software powering other search engines.
I'm targeting nich search engines and site search engines and I'm looking into releasing the code under the GPL to allow others to use it as a basis for development.
List of Search Engine Software
While I'm considering how to release the code in a way you can use you can see if any of these search engines softwares will work for your project.
Apache Lucene is (according to their website) a high-performance, full-featured text search engine library written entirely in Java. It's an open source project, thus free to use. You will need another component called Nutch as well if you intend to crawl the web for pages, Lucene needs to be fed locally. Lucene
Entireweb Datafeed Entireweb is a large search engine based in Sweden and you can get their search results from an index of over 100 million pages as an xml datafeed completely free. Entireweb Datafeed
Exalead CloudView is an advanced search solution for big companies. It's based on the search technology of the Excalead internet search engine and can thus handle up to at least 8 billion documents. Exalead CloudView
Fluid Dynamics Search Engine is a site search engine in Perl with free and shareware ($40) versions available. Fluid Dynamics
FM SiteSearch Pro is a site search engine in PERL from Focalmedia.net. I seems to be targeted at smaller sites below 1000 pages and has both text file and MySQL storage support. FM SiteSearch Pro
FreeFind is a search service that lets you add search to your website using the FreeFind index. This means easy setup and no need for spiders and indexes on your own server. There's a free version with ads and a paid version from $19/month if you don't want the ads. FreeFind
Google Custom Search is a service by Google where you can make a search engine that includes only your specified sites. The free version has Google ads where the revenue is split between you and Google. There is also corporate paid version where you can turn ads off starting at $100/year for 1000 documents. As the custom search uses Googles database this only works as long as your website is accessible from the Internet, for Intranets you need another solution. Google Custom Search
Google Search Appliance is a service by Google where they put a computer (the search appliance) in your corporate network so you can search the Intranet that is not accessible from the Internet. The capacity is up to a million documents with pricing depending on the number of documents indexed. Google Search Appliance
Google Mini is basically the same thing but indexes less pages and costs less. Google Mini
ht://Dig is a search engine software used for site search or for vertical search engines. It was developed at San Diego State University to s search the campus networks, It's written in C and can be run on a Linux/Unix box or on Windows with Cygwin. It's free (LGPL) and indexes web pages, not databases. ht://Dig
IBM OmniFind Yahoo! Edition is a free enterprise search engine for up to 500,000 documents. There is more advanced editions available from IBM with a $20.000 price tag. OmniFind Yahoo! Edition
Inout is a PHP meta search engine script you can buy from inout scripts for $249. It uses the search results from 12 popular search engines to compile it's own set of results and includs functions to monetize the search engine with ads. Seems to have a lot of features according to the marketing materials on the website. Inout
LEXST is a search application that is scalable to index up to 2 billion webpages using 1120 nodes (servers). There's a free version that works on up to three nodes and if you go for the 1120 node version it will cost you $800,000. This software can handle really big intranets or even be used to make a full sized Internet search engine. LEXST
Open Web Spider is Open Source search engine software written in C# that seem to be targeted at smaller installation, I haven't found any hard numbers though. Open Web Spider
Sphider is a PHP and MySQL based site search engine. It's open source and has a good admin interface, probably a good candidate for site search on a smaller site. Sphider
Sphinx is a free open-source SQL full-text search engine. Sphinx
Swish-e is an open source search engine usable up to a million documents but more commonly used for tens of thousands of documents. It's written in C, is fast and flexible but requires some programming/assembling work to integrate with your website. Swish-e
Webinator is search software from Thundersone to spider and index doucuments. With a capacity of up to 200.000 pages this can take on larger sites and the software runs on Windows and Unix. There's a free version for up to 10.000 documents and paid version from $700 to $5800 Webinator
Zoom Search Engine is a search engine from Wrensoft that is slightly geared towards searching CDs and DVDs but it also has intranet and Internet search functionality. There's a free version that can index 50 pages (that's not much) and paid versions from $49 to $299 that indexes up to a million pages. Zoom Search
Zoom Master Node is a scalable search software from Wrensoft that will run on multiple servers and that is able to index over a million webpages. Master Node can take OpenSearch responces from other search servers to integrate searching of several different databases and indexes in a single interface. Master Node is free with the option to buy support for $250. Zoom Master Node
Make Your Own Search Engine SoftwareThis is a collection of libraries and howtos that will help you make your own database search engine (easy), site search engine (medium) or Internet search engine (not easy)
The Porter Stemming Algorithm is a way to reduce Enlish words to their smallest common denominator. Large, largely, larger and largest would all become just large after stemming.
Snowball is a new stemming library by Marin Porter that allows you to create stemming functions for other languages using a script like language. Available is stemmers for French, Spanish, Portuguese, Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Russian, Finnish, Hungarian, Turkish