Open Source Search Engines with Web Crawling and HTML Indexing

The term search engine is particularly vague.  It can mean an online search engine, or it can mean software that provides something similar to the online search engines, or it can mean software that does fulltext indexing and querying.  I specifically wanted to find a tool that could be used to create a niche search engine without having to write the whole thing myself as a ruby on rails app.  Here’s what I found:

  • mnoGoSearch – seems very actively maintained, packages present in Debian and Ubuntu, written in c
  • DataparkSearch – seems fairly actively maintained, apparently a branch of mnoGoSearch, written in c
  • Nutch – seems fairly actively maintained,written in java, built on top of Lucene

And that’s really it.  There is a product called ASPseek but it has long since been abandoned.  There is also ht://Dig but it has not been updated in years and is not really in the same league as the above.  So, at least for anyone trying to research this area there are not a whole lot options to try.

Resources:

Alternative Search EnginesAlternative Search Engines

Leave a Reply

Your email address will not be published. Required fields are marked *