Why Sphinx? Some benchmarks?

@John: I just wanted to know any rough numbers you have about using Sphinx to index (e.g.: # of records and indexing / searching times for that installation). We are using ApacheSolr and it's pretty easy to setup and fairly quick... just wanted to do some basic comparison.

Both ApacheSolr and Sphinx are fairly painless to install. As a general rule, Lucene's search times are slightly faster on smaller to moderately-sized data sets < 5M records or < 1GB and Sphinx is slightly faster with very large data sets. Of course, mileage will vary, depending on your indexes and key attributes. So, IMO, it's a wash between the two when it comes to search speeds. Sphinx blows Lucene away when it comes to indexing time, however. A big plus, but not the reason I chose it.

After comparing the two, it seemed to me that Sphinx offered a significantly higher level of flexibility over Solr and has a naively available PHP API that I could use immediately. I also am interested in the fact that Sphinx offers an integrated MySQL storage engine--something Locum does not yet take advantage of, but may in the future.

I encourage you to take a look at the Sphinx project site and feature list:

  • high indexing speed (upto 10 MB/sec on modern CPUs)
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
  • supports distributed searching (since v.0.9.6)
  • supports MySQL natively (MyISAM and InnoDB tables are both supported)
  • supports phrase searching
  • supports phrase proximity ranking, providing good relevance
  • supports English and Russian stemming
  • supports any number of document fields (weights can be changed on the fly)
  • supports document groups
  • supports stopwords
  • supports different search modes ("match all", "match phrase" and "match any" as of v.0.9.5)
  • generic XML interface which greatly simplifies custom integration
  • pure-PHP (ie. NO module compiling etc) search client API

Thanks for the link to the PPT. It confirms why Locum is built on Sphinx and not Lucene. I evaluated both search engines before I built Locum (I looked at Sphinx on the basis of Casey Bisson's recommendation--Scriblio also uses Sphinx).