|
Sphinx
Community
Services
Misc
Subscribe in a reader
|
About
Sphinx is a full-text search engine, distributed under GPL version 2.
Commercial license is also available for embedded use.
Generally, it's a standalone search engine, meant to provide fast,
size-efficient and relevant fulltext search functions to other
applications. Sphinx was specially designed to integrate
well with SQL databases and scripting languages. Currently
built-in data sources support fetching data either
via direct connection to MySQL or PostgreSQL, or using XML pipe
mechanism (a pipe to indexer in special XML-based format which
Sphinx recognizes).
As for the name, Sphinx is an acronym which is officially
decoded as SQL Phrase Index. Yes, I know about CMU's Sphinx project.
Key features
- high indexing speed (upto 10 MB/sec on modern CPUs)
- high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
- high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
- supports distributed searching (since v.0.9.6)
- supports MySQL natively (MyISAM and InnoDB tables are both supported)
- supports phrase searching
- supports phrase proximity ranking, providing good relevance
- supports English and Russian stemming
- supports any number of document fields (weights can be changed on the fly)
- supports document groups
- supports stopwords
- supports different search modes ("match all", "match phrase" and "match any" as of v.0.9.5)
- generic XML interface which greatly simplifies custom integration
- pure-PHP (ie. NO module compiling etc) search client API
Distribution
Sphinx distribution includes the following programs:
- indexer: an utility to create fulltext indices;
- search: a simple (test) utility to query fulltext indices from command line;
- searchd: a daemon to search through fulltext indices from external software
(Web scripts using Sphinx API; or MySQL with SphinxSE; or your application server);
- sphinxapi: a set of API libraries for popular Web scripting languages
(there are native API ports for PHP, Python, Java, Perl, and Ruby).
History
Sphinx development was started back in 2001, because I didn't
manage to find an acceptable search solution (for a database driven
web site) which would meet my requirements. Actually, each and every
important aspect was a problem:
- search quality (ie. good relevance)
- statistical ranking methods, performed rather bad,
especially on large collections of small
documents (forums, blogs, etc)
- search speed
- especially if searching for phrases which contain
stopwords, as in "to be or not to be"
- moderate disk and CPU requirements when indexing
- important in shared hosting enivronment,
not to mention the indexing speed.
Despite the amount of time passed and numerous improvements
made in the other solutions, there's still no solution which
I personally would be eager to migrate to.
Adding that with a lot of positive feedback received from
Sphinx users during last years, the obvious decision is to continue
developing Sphinx (and, eventually, to take the world).
|