Magento - Apache Solr Integration - Part I (intro)

Tue, 03/01/2011 - 14:33

If you have a Magento web store with a really big catalog, or with a large data collection that users can browse... I know that you've wondered about how to improve the search on your site.

Magento's search system is not bad, in fact, considering the underlying technology, it's pretty good. It makes good use of MySQL's strengths, and with some tuning effort (well, OK, a lot of effort) you can even get pretty decent performance for large datasets. You will have to put in a lot of caching and SQL tuning tricks, though, and your magic could only be short-lived anyway... What would happen if your site becames twice as popular due to a good marketing move?

If our only weapon is MySQL, this can feel like a curse rather than a blessing. Luckily for us, there is a better solution available! 

Enter Apache Solr! 

What is Apache Solr? The omniscient Wikipedia says:

Solr is an open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language.Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

You can find more good info about Apache Solr in these locations:

And what do we say? Solr rules! It is amazingly fast, configurable and it opens up a lot of possibilities that would be unthinkable without it. 

We have implemented Solr in several Drupal sites (check it here, here and here), and found it extremely good. It does require extra effort, but the results pay off. Also, the Drupal-Solr integration is very well crafted and with a lot of extension points (a.k.a. hooks), and an increasing amount of documentation about it. At a pinch, you can google your way through it - it won't be the simplest thing you have done, but you won' t be alone! There are some pioneers and some footsteps to follow (some of those footsteps, ahem, are ours: check here and here). 

In the world of Magento, that situation is still a way off. A web search will return lots of people asking how to do it, but we couldn't find any good answers. So we decided to roll up our sleeves and work it out for ourselves. This little series of posts is to share what we have learnt, and, hopefully, to make your life a bit easier if you also plan to integrate Magento with Solr. 

OK, let's get busy!

Managing Partner
Aldo works as a general mentor for the development teams, keeping in direct contact with programming and design.