Possible book project - open source search
March 9, 2008
I’ve had a book project in the back of my mind for a bit. Though there’s never enough hours in the day to get everything done I need to, I have one book I’m wrapping up in the next few days, and am seriously considering committing myself to this next one. No publisher lined up yet or anything of that nature, but if I can’t find one I’d self publish through lulu.com.
The title obviously gives it away - I’m looking at doing a book on open source search products. I was thinking of doing an entire book on SOLR last fall, but honestly I’m not sure there’s enough about SOLR to write an entire book - at least not without repeating a lot of the information already out there in tutorials and what not. And I’m not sure that another deep in-depth technical book is necessary on something that’s moving so fast. The idea was to give a moderately-deep (but not overly deep, if you can make that distinction) look at setting up and using a group of open source search projects out there.
- Lucene is the leader in this space, without question. It’s been around for quite a while, and keeps getting better with each release. However, Lucene itself is very low-level, and Java only. Many implementations of Lucene have sprung up, such as Lucene.Net and Lucy, as well as tools which build on Lucene like SOLR and Nutch.
- PostgreSQL has full text search capabilities which I plan to explore in more detail.
- MySQL has had a degree of full text search capabilities for years, and the Sphinx project has emerged over the last couple of years to provide even more functionality and speed. I believe Sphinx is essentially standalone but can be coupled with MySQL or PostgreSQL - again, that’s research fodder for the book.
Are there other open source search projects that you’d be interested in seeing covered in a book? Is this a topic you see any demand or interest in? Whenever I see a gap in the book market, I always wonder if it’s because there’s no interest, or just that no one has filled the gap yet. Usually something appears to fill that gap a few months after I notice it, but I’ve yet to see this gap filled after almost a year of thinking about it.
Feedback?
Did you like this post? Buy me a hot chocolate!
Posted in




March 9th, 2008 at 5:55 pm
Don’t forget Xapian. What I would really like to see in a book like this is walking through a bunch of search design patterns, such as faceting, implemented with different engines. A nice layout might be to split the book into two halfs, the first half introducing each of the covered search solutions and getting the reader up to speed with them. Then the second half introducing a selection of patterns and showing how each would be implemented using each of the covered solutions. A good source of search patterns is Peter Morville’s newly started Flickr collection.
In fact, I would be really interested in getting involved with this, perhapse writing a chapter on Forage
March 10th, 2008 at 6:45 am
On first read of your post, I thought “huh? Aren’t there lots of books on open source search ?” But then I realized that what you are looking at is a cross section of all the various search tools.. Based in Java, C, PHP, Ruby etc… And covering a wide variety of approaches to search. From a pure library like Lucene to a SQL based engine like Sphinx.
There are also lots of interesting approaches to different types of search, like Solr Flare for faceted search, that would be interesting to compare in a single book.
I think the person who would buy this is looking to understand the “landscape” of search, via reading about various projects, and then you could extract the patterns used by the various solutions to think about “For my project X, I need something supporting Pattern Y”.
Sign me up!
March 10th, 2008 at 7:17 am
@eric - that’s the focus. Buying a book on Lucene means you’ve likely already decided that Lucene is your choice. Diving in to the details of various projects is a pain, and the idea of the book is to compare multiple projects so readers can compare them in one place.
@rob - thanks for those resources - hadn’t seen them before, and they look really cool/useful!
March 10th, 2008 at 11:45 am
BTW: You might want to tweak the book title before you publish - “Open Source Search” could be misconstrued as “The search for open source [projects]“. Maybe something like “Search with/using open source”.
March 10th, 2008 at 1:39 pm
There’s definitely a missing book here - and it’s about the application of search, versus search itself. E.g. I’ve got a database, and I need faceted search support. Or I have a high volume feed of rapidly changing content that I need to make searchable. Or my web site needs search support for all of the content (static & dynamic).
There’s a lot of scattered information out there, but most of it’s at the wrong level for somebody who needs to get to point B as quickly as possible.
Heck, even just the topic of full text search inside of databases could be a book. Do you use the built-in support, or attach something like Compass, or roll your own using Lucene?
Some specific issues that continue to come up in discussion are around high availability, scalability (size of data), faceting, and high volume (rapidly changing data sets).
Anyway, just some notes from an old presentation proposal I’d submitted to JavaOne. Good luck on the book idea, I think it would be very useful.
– Ken
March 10th, 2008 at 1:46 pm
@s - book’s not even started - haven’t finished the last one yet! But yeah, the title is not set in stone.
@ken - thanks for the ideas. What’s available and what’s possible are topics that don’t seem to get covered enough in tech books in general. Lots of manuals that show you *how* to do something, but few that give you an idea that those ’somethings’ exist in the first place, or why they’d be useful.
Definitely some food for thought, and thanks again for the feedback guys!
March 10th, 2008 at 2:01 pm
From my experience books about a number of open source tools don’t do that great.
People usually use the internet to find the tool they want to use, then buy a book on that topic… that is my opinion anyway. I think an e-book on the subject could do well though, something to help people make the decision.
March 10th, 2008 at 2:05 pm
Good point James - I was actually thinking of your behemoth when considering this project. I’m not sure I could put together anything so large (1100 pages was it?) and do consider something focused on 4-5 tools would be in a different league. However, you’ve done it and I haven’t.
I’d think limiting the focus of a book to one problem domain - search - would help.
Perhaps this would end up being ‘just’ an ebook. I don’t know, and don’t even know I’ll have the time to put it together. But it does seem that it’s something there’s interest in.
Thanks!
March 11th, 2008 at 6:14 am
You might want to have a section on hosted search solutions, e.g. Google and Atomz.com.