The unholy legacy of databases
When reading about the status of Qi4j on Rickards blog, I stumbled about
Entities are really cool. We have decided to split the storage from the indexing/querying, sort of like how the internet works with websites vs Google, which makes it possible to implement really simple storages. Not having to deal with queries makes things a whole lot easier.
We made the same experience when we developed the SnipSnap wiki application several years ago. We had a split in storages and search, each part with it’s own Java interface (a component could implement both of course). This way we could have Lucene, database and in-memory search and database and file (XML, plain text) storage. We were very flexible with storage and search this way and people could easily implement different storage backends because developers have been freed from the search implementation. Rickard seems to have made the same experiences:
We have one EntityStore based on JDBM (persistent binary hashmap), one on JGroups (replicated cluster hashmap), one on Amazon S3 (for global storage), and one on iBatis (for RDBMS storage)
So today SnipSnap would easily be able to supply a S3 backend, because of the split, whereas others which rely on the storage/search combination have much more problems to support a storage-only backend. So they have problems to support S3 or WebDav out of the box.
Why don’t more people split the problem of storage into storage and search? After some contemplation on the topic, perhaps it’s the unholy legacy of databases. Databases make it easy to solve the search/storage problem with only one technology. After 30 years of databases the problems have merged in a way that most developers think of them as one problem. By splitting the problem again, projects will be freed for better backends and better search solutions. Open Source projects will emerge which adress each of the problems better than current databases do.
This of course breaks the DAO pattern and the usage of the EntityManager as an DAO replacement and should be replaced by a Storage and Search pattern. Free your mind! Storage and search are two different things, if you split them, you gain flexibility.
Thanks for listening.
You can leave a Reply here. Of course, you should follow me on twitter here.
I have tried this sort of thing as well and I really really like the concept. The only thing that stops my poor, slow, small brain from really seeing it through to its logical conclusion is the sometimes-requirement to join attributes from one Thing stored with one Storage mechanism with the attributes of another Thing stored with another Storage mechanism. Do you have suggestions here?
Obviously, a Compass/Lucene-type search handles a huge number of cases–people tend to like to search by keywords, and so a coarse-grained search/locate strategy like that makes a lot of sense. But in some of the applications I work on, careful targeted queries that join bits of two entities together–a classic SQL join–are also needed.
Have you found a convenient way to expose a *common* SQL-like query mechanism across items that use different Search implementations?