In spite of my 3 blog posts on NoSQL (http://bit.ly/NoSQLt http://bit.ly/NoSQL2 http://bit.ly/NoSQL3), where I had clearly stated what my concerns were with the whole NoSQL movement and what I strongly feel are the problems with the modus operandi of the architects and designers of the NoSQL systems, many people are still misinterpreting my comments/suggestions. Let me elaborate more in this post on what I strongly believe needs to be done and what I am willing to do to help in this regard.
I fully realize that there are many different types of NoSQL systems and that there are many differences between them with respect to the functionality they provide and the technologies that were invented/leveraged/implemented to realize that functionality. While not all the points I have made in my previous posts would necessarily apply to every one of the systems, every point I have made would apply to at least a reasonable subset of the systems.
Some people are expecting me to provide detailed review/criticism of each of the NoSQL systems along different dimensions (replication, data model, locking, etc). They seem to have missed my points about the descriptions of the internals of these systems being vaguely documented, and the choice of technologies/algorithms not being well specified and justified. This doesn’t mean that such details aren’t available for any aspect of any of the systems. I am sure there are a few systems for which some amount of detail is documented somewhere for some aspects of the implementation/design of those systems.
My major point is that the designers/architects/implementors (“the techies”) of the NoSQL systems have to more carefully document the design of their systems so that the above points are dealt with methodically. Then, we can have more meaningful discussions about the merits and demerits of each of those systems, and the correctness/appropriateness of the chosen approaches to solving specific technical issues. The NoSQL techies, as responsible citizens of the land of data management, owe that level of rigor to the community. It would also help them in achieving better clarity in their thinking and increase the likelihood of catching logical errors in their algorithms. The whole ecosystem would gain from this exercise. Hopefully, the NoSQL techies would realize that many of the algorithms invented for RDBMSs would be applicable to even their systems and they would learn how to do design their systems so that they are extensible to accommodate new requirements. As I have said before, many features of RDBMSs which were initially considered unnecessary in the NoSQL context are now creeping back in.
I am not expecting the NoSQL techies to necessarily write research papers which are subject to the rigorous refereeing processes of conferences like VLDB, ACM SIGMOD, IEEE ICDE, EDBT, etc, even though such things would be desirable in the long run. As I detailed in my Part 2 blog post (http://bit.ly/NoSQL2), I personally had to go through tremendous amount of evangelization of my ideas (the ARIES family of locking and recovery algorithms - http://bit.ly/RepHis http://bit.ly/ARIESi) before they became widely accepted and got adopted/adapted for implementation in many different types of systems (not just RDBMSs). I of course know that many other people working in the traditional DBMS community have also had to go through similar pains to get their ideas rationalized and accepted. I have put references to my story only because I know it very well and because, for one reason or the other, my experiences have been well documented in various ways (papers, presentations, interviews and videos). I sincerely hoped that the readers of my blog post would take the trouble to follow up on them to get a far better feel for what some people have to go through to make long lasting impact on the wider technical community. Impact that goes beyond money making, flashy marketing collateral, elevator pitches or industry watcher/analyst pronouncements.
Maybe I am being idealistic but I feel I should appeal to the NoSQL community about this with the hope that it gains traction! I do realize that in addition to the open source community that is a big part of the NoSQL movement, there are a number of startups and big Web 2.0 companies who have sizable internal development groups that are engaged in NoSQL work. For different reasons, this set of people might choose not to act like the traditional DBMS community in following the kinds of suggestions I am making in my blog posts to bring more order to the current chaotic situation.
The onus is on the NoSQL techies to do the needed documentation of their work and rationalization of their design choices rather than people like me having to play with their systems or dig through any available open source code to figure out such technical details and rationale. As the references I have given in the Part 2 blog post make it clear, I chose to do such things in the past as part of the due diligence background work in relating my ARIES algorithms to what had been done before by others. It is worth pointing out that a typical researcher doesn’t take the trouble to do as much digging into real systems to compare with the prior art.
So, here is my humble request to the NoSQL techies: For each of your systems, please send me or point me to detailed technical information on each of the important aspects of your system. This should be documentation in the form of papers or presentations, and not pointers to source code comments and such! If some significant aspects of a system aren’t documented reasonably, I am urging the appropriate people to produce such documentation. Of course, for legal reasons, you should NOT send me any confidential or proprietary information.
Here is my offer in return for the above: Once I get hold of such documentation, I am willing to maintain a page for each significant NoSQL system where I will consolidate all the information on that system. Once I get hold of all that information, I will be able to do the comparisons between systems and make suggestions for improvements, etc. for each of the systems. I am planning a tutorial on NoSQL systems and it would be in the best interest of the techies of the different systems to get their systems featured in such a tutorial by providing accurate and complete information on their systems.
I would like to hear the readers’ reactions to my humble request and my offer in return.