After my first blog post made 2 days ago on the topic of NoSQL (see http://bit.ly/NoSQLt), which has been widely read (according to Google Analytics, 1100+ visits from 58 countries), I have been surprised to see some people’s knee-jerk reactions :-) That prompted me to post variations of the following on Facebook, Twitter, LinkedIn and Google+:
I wish people would read exactly what I wrote in http://bit.ly/NoSQLt and stop imagining stuff that I didn’t write! I didn’t say NoSQL is not needed or that everything has been invented before. It is interesting how the blogosphere and twitterati are having a field day putting words in my mouth :-) E.g., search for @seemohan on Twitter
Since most people who read my earlier post probably don’t have a clue about who I am and what my philosophy with respect to technologies and inventions over my entire career has been, I would like them to be aware of the following collateral so that my comments could be interpreted with the right perspective and frame of mind. Readers who take the trouble to follow up on the references listed below will hopefully realize that I am not an RDBMS bigot, that I am open-minded about new ways of addressing problems and their solutions, and that I have invented and transferred technologies to also non-relational systems like MQSeries messaging system, Lotus Notes groupware/document system, WebSphere Application Server, FlowMark workflow management system, Parallel Sysplex Coupling Facility, etc. in the mainframe and non-mainframe environments (http://bit.ly/ARIESi).
I am not trying to claim that I know everything about our industry/technologies or what matters when, or that I have definitive ideas about what the right evolutionary path for data management systems is. I am merely trying to temper some of the marketing and technical hype associated with NoSQL and related areas, and to pass on some caveats and warnings based on my 30+ years of experience in the data management field. I am a bits and bytes (or nuts and bolts) kind of guy who has worked mostly on technologies relating to the bowels of different systems which manage persistent data of different kinds in distributed and clustered environments (http://bit.ly/CMpapp). In my writings and while designing my algorithms, I have tried hard to dig into what has been done in the past and document as much of my learning about the prior art and related work in my papers, crediting the people who did the prior work.
My comments aren’t targeted merely at one NoSQL system or one set of people. I would like all sorts of people to give some attention to what I have to convey regarding NoSQL systems: entrepreneurs, end users, IT management, systems architects, designers, marketers, students, industrial researchers, academicians (pure and those who moonlight on the side as entrepreneurs and consultants), established little/big industry people, …
Now, coming back to the topic of NoSQL and some of the concerns that I have about what is being done in that context, watching a video of the panel discussion that I took part in Sri Lanka in September 2011 would be a way for readers to hear me express some of my thoughts and reactions. In due course of time, I plan to document more of my views in the written form.
I have closely observed or taken part in the evolution of many systems the designers of which initially designed their systems thinking in a simple way but later on had to add more sophisticated functionality which they found out was very hard to do. Examples are System/38’s database functionality which was embedded in the horizontal and vertical microcode of the system, Lotus Notes which from its beginnings in 1989 (http://bit.ly/LNhist) has looked in many ways like the NoSQL systems of today, and RDBMSs like mainframe version of DB2, Sybase and SQLServer, and OODBMs like ObjectStore which started out with page level locking as the smallest granularity of locking.
S/38 had a single level store and it relied on the virtual memory paging subsystem and the file system for accessing and caching data in memory. There was no buffer manager as in other RDBMSs. The granularity of latching during a call to the data manager was an entire table (locking was at record level). As the systems became more powerful and SMPs came into existence, latch conflicts became severe and the myriad things that took advantage of the table level latch became very painful to deal with.
Lotus Notes until R5 had very ad hoc ways of handling recovery, no notion of transactions and many non-scalable features. Changing that system and adding log-based recovery and transaction semantics was painful (http://bit.ly/LNotes).
Reducing the smallest granularity of locking from page size to something smaller was quite painful in RDBMSs/OODBMSs like DB2, Sybase, SQLServer and ObjectStore. The original lock granularity had been taken advantage of in many places in unobvious/subtle ways and those were very tricky to identify and fix.
I am really concerned about some of the design choices made in the case of NoSQL systems. As they mature and what were initially considered as unnecessary features start creeping in (due to the slippery slope that these systems are on when they deviated significantly from the feature set of RDBMSs), they are going to suffer a lot with growing pains along the above lines. I am unsure of the extent to which the designers of such systems are conscious of these sorts of consequences of what they have chosen to do initially.
I tried to demonstrate in our original ARIES paper (http://bit.ly/CMaries) the benefits to be had and the need for concurrently thinking about storage management, locking and recovery, unlike some layered approaches advocated in some earlier work. I also discussed numerous approaches to locking and recovery implemented in relational and non-relational systems which would be worth paying attention to as NoSQL systems evolve.
While there is a lot of talk about scalability, elasticity, etc., such design criteria seem to be applied in a spotty way in the design of these systems. Even systems which support incremental updates, don’t seem to think of having to scale along the concurrency dimension by supporting finer granularity of locking/latching.
Way too much burden is being placed on the laps of the application writers or database administrators since even statement level atomicity isn’t guaranteed when a single statement which updates more than one object encounters a failure of some sort or the other. Of course, only some NoSQL systems support the functionality of multiple object updates in a single statement.
The lack of standards with each NoSQL system cooking up at its own APIs is also going to be a nightmare in due course of time. Whether it is an open source system or a proprietary one, users will feel locked in.
To be continued … in future posts.
very interesting read. In...similar vein, most system engineers