MOnagals

Month

April 2012

3 posts

Part 2: The NoSQL hoopla … What is NonsenSQL about it?

After my first blog post made 2 days ago on the topic of NoSQL (see http://bit.ly/NoSQLt), which has been widely read (according to Google Analytics, 1100+ visits from 58 countries), I have been surprised to see some people’s knee-jerk reactions :-) That prompted me to post variations of the following on Facebook, Twitter, LinkedIn and Google+: 

I wish people would read exactly what I wrote in http://bit.ly/NoSQLt  and stop imagining stuff that I didn’t write! I didn’t say NoSQL is not needed or that everything has been invented before. It is interesting how the blogosphere and twitterati are having a field day putting words in my mouth :-) E.g., search for @seemohan on Twitter


Since most people who read my earlier post probably don’t have a clue about who I am and what my philosophy with respect to technologies and inventions over my entire career has been, I would like them to be aware of the following collateral so that my comments could be interpreted with the right perspective and frame of mind. Readers who take the trouble to follow up on the references listed below will hopefully realize that I am not an RDBMS bigot, that I am open-minded about new ways of addressing problems and their solutions, and that I have invented and transferred technologies to also non-relational systems like MQSeries messaging system, Lotus Notes groupware/document system, WebSphere Application Server, FlowMark workflow management system, Parallel Sysplex Coupling Facility, etc. in the mainframe and non-mainframe environments (http://bit.ly/ARIESi). 

  • As part of the series ACM SIGMOD “Distinguished Database Profiles”, an interview of me by Marianne Winslett was published in ACM SIGMOD Record in December 2004. It is titled “C. Mohan Speaks Out on R*, Message Queues, Computer Science in India, How ARIES Came About, Life as an IBM Fellow, and More”. The transcript of the interview conducted in June 2003 is at http://bit.ly/CMddpS and the video of that interview is at http://bit.ly/CMsDDP
     
  • The presentation “The Excitement of Research and Advanced Technology Development: A Personal Journey” (slides at http://bit.ly/CMexci) which I have given in many venues, details my successes and failures, the struggles that I had to go through to get traction for my ideas and how I resisted the sheep mentality with which many people decide what to work on and what should be accepted as good work, etc. 
     
  • The presentation “Can India be an Innovation Superpower?” which I gave as a part of the University of California at Santa Cruz’s Series “Mapping the Future of India” discusses the innovation ecosystem and what it takes to be successful in doing high end technical work. The slides are at http://bit.ly/CMiisp and the video recording of the talk is at http://bit.ly/iisVid (streamed) and http://bit.ly/IISvid (download). 

I am not trying to claim that I know everything about our industry/technologies or what matters when, or that I have definitive ideas about what the right evolutionary path for data management systems is. I am merely trying to temper some of the marketing and technical hype associated with NoSQL and related areas, and to pass on some caveats and warnings based on my 30+ years of experience in the data management field. I am a bits and bytes (or nuts and bolts) kind of guy who has worked mostly on technologies relating to the bowels of different systems which manage persistent data of different kinds in distributed and clustered environments (http://bit.ly/CMpapp). In my writings and while designing my algorithms, I have tried hard to dig into what has been done in the past and document as much of my learning about the prior art and related work in my papers, crediting the people who did the prior work.

My comments aren’t targeted merely at one NoSQL system or one set of people. I would like all sorts of people to give some attention to what I have to convey regarding NoSQL systems: entrepreneurs, end users, IT management, systems architects, designers, marketers, students, industrial researchers, academicians (pure and those who moonlight on the side as entrepreneurs and consultants), established little/big industry people, …

Now, coming back to the topic of NoSQL and some of the concerns that I have about what is being done in that context, watching a video of the panel discussion that I took part in Sri Lanka in September 2011 would be a way for readers to hear me express some of my thoughts and reactions. In due course of time, I plan to document more of my views in the written form. 

  • WSO2Con 2011 (http://bit.ly/WSO21a) Panel on “Data, data everywhere: big, small, private, shared, public and more” - Video recording at http://bit.ly/SLdatP

I have closely observed or taken part in the evolution of many systems the designers of which initially designed their systems thinking in a simple way but later on had to add more sophisticated functionality which they found out was very hard to do. Examples are System/38’s database functionality which was embedded in the horizontal and vertical microcode of the system, Lotus Notes which from its beginnings in 1989 (http://bit.ly/LNhist) has looked in many ways like the NoSQL systems of today, and RDBMSs like mainframe version of DB2, Sybase and SQLServer, and OODBMs like ObjectStore which started out with page level locking as the smallest granularity of locking.

S/38 had a single level store and it relied on the virtual memory paging subsystem and the file system for accessing and caching data in memory. There was no buffer manager as in other RDBMSs. The granularity of latching during a call to the data manager was an entire table (locking was at record level). As the systems became more powerful and SMPs came into existence, latch conflicts became severe and the myriad things that took advantage of the table level latch became very painful to deal with. 

Lotus Notes until R5 had very ad hoc ways of handling recovery, no notion of transactions and many non-scalable features. Changing that system and adding log-based recovery and transaction semantics was painful (http://bit.ly/LNotes). 

Reducing the smallest granularity of locking from page size to something smaller was quite painful in RDBMSs/OODBMSs like DB2, Sybase, SQLServer and ObjectStore. The original lock granularity had been taken advantage of in many places in unobvious/subtle ways and those were very tricky to identify and fix. 

I am really concerned about some of the design choices made in the case of NoSQL systems. As they mature and what were initially considered as unnecessary features start creeping in (due to the slippery slope that these systems are on when they deviated significantly from the feature set of RDBMSs), they are going to suffer a lot with growing pains along the above lines. I am unsure of the extent to which the designers of such systems are conscious of these sorts of consequences of what they have chosen to do initially. 

I tried to demonstrate in our original ARIES paper (http://bit.ly/CMaries) the benefits to be had and the need for concurrently thinking about storage management, locking and recovery, unlike some layered approaches advocated in some earlier work. I also discussed numerous approaches to locking and recovery implemented in relational and non-relational systems which would be worth paying attention to as NoSQL systems evolve. 

While there is a lot of talk about scalability, elasticity, etc., such design criteria seem to be applied in a spotty way in the design of these systems. Even systems which support incremental updates, don’t seem to think of having to scale along the concurrency dimension by supporting finer granularity of locking/latching. 

Way too much burden is being placed on the laps of the application writers or database administrators since even statement level atomicity isn’t guaranteed when a single statement which updates more than one object encounters a failure of some sort or the other. Of course, only some NoSQL systems support the functionality of multiple object updates in a single statement. 

The lack of standards with each NoSQL system cooking up at its own APIs is also going to be a nightmare in due course of time. Whether it is an open source system or a proprietary one, users will feel locked in. 

To be continued … in future posts.

Apr 1, 20129 notes

March 2012

2 posts

Part 1: The NoSQL hoopla ... What is NonsenSQL about it?

After not looking at it seriously for a long time, during the last few months, I have been paying closer attention to the NoSQL phenomenon. I have been amazed at the amount of hoopla associated with it and the “anything goes” attitude of a significant fraction of the people using and/or working on such systems. Of late, it has become fashionable to diss RDBMSs, and a significant chunk of the technologies that have been laboriously thought about and worked out over the last few decades. Some inconvenient/inadequate features of RDBMSs in certain contexts have been used as arguments to throw the baby with the bath water while coming up with alternatives. As some of us anticipated, many features which were initially considered unnecessary/undesirable, are now being retrofitted to the NoSQL systems, in many cases in ad hoc and simple-minded ways. 

Having worked in the database field for more than 3 decades with a fair amount of impact on the research and commercial sides of this field (see http://bit.ly/cmohan), it pains me to see the casual way in which some designs have been done and some supposedly new ideas get proposed/implemented. Not enough efforts are being made to relate these proposals to what has been done in the past and benefit from the lessons learnt in the context of RDBMSs. Not everything needs to be done differently just because it is supposedly a very different world now! 

As a senior citizen of the database community, I feel I need to say something on this and related topics. For a while, I have been irregular in expressing my opinions very vocally in public fora. Now, I have decided to use this blog to become somewhat more active :-)

Of course, I have to state the obvious: what I say in this blog are all my personal opinions and they don’t necessarily reflect the opinions of my employer of the last 30 years!

I did raise some heckles and asked some uncomfortable questions when my academic sibling Raghu Ramakrishnan gave what I strongly felt was a very one-sided keynote (“Cloud Data Serving: Key-Value Stores to DBMSs”) at VLDB 2009 in Lyon where he extolled the benefits of such systems without enough caveats around what he was saying. I felt the impressionable minds, who constituted a significant fraction of the huge audience, deserved to be exposed to the latter. 

As the General Chair, I listened to a number of related presentations at HPTS 2011 workshop in Asilomar (http://bit.ly/HPTSpr). More recently, I attended the Silicon Valley edition of the MongoDB annual conference (MongoSV - http://bit.ly/MDsv11) along with, believe it or not, 1200 other people. After listening to some of the detailed presentations, I decided to tweet my reactions. What follows are a subset of the tweets I authored during that event. They may be of interest to people who didn’t see them before or to those who didn’t jot them down for later use. 

  • seemohan #mongoSV whole bunch of papers on indexing, locking, logging, recovery, storage management, shared disks, … http://t.co/Pmy608Xh 10:56 PM Dec 9th, 2011 from web

  • seemohan #mongoSV Index locking and recovery so easy when high concurrency isn’t a goal! See what real systems have had to do http://t.co/6Ik17RPW 10:51 PM Dec 9th, 2011 from web

  • seemohan @divyagrawal I feel like I am 35+ years younger listening to age old stuff being talked about here at #mongoSV :-) Wow! 10:50 PM Dec 9th, 2011 from web

  • seemohan #mongoSV Slides of a tutorial on high performance transaction processing (covers ARIES recovery/locking/indexing) http://t.co/pslUql3T 10:50 PM Dec 9th, 2011 from web

  • seemohan #mongoSV Some war stories from original days of RDBMSs might be of interest to #mongoDB people - SQL 25 year reunion http://t.co/zwFARTHM 8:47 PM Dec 9th, 2011 from web

  • seemohan #mongoSV The main ARIES paper (ACM Trans on Database Sys) IBM Res Rep version at http://t.co/hcKMcZsJ - The Bible on recovery/locking! 8:43 PM Dec 9th, 2011 from web

  • seemohan #mongoSV Since Lotus Notes is a doc store like MongoDB what how we added industrial strength to Notes maybe of interest http://t.co/eNAEEkdV 8:05 PM Dec 9th, 2011 from web

  • seemohan #mongoSV May be people should look at my ARIES papers for what real DBMSs do http://t.co/WDdTYMdp http://t.co/Pmy608Xh http://t.co/jSl49cDz 7:16 PM Dec 9th, 2011 from web

  • seemohan #mongoSV For the ARIES guy (ME!), getting indigestion with the following: Single writer process, single DB wide lock scope in MongoDB :-)))) 7:12 PM Dec 9th, 2011 from web

  • seemohan #mongoSV For a 30 year DB guy@IBM with also PhD in DB, so much of the mongoDB internals look so basic :-((( Did these people read old stuff? 6:30 PM Dec 9th, 2011 from web

  • seemohan #mongoSV 1100 attendees, 50 sessions; 10gen: 80 employees, offices in Bay Area, NYC, London, Dublin - Microsoft working with mongoDB!! 5:31 PM Dec 9th, 2011 from web

In future posts, I hope to elaborate more on some of the points I have made above.

Mar 29, 20129 notes
Weaves by pavsmo: A Catch-22 paradox, inspired by my dad's eloquence

My daughter Pavithra Mohan, on turning 22 today, has made a resolution to get back to writing regularly through traditional media like magazines as well as, for the first time for her, Web 2.0 or social media like tumblr (Weaves by pavsmo) and twitter (@PavithraSMohan). She even credits me in this post :-)

pavsmo:

Catch-22 is likely the most fitting descriptor for this day: having just turned 22, I do believe I’m past my prime as I have officially crossed the threshold into the land of the pruned and ripened. 

Mar 29, 20122 notes
Next page →
2012 2013
  • January
  • February 1
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
2012 2013
  • January
  • February
  • March 2
  • April 3
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December