A schema means different things for an SQL and a NOSQL database management system. If you are using SQL, you have to tell the system about the format of the data before you can use – that is the schema. When it comes to NOSQL databases everything is different. You don’t need a schema, you simply present data and the system stores it in whatever format you send it.
Each document in a collection1 can be a different layout and different types of data. The flexibility of NOSQL databases suggests that data design is less important. For example I read: “… since NoSQL doesn’t necessitate the need for a schema, you avoid the expense and time of that initial design stage.”
I don’t think so
In fact I think that database planning is even more important with NOSQL. Sure you can throw any data at a NOSQL database. But that means that a program bug, like mis-spelling a fieldname can run for months before we discover it. There are other issues with NOSQL because there is more than one way of modelling the data structure, and you had better decide on your approach before you start coding.
So you still need to document the database in a format that your software can understand, even if the database management system doesn’t care. We still call this a schema, but it exists outside the database. It can then be used by your software to validate documents before they are stored, and validate them when they are read back. In my lockdown project (sudsjs.com) I used the schema to create an update form and a report layout as well.
The Mongoose software provides schema validation for MongoDB, and it is there that I discovered my little snafu.
My SNAFU
There is a standard for this type of schema called JSON-Schema, but I was blissfully unaware of it when I wrote the system. I started the project using the the Sails.js framework and an SQL database. After a few months I realised that Sails wasn’t really working for me. I started looking at other frameworks, however each had a different problem. They were all over-engineered for what I needed. I reverted back to Express (which by the way seems to be under the hood of most of the more advanced frameworks) and I used the Sails schema as a basis. Eventually extending it to take in NOSQL databases such as MongoDB and CouchDB.
Then, when I checked out Mongoose, I found that is uses the JSON Schema standard. Clearly I should be using the standard rather than my own. I set about modifying my system, which was not too hard. My schema wasn’t so different from the standard (so they obviously did it right!) and mainly I just had to rename things or move stuff around. I don’t have all the features of the standard coded, but I think I have included the most common features. It could have been a lot worse.
The JSON Schema
JSON is a formalised version of the way data is defined in JavaScript – thus the name: JavaScript Object Notation. It is a lot more fussy than JavaScript, but it is designed as a standard to pass data between computers. (Hold on, I thought that was XML – but that is for another blog post).
JSON is valid JavaScript, but the reverse is not true. A simple example of my schema written in JSON is here. The same schema written in JavaScript is here. Apart from the quote marks around all the field names, the main difference is the lack of comments in JSON. After all, if JSON is for computer to computer communication, why do you want comments.
My system accepts either and normally I would expect the schema to be written in JavaScript. JavaScript is much easier for humans. Also in JavaScript I can add code for special purposes. But If you needed to create a schema from raw data, there are sites that will create a JSON schema from an example of the data.
But sometimes a fixed schema doesn’t work
But sometimes a fixed schema it isn’t right and the flexibility of NOSQL comes into its own. For example, suppose you have a collection of medical tests. Some of the information is standard such as date, patient name and doctor id. But each test will have different data, from a single number up to a five-page report. In an SQL system you end up with thousands of secondary tables, one for each different test, or maybe store it as XML data in a big text field. (or JSON if you must). Either way this cuts down your search and reporting options.
So a NOSQL database fits the needs exactly. The data is kept in separate fields and any type of test is catered for. I catered for this in my system by having a fixed section in the document layout and a variable section which has its own set of schema which are editable and based on record type.
- ‘Rows in a Table’ in SQL-speak or ‘Records in a File’ in classic-speak. Why we need a whole new terminology is beyond me.
Leave a comment