Database management is not a new concept. The cultures in antiquity who mastered their inventory tracking had a distinct advantage over those who didn’t. Database management has not always been computer based, but throughout history, effective management of information has always been a competitive advantage.
Computer technologies tends to follow physical systems first, then they frequently exceed the physical systems – not only in speed, but in their ability to work out more advanced problems. Just as data collection has evolved from notches on sticks to scrolls to printed documents filed in folders in drawers in cabinets, so has the sophistication of data storage methods in computers. Today’s modern Database Management System software is able to produce results never imagined by physical systems.
If you are involved in building a software system that involves capturing, storing, updating and querying any kind of data, there is a range of options available. In this short article, we will start with the simple options and move up to the DBMS, explaining why each step provides a superior solution.
We are assuming that permanent storage will be maintained by a disk drive. Other options are now available, but won’t be discussed here. The term “flat file” would refer to the uniform structure of a file where each new “record” has the same length and fields within the records are found at the same offsets. The simplest way to make changes to flat files is to write to the end, or to read through it and write it back out to another file with some changes. This comes from the model of tape drives – the original computer mass storage.
Why not write somewhere in the middle of a flat file to change it? Because if there is a failure during the write, you may have no idea if the correct data has been permanently stored.
A flat file must be used by one and only one program. Otherwise data can be lost or corrupted by competing users/programs.
Structured File and File System
A structured file has the same characteristics as a flat file, except that its contents are interpreted by a specialised tool like a spreadsheet program or word processor. The tool will typically read the entire file, process it, then write back the entire file.
A file system is now common where files are grouped together into directories, frequently called folders. Here the physical system is modelled in the computer, where the location of a file among multiple folders also adds information to the file. For example, an expense report spreadsheet filed under the “reimbursed” folder means something very different than the same spreadsheet filed under the “submitted” folder.
An additional benefit of a file system is that different files can be is use by different software programs at the same time without concern for overwriting the work of the others. To make this work, the file system must provide a locking facility so that exclusive use of a file is guaranteed.
When many files may be altered during a period of time, but a summary of those changes must also be created as soon as possible, a periodic process called “posting” can take place. As in a Bank, transactions can be collected during the day in a number of files, but not posted to the accounts they affect until the end of the day, after the bank is closed and no more transactions are being created. This has been problematic when multiple withdrawals occur before the account balance can be checked, resulting in an overdrawn account!
Computer networking and multi-user operating systems allow access to computer files by any number of computers at once. This can solve the “posting” problem described above if sharing issues are solved. There are two issues: exclusive access to a region of a file, and safe changes to any location in the file.
Fortunately both problems can be solved if all programs using the file follow the same rules. First, they all must request exclusive access to a file region and honor any locks that already exist. Second, they must employ some sort of journaling or “double write” mechanism to safely update the file in a way that is recoverable if the update is interrupted.
Database Management System
The invention of the database management system was the result of the progression shown above. Since much of the functionality described above is non-trivial, it doesn’t make much sense to reinvent it every time a new application is created. Here’s a look at the minimal functionality of any DBMS:
Data Structuring Grouping fields/cells of data into rows/records and allowing various associations between rows of the same or differing types.
Transactional Safety When multiple parts of the database must be updated as a single transaction, all changes must be saved, or none of them.
Rules Some changes to a database don’t make any sense. A DBMS will help define what changes are proper and help to enforce them.
Multi-User If 2 or 2000 people or programs are accessing a common bank of data, it is essential that their views of the data and changes to the data do not affect others incorrectly.
Safely Stored If a DBMS accepts a “change” in the form of a transaction, it guarantees that the change has been permanently stored regardless of what happens afterward.
Navigation Now that data has been stored, how can it be used? A DBMS will provide a number of ways to search through it or move through it so that useful information can be derived.
Most of this functionality is represented by the acronym ACID, which stands for Atomicity (all or nothing when making changes), Consistency (defining and enforcing rules), Isolation (multiple users unaffected by other users), and Durability (one written, it stays written). ACID is the DBMS contract with its users. ACID is the hard stuff to implement.
Other essential functionality, such as defining the structure of the data and searching through the data, is often very much simplified by a DBMS. A typical implementation of the SQL language will do both. In fact, the expressiveness of SQL has led many people to ask questions of the data that would never have been asked otherwise. Physical storage systems cannot yield answers that a well written set of SQL queries can.
The DBMS was invented because data has value. The technology was essential when the largest mainframe computer was less capable than a typical smartphone of today.
Database Management software makes sense for the entire spectrum of computers, ranging from mobile and embedded computers to large supercomputers and networks of computers. But to this day, the adoption of this proven technology has stopped at the workstation or laptop level. The use of a true DBMS is rare in a smart phone, tablet, or embedded/real-time computer. Since these latter computers come from a culture of small and limited resources, the software development mentality is different than that which is common in the rest of the computing world. This will change. The glut of valuable data, often called Big Data, and the pervasiveness of data-driven devices, often characterized by the phrase Internet of Things (IoT) will force mobile and embedded computers to do their share of the work. The amount of computing power that is currently available is much greater than the computing power that is being used. Adoption of DBMS technology at all levels will improve this imbalance.
Regarding the options that are available for data storage, we are saying that the DBMS is the best option because of the problems this technology was invented to solve.
Let’s add one more very important reason to use a DBMS: to make sure your data is not left in isolation as the world moves in the direction of connectedness. Data management systems are moving in the direction of open sharing (in a secure, safe way, of course!), and new software systems should be prepared to be participants in that world. It is short-sighted, and probably more difficult, to “build your own” data storage when you can plug into past innovation and future openness with one good decision.