I find is amazing how the computing world revolves around something called “Data”. For us in the IT world the word “Data” comes naturally. Somewhere down the line we take it for granted, we don’t even think that we need to explain the term “Data” to someone. Until one day my 5 years old daughter overheard me in an official phone conversation to use the term “Data” repeatedly (frankly speaking I was in a panic since we lost some vital transaction data) and asked me a very innocent question “What is Data?”.
Curiosity leads to knowledge
When I was a kid my father was my encyclopedia, whatever questions I had in my mind I use to ask him. He always took it very seriously to explain it very accurately, in a way I can understand, no matter how trivial or silly the question is. My father always used to say that when a question is raised in one’s mind, it is an obligation of that person to search for the answer. I also try my best to induce that sprit in my daughter.
I never thought it will be so hard for me to explain the term to her in a way, which she can understand.
Concept of data is difficult to explain
Honestly, how many of us can actually define and explain the term “Data” to a layman, for example to an Andaman Nicobar island’s Jarawa tribe’s man (assuming we can communicate effectively in a common language). Or even take a previous generation person. Let’s say my grand ma (She was born in British India. When I got my first PC in 80s, she was still alive. She defined computer in her own way, as a typewriter which uses a television instead of paper, to save trees)
I am writing this blog after getting enlightened by daughter’s innocent question. I feel even after 14-15 year of IT experience, I doubt how much we know, about the way the definition of “Data” has evolved and changed, and based on that definition maturity, the corresponding tools and products has developed to manage the “Data” Let’s take a stock!
So what is Data?
wikipedia.org | Data is a set of values of qualitative or quantitative variables; restated, pieces of data are individual pieces of information. |
WhatIs.com | In computing, data is information that has been translated into a form that is more convenient to move or process. |
businessdictionary.com | Data is information in raw or unorganized form (such as alphabets, numbers, or symbols) that refers to, or represents, conditions, ideas, or objects. Data is limitless and present everywhere in the universe. |
I can go on with lots of other definitions, but that is not the point. One thing we must see is that term “Data” and “Information” goes hand in hand.
For example:
150 | Allen Barrett | 56 | 70 |
101 | Dipak Shaw | 76 | 65 |
434 | Timothy Bolton | 83 | 94 |
342 | Tim Gaffney | 85 | 46 |
Do we understand from the above table? What does it represents? Remember this is just some numbers and text. I think it will be safe to say it is just a data.
But once we put a header with the above table:
Roll No. | Name | Mathematics | Physics |
150 | Allen Barrett | 56 | 70 |
101 | Dipak Shaw | 76 | 65 |
434 | Timothy Bolton | 83 | 94 |
342 | Tim Gaffney | 85 | 46 |
Now it makes sense. It represents a marks sheet/ score card in this case.
OR
Emp ID. | Name | Base Salary (in thousands) | Allowance (in thousands) |
150 | Allen Barrett | 56 | 70 |
101 | Dipak Shaw | 76 | 65 |
434 | Timothy Bolton | 83 | 94 |
342 | Tim Gaffney | 85 | 46 |
Just by changing the header the same details, represents completely different thing, it represents a pay slip in this case.
Without the header, the table alone was just some “Data”, but with the header it becomes some “Information”.
Just remember how the computer world matured with this concept.
The Ages of FLAT files
I am lucky that I started my career just couple of years before Y2K, hence I got a chance to program in COBOL language at one point of my career. In those days we used flat files which were like the above table without the header details. The header details were in a separate COBOL program file (Remember the sweet old “Data Division” in COBOL program!). This was a problem because the COBOL program file’s “DATA DIVISION” will not run without a proper and matching data file. The other way round, the actual data file (the flat file) was useless without the program file. Lots of efforts, resources and money were spent just to manage this.
Then comes dBASE
The next generation of programming tools addressed this problem. There was a tool called DBASE created by a company called Ashton-Tate (see details http://en.wikipedia.org/wiki/Ashton-Tate). It was a big name in those days in the ranks of Microsoft. It created the concept of keeping the “Data” and the “Header” details in a database file together. So the programming language like dBASE, Clipper, Foxpro evolved to exploit this concept, and the programming and programmer concentrated more on how to do insert, update and retrieve data from the database file. So in a typical business utility/software there were couples of database files representing individual tables. And some program files to do insert/ update and retrieve operation. But again how these desperate sets of database files (tables) were inter-related was again kept in the separate program files. Even though the information (Raw data and the associated header details) was together, which individually made some sense, but how they are related to each other was again was a tough job and costs a lot of effort, resource and money to the organizations.
The Golden Age of RDBMS
The concept of RDBMS might have been on the corner mainly used in defense, academic and research community, but Oracle must be credited to bring this concept to masses in a commercially viable way. Oracle was the first popular RDBMS tool, where the relationship between separate tables in a business application/software was embedded along with the information tables.
So now a database will have the raw data, and the header details (Fields definitions) (raw data and the field together is now the called “Tables”) and the relationship between the Tables is also stored inside the database. This concept gave birth to a new generation of professionals, the “DBA”. So programmers were freed to concentrate on the programming only business logic.
All most all major players came up with their version on the RDBMS like Oracle DB, SYBase, Microsoft Access/ SQL Server, SUN systems MySQL etc.
Renaissance in computing
This period was very turbulent for the IT professionals, people were in crossroad which way to go? In IT world, whoever controls the “Data” in the king. The kingdom was taken away from the programmers by a new breed of professionals called DBAs, they were in demand and better paid, they were going places. Even the programmers were no more called a programmer, they were called coders.
I chose to stick with the programming stream, though there were opportunities and temptation, to cross the fence. The decisions were purely based on personal passion. In programming I quenched my thirst of creativity, till today I do enjoy coding, I can stay awake overnight even skip office, if I start coding and creating something interesting.
Once programming community was free from the Data storage and management related activities (because a special stream of professional took over the data part – The DBAs), the programmers concentrated more on quality of coding, reusability of code, standardization, business logic, patterns etc. The concept of Object Oriented Programming came to the masses, it was during this period, Dennis Ritchie’s C language which was hibernating at AT&T Bell Labs, came out to become C++ language, it was popular easy to use. Following which a horde of next generation of Object Oriented Programming languages got developed, like Delphi, Java, Python, C#, VB.Net etc.
Logic and Data
With Object Oriented Programming the programmers created their way of viewing the “Data”, which was very different from the world of Database and DBAs.
In the database world the “Data” is in the form of rows and columns stored in tables. Where as in Object Oriented world the data is in the form of Objects and Entities and their attributes.
Since this is not a technical blog, I am not explaining it in details, but people with some understanding of IT industry will understand what I am saying.
Data representation
Another way of putting it is that the end users/ business houses prefer to visualize and understand data in a way that they see and interact with daily, which are the screens and reports, the application forms etc. They get confused when they are told that their employee data is going to reside in 10 different tables in a database with foreign keys and primary key relationship joining the tables.
With Object Oriented approach, the users/business feel more comfortable to relate to their business entities like the screens and reports, the application forms etc.
The programmers were able to innovate new ways of design methods like the design patterns, UML etc. to come closer to the business/ user community. And once again got back the control over the kingdom called “Data”. There were applications designed during this period, where the database was just used as a storage medium, without any relational constraints, those were taken care by the Object Oriented programming, because the RDBMS at that time, were not suitable to handle such radical design concepts and patterns.
The consolidation period
Like any conflict, the after effect is always the same “replace the old with the new”. During this time IT industry consolidated the radical new design concepts, and update the database and storage systems to meet the new design challenges. So next generation of programming frameworks like ORM, IOC and database products like object-oriented database management system (OODBMSs) came up.
Some major players in programming world were JBOSS, Spring Framework, .Net Entity Framework, Enterprise Library etc. And in database world some major products were ConceptBase, ObjectDB, ObjectStore etc.
Lots of interesting period of innovation took place after this period, I can go on, but one blog is too small to accommodate all of it. I will put my thoughts in my upcoming blogs.
Until then, give it a thought that the term “Data” which we take it for granted, can be very difficult to define, which is why IT industry is still evolving. It is something like ZERO which ancient Indian and Persian philosophers thought of and we take it for granted.