Are we not obsessed with Data? Imagine how much we spend on capturing, storing and analyzing data in our day to day business. Is it worth to invest so much on enterprise level data? Well it is difficult to answer these questions in a sentence or a paragraph. However, it is safe to say that the data as such is of no importance rather the variation of information that can be extracted from the data is of more value to us.
Concept of data (re-cap)
In my previous blog “Honey it’s all about DATA!”, I tried to explain the concept of simple data and how a raw data can be transformed into an information and gain value form consumption point of view. Also, I tried to give a brief account of the technologies that evolved to manage this information.
This is in continuation of my previous blog “Honey it’s all about DATA!”. Thanks for reading and your feedback, after all it is for these feedback and comments, that armature blogger like me, are encouraged to squeeze time out of our schedule and pen down our thoughts.
As mentioned in my previous blog, the journey from simple flat file to DBMS, RDBMS, OODBMS, seems a long way, however in reality evolution of database was the first step in this journey.
There is an old saying that “Until we mature the use of knife, we may not appreciate the utility of a dagger”.
Once our IT industry matured in using the database efficiently, we felt the need of Data Warehouse and Data Mart. In the initial days of programming the emphasis was on logic constructs, algorithms and coding language. Data on the other hand was treated as a consequence of programming (probably because programming was still in realm of academics and research scholars). No wonder volume of data involvement was also low at that time. But things changed once programming language was developed to apply in business environment. The programs became big and complex and took shape of application.
The birth of Master Data, Communal Computing and the business of Data Processing
Once people understood the concept of applications in an organization, they discovered that there is a need to share data between applications, thus the concept of Master data was born.
IBM took this opportunity and developed the system of Mainframe and Midrange, where they formed a single place where all applications of an enterprise and the database can reside. Big corporations who could afford such investment, evolved a new business model called “Data Processing”, where business data was captured offline by respective line of business and feed into the Mainframe by some data entry operator and the data was processed and reports were generated as per business requirement. It was kind of communal computing like centralization of IT activities. During this time batch processing and time-sharing model was the way.
The power of Personal Computing
Once personal computers (PC) were commercially viable and massed produced, it was mostly used as typewriter and a place to play computer games. Initially it was not even considered for any data processing purpose. PCs were too much of a toy as compared to sophistication and size of Mainframe.
But things changed, users of PC started to do data processing at a very different scale, namely in the form of spread sheets. Software like LOTUS 123 was a hot cake for accountants, managers at various levels. This created a sense of independence from the centralized IT department. No more waiting for them to generate your reports. The need for more sophisticated PC technologies emerged for controlling the PC level data processing, like fourth generation languages (FoxPro, Clipper etc.). Soon the multi-dimensional analytical tools started coming.
The Side effect
It is needless to say that with so much power at the hands of end –users, soon data capturing and data processing started to happen at every level with in the enterprise. The data integrity was lost. Everyone had a copy of similar data believing their copy is latest. There was no data control and authenticity of information.
Data warehouse
It is in this environment, the concept of data warehouse was evolved. Fundamentally data warehousing means that within an enterprise various individual databases will be integrated and, all lines of business and their respective applications should agree on the meaning and content of data. To describe data warehouse will take long, I will write a separate blog on data warehousing. However it is important to note, that since data warehouse stores data history as well as the data is granular (smallest level of transactions) there is a possibility to address future information and reporting needs.
New type of database and data modelling technique and data classifications
With data warehouse, new types of database started to emerge, like multi-dimensional databases based on the star schema, or snowflakes modelling, these were also called data marts. Few other examples are online databases, analytical databases, Operational Data Store (ODS), Exploration databases and data mining databases. (we really need a separate article for detailed explanation of the various types of Database)
It will be incomplete if I don’t mention one person how took the data classification and modelling to a different level. Bill Inmon is recognized as the father of data warehousing.
There is one more person I respect in this field and seek advice from his papers, he is Ralph Kimball, and his contribution is in business intelligence and the concept behind Data Mart, he is rightly called the father of BI.
So, in my next blog I will write about these concepts of Data Warehouse, Data Mart, Business Intelligence and how it evolved Data Architecture. Until then do provide your feedback, your feedback and comments give me food for thoughts.