A
couple of decades ago, the data and information management landscape was
significantly different. Though the core concepts of Analytics, in a
large sense, has not changed dramatically, adoption and the ease of
analytical model development has taken a paradigm shift in recent years.
Traditional Analytics adoption has grown exponentially and Big Data Analytics needs additional and newer skills.
For
further elaboration, we need to go back in time and look at the journey
of data. Before 1950, most of the data and information was stored in
file based systems (after the discovery and use of punched cards
earlier). Around 1960, Database Management Systems (DBMS) became a
reality with the introduction of hierarchical database system like IBM
Information Management and thereafter the network database system like
Raima Database Manager (RDM). Then came Dr. Codds Normal Forms and the
Relational Model. Small scale relational databases (mostly single user
initially) like DBase, Access and FoxPro started gaining popularity.
With
System R from IBM (later becoming the widely used Structured Query
Language database from which IBM DB2 was created), and ACID (Atomicity,
Consistency, Isolation, Durability) compliant Ingres Databases getting
released, commercialization of multi-user RDBMS became a reality with
Oracle and Sybase (now acquired by SAP) databases coming into use in the
coming years. Microsoft had licensed Sybase on OS2 as SQL Server and
later split with Sybase to continue on the Windows OS platform. The open
source movement however continued with PostGreSQL (an Object-Relational
DBMS) and MySQL (now acquired by Oracle) being released around mid
1990's. For over 2 decades, RDBMS and SQL grew to become a standard for
enterprises to store and manage their data.
From
1980's, Data Warehousing systems started to evolve to store historical
information to separate the overhead of Reporting and MIS from OLTP
systems. With Bill Inmon's CIF model and later Ralph Kimball's popular
OLAP supporting Dimensional Model (Denormalized Star & Snowflake
schema) gaining popularity, metadata driven ETL & Business Intelligence
tools started gaining traction, while database product strategy
promoted the then lesser used ELT approach and other in-database
capabilities like in-database data mining that was released in Oracle
10g. For DWBI products and solutions, storing and managing metadata in
the most efficient manner proved to be the differentiator. Data Modeling
Tools started to gain importance beyond desktop and web application
development. Business Rules Management technologies like ILOG JRules,
FICO Blaze Advisor or Pega started to integrate with DWBI applications.
Once
the Data Warehouses started maturing, the need for Data Quality
initiatives started to rise, since most Data Warehousing development
cycles would have used a subset of the production data (at times
obfuscated / masked) during development and hence even if the
implementation approach would have included Data Cleansing and
Standardization, core DQ issues would start to emerge post production
release to even at times render the warehouse unusable till the DQ
issues were resolved.
Multi-domain
Master Data Management (Both Operational or Analytical) / Data
Governance projects started to grow in demand once organizations started
to view Data as an Enterprise Asset for enabling a single version of
truth to help increase business efficiency and also for both internal
and at times external data monetization. OLAP integrated with BI to
provide Ad-hoc reporting besides being popular for what-if modeling and
analysis in EPM / CPM implementations (Cognos TM1, Hyperion Essbase,
etc.)
Analytics was primarily implemented by practitioners using SAS (1976) and SPSS (1968) for Descriptive and Predictive Analytics
in a production environment and ILOG (1987) CPLEX, ARENA (2000) for
Prescriptive Modeling including Optimization and Simulation. While SAS
had programming components within Base SAS, SAS STAT and SAS Graph, the
strategy evolved to move SAS towards a UI based modeling platform with
Enterprise Miner and Enterprise Guide getting launched, products that
were similar to SPSS Statistics and Clementine (later IBM PASW modeler)
which were essentially UI based drag-drop-configure analytics model
development software for practitioners usually having a background in
Mathematics, Statistics, Economics, Operations Research, Marketing
Research or Business Management. Models used sample representative data
and a reduced set of factors / attributes and hence performance was not
an issue till then.
Around
mid of last decade, if anyone had knowledge and experience with Oracle,
ERWin, Informatica and MicroStrategy or competing technologies, they
could play the role of a DWBI Technology Lead or even as an Information
Architect with additional exposure & experience on designing Non
Functional DW requirements including scaleability, best practices,
security, etc.
Sooner,
the enterprise data warehouses, now needing to store years of data,
often without an archival strategy, started to grow exponentially in
size. Even with optimized databases and queries, there was a drop in
performance. Then came Appliances or balanced / optimized data
warehouses. These were optimized database software often coupled with
the operating system and custom hardware. However most appliances were
only supporting vertical scaling. However, the benefits that appliances
brought were rapid accessibility, rapid deployment, high availability,
fault tolerance and security.
Appliances
thus became the next big thing with Agile Data Warehouse migration
projects being undertaken to move from RDBMS like Oracle, DB2, SQL
Server to query optimized DW Appliances like Teradata, Netezza,
GreenPlum, etc. incorporating capabilities like data compression,
massive parallel processing (shared nothing architecture), apart from
other features. HP Vertica, which took the appliance route initially,
later reverted to become a software only solution.
Initially
Parallel Processing had 3 basic architectures – MPP, SMP and NUMA. MPP
stands for Massive Parallel Processing, and is the most commonly
implemented architecture for query intensive systems. SMP stands for
Symmetric Multiprocessing and had a Shared Everything (including shared
disk) Architecture while NUMA stands for Non Uniform Memory Architecture
which is essentially a combination of SMP and MPP. Over a period of
time, the architectures definitions became more amorphous as products
kept on improvising their offerings.
While
Industry and Cross-Industry packaged DWBI & Analytics Solutions
became increasingly a Product and SI / Solution Partner Strategy, end
of last decade started to see increasing adoption of Open Source ETL, BI
and Analytics technologies like Talend, Pentaho, R Library, etc.
adopted within industries (with the only exceptions of Pharma & Life
Science and BFSI Industry groups / sectors), and in organizations where
essential features and functionality were sufficient to justify the ROI
on DWBI initiatives that were usually undertaken for strategic
requirements and not for day to day operational intelligence or for
insight driven additional or new revenue generation.
Also, cloud based
platforms and solutions adoption and even DWBI and Analytics
application development on private or public cloud platforms like
Amazon, Azure, etc. (IBM has now come with BlueMix and DashDB as an
alternate) started to grow as part of either a start-up strategy or cost
optimization initiative of Small and Medium Businesses and even in some
large enterprises as an exploratory initiative, given confidence on data security.
Visualization
Software also started to emerge and carve a niche, growing in
increasing relevance mostly as a complementary solution to the IT
dependent Enterprise Reporting Platforms. The Visualization products
were business driven, unlike technology forward enterprise BI platforms
that could also provide self-service, mobile dashboards, write-back,
collaboration, etc. but had multiple components with complex integration
and pricing at times.
Hence
while traditional enterprise BI platforms had a data driven "Bottom Up"
product strategy, with dependence and control with the IT team,
Visualization Software took a business driven "Top Down" Product
Strategy, empowering business users to analyze data on their own and
create their own dashboards with minimal or no support from the IT
department.
With
capabilities like geospatial visualization, in-memory analytics, data
blending, etc. visualization software like Tableau is increasingly
growing in acceptance. Some others have blended Visualization with
out-of-box Analytics like TIBCO Spotfire and in recent years SAS Visual
Analytics, a capability which otherwise is achieved in Visualization
tools mostly by integrating with R.
All
of the above was manageable with reasonable flexibility and continuity
till data was more or less structured and ECM tools were used to take
care of documents and EAI technologies were used mostly for real-time
integration and complex event processing between Applications /
Transactional Systems.
But
a few year ago, Digital platforms including Social, Mobile and other
platforms like IOT/M2M started to grow in relevance and Big Data
Analytics grew beyond being POCs undertaken as an experiment to
thereafter complement an enterprise data warehouse (along with
enterprise search capabilities), to at times even replace them. The data
explosion gave rise to the 3 V dilemma of velocity, volume and variety
and now data was available in all possible forms and in newer formats
like JSON, BSON, etc. which had to be stored and transformed real-time.
Analytics
had to be now done over millions of data in motion unlike the
traditional end of day analytics over data at rest. Business
Intelligence including Reporting and Monitoring, Alerts and even
Visualization had to become real-time. Even the consumption of analytics
models now needed to be real-time as in the case of customer recommendations and personalization, trying to leverage smallest windows of opportunity to up-sell / cross-sell to customers.
It is Artificial Intelligence systems, powered by Big Data
that is becoming the game changer in the near future and it is Google,
IBM and others like Honda who are leading the way in this direction.
To be continued.........