TERADATA came into existence with a goal and dream in mind, businesses that wants to build huge data wareshouses don't have to compromise because their database is unable to meet the challenges and requirements of constant change. It has the ability to quickly respond to changing conditions and is more flexible than any RDBMS. It is designed to perform parallel processing and has the capability to be flexible linearly and accommodate terabytes of data.
In short when other database vendors designed their databases around OLTP (Online Transaction Processing) and stopped thinking further, But that's where TERADATA started thinking, they came up with an idea to design a database for OLAP (Online Analytical Processing) or DSS (Decision Support Systems).
Teradata Architecture is simple and complex in itself. It is simple because the user doesn't have to worry much and complex because the RDBMS has its own complex algorithms and intelligence that knows the easiest and fastest way to retrieve data. It also has its own way of storing the data. Teradata based on the fundamentals of mathematical set theory. It is the database that loads data in parallel, backups data in parallel and process data in parallel. It is a story of parallel processing.
In my next post I'll give an overview and Architecture of the TERDATA so that it will be a jump start for all the nacent users.
....continued TERADATA Architecture
Friday, August 21, 2009
Today's world is the result of human evolution through human thought process and adapting to the ever dynamic changes in this over crowded planet. It isn't limited to just that it applies to all possible areas. It is quite amazing how things have changed over the time. I would not drive you to an antique show but every morning when you wake up you are not only thrilled and excited for the latest technology but also amazed to look behind how fast things are changing.It applies the same to Information Technology. Imagine the old way of communicating through drums and messengers to today's fiber optics communications. I'm sure this page will be too outdated when some new technology will beat/replace fiber optics. The way we do business has changed considerably. Business Managers have to keep up the pace to meet challenges to stay in this competitive business. As an author of this website my goal is to share the knowledge and experience with the outside world that I know or learned. Since this website goal is to teach the Data Warehousing, Business Intelligence and Database Concepts, I'll restrict it to purely technical concepts. You can also read my social blog http://VasudevK.blogspot.com.
In my first few chapters I'll give an overview about Data warehousing and terminology. After that I'll start with the basics of teradata and go in detail.
Introduction
In today's world, Information about the market at photon's speed helps the Industry to make strategic decisions proving Darwin's theory, Survival of the fittest. Making right decision at the right time helps the wheel of Industry to pass through hurdles. Data warehousing is Industry's heart for strategc decision making.
A data warehouse is core repository of the organization's historical data, or called community memory. It is the data, information and the knowledge used by people for interacting using these object in making decisions. For example, an organization would use the information that's stored in its data warehouse to find out what day of the week they sold the most Widgets in May 1992, or how employee sick leave the week before Christmas differed between Mumbai and Seattle from 2002 - 2006. In other words, the data warehouse contains the raw material for management's DSS(decision support system). The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis (such as data mining) on the information without slowing down the operational systems.
Origin/History of Data Warehouse
During the late 80s to Early 90s Data Warehouses became a unique type of computer database . The sole purpose to develop is to meet a growing demand for management information and analysis that could not be met by operational systems. Operational systems were unable to meet this need for a range of reasons:
The processing load of reporting reduced the response time of the operational systems,
The database designs of operational systems were not optimized for information analysis and reporting,
Most organizations had more than one operational system, so company-wide reporting could not be supported from a single system, and
Development of reports in operational systems often required writing specific computer programs which was slow and expensive
As a result, separate computer databases began to be built that were specifically designed to support management information and analysis purposes. These data warehouses were able to bring in data from a range of different data sources, such as mainframe computers, minicomputers, as well as personal computers and office automation software such as spreadsheet, and integrate this information in a single place. This capability, coupled with user-friendly reporting tools and freedom from operational impacts, has led to a growth of this type of computer system.
Advantages of using data warehouse
There are many advantages to using a data warehouse, some of them are:
Enhances end-user access to a wide variety of data.
Business decision makers can obtain various kinds of trend reports e.g. the item with the most sales in a particular area / country for the last two years.
A data warehouse can be a significant enabler of commercial business applications, most notably customer relationship management (CRM) the automation or support of customer processes involving sales or service representatives.
Concerns in using data warehouses
Extracting, cleaning and loading data is time consuming.
Data warehousing project scope must be actively managed to deliver a release of defined content and value.
Problems with compatibility with systems already in place.
Security could develop into a serious issue, especially if the data warehouse is web accessible.
Data Storage design controversy warrants careful consideration and perhaps prototyping of the data warehouse solution for each project's environment.
After reading this Chapter you should be familiar with Data Warehousing and Business Intelligence .
Overview: I would like to explain in plain simple English and then dwell into details about this chapter.
Business Intelligence(BI): You can call this a technology or an application or some practices to support better decission making. This is data driven meaning data that serves as an input to make decisions. so the system that makes these decisions is called Decision Support Systems( also called DSS). Now when you combine the technology, applications for efficient way of reading this data for better understanding its behavior and trends, we call it Business Intelligence.
Technical Savvy definition for BI: Business intelligence (BI) is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. It is a simplification process of discovering information and analyzing that enables every decission maker in an organization to easily access, understand, analyze, collaborate and quickly act on it for better timely decission making.
BI is about extracting information from multiple sources of data, Synthesizing useful knowledge from collections of data, filtering irrelevant informationAnd setting the remaining information in the context of the business and its environment, discovering what plausible actions to be taken or decisions to be Made at different times. To sum up it is a process of using information wisely and keeps you to up to date. And again it is a data driven Decision support System.
Relationship between Business Intelligence and Data warehousing
So why are we talking these two terms together? Are they inter-related? or they both same or what is their relationship?
Many of the tool vendors who sell their products or softwares call it business Intelligence software rather than Data warehousing software. so what is it? Business Intelligence is a term commonly associated with data warehousing. Business Intelligence is a generalized term where a company initiates various activities to gather today's market information which also includes about their competitor. Today's business Intelligence systems are contrasted to more classical way of information gathering in mining and crunching the data in the most optimal manner. In short we can say BI simplifies information discovery and analysis. In this way the company will have a competitive advantage of business and intelligently using the available data in strategic and effective decision making. it has the ability to bring disparate data under one roof with a meaningful information and ready for analysis.
so what has Data warehousing to do with Business Intelligence?
Business intelligence usually refers to the information that is available for the enterprise to make decisions on. A data warehousing (or data mart) system is the backend, or the infrastructural, component for achieving business intelligence. Business intelligence also includes the insight gained from doing data mining analysis, as well as unstructured data (thus the need fo content management systems).
Let me give the path of Data warehousing. All the source data from disparate sources are used to load/Stage data. Different sources can be flat files, another database or some other process. The starting point of the Data warehouse should extract the data in order to load into its environment.This is extracting. This data may not be the expected format or size. your business demands are different or your organization business requirements are different. So the business process has to modify the data or better word is to transform the incoming data to meet requirements and objectives. This is called Transformation. Once every slicing and dicing of the data is done along with applied business rules, this data is ready for loading into the target tables. This process is called Loading. So overall till now we have done Extraction, Transformation and Loading. In short we call this ETL. There are lot of tools available in today's market which does help in achieving the ETL process. Once this data is loaded in to the database, this is ready for next processing. We call that database as Data warehouse database. The next process could be building of datamarts or directly reporting from it. There are lot of tools/software available for reporting/analysis. Some call it business reporting or analysis tool. But if you see the whole process has intelligence involved in business. we can call this or the gurus call it Data warehousing and the system involved from end to end is called business intelligence system.
To sum up the whole BI Systems or DW process. it is the ETL plus Reporting which is called the Data warehousing. Now having read this you should have an idea of what is the whole business involved in Data warehousing. In my next chapter I'll discuss the concepts and terminology used in Data warehousing.
One line difference between Data Warehouse and Business Intelligence:Data Warehousing helps you store the data while business intelligence helps you to control the data for decision making, forecasting etc.
Data warehousing using ETL jobs, will store data in a meaningful form. However, in order to query the data for reporting, forecasting, business intelligence tools were born.
Chapter 3: Concepts of Data Warehousing
Introduction: Concepts of Data warehousing and Business Intelligence is so important for professionals who are working in that subject area projects. Working on a business intelligence (BI) or data warehousing (DW) project can be overwhelming if you don't have a solid grounding in the basics. It's difficult to focus on the goals of the project when you're bogged down by unanswered questions - or don't even know what questions to ask. By arming yourself with knowledge of the concepts and fundamentals, you can hit the ground running. As we know now A data warehouse is a central repository for a subset of data that an enterprise's various business systems collect. Historically, data warehouses were most often used as a central repository to integrate, cleanse and reconcile data used for business intelligence (BI) and analysis. This chapter provides an overview that gives business and information technology professionals the confidence to dive right into their business intelligence and data warehousing activities and contribute to their successDifferent concepts of Data warehouse:
Data Warehouse: A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process.
Data mart: A data mart is a subset of data warehouse that is designed for a particular line of business, such as sales, marketing, or finance. In a dependent data mart, data can be derived from an enterprise-wide data warehouse. In an independent data mart, data can be collected directly from sources. It is specific for a business function. this database contains data summarized at multiple levels of granularity.
Dimension: A category of information. A grouping of data that defines something that we are analyzing.For example, Gender, Office.
Measure: It is the performance Indicator of the data that you are analyzing. For example, Score on a test, Number of bail-outs, percentage of foreclosures, 2008 budget for different offices.
Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension.
Data Cleansing: The transformation of data in its current state to a predefined standardized format using software or in other words format the incoming data to your business needs.
Data Extraction: The process of pulling data from external data sources in order to feed it to the data warehouse environment.
Dataflow diagrams(DFDs) : it is a technique which describes the data flow in a business. It could be one system data flow or a chain of systems which are inter related and inter dependent.
Decision Support System: A system that primarily focuses on supporting analytical and tactical reporting. it is a trend analyzer. it is the source of reporting the trends in the business over time with detailed information.
Dimesnion Table: A table that contains discrete values. it contains attributes used to constrain and group data when a query is performed. These attributes may change over the time. For example, the phone number of a person may change. This is referred to as slowly changing dimensions.
Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. It is usually the primary table that contains measurements(mostly numeric data like grade, wages etc). Lets say we have a column called Primary Names( this is dimension), the values in this Primary Name( Vasu, Harshitha, Rahul, Jawa, Raghu) are the facts. Similarly the salary dimension and the amounts in it are the facts.
Lookup Table: The lookup table provides the detailed information about the attributes. For example, the lookup table for the Quarter attribute would include a list of all of the quarters available in the data warehouse. Each row (each quarter) may have several fields, one for the unique ID that identifies the quarter, and one or more additional fields that specifies how that particular quarter is represented on a report (for example, first quarter of 2001 may be represented as "Q1 2001" or "2001 Q1").
A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another. Dimensions and hierarchies are represented by lookup tables. Attributes are the non-key columns in the lookup tables.
In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema and Snowflake Schema.
Online Analytical Processing(OLAP): It is the system that analyzes the data to report the business trends. it gives the statistical information of the business retrieved from the Data warehouse. Also called Multidimensional analysis.
Operation Data Store: A system with database structure that is the repository for near real time operational data rather than long term trend data.
Reverse Engineering: It is a reverse process where you start engineering the existing code to write system specifications or to generate documentation.
Schema: It is a model having information about the database implemenatation
Star Schema: In the star schema design, a single object (the fact table) sits in the middle and is radially connected to other surrounding objects (dimension lookup tables) like a star. A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table.
Snowflake Schema: The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables.
Introduction: Concepts of Data warehousing and Business Intelligence is so important for professionals who are working in that subject area projects. Working on a business intelligence (BI) or data warehousing (DW) project can be overwhelming if you don't have a solid grounding in the basics. It's difficult to focus on the goals of the project when you're bogged down by unanswered questions - or don't even know what questions to ask. By arming yourself with knowledge of the concepts and fundamentals, you can hit the ground running. As we know now A data warehouse is a central repository for a subset of data that an enterprise's various business systems collect. Historically, data warehouses were most often used as a central repository to integrate, cleanse and reconcile data used for business intelligence (BI) and analysis. This chapter provides an overview that gives business and information technology professionals the confidence to dive right into their business intelligence and data warehousing activities and contribute to their successDifferent concepts of Data warehouse:
Data Warehouse: A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process.
Data mart: A data mart is a subset of data warehouse that is designed for a particular line of business, such as sales, marketing, or finance. In a dependent data mart, data can be derived from an enterprise-wide data warehouse. In an independent data mart, data can be collected directly from sources. It is specific for a business function. this database contains data summarized at multiple levels of granularity.
Dimension: A category of information. A grouping of data that defines something that we are analyzing.For example, Gender, Office.
Measure: It is the performance Indicator of the data that you are analyzing. For example, Score on a test, Number of bail-outs, percentage of foreclosures, 2008 budget for different offices.
Attribute: A unique level within a dimension. For example, Month is an attribute in the Time Dimension.
Data Cleansing: The transformation of data in its current state to a predefined standardized format using software or in other words format the incoming data to your business needs.
Data Extraction: The process of pulling data from external data sources in order to feed it to the data warehouse environment.
Dataflow diagrams(DFDs) : it is a technique which describes the data flow in a business. It could be one system data flow or a chain of systems which are inter related and inter dependent.
Decision Support System: A system that primarily focuses on supporting analytical and tactical reporting. it is a trend analyzer. it is the source of reporting the trends in the business over time with detailed information.
Dimesnion Table: A table that contains discrete values. it contains attributes used to constrain and group data when a query is performed. These attributes may change over the time. For example, the phone number of a person may change. This is referred to as slowly changing dimensions.
Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount would be such a measure. This measure is stored in the fact table with the appropriate granularity. For example, it can be sales amount by store by day. It is usually the primary table that contains measurements(mostly numeric data like grade, wages etc). Lets say we have a column called Primary Names( this is dimension), the values in this Primary Name( Vasu, Harshitha, Rahul, Jawa, Raghu) are the facts. Similarly the salary dimension and the amounts in it are the facts.
Lookup Table: The lookup table provides the detailed information about the attributes. For example, the lookup table for the Quarter attribute would include a list of all of the quarters available in the data warehouse. Each row (each quarter) may have several fields, one for the unique ID that identifies the quarter, and one or more additional fields that specifies how that particular quarter is represented on a report (for example, first quarter of 2001 may be represented as "Q1 2001" or "2001 Q1").
A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup tables, but fact tables do not have direct relationships to one another. Dimensions and hierarchies are represented by lookup tables. Attributes are the non-key columns in the lookup tables.
In designing data models for data warehouses / data marts, the most commonly used schema types are Star Schema and Snowflake Schema.
Online Analytical Processing(OLAP): It is the system that analyzes the data to report the business trends. it gives the statistical information of the business retrieved from the Data warehouse. Also called Multidimensional analysis.
Operation Data Store: A system with database structure that is the repository for near real time operational data rather than long term trend data.
Reverse Engineering: It is a reverse process where you start engineering the existing code to write system specifications or to generate documentation.
Schema: It is a model having information about the database implemenatation
Star Schema: In the star schema design, a single object (the fact table) sits in the middle and is radially connected to other surrounding objects (dimension lookup tables) like a star. A star schema can be simple or complex. A simple star consists of one fact table; a complex star can have more than one fact table.
Snowflake Schema: The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and joining smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase number of lookup tables.
to be continued...TERADATA
Subscribe to:
Posts (Atom)