Notes: Oracle 11g - Data Warehousing Guide
From Wikinology
Contents |
Concepts
- The characteristics of a data warehousing
- Subject Oriented
- Integrated
- Nonvolatile
- Time Variant
- OLTP v.s. Data Warehousing
- Few <-- Indexes --> Many
- Many <-- Joins --> Some
- Normalized DBMS <-- Duplicated Data --> Denormalized DBMS
- Rare <-- Derived Data and Aggregates --> Common
- Summary: unpredictable workload, batch processing, denormalized schema, large data bunch, historical data.
- Data Warehousing Architectures
- Data Warehousing Architecture: Basic
- Data Warehousing Architecture: with a Staging Area
- Data Warehousing Architecture: with a Staging Area and Data Mart
- Oracle Data Mining supports the following data mining functions:
- Classification
- Regression
- Anomaly detection
- Attribute importance
- Clustering
- Association
- Feature Extraction
Logical Design
- Results:
- A set of entities and attributes corresponding to fact tables and dimension tables.
- A model of operational data from your source into subject-oriented information in your target data warehousing schema.
- Data Warehousing Schemas
- Star Schema: center -- fact tables, end points -- dimension tables.
- Only one join establishes the relationship between the fact table and any one of the dimension tables.
- It optimizes performance by keeping quires simple and providing fast response time. All the information about each level is stored in one row.
- Other schemas: third normal form, snowflake schema (star schema with dimension tables in a tree structure), OLAP (supports dimensional data types)
- Star Schema: center -- fact tables, end points -- dimension tables.
- Data Warehousing Objects
- Fact tables: numeric facts (measurements) and foreign keys to dimension tables
- Dimension tables: hierarchies, levels
- Unique identifiers
- Relationships
Physical Design
Hardware and I/O Consideration
- Configure I/O for Brand-width not Capacity
- Stripe Far and Wide
- Use Redundancy
- Test the I/O System Before Building the Database
- Plan for Growth
Oracle Storage Management: Oracle Managed Files and Automatic Storage Management
Indexes
- Degree of cardinality: The number of unique values of a column divided by the total number of rows in the table.
- As a general rule, a cardinality of under 1% makes a good candidate for a bitmap index.
- B-tree indexes are most effective for high-cardinality data: that is, for data with many possible values, such as customer_name or phone_number.
- Bitmap indexes include rows that have NULL values, whereas B-tree indexes don't.
Four Join Models for Bitmap Join Indexes
- Bitmap Join Index: One Dimension Table Columns Joins One Fact Table
CREATE BITMAP INDEX sales_cust_gender_bjix ON sales(customers.cust_gender) FROM sales, customers WHERE sales.cust_id = customers.cust_id LOCAL NOLOGGING COMPUTE STATISTICS;
- Bitmap Join Index: Multiple Dimension Columns Join One Fact Table
CREATE BITMAP INDEX sales_cust_gender_ms_bjix ON sales(customers.cust_gender, customers.cust_marital_status) FROM sales, customers WHERE sales.cust_id = customers.cust_id LOCAL NOLOGGING COMPUTE STATISTICS;
- Bitmap Join Index: Multiple Dimension Tables Join One Fact Table
CREATE BITMAP INDEX sales_c_gender_p_cat_bjix ON sales(customers.cust_gender, products.prod_category) FROM sales, customers, products WHERE sales.cust_id = customers.cust_id AND sales.prod_id = products.prod_id LOCAL NOLOGGING COMPUTE STATISTICS;
- Bitmap Join Index: Snowflake Schema
CREATE BITMAP INDEX sales_co_country_name_bjix ON sales(countries.country_name) FROM sales, customers, countries WHERE sales.cust_id = customers.cust_id AND customers.country_id = countries.country_id LOCAL NOLOGGING COMPUTE STATISTICS;
Restrictions and Requirements -- P63
Integrity Constraints
- Constraint States
- ENABLE -- enforcement
- VALIDATE -- validation
- RELY -- belief
Materialized Views
- Applied Circumstances: Data Warehouses, Distributed Computing, Mobile Computing
- It is recommended that you try to avoid writing SQL statements that directly reference the materialized view, because then it is difficult to change them without affecting the application. Instead, let query rewrite transparently rewrite your query to use the materialized view.
- Materialized View Schema Design: 7 Guidelines -- P76
Loading Data into Data Warehouses
- A popular and efficient way to load data into a data warehouse or data mart is to use SQL*Loader with the DIRECT or PARALLEL option, Data Pump, or to use another loader tool that uses the Oracle direct-path API.
- In one-phase loading:
- Data is loaded directly into the target table, quality assurance tests are performed, and errors are resolved by performing DML operations prior to refreshing materialized views.
- If a large number of deletions are possible, then storage utilization can be adversely affected, but temporary space requirements and load time are minimized.
- You can disable query rewrite at the system level by issuing an statement until all the materialized views are refreshed.
ALTER SYSTEM SET QUERY_REWRITE_ENABLED = FALSE
Types of Materialized Views
Materialized Views with Aggregates
- For fast refresh to be possible,
- the SELECT list must contain all of the GROUP BY columns (if present), and there must be a COUNT(*) and a COUNT(column) on any aggregated columns.
- also, materialized view logs must be present on all tables referenced in the query that defines the materialized view.
- It can be defined to be refreshed ON COMMIT or ON DEMAND.
- Requirement: COUNT(*) must always be present to guarantee all types of fast refresh.
Materialized Views Containing Only Joins
- If you specify REFRESH FAST:
- A materialized view log must be present for each detail table unless the table supports PCT. Also, when a materialized view log is required, the ROWID column must be present in each materialized view log.
- The rowids of all the detail tables must appear in the SELECT list of the materialized view query definition.
- Note:
- If some of these restrictions are not met, you can create the materialized view as REFRESH FORCE to take advantage of fast refresh when it is possible.
- If the materialized view has remote tables in the FROM clause,
- all tables in the FROM clause must be located on that same site.
- ON COMMIT refresh is not supported.
Nested Materialized Views
- Nested materialized view can be created on materialized views, but all parent and base materialized views must contain joins or aggregates.
- All the underlying objects (materialized views or tables) on which the materialized view is defined must have a materialized view log.
- Nesting Materialized Views with Joins and Aggregates
- Some types of nested materialized views cannot be fast refreshed. Use EXPLAIN_MVIEW to identify those types of materialized views.
- You can refresh a tree of nested materialized views in the appropriate dependency order by specifying the nested = TRUE parameter with the DBMS_MVIEW.REFRESH parameter.
-
DBMS_MVIEW.REFRESH ('SUM_SALES_CUST_TIME', nested => TRUE)
- Nested Materialized View Usage Guidelines -- P83
- Use the REFRESH_DEPENDENT procedure with the nested parameter value set to TRUE if you want to ensure that all materialized views in a tree are refreshed.
- Restriction: You cannot create both a materialized view and a prebuilt materialized view on the same table.
Creating Materialized Views
Example:
CREATE MATERIALIZED VIEW cust_sales_mv PCTFREE 0 TABLESPACE demo STORAGE (INITIAL 16k NEXT 16k PCTINCREASE 0) PARALLEL BUILD IMMEDIATE REFRESH COMPLETE ENABLE QUERY REWRITE AS SELECT c.cust_last_name, SUM(amount_sold) AS sum_amount_sold FROM customers c, sales s WHERE s.cust_id = c.cust_id GROUP BY c.cust_last_name;
Creating Materialized Views with Column Alias Lists
- The standard method of attaching aliases in the SELECT clause for name resolution restricts the use of the full text match query rewrite and it will occur only when the text of the materialized view's defining query and the text of user input query are identical.
- You can add a column alias list to a CREATE MATERIALIZED VIEW statement.
- Note that when aliases are specified in both the SELECT clause and the new alias list clause, the alias list clause supersedes the ones in the SELECT clause.
Query Rewrite
- You can
- verify what types of query rewrite are possible by calling the procedure DBMS_MVIEW.EXPLAIN_MVIEW, or
- use DBMS_ADVISOR.TUNE_MVIEW to optimize the materialized view so that many types of query rewrite are possible.
- Materialized View Restrictions
- The defining query of the materialized view cannot contain any non-repeatable expressions (ROWNUM, SYSDATE, non-repeatable PL/SQL functions, and so on).
- The query cannot contain any references to RAW or LONG RAW datatypes or object REFs.
- If the materialized view was registered as PREBUILT, the precision of the columns must agree with the precision of the corresponding SELECT expressions unless overridden by the WITH REDUCED PRECISION clause.
- General Query Rewrite Restrictions
- A query can reference both local and remote tables (an eligible materialized view referencing the same tables is available locally).
- Neither the detail tables nor the materialized view can be owned by SYS.
- If a column or expression is present in the GROUP BY clause of the materialized view, it must also be present in the SELECT list.
- Aggregate functions must occur only as the outermost part of the expression. That is, aggregates such as AVG(AVG(x)) or AVG(x)+ AVG(x) are not allowed.
- CONNECT BY clauses are not allowed.
Refresh Options
- Refresh Modes: ON COMMIT, ON DEMAND
- Refresh Options: COMPLETE, FAST, FORCE, NEVER
- Constraints: TRUSTED CONSTRAINTS, ENFORCED CONSTRAINTS
- General Restrictions on Fast Refresh -- P89
- Restrictions on Fast Refresh on Materialized Views with Joins Only
- Restrictions on Fast Refresh on Materialized Views with Aggregates
- Restrictions on Fast Refresh on Materialized Views with UNION ALL
Materialized View Logs
- The SEQUENCE column is required in the materialized view log to support fast refresh with a combination of INSERT, UPDATE, or DELETE statements on multiple tables.
- You can, however, add the SEQUENCE number to the materialized view log after it has been created.
- FORCE option: silently ignores the existing elements and adds to the materialized view log any items that do not already exist in the log.
