Positive Attitude??
Monday, September 2, 2013
Friday, August 30, 2013
Dimensional Data Warehousing with MySQL (Using the Bitnami WAMP Stack)
PART I: CHAPTER 1: TASKS
This is the first post in the series of posts wherein the various assignments w.r.t the creation of a dimensional data warehouse using MySQL in a phpMyAdmin environment in the Bitnami WAMPStack.
I tried creating the dwid user id as given, but it threw an error, as shown:
I started off with the creation of two databases, namely "DW" and "SOURCE". The code for the same and the subsequent log file indicating success of the source-code is shown below:
CODE USED:
create database dw;
create database source;
| Code to create Database |
| Log Report
Then I used the SQL query to display all databases in the system.
CODE USED:
show databases;
|
The next step was the creation of tables in the database "DW". The tables created were as follows:
- customer_dim (to store customer details)
- product_dim (to store product details)
- order_dim (to store order details)
- date_dim (to store the dates associated with orders)
- sales_order_fact (to keep track of the sales w.r.t orders. It contained foreign keys to link the various tables)
CODE USED:
USE dw;/* creating customer_dim table */CREATE TABLE customer_dim( customer_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY, customer_number INT , customer_name CHAR (50)ip_code INT (5) , customer_city CHA, customer_street_address CHAR (50) , customer_ zR (30) , customer_state CHAR (2) , effective_date DATE , expiry_date DATE ) ;/* creating product_dim table */
CREATE TABLE product_dim( product_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY, product_code INT , product_name CHAR (30)e DATE , expiry_date DATE ), product_category CHAR (30) , effective_da t;/* creating order_dim table */
CREATE TABLE order_dim( order_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY, order_number INT , effective_date DATE, expiry_date DATE ) ;/* creating date_dim table */
CREATE TABLE date_dim( date_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY, date DATE , month_name CHAR (9) , month INT (1)DATE , expiry_dat, quarter INT (1) , year INT (4) , effective_date e DATE ) ;/* creating sales_order_fact_table */
CREATE TABLE sales_order_fact( order_sk INTNT , product_sk I, customer_sk INT , order_date_sk INT10, 2) ) ;, order_amount DECIMAL(
The next step was to check whether the tables created were in order or not. For this we used the "show" command again.
CODE USED:
show create table customer_dim;
show create table product_dim;
show create table order_dim;
show create table date_dim;
show create table sales_order_fact;
Next we inserted values into the customer_dim table.
CODE USED:
USE dw; /* since table resides in dw, hence we first access dw*/
INSERT INTO customer_dim /* the attributes/column_names are first written followed by their values*/
( customer_number /* I left out customer_sk as it is an auto generated value*/
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, effective_date
, expiry_date )
VALUES
(1, 'Big Customers', '7500 Louise Dr.', '17050',
'Mechanicsburg', 'PA', CURRENT_DATE, '9999-12-31')
, (2, 'Small Stores', '2500 Woodland St.', '17055',
'Pittsburgh', 'PA', CURRENT_DATE, '9999-12-31')
, (3, 'Medium Retailers', '1111 Ritter Rd.', '17055',
'Pittsburgh', 'PA', CURRENT_DATE, '9999-12-31')
;
| Insert_Into_Customer_Dim Log |
| Table View |
The table looked as follows on the execution of the following code:
CODE USED:
use dw;
select * from customer_dim;
Then we inserted a few more values into the table and finally used the select statement again to view the table.
CODE USED:
USE dw;
INSERT INTO customer_dim (customer_numbercustomer_street, customer_name , _address , customer_zip_codete , effective_, customer_city , customer_st adate , expiry_date ) VALUESScott St.', '17050', 'Mechanicsburg', 'PA'(4, 'Good Companies', '950 0, CURRENT_DATE, '9999-12-31')Rossmoyne Rd.', '17050', 'Mechanicsburg', 'PA', C, (5, 'Wonderful Shops', '3333 URRENT_DATE, '9999-12-31') , (6, 'Loyal Clients', '7070 Ritter Rd.', '17055','Pittsburgh', 'PA', CURRENT_DATE, '9999-12-31');
__________________________________________
use dw;
select * from customer_dim;
| Inserting Values |
| Select & Display |
| Final Table View |
I completed the first chapter with this. The next phase would be entering into dimensional modelling of the
databases.
Now, the second part of the project begins. In this phase we designed the database using
Slowly Changing Dimension Techniques. There are three variants of the SCD technique: SCD Type 1
(SCD 1), SCD Type 2 (SCD2), and SCD Type 3 (SCD3). SCD1 updates dimension records
by overwriting the existing data-no history of the records is maintained. SCD1
is normally used to directly rectify incorrect data.
SCD2 maintains dimension history by creating newer ‘versions’ of a
dimension record whenever its source changes. SCD2 does not delete or modify
existing data.
For this database we used SCD 1 and SCD 2.
First we created the customer_stg table as a .csv file named customer.csv and this file was saved and then subsequently loaded. Then the SCD1 model was applied onto the customer_dim table created last time. The screenshots for the same are as shown:
Then a similar process was followed to implement the SCD2 model wherein we used the product_dim table along with product_stg a table created by uploading a product.txt file into the database. The process was:
The final product table is shown in the last screenshot.
Next we tested for additive measures.A measure is fully additive if the total of its individual values across any one dimension is the same as the total across any other dimension and across any combination of some or all dimensions.This section explains fully additive measures by using the order_amount measure in the sales_order_fact table. The screenshot for the process is as given:The order_amount measure is fully-additive if all queryresults are the same. We will prove that all the four queries produce a total order of 58,000.The next check for additivity across_date_product_order which sums the order_amount values across the date, product, and order dimensions (adding up the order_amount values by selecting customers only). The process is as shown:The third query, the across_date_customer_order.sql script in sums the order amounts across the date, customer, and order dimensions.Next we apply dimensional queries of three specific types: aggregate, specific, and inside-out queries.First query is the daily aggregation, which gives the daily sales summary. The aggregation of the order amounts and number of orders is done by date. That the join between the sales_order_fact table and date_dim table is on their surrogate keys.Next we check for annual aggregation (code+output):Next we tried specific queries starting with MONTHLY STORAGE PRODUCT SALES and the output for the same are as follows:The next was QUARTERLY SALES IN MECHANISGBURG and the code for the same is:The next in line were the inside out queries and these were done for MONTHLY PRODUCT PERFORMER. The output is :The next analysis was the LOYAL CUSTOMER ANALYSIS:Next comes the ETL (EXTRACT, TRANSFORM, LOAD) phase wherein the first task was to check how push-by-source CDC (Change Data Capture ) works on sales order source data. Push-by-source CDC means the source system extracts only the changes since the last extraction.The code for the same is as shown below:Next we created table sales_order in the SOURCE database. The output screen for this was as shown:Then we executed the push_data.sql script.The output of select * from sales_order is then displayed here:We next moved onto PRE-POPULATING a table via a function (or procedure). We created a procedure for the same as shown:USE dw; DELIMITER // ; DROP PROCEDURE IF EXISTS pre_populate_date // CREATE PROCEDURE pre_populate_date(IN start_dt DATE, IN end_dt DATE) BEGIN WHILE start_dt <= end_dt DO INSERT INTO date_dim( date_sk , date , month_name , month , quarter , year , effective_date , expiry_date ) VALUES( NULL , start_dt , MONTHNAME(start_dt) , MONTH(start_dt) , QUARTER(start_dt) , YEAR(start_dt) , '0000-00-00' , '9999-12-31' ) ; SET start_dt = ADDDATE (start_dt, 1); END WHILE; END //DELIMITER ; //The output for the same is as shown:But on executing the script, an error was thrown which could not be resolvedThen we executed the ONE DATE EVERY DAY section wherein we ran a code to populate the date table. The code and subsequent output window was as shown:Then we displayed all values in the table:Finally created the sales_order dataThen we populated the date_dim table by loadingThe source table of sales_order followed by the date_dim table was then displayedValues were also inserted into the sales_order table at source.Next up was a script for initial population.Finally the truncate()function was used to drop the values and re-load them using the defined functions in the SQL server. After truncation, due to errors in the server (files for the procedures were not saved onto the dw folder in data folder in MySQL under Bitnami) the remaining steps couldn't be performed
Created the script confirm_initial_population.sql
Subscribe to:
Comments (Atom)