Data to Dashboard in 90 Minutes with Ajilius Data Warehouse Automation

Data to Dashboard in 90 Minutes with Ajilius Data Warehouse Automation Hero Image

This is the third video in our Data Warehouse Automation series, Data to Dashboard in 90 Minutes with Ajilius Data Warehouse Automation. In this video, we demonstrate a full end-to-end workflow for the creation of a data warehouse from the Sakila movie rental database using Ajilius.

The following steps are covered during the demonstration:

  • Connecting to data sources (MySQL and CSV)
  • Extracting data from data sources to loading tables
  • Transforming data from load tables to staging tables
  • Creating dimension and fact tables
  • Exporting metadata to Yellowfin for data analysis

We hope you find the video informative. A transcript of the first component of the video is included below.

Our second video, Introduction to Data Warehouse Automation covered some of the fundamentals of data warehouse automation and how it delivers significant value to organisations.

In this video, I use the Ajilius data warehouse automation platform to demonstrate a full end-to-end workflow to create a data warehouse from scratch. I speed the video up along the way to compress about 90 minutes of work into about 25 minutes. Future videos will show each part of the process at normal speed.

The Sakila Database

First, we’ll talk about the database that we’ll be using for this exercise. We’re going to use the ‘Sakila’ database which is a sample data set for the MySQL database platform.

The story behind the sample database is all about a DVD rental company. The company uses a system to record all the rental activity for the business, and all data entered into the system is saved a transactional database. This transactional database is the dataset we’ll be working with.

The dataset contains a variety of information such as:

  • Information about customers and staff (personal details, etc.)
  • Information about each store (location, manager, etc.)
  • Information about each film (e.g. title, language, genre)
  • Information about when when each film was rented and returned
  • Metrics such as how much revenue the rental generated

The data has been collected in a transactional database which isn’t structured to be optimised for reporting and analytics. In the first video in our data warehouse automation series, we mentioned that a star schema is the best data structure to support analytics.

Building a Data Warehouse with Ajilius

In this exercise, we’re going to transform the transactional data in the DVD rental database to a star schema. This will allow quick and easy analysis of the rental transactions at the DVD rental company.

We will build the following:

  • Fact table containing metrics such as how long each film was rented for and how much revenue was generated
  • A series of dimensions allowing this data to be sliced and diced by staff member, customer, store, film and date

Sakila Movie Rental Database Structure

The process we’ll go through is as follows:

  • Create a destination database for the warehouse in Microsoft SQL server and connect the destination data warehouse to Ajilius
  • Create a connection to the source database, which is the Sakila DVD rental database in MySQL
  • Extract data from the source database to the loading tables
  • Transform the data from the loading tables into the staging tables
  • Create dimension and fact tables
  • Create relationships between the dimensions and fact table to create a star schema and then export the metadata to Yellowfin to allow immediate analysis of the data

A diagram of the steps involve in creating the Sakila Data Warehouse using Ajilius

Summary of Timings

The table below summarises the time it took for each of the components of the data warehouse to be completed. All in all, the data warehouse was created in less than 90 minutes!

A table displaying a summary of the time taken to create the data warehouse

Stay tuned for future videos in our data warehouse automation series where we’ll use the Ajilius Data Warehouse Automation platform to further build upon the data warehouse we have just created.

What Is Data Warehouse Automation And How Does It Deliver Value?

This is the second video in our Data Warehouse Automation series, Introduction to Data Warehouse Automation. This video is ideal for people from a business background who are new to data warehouse automation and want to find out more. We’ll be covering the following topics:

  • Overview of data warehouse development
  • Problems with traditional data warehouse development methods
  • Evolutionary data warehouse development
  • How data warehouse automation delivers business value

If you haven’t already seen our first video in the series, Introduction to Data Warehousing, you may choose to watch this video first.

We hope you find the video below informative.

Notes from the video are provided below.  The slides can also be found on SlideShare.

Our first video, Introduction to Data Warehousing covered some of the basics of data warehouses and how they deliver business value.

In this video, we’re going to talk more about the development of data warehouses, and in particular, give you an introduction to data warehouse automation, and how it delivers significant value to organisations. Let’s first cover what is involved in developing a data warehouse.

Overview of traditional data warehouse development

Diagram showing the ETL / ELT process of a data warehouse

Traditional data warehouse organisations often have data sitting in various locations. This includes operational systems (where the data isn’t structured for reporting), in Excel or CSV, or in the cloud in some form. We need to get the data from these sources into our data warehouse where it is structured for reporting.

This is done by:

  • Writing code to extract the data from the sources and load it into a temporary location.
  • Transforming the data into a format suitable for reporting. This process also includes processes to improve the quality control (called cleansing), and data integration.
  • Making the data available in the data warehouse where it is structured in an optimum way for reporting purposes

This whole process is known in the industry as “ELT” or “ETL”. Traditionally, all of the code for the ELT/ETL process was written from scratch.

The problems with traditional data warehouse development methods

There are four main problems with traditional data warehouse development.

  • It’s time consuming as it requires significant manual coding – approximately 70-75% of the total project time. As a result, projects take far too long and users have to wait to get the information they need.
  • It’s error-prone. Due to the amount of manual coding required, it is easy to make mistakes and difficult to find and fix errors.
  • It’s costly due to the development resources and the effort required. Coding accountes for 70-75% of project costs. As a result, projects often result in failure – due to budget overruns, slow delivery, and low perceived value with stakeholders.
  • Users get frustrated as it takes way too long for them to get the information they need.

What’s the solution to these problems?

You might say that data warehouses are bad – let’s stop using data warehouses. However this isn’t the right solution because data warehouses provide a tremendous amount of business value. The problem was that we were using the wrong approach. There was a mismatch between traditional data warehouse development and the evolutionary nature of analytics development.

There were three main problems that are specific to analytics development.

  • Users don’t know whether they want something until they’ve actually seen it. We needed a different approach where we could release functionality rapidly in short, iterative cycles and make changes based upon user feedback. The reason is that if we build most functionality in one go, we would run a high risk that the output would have little value to the business.
  • Users requirements change once they start using the analytics system. If you don’t respond to those changes, people will stop using it.
  • Manual coding takes far too long and costs too much money.

The right solution to these problems is to use a combination of:

  • Evolutionary analytics development methods, and
  • Data warehouse automation

Evolutionary Development

Evolutionary Analytics Development: An iterative cycle of prioritisation, prototyping and feedback.

Evolutionary development involves a constant cycle of requirements gathering, rapid prototyping and review. It aims to deliver functionality incrementally in short release cycles. The evolutionary approach allows development of analytics systems that deliver value in a shorter time frame and have a tighter fit with organisational decision-making requirements.

The problem is that the evolutionary approach is extremely difficult with traditional data warehouse development methods – because manual coding takes too long. So we need to use data warehouse automation to solve these problems.

How Data Warehouse Automation delivers value

Rapid data warehouse delivery and agile changes
By automating much of the design, build and maintenance of your data warehouse. You can pull data from virtually any data source or cloud-based platform and deploy new functionality rapidly – in days and weeks, not months and years.

Dramatically reduced development costs
As all development is metadata-driven, minimal coding is required. As such, fewer ELT developers are needed and a high-quality solution can be delivered in a very short time.

Consistent high quality code and automated documentation
Data warehouse automation builds the entire extract, load and transform process by combining metadata that describes your data warehouse with industry best practice design patterns. This results in high-quality, consistent and error-free code with auto-generated documentation.

Choice of data warehouse platform
One of the problems with traditional data warehouse development was that after you’ve spent time writing the code to extract data from your source systems to your destination data warehouse, you were more or less locked into your data warehouse platform.

For example, if you were using SQL server to house your data warehouse, and you wanted to move to a cloud-based platform such as Amazon Redshift, it would take you months or perhaps years to rewrite all the code from scratch.

With data warehouse automation, all the transform logic is built in using metadata, and then the data warehouse automation platform will use this metadata and logic to write the specific code required for your target data warehouse platform. If you want to switch to a different platform, the data warehouse automation platform will re-write all the ELT code for you, rewrite it specifically so that it is optimised for the new platform.

Metadata export
Data warehouse automation tools allow you to export metadata into your data visualisation tool such as Yellowfin, Power BI, Qliq and Tableau, allowing you to analyse your data immediately and save time in report development.

And the output?

  • Functionality can be delivered rapidly, incrementally and fully support evolutionary development of analytics functionality
  • Development costs are a fraction of those using traditional methods, with less financial risk around data warehouse initiatives and a significantly higher return on investment
  • Dramatically reduced risk
  • Greater stakeholder engagement as the output is tightly aligned with their requirements
  • Far more business value and a significantly higher return on investment

Stay tuned for the next video in our data warehouse automation series where we’ll use the Ajilus data warehouse automation platform to build a data warehouse from scratch!

Do decision makers in your organisation have to wait weeks and months for changes and additions to be made to the data warehouse? Has your organisation avoided building a data warehouse because you have heard it can take too long or it’s too expensive? Minerra’s experienced consultants can help you assess your organisations needs and show you how using data warehouse automation and evolutionary development methods can deliver a cost effective data warehouse in much less time than you thought. Contact us for a casual chat to see how we can help.

What Is A Data Warehouse And How Does It Deliver Value?

We’re kicking off a series of videos to answer “What is a data warehouse?” and “Data Warehouse Automation” with our first video: Introduction to Data Warehousing. This video is ideal for people with a business background who would like to learn more about data warehousing, specifically:

  • What a data warehouse is
  • How a data warehouse delivers business value
  • How a data warehouse is structured
  • How a data warehouse is used

We hope you find the video below informative.

Notes from the video are provided below. The slides can also be found on SlideShare.

What is a data warehousing?

Diagram showing the flow of data through a data warehouse

In order to correctly understand and make use of  data warehousing, you need to understand the problem you are solving.

It may be a decision that needs to be made, or a question that needs answering. For example:

  • A marketing manager might want to understand where to invest their online advertising dollars
  • A call centre manager might want to know the optimum number of staff to hire for their call centre, or
  • A sales manager might want to identify the customers that deliver the most profit – so they can find more of these customers.

We want to make an informed, and objective, data driven decision, and we need some data to answer these questions.

At an organisation, you may have data being collected in a variety of places. This may be:

  • From internal systems, such as a finance system or ERP, customer systems such as your CRM, HR / payroll, or operations systems such as a manufacturing system
  • From web apps, such as Salesforce and Xero, Google Analytics, or social media such as Twitter and Facebook
  • From spreadsheets and flat files
  • From other cloud sources
  • Or from any other data sources within the organisation

However this data is likely to be:

  • Not structured for reporting
  • Hard to access
  • Captured in a silo and not integrated with all the other data to give a complete picture

This is where data warehousing comes in. Data warehousing allows you to:

  • Extract data from your organisational systems
  • Load it into a centralised location
  • Transform and integrate the data into a format optimised for analytics

The data warehouse can be used as a source for your data visualisation tool to provide reports & dashboards, for advanced analytics, and for a variety of other purposes.

So how does this create value for an organisation?

  • A data warehouse creates a single source of data that is consistent in format, structured in a uniform way, contains complete and accurate data that can be relied upon, and is up to date
  • It is structured and designed specifically to allow data to be accessed quickly
  • It provides a single integrated view of an organisation by combining data from multiple sources
  • Provides a complete data set, allowing you to analyse data from the past to predict the future

All of this delivers value to the business by:

  • Giving managers access to the information they need more quickly and easily, with significantly less ongoing effort to prepare the data
  • Providing information about the business environment more quickly which means that managers can respond to changes rapidly
  • Providing information more frequently

And leads to more informed managers, making data driven, objective decisions – for which you’ll see the results in your organisation’s bottom line.

How is a data warehouse structured?

At its most basic, a data warehouse is a collection of tables containing data structured in a way that is optimised for reporting and analytics.

There are different ways that the data can be modeled. The best way to model data to support decision making is a dimensional model, sometimes known as a ‘star schema’ or a Kimball model.

At a high level, the tables in a dimensional model can be categorised into two different types:

  • Fact tables (also called event tables) contain individual business events (e.g. sales transactions), or aggregated, summarised business events (e.g. sales by month). They consist of measures and calculations such as sales amount and discount rate.
  • Dimensions contain descriptive attributes – fields that describe the measures in the fact table. They allow you to slice and dice your data. For example, a given sales transaction relates to a customer, a product, a salesperson, and a date.

So you could look at, for example

  • The transactions by salesperson for a given product
  • The top customers by revenue
  • The overall trend on revenue over time
  • If there is a relationship between discounts given and sales

Data warehouses usually contain multiple star schemas for analysing organisational data.

How are data warehouses used?

A data warehouse contains data that is used to support business decisions. It can be used in a variety of ways:

  • With a Data visualisation tool such as Yellowfin, Power BI, Tableau or Qlik, to provide reports and dashboards to managers to allow them to monitor and manage organisational performance, share information around the business, drive processes or send out tasks.
  • With Advanced Analytics such as network and cluster analysis, forecasting, data mining, sentiment analysis, simulation modeling, etc. There could be a two-way feed between the data warehouse and advanced analytics.
  • As a source of data to feed into other organisational systems, for example, into CRM for future marketing after customer segmentation analysis.

Stay tuned for the next video in our data warehousing automation series: Introduction to Data Warehouse Automation!

Do you have people in your organisation who spend most of their time using Excel to manually “glue” together data from for different systems? Do you only get monthly reports because weekly reports take too long? Do you have to wait too long for ad hoc data questions to be answered? If you answered yes to one or more of these questions then your organisation may gain a lot of value from a data warehouse. Minerra’s consultants have a lot of experience working with many type of organisation to help them quickly build a cost effective data warehouse that significantly reduces the manual effort required to prepare the data for regular reports and ad hos analysis. Contact us for a casual chat to see how we can help.