What is the difference between a database and a data warehouse? - database

What is the difference between a database and a data warehouse?
Aren't they the same thing, or at least written in the same thing (ie. Oracle RDBMS)?

Check out this for more information.
From a previous link:
Database
Used for Online Transactional Processing (OLTP) but can be used for other purposes such as Data Warehousing. This records the data from the user for history.
The tables and joins are complex since they are normalized (for RDMS). This is done to reduce redundant data and to save storage space.
Entity – Relational modeling techniques are used for RDMS database design.
Optimized for write operation.
Performance is low for analysis queries.
Data Warehouse
Used for Online Analytical Processing (OLAP). This reads the historical data for the Users for business decisions.
The Tables and joins are simple since they are de-normalized. This is done to reduce the response time for analytical queries.
Data – Modeling techniques are used for the Data Warehouse design.
Optimized for read operations.
High performance for analytical queries.
Is usually a Database.
It's important to note as well that Data Warehouses could be sourced from zero to many databases.

From a Non-Technical View:
A database is constrained to a particular applications or set of applications.
A data warehouse is an enterprise level data repository. It's going to contain data from all/many segments of the business. It's going to share this information to provide a global picture of the business. It is also critical to integration between the different segments of the business.
From a Technical view:
The word "Data Warehouse" has been given no recognized definition. Personally, I define a data warehouse as a collection of data-marts. Where each data-mart consists of one or more databases where the database is specific to a specific problem set (application, data-set or process).
Simply put a database is a component of a data-warehouse. There are many places to explore this concept, but because there is no "definition", you will find challenges with any answer you give.

A data warehouse is a TYPE of database.
In addition to what folks have already said, data warehouses tend to be OLAP, with indexes, etc. tuned for reading, not writing, and the data is de-normalized / transformed into forms that are easier to read & analyze.
Some folks have said "databases" are the same as OLTP -- this isn't true. OLTP, again, is a TYPE of database.
Other types of "databases": Text files, XML, Excel, CSV..., Flat Files :-)

The simplest way to explain it would be to say that a data warehouse consists of more than just a database. A database is an collection of data organized in some way, but a data warehouse is organized specifically to "facilitate reporting and analysis". This however is not the entire story as data warehousing also contains "the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system".
Data Warehouse

Data Warehouse vs Database: A data warehouse is specially designed for data analytics, which involves reading large amounts of data to understand relationships and trends across the data. A database is used to capture and store data, such as recording details of a transaction.
Data Warehouse:
Suitable workloads - Analytics, reporting, big data.
Data source - Data collected and normalized from many sources.
Data capture - Bulk write operations typically on a predetermined batch schedule.
Data normalization - Denormalized schemas, such as the Star schema or Snowflake schema.
Data storage - Optimized for simplicity of access and high-speed query. performance using columnar storage.
Data access - Optimized to minimize I/O and maximize data throughput.
Transactional Database:
Suitable workloads - Transaction processing.
Data source - Data captured as-is from a single source, such as a transactional system.
Data capture - Optimized for continuous write operations as new data is available to maximize transaction throughput.
Data normalization - Highly normalized, static schemas.
Data storage - Optimized for high throughout write operations to a single row-oriented physical block.
Data access - High volumes of small read operations.

DataBase :-
OLTP(online transaction process)
It is current data, up-to-date detailed data, flat relational
isolated data.
Entity relationship is used to design the database
DB size 100MB-GB simple transaction or quires
Datawarehouse
OLAP(Online Analytical process)
It is about Historical data Star schema,snow flexed schema and galaxy
schema is used to design the
data warehouse
DB size 100GB-TB Improved query performance foundation
for DATA MINING DATA VISUALIZATION
Enables users to gain a deeper understanding and knowledge about various
aspects of their corporate data through fast, consistent, interactive access
to a wide variety of possible views of the data

Any data storage for application generally uses the database. It could be relational database or no sql databases which are currently trending.
Data warehouse is also database. We can call data warehouse database as specialized data storage for the analytical reporting purposes for the company.
This data used for key business decision.
The organized data helps is reporting and taking business decision effectively.

Database:
Used for Online Transactional Processing (OLTP).
Transaction-oriented.
Application oriented.
Current data.
Detailed data.
Scalable data.
Many Users, Administrators / Operational.
Execution time: short.
Data Warehouse:
Used for Online Analytical Processing (OLAP).
Oriented analysis.
Subject oriented.
Historical data.
Aggregated data.
Static data.
Not many users, manager.
Execution time: long.

A Data Warehousing (DW) is process for collecting and managing data from varied sources to provide meaningful business insights. A Data warehouse is typically used to connect and analyze business data from heterogeneous sources. The data warehouse is the core of the BI system which is built for data analysis and reporting.

Source for the Data warehouse can be cluster of Databases, because databases are used for Online Transaction process like keeping the current records..but in Data warehouse it stores historical data which are for Online analytical process.

A Data Warehouse is a type of Data Structure usually housed on a Database. The Data Warehouse refers the the data model and what type of data is stored there - data that is modeled (data model) to server an analytical purpose.
A Database can be classified as any structure that houses data. Traditionally that would be an RDBMS like Oracle, SQL Server, or MySQL. However a Database can also be a NoSQL Database like Apache Cassandra, or an columnar MPP like AWS RedShift.
You see a database is simply a place to store data; a data warehouse is a specific way to store data and serves a specific purpose, which is to serve analytical queries.
OLTP vs OLAP does not tell you the difference between a DW and a Database, both OLTP and OLAP reside on databases. They just store data in a different fashion (different data model methodologies) and serve different purposes (OLTP - record transactions, optimized for updates; OLAP - analyze information, optimized for reads).

See in simple words :
Dataware --> Huge data using for Analytical/storage/ copy and Analysis .
Database --> CRUD operation with Frequently used data .
Dataware house is Kind of storage which u are not using on daily basis & Database is something which your dealing frequently .
Eg. If we are asking statement of bank then it gives us for last 3/4/6/more months bcoz it is in database. If you want more than that it stores on Dataware house.

Example: A house is worth $100,000, and it is appreciating at $1000 per year.
To keep track of the current house value, you would use a database as the value would change every year.
Three years later, you would be able to see the value of the house which is $103,000.
To keep track of the historical house value, you would use a data warehouse as the value of the house should be
$100,000 on year 0,
$101,000 on year 1,
$102,000 on year 2,
$103,000 on year 3.

Related

Is "Snowflake Data Cloud" a good choice for a cloud-native transactional application data-store?

Currently, I generate data on a different datastore and replicate to Snowflake Staging, then that data moves to the Data Warehouse DB through ELT ingestion for Analytics purpose. However this approach can be considered as creating data-silos in itself, since we already have 3 copies of the same data:
Transactional data-store DB
Replicated snowflake staging
Snowflake Data Warehouse DB
From a technical architecture point of view, is it a good idea to use Snowflake as a direct datastore for transactional application? (application that does many CRUD operations). That may help in avoiding the cost of replication and ingestion.
The main problem I see with this approach is that: Snowflake does not enforce any referential integrity (primary keys, foreign keys) so within the CRUD app, I have to either use a MERGE statement always or somehow make sure I don't create duplicate records.
The other problem being in the cloud, the distance (aka network) between the app and snowflake decides the performance of the transactions, I want good, consistent performance of my CRUD operations.
Any thoughts/suggestions are much appreciated.
Snowflake as of today does not perform well with singleton updates and inserts, which is what we see mostly with transactional databases. I have seen a performance degradation when using singleton inserts are submitted against Snowflake.
On the contrary, they are very optimized for bulk ingestion of unstructured data and structured data though and are designed for OLAP warehouses. You can still use it but you may see the same performance degradation. Also, primary keys can be defined but they are not enforced.
In my opinion, if you are faced with that challenge, you have the option to use a Postgre SQL DB (open source) in the cloud as your transactional database and it acts as a good complement to Snowflake as the OLAP database.
No. Snowflake isn't good as a transactional / OLTP database for the reasons you've mentioned. Plus, it won't perform well with many individual CRUD operations due to how they structure the data (optimised for OLAP workloads).
Just want to point out that there are benefits to creating separate databases, for one you want to isolate your transactional database from that of your analytics database otherwise you could be significantly affect the performance of the application. Secondly, the data in the transactional database could change and if you had to reprocess the data for whatever reason you may not be able to do so. There are many more, but I will stop here :-)

Using a regular database as a data warehouse

Can anyone tell me what the implications are when attempting to use a regular database as a data warehouse?
I understand a data warehouse is known for storing data in a more structured manner however what's the implication of using a standard database to achieve the same result? Can we not just create a regular database table with structured data as it would reside in a data warehouse?
Data structure is not the issue - optimization is.
OLTP databases like SQLS are optimized to reliably record transactions. They store data as records, and extensively use disk I/O.
BI databases like Redshift or Teradata are optimized to query data. They store data as columns, and often are in-memory only (no disk I/O).
As a result, traditional databases are better at getting data in, while BI databases are better at getting data out (both platforms are trying to mitigate their weaknesses, so the difference is blurring).
Practically speaking, you can use regular databases like SQLS to build a data warehouse without any problems, unless your needs are special:
Data size is large (billions of records)
Refresh rate is high (hour/minute/real time)
You intend to use live connection from BI tools like Tableau or PowerBI (as opposed to loading data extract into them)
Your queries are highly complex and computationally intensive
You can also combine both platforms. Import, process, integrate and store data in a regular database, and then convert it into a star schema (dimensional model) and publish it to a BI database (i.e, keep normalized data in SQLS and publish star schema to Redshift).
If you intend to import data into BI tools like Tableau or PowerBI, then you can safely use any traditional database, because they rely on their internal engines and using BI database won't give you any advantages.
data warehouses also will have redundant or duplicate data in them, not really what you are looking for in a regular database

why operational database are not fulfilling business challenges as data warehouse?

i have a question why operational database are not fulfilling business challenges as data warehouse?
in operational database i can create reports in details about any product or any thing and i can issue statistical reports with charts and diagrams, so why the operational database can not use as data warehouse?
Best Regards
Usually an operational database only keeps track of the current state of each record.
The purpose of a data warehouse is two-fold:
- Keep track of historic events without overwhelming the operational database;
- Isolate OLAP queries so that they don't impact the load on the operational datastore.
If you try to query your operational data store for sales per product line per month for the past year the amount of joins required, as well as the amount of information you need to read from storage may cause performance degradation on your operational database.
A data warehouse tries to avoid this by 1) keeping things separated and 2) denormalising the data model (Kimball approach) so that query plans are simpler.
I suggest reading The Data Warehouse Toolkit, by Ralph Kimball. The first chapter deals precisely with this question: why do we need a data warehouse if we already have an operational data store?
i can create reports in details about any product or any thing
and i can issue statistical reports with charts and diagrams
Yes you can but a business user can not as they don't know SQL. And you it's very difficult to put a BI tool (for a business users to use) over the top of an operational database for many reasons:
The data model is not built for an end user to understand. A data warehouse data model is (i.e. there is ONE table for customers that has everything about a customer in it rather than being split into addresses, accounts etc.)
The operational data store is probably missing important reporting reference data such as grouping levels and hierarchies
A slowly changing dimension is a method of 'transparently' modelling changes to for example, customers. An operational data model generally doesn't do this very well. You need to understand all the table and join them correctly, if this information is even stored
There are many other reasons but these just serve to address your points
When you get to the point that you are too busy to service business users requests, and you are issuing reports that don't match from one day to the other, you'll start to see the value of a data warehouse.

Data warehouse and/or database?

Can an enterprise use both data warehouse and database in one Head Office? Is it just OK to use only one of these or is it necessary to use both in the same place?
Yes, an enterprise can use both data warehouse and database in one office. They need not be in the same physical data-center. It all depends on the needs of the organization. Generally, databases are used to support transactions as they happen and data warehouses are used to support business intelligence or the like.
Database
Transactions in an enterprise is most likely to happen in a relational database management system (aka database, aka RDBMS). Reporting can happen using the same database, but it is also possible that reporting is done off of a mirror of the RDBMS. Now, an enterprise may have more than one RDBMS - one running SQL Server, one running Oracle, one running MySQL etc. All this is great for recording activities and reporting.
Warehouse
Additionally, enterprises seek to do data analysis on a regular basis. Business Intelligence, data science, big data - regardless of the term, we are talking about data analysis overall. Doing number crunching on large amounts of data stored in an RDBMS can be hard on the RDBMS. So, organizations decide to de-normalize data to some extent and store data in a warehouse. When data is extracted, transformed and loaded (ETL) from one or more RDBMS (and other sources of data) and stored in a data warehouse, it is available for some research.
Organizations may choose to move the warehouse to a different office location, or may have multiple-warehouses. For example, a headquarter with 5 satellite facilities may choose to bring data from all those facilities to the warehouse at the headquarter every night, or it may choose to have a warehouse in a different datacenter. In contrast to that, a company with hundreds of satellite facilities may choose to have a warehouse with high-level summarized data at their headquarter and regionalize their warehouses; one warehouse in each continent, so that target markets are better served by satellite units in that particular continent.
Database (or databases) to Warehouse journey
Business Intelligence
Cognos, QlikView, Tableau, Microstrategy etc. are some business intelligence/data analytic tools among many that reach out to the data warehouse and present data for analytics. They are great for presentation and reporting - data visualization, in general. These tools can also get data from RDBMS, but it's convenient to get it from data warehouse since they are architected in a way to make it easier to showcase that data on a business intelligence dashboard
Example of a dashboard:
Big data
The buzzword around big data is interesting. Many of us may take a subset of data from a large pool of data, do analysis and assume that the results from the subset applies to the large pool of data. What if all the data was used for analysis? And even better - what if we took related data from elsewhere (outside of our dataset) and included it in our analysis? Yeah, you would have a giant pile of data and if you had means to analyze them all, you'd be doing big data. We are talking several hundred GB or even PB of data. Although Hadoop and the like are used in big data analysis, they could derive that data from the warehouse.

data warehouse and database difference in implementation

Can anyone tell me the difference between a simple database and a data warehouse in terms of implementation?
I know that data warehouse is used for analysis rather than keeping record but I don't understand how are they structurally different
In simple database we have tables and so in a data warehouse. How can we make a data warehouse out of a simple database
In both cases we have queries so how are they different for each of them?
The differences are in the implementation, that is the representation (structure) of the data in tables.
A simple database is typically structured in normalized tables in order to minimize redundancy and optimize writing operations to the table. This can be achieved by dividing large tables into smaller and less redundant tables, so that data of the same kind are isolated in one place so that additions, deletions, and modifications of a field can be made in just one table. The smaller tables are then connected together via defined relationships between them (this is done by foreign keys), resulting in many joins between tables when retrieving the data.
On the other hand a datawarehouse is structured for reading operations only, which is why a datawarehouse accepts some level of redundancy in the data, because this makes reading faster. In a datawarehouse data is typically structured in what is called a Starschema approach through the use of dimensional modeling. That means you have 1 big table (Facttable) with all the relevant records and measures (fx sales amount in $), and then many minor tables (called dimensiontables) that describes the values in the facttable.
Dimensiontables could be something like Date, SalesCountry, SalesPerson, Product etc. that all describes the sales amount from the facttable. The dimensiontables are then related to the facttable with foreign keys, thereby creating the star like figure with the facttable in the middle and all the dimensiontables around it in a circle linking to it.
NB: This is a very simple introduction, and you should of course refer to some datawarehouse litterature to read more details. Look for books by Ralph Kimball and Bill Inmon, they are the gurus within the datawarehouse field.
Assuming you already know something about OLTP databases, the IBM Redbooks have a couple of downloadable titles about data warehouses that are worth looking at.
Data Modeling Techniques for Data
Warehousing
Dimensional Modeling: In a Business
Intelligence Environment
In essence, the way that data and tables are organized -- and more...
Read
Bill Inmon "Building the Data Warehouse"
Ralph Kimball "The Data Warehouse Toolkit"
OLTP stands for Online Transaction processing. The systems that are used in any booking system or in technical terms "OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing"
Now the next questions arrive what is the difference between OLTP and Data Warehouse?
There are many differences between the two, so we will list down some of the important differences :
The most important difference of all is : OLTP is normally in 3NF (3rd Normalized Form) whereas Data Warehousing is not in 3NF. Therefore, we can also infer that OLTP won't have any kind of Data Redundancy.
Data warehouse is used to store months and years of data to support historical analysis, whereas OLTP system store data for a few weeks or months. Therefore the sizes of the DB also has a vast difference. OLTP uses 100MB - 100GB where a Data Warehouse uses 100GB- few terabytes.
The highly normalized structure of the OLTP helps it to optimize the operations such as UPDATE/INSERT/DELETE, where Data Warehouse has a very de-normalized structure (Star Schema) to optimize query performance.
Data in Data Warehouse is pushed on regular basis by ETL process and end user do not update the data warehouse directly whereas in OLTP systems, end users routinely issue individual data modification statements to the database and thus the OLTP system is up to date.
These are few important differences between OLTP and Data Warehouse.
Read more

Resources