How is querying a data warehouse different than querying a database? - database

Say I have a data warehouse like BigQuery, RedShift. I store data which is fit for online analytical processing (OLAP). Similarly suppose I have a database like MySQL or Microsoft SQL Server which has some data fit for online transaction processing(OTLP).
What are the different parameters on which querying a data warehouse and a database would be different?

This is a very general question nevertheless I think the following can help you make your desicion:
1. How much data you have Vs relational features
2. Cloud solution Vs on premesies
3. Payment models (derived from 2) for example bq model is per scan while other is per storage

Related

Create a Data Warehouse with the database on SQL Developer

I have a database in SQL Developer which pull data from an ERP tool and I would like to create a Data warehouse in order to connect it then to PowerBI.
It's my first time that I am doing all this process from the beginning so I am not so experienced.
So where are you suggesting to create the Data Warehouse (I was thinking on SSMS) and how can I connect it with PowerBI ?
My Data Warehouse will consist from some View of my tables and some Joins to get some data in the structure that I want since it is not possible to change anything in the DB.
Thanks in advance.
A "data warehouse" is just a database. The distinction is really more about the commonly used schema design, in the sense that a warehouse is often built along the lines of a star or snowflake design.
So if you already have a database that is extracting data from your ERP, there is nothing to stop you from pointing PowerBI directly at that and performing some analytics etc. If your intention is to start with this database, and then clone/extract/load this data into a new database which is a star/snowflake schema, then that's a much bigger exercises.

Best Options to manage large sets of data SQlserver

I am currently working on a project which involves the following:
The application I am working on is connected to a SQlserver
database.
SAP loads information into multiple tables (in a daily
and also hourly basis) into a MASTER database
There are 5 other databases(hosted on the same server) that access this information via synonyms and stored procedure calls to the MASTER database
The MASTER database purely used for storing the data and routing it to the other databases)
Master Database -
Tables:
MASTER_TABLE1 <------- SAP inserts data into this table.Triggers are used to process the valid data & insert into secondary staging tables -say MASTER_TABLE1_SEC
MASTER_TABLE1_SEC -- Holds processed data coming into MASTER_TABLE1
FIVE other databases ( for each manufacturing facility) are present in the same server. My application is connected to the facility databases ( not the Master)
FACILITY1
Facility2
....
FACILITY5
Synonyms of MASTER_TABLE1_SEC are created in each of these 5 facility databases
Stored procedures are again called from the Facility databases- in order to load data from the MASTER_TABLE1_SEC into the respective tables( within EACH facility) based on the business logic.
Is there a better architecture to handle this kind of a project? I am a beginner when it comes to advanced data management. Can anyone suggest a better architecture or tools to handle this?
There are a lot of patterns that would actually meet the needs described here. It serves that you are working with a type of Data Warehouse. I use Data Vault for my Enterprise Data Warehouses. It is an Ensemble Modeling technique designed for integration and master data preparation. You can think of it as a way to house all data from all time. You would then generate Data Marts (Kimball Method) for each of the Facilities containing only thei or whatever is required for their needs.

Represent Oracle sql Cube with Microstrategy

Hi i have a serveral cube tables on oracle 12c database. How respresent its with Microstrategy? The Object Intelligent Cube the Microstrategy don't represent correctly this cubes and It save in-memory sqls. I need execute sql realtime to cube table
A MicroStrategy cube is an in-memory copy of the results of an SQL query executed against your data warehouse. It's not intended to be a representation of the Oracle cubes.
I assume both these "cubes" organize data in a way that is easy and fast to use for dimensional queries, but I don't think you can import directly an Oracle cube into MicroStrategy IServer memory.
I'm not an expert with Oracle Cubes, but I think you need to map dimensions and facts like you would do with any other Oracle table. At the end an Oracle cube is a tool that Oracle provide to organize your data (once dimensions and metrics are defined) and speed up your query, but you still need to query it: MicroStrategy will write your queries, but also MicroStrategy needs to be aware of your dimensions and metrics (MicroStrategy facts).
At the end the a cube speeds up your queries organizing and aggregating your data, and it seems to me that you have achieved this already with your Oracle cube. A MicroStrategy cube is an in-memory structure that saves also the time required by a query against the database.
If your requirements are that you execute SQL against your database at all times, then you need to disable caching on the MicroStrategy side (this can be done on a report-by-report basis, or at a project level).
MicroStrategy Intelligent Cubes aren't going to be a good fit for you here, because they explicitly cache data, in order to decrease response time, and reduce load on your source database.

Expressing data transformation

Two different relational databases.
Your task is to write a code to transfer the data from the first database to the second database.
Some tables in the database you are transferring to are of the same structure as the table you are transferring from, the transfer of these tables is as simple as "INSERT INTO DbA.TableA (...) VALUES SELECT * FROM DbB.TableB".
Some tables in the database you are transferring to have different structures and different purposes. After proper analysis, you understand the relations and you understand the right transformation you need to code.
My question is: how do you express such knowledge? How do you express the transformational relations between two databases? Are there any tools or diagrams?
The best way I know right now is writting the list of tables of the first database and for each table describing how it is to be transformed into the second database. Is it possible to make this more formal/concise/cool?
If you are wanting a toolset and work in the Microsoft database stack then this is exactly what SQL Server Integration Services (or SSIS) is used for.
If you are wanting to document the process then you would typically write an interface definition document (IDD). There are many examples on Google but here is something to get you started.

Column store in data warehouse

I have a question about data warehousing and column oriented databases. In my project the company use a warehouse solution in visual studio SQL server, they have troubles with the performance when querying complex questions on large amount of data. I want to try to replace the database with a columnar based database. I know that you can "transform" a row oriented database in to more column based or use an open source database such as Vertica or Sybase IQ, i just wondering how it would fit in the warehouse? Do you have to have a star join schema in a warehouse or can you use the columnar approach instead, i realize this is kind of a stupid question but im just trying to understand it all before i start to explore the different databases and solutions.
I know that SQL Server 2012 have a column store but i would like to try the other open source databases as well.
Thanks in advance!
Do you have to have a star join schema in a warehouse or can you use the columnar approach instead?
The star join schema consists of the table definitions of your data warehouse. The star schema, and similar schema, trade query performance for query flexibility. Usually, query flexibility is more important than query performance in a data warehouse.
Based on the Wikipedia article you linked to in your comments, a column oriented database engine stores the actual database bytes in column order, rather than the traditional row order of relational databases.
As the article says, this can improve disk access performance.
The star schema is how you define tables. A column oriented database engine is concerned with how the database information is written to disk. The two concepts have nothing to do with one another, except that they both apply to a data warehouse.
Keep your present data warehouse schema, and see if a column oriented database engine will improve query performance.

Resources