Column store in data warehouse - sql-server

I have a question about data warehousing and column oriented databases. In my project the company use a warehouse solution in visual studio SQL server, they have troubles with the performance when querying complex questions on large amount of data. I want to try to replace the database with a columnar based database. I know that you can "transform" a row oriented database in to more column based or use an open source database such as Vertica or Sybase IQ, i just wondering how it would fit in the warehouse? Do you have to have a star join schema in a warehouse or can you use the columnar approach instead, i realize this is kind of a stupid question but im just trying to understand it all before i start to explore the different databases and solutions.
I know that SQL Server 2012 have a column store but i would like to try the other open source databases as well.
Thanks in advance!

Do you have to have a star join schema in a warehouse or can you use the columnar approach instead?
The star join schema consists of the table definitions of your data warehouse. The star schema, and similar schema, trade query performance for query flexibility. Usually, query flexibility is more important than query performance in a data warehouse.
Based on the Wikipedia article you linked to in your comments, a column oriented database engine stores the actual database bytes in column order, rather than the traditional row order of relational databases.
As the article says, this can improve disk access performance.
The star schema is how you define tables. A column oriented database engine is concerned with how the database information is written to disk. The two concepts have nothing to do with one another, except that they both apply to a data warehouse.
Keep your present data warehouse schema, and see if a column oriented database engine will improve query performance.

Related

When to Create DB and When to Create Schema

This seems a design question but I wanted to know if there is a pattern or design consideration we need to have where we would want to create a Database and not a new schema.
why not create one big database and separate schemas. Under what circumstance should we create a new database.
They are just logical divisions, so for the most part it's a matter of preference. There is one place where it's not a matter of preference: replication.
As of September, 2022, the unit of replication is the database. It's possible to specify which databases you want to replicate, but not which schemas within a database to replicate.
If you plan to replicate, you'll want to think about keeping only the schemas/tables that are important to replicate in one or more databases that get replicated and keep other data in databases that do not get replicated.
Another thought could be, In a large DWH Enterprise Solution,
There can be variety of flavours of tables which You can map to different databases. Sales DB, Master DB, Finance DB for ex. Then Inside DBs, You may want to have schemas for tables, views ,procedures and other object .

SQL Performance when using different tables or different databases

Currently, I have a SQL database with a lot of customers.
I have a table store data of all customers. Now I want to split data, each table will store data of each customer. Or each database will store data of each customer.
I'm confused about performance of SQL. Which way is best solution? Store on each table at the same database or store on each database?
As explained in comments both ways (that you describe) is not a good idea. If you want to increase performance of database then correct use indexes. Also you may look at partitioned table:
https://learn.microsoft.com/ru-ru/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver15
https://www.cathrinewilhelmsen.net/2015/04/12/table-partitioning-in-sql-server/

How is querying a data warehouse different than querying a database?

Say I have a data warehouse like BigQuery, RedShift. I store data which is fit for online analytical processing (OLAP). Similarly suppose I have a database like MySQL or Microsoft SQL Server which has some data fit for online transaction processing(OTLP).
What are the different parameters on which querying a data warehouse and a database would be different?
This is a very general question nevertheless I think the following can help you make your desicion:
1. How much data you have Vs relational features
2. Cloud solution Vs on premesies
3. Payment models (derived from 2) for example bq model is per scan while other is per storage

How to add meta data to every cell in all tables of a relational database?

I have a relational database (I am using SQL Server 2008) with scores of tables. I need to capture a lot of meta data for each cell (not just the row) in every table. Thankfully, the metadata schema is expected to be consistent across all tables.
Further, the metadata should be queryable as well. I did not some across any such direct support built in.
What is the best possible approach?
You may want to look into using SQL Server's extended properties.

Is a single table a bad starting point for OLAP cubes (SQL Server Analysis Services)?

I'm going to use a single table to aggregate historical data about our (very big) virtual infrastructure. The table will be composed of 15 to 30 fields, and I esitmate from 500 to 1000 records a day.
Why a single table? A couple of reasons:
Data is extracted to csv using powershell scripts. Then bulk load on a single table is very easy and fast.
I will use the table to connect excel and report through pivot tables. Then a single table is perfect (otherwise I should create views).
Now my question:
If I'm planning in the future to build cubes upon this table is the "single-table" choice a bad solution?
Do cubes rely on relational databases or they can be easily built upon single-table databases?
Thanks for any suggestion
Can't tell you specifically about SQL Server Analysis Services, but for OLAP you typically use denormalized and aggregated data. That means fewer tables than in a normal relational scenario. And as your data volume is not really big (365k rows/year - even small for OLAP), I don't see any problem using a single table for your data.

Resources