I need an advice for creating my DW. I have some experience in creating DW (using Pentaho as BI server), I made an array of scheduled database queries which creates 4 dimension tables and 1 fact table (for sales reports).
Now, there is a need for spreading out DW (for import, warehouse, logistic reports), so my question is: Do I have to make more fact tables for each department (and dimension tables for those), or there is some other structuring model?
Of course, this time it will be done using ETL tool, but need general advice.
Thanks,
Stevan
A fact table represents a process or events that you want to analyze.
If there is not a new process or event that you want to analyze, then you do not need a new fact table.
Perhaps you could give us some more details about what you are analyzing...
Related
I have to connect multiple tables that are part of single or multiple databases. Approximately 10-15 tables in each query have to be connected to generate data for the analysis in SQL Server 2014.
I don't have access to the database diagram or architecture and these reports are to be sent out weekly. I want to understand the approach on how to begin writing these kind of queries which are of basic and advanced level and identify the relationship between tables and what kind of advanced level queries I can learn or utilize like CTE, Rank Partition, Subqueries etc.
Anybody who can provide a rough flow diagram or structure about the approach will be really helpful.
It's very unlikely that owners of those source systems want to be directly queried every time someone runs a report. Since you already have access to SQL Server, I would suggest building a data warehouse with that.
You haven't provided a whole lot of information to go on, but SSIS packages could be created to connect to the source systems and load into your data warehouse. And furthermore, those packages can be scheduled through Agent.
As for modeling... Again it is difficult with the lack of information, but generally the star model works great for reporting, which is a fact table surrounded by dimension (or attribute) tables.
As for figuring out relationships without a diagram, this will have to be done via experimentation and tieing to existing reports to make sure your joins aren't dropping records or cascading.
Good luck.
I need to somehow copy data from one database to another. I guess best will be SqlBulkCopy.
Problem is that, my table Product have 5 more tables connected with ID of product.
So product is connected with Product_picture (pictureID, productID), for example.
Is there any way to copy Product and all other connected tables with ID (or Identity) from one database to another?
Invoke bulk copy multiple times. Nothing of this is automated in the product, so you have to build this on your own.
Look into table valued parameters. They are much more flexible and offer reasonable performance for bulk jobs. It is probably fast enough.
I'm going to use a single table to aggregate historical data about our (very big) virtual infrastructure. The table will be composed of 15 to 30 fields, and I esitmate from 500 to 1000 records a day.
Why a single table? A couple of reasons:
Data is extracted to csv using powershell scripts. Then bulk load on a single table is very easy and fast.
I will use the table to connect excel and report through pivot tables. Then a single table is perfect (otherwise I should create views).
Now my question:
If I'm planning in the future to build cubes upon this table is the "single-table" choice a bad solution?
Do cubes rely on relational databases or they can be easily built upon single-table databases?
Thanks for any suggestion
Can't tell you specifically about SQL Server Analysis Services, but for OLAP you typically use denormalized and aggregated data. That means fewer tables than in a normal relational scenario. And as your data volume is not really big (365k rows/year - even small for OLAP), I don't see any problem using a single table for your data.
I'm working on a web-based business application where each customer will need to have their own data (think basecamphq.com type model) For scalability and ease-of-upgrades, I'd prefer to have a single database where each customer gets a filtered version of the data. The problem is how to guarantee that they stay sandboxed to their own data. Trying to enforce it in code seems like a disaster waiting to happen. I know Oracle has a way to append a where clause to every query based on a login id, but does Postgresql have anything similar?
If not, is there a different design pattern I could use (like creating a view of each table for each customer that filters)?
Worse case scenario, what is the performance/memory overhead of having 1000 100M databases vs having a single 1Tb database? I will need to provide backup/restore functionality on a per-customer basis which is dead-simple on a single database but quite a bit trickier if they are sharing the database with other customers.
You might want to look into adding Veil to your PostgreSQL installation.
Schemas plus inherited tables might work for this, create your master table then inherit tables into per-customer schemas which provide a company ID or name field default.
Set the permissions per schema for each customer and set the schema search path per user. Use the same table names in each schema so that the queries remain the same.
I was thinking of putting staging tables and stored procedures that update those tables into their own schema. Such that when importing data from SomeTable to the datawarehouse, I would run a Initial.StageSomeTable procedure which would insert the data into the Initial.SomeTable table. This way all the procs and tables dealing with the Initial staging are grouped together. Then I'd have a Validation schema for that stage of the ETL, etc.
This seems cleaner than trying to uniquely name all these very similar tables, since each table will have multiple instances of itself throughout the staging process.
Question: Is using a user schema to group tables/procs/views together an appropriate use of user schemas in MS SQL Server? Or are user schemas supposed to be used for security, such as grouping permissions together for objects?
This is actually a recommended practice. Take a look at the Microsoft Business Intelligence ETL Design Practices from the Project Real. You will find (download doc from the first link) that they use quite a few schemata to group and identify objects in the warehouse.
In addition to dbo and etl, they also use admin, audit, part, olap and a few more.
I think it's appropriate enough, it doesn't really matter, you could use another database if you liked which is actually what we do.
I'm not sure why you would want a validation schema though, what are you going to do there?
Both the reasons you list (purpose/intent, security) are valid reasons to use schemas. Once you start using them, you should always specify schema when referencing an object (although I'm lazy and never specify dbo).
One trick we use is to have the same-named table in each of several schemas, combined with table partitioning (available in SQL 2005 and up). Load the data in first schema, then when it's validated "swap" the partition into dbo--after swapping the dbo partition into a "dumpster" schema copy of the table. Net Production downtime is measured in seconds, and it's all carefully wrapped in a declared transaction.