I am working on a PowerBI report that consists of multiple dashboards. The data needed is from a single table with 100K rows of data in DWH . The table stores all the variables and values for different stores, as shown in the picture below.
Currently, we are creating new table in data mart for each separate dashboard, such as total profit in each country, total number of staff in each country etc. However, I realize I can do the same using Power Query without adding new tables for my data mart. So I am curious which approach is better?
And this leads to another question I always have, when we need a tranformed table for dashboard, shoud we create new tables in data mart, or should we do it in the BI tool such as PBI or Tableau? I think performance is a factor to be considered, but not sure about the other factors.
Appreciate if anyone can share your opinion.
Appreciate if anyone can share your opinion.
Given the amount of transformation that needs to occur, it would be worth doing this in the DWH. Power BI does well with a star schema, so it would be good to break out dimensions like country, store and date into their own tables.
You might also work the measures into a single fact table - or maybe two if some of the facts are transactional and others are semi-additive snapshot facts. i.e. profit vs. number of staff. Designed right, the model could support all of the dashboards, so you would not need a report table for each.
In our SQL Server DB, we have about some 800+ tables and there are 40 - 50 tables are business critical tables. MIS team needs to generate reports based on those 50 business tables.
Those 50 tables gets updated frequently. MIS team requires those delta records (update/inserted/deleted)
What would be the best solution?
We have few approches here
1.Always On 2.Replication 3.Mirroring 4.Introducing new column (LastModifiedDate & creating index) in those 50 tables and pulling those records periodically and populating it to MIS environment.
There will be huge code change for the new column LastModifiedDate approach.
Based on those 50 tables, we have huge number of stored procedures which it has Insert/Update
statements. In those stored procedures, we need to do code change for LastModifiedDate.
What would be the best solution from the above approches?
Pls let us know if any other approach to do. Note: We are using SQL Server 2008 R2
Regards Karthik
One approach is to have insert, update and delete triggers on these tables, and for each table an archive table with exactly the same columns plus e.g. username, modifieddatetime and a bit to indicate new and old. Then the triggers simply insert into archive select from inserted/deleted + current user, current time and 1 for inserted and 0 for deleted.
Then all your MIS need to concern themselves with is the archive tables, and you will not need to make a structure change to the existing tables.
Apologies in advance if this is hard to follow, but my question is more conceptual than technical, for those of you out there who have some experience designing this kind of thing.
I'm trying to decide the best way to structure my feeder tables and queries to connect metrics with their measures and objectives.
My metrics database has two primary feeder tables:
tblDissem (with Organization/Project/SubProject columns being
relevant)
tblVolume (With Organization/Project/SubProject/Analyzed/Shared
columns being relevant)
There are approximately 50 measures organized into 10 objectives in tblMeasures, with foreign keys for OrganizationID, ProjectID, SubProjectID for each measure. The 10 objectives in themselves aren't distinct enough where I can simply create a query for each (it'd have to be a union query after the fact).
The metrics for each measure are based on one of the following:
Count of tblDissem, organized by
Organization/Project/SubProject/Fiscal Quarter
Sum of tblVolume.Analyzed, organized by
Organization/Project/SubProject/Fiscal Quarter
Sum of tblVolume.Shared, organized by
Organization/Project/SubProject/Fiscal Quarter
As it is now, I have select queries set up for each of the two feeder tables converting dates to fiscal quarters, and crosstab queries for each of the above broken out by quarter for each organization/project/subproject.
The challenge is in getting these organized by objective. I figured I could create a query for each group of measures, then use union queries to organize each into their proper objective, OR I add a field to tblMeasures that lists the measure as either a DissemCount, AnalyzedSum, or SharedSum measure, which I'd somehow build into another query to automatically group the query results. Or, maybe use a lookup field in a way I haven't considered yet.
I'm open to any ideas, and apologies for being so abstract. Thanks in advance...I'm just not an expert when it comes to the how of relating information.
My new employer runs a standard ecommerce website with about 14k products. They create "microsites" that a person can log into and see custom pricing and limited products (hide some/show some products) and custom categories.
The current system is just has a huge relational database. So there is a products table, a sites table and then sites_products has the following columns
site_id
product_id
product_price
If the microsite is suppose to show a product it simply stores it in this table. This table is currently # 2 million rows and growing. The custom categories is a similar relational table setup but the numbers are much lower so I am not as worried about that.
I would appreciate any help/ideas you could provide to decrease this table size. I am confident that in the next couple years it will be at 20 million at this rate.
-Justin
In a recent project I have seen a tables from 50 to 126 columns.
Should a table hold less columns per table or is it better to separate them out into a new table and use relationships? What are the pros and cons?
Generally it's better to design your tables first to model the data requirements and to satisfy rules of normalization. Then worry about optimizations like how many pages it takes to store a row, etc.
I agree with other posters here that the large number of columns is a potential red flag that your table is not properly normalized. But it might be fine in this case. We can't tell from your description.
In any case, splitting the table up just because the large number of columns makes you uneasy is not the right remedy. Is this really causing any defects or performance bottleneck? You need to measure to be sure, not suppose.
A good rule of thumb that I've found is simply whether or not a table is growing rows as a project continues,
For instance:
On a project I'm working on, the original designers decided to include site permissions as columns in the user table.
So now, we are constantly adding more columns as new features are implemented on the site. obviously this is not optimal. A better solution would be to have a table containing permissions and a join table between users and permissions to assign them.
However, for other more archival information, or tables that simply don't have to grow or need to be cached/minimize pages/can be filtered effectively, having a large table doesn't hurt too much as long as it doesn't hamper maintenance of the project.
At least that is my opinion.
Usually excess columns points to improper normalization, but it is hard to judge without having some more details about your requirements.
I can picture times when it might be necessary to have this many, or more columns. Examples would be if you had to denormalize and cache data - or for a type of row with many attributes. I think the keys are to avoid select * and make sure you are indexing the right columns and composites.
If you had an object detailing the data in the database, would you have a single object with 120 fields, or would you be looking through the data to extract data that is logically distinguishable? You can inline Address data with Customer data, but it makes sense to remove it and put it into an Addresses table, even if it keeps a 1:1 mapping with the Person.
Down the line you might need to have a record of their previous address, and by splitting it out you've removed one major problem refactoring your system.
Are any of the fields duplicated over multiple rows? I.e., are the customer's details replicated, one per invoice? In which case there should be one customer entry in the Customers table, and n entries in the Invoices table.
One place where you need to not fix broken normalisation is where you have a facts table (for auditing, etc) where the purpose is to aggregate data to run analyses on. These tables are usually populated from the properly normalised tables however (overnight for example).
It sounds like you have potential normalization issues.
If you really want to, you can create a new table for each of those columns (a little extreme) or group of related columns, and join it on the ID of each record.
It could certainly affect performance if people are running around with a lot of "Select * from GiantTableWithManyColumns"...
Here are the official statistics for SQL Server 2005
http://msdn.microsoft.com/en-us/library/ms143432.aspx
Keep in mind these are the maximums, and are not necessarily the best for usability.
Think about splitting the 126 columns into sections.
For instance, if it is some sort of "person" table
you could have
Person
ID, AddressNum, AddressSt, AptNo, Province, Country, PostalCode, Telephone, CellPhone, Fax
But you could separate that into
Person
ID, AddressID, PhoneID
Address
ID, AddressNum, AddressSt, AptNo, Province, Country, PostalCode
Phone
ID, Telephone, Cellphone, fax
In the second one, you could also save yourself from data replication by having all the people with the same address have the same addressId instead of copying the same text over and over.
The UserData table in SharePoint has 201 fields but is designed for a special purpose.
Normal tables should not be this wide in my opinion.
You could probably normalize some more. And read some posts on the web about table optimization.
It is hard to say without knowing a little bit more.
Well, I don't know how many columns are possible in sql but one thing for which I am very sure is that when you design table, each table is an entity means that each table should contain information either about a person, a place, an event or an object. So till in my life I don't know that a thing may have that much data/information.
Second thing that you should notice is that that there is a method called normalization which is basically used to divide data/information into sub section so that one can easily maintain database. I think this will clear your idea.
I'm in a similar position. Yes, there truly is a situation where a normalized table has, like in my case, about 90, columns: a work flow application that tracks many states that a case can have in addition to variable attributes to each state. So as each case (represented by the record) progresses, eventually all columns are filled in for that case. Now in my situation there are 3 logical groupings (15 cols + 10 cols + 65 cols). So do I keep it in one table (index is CaseID), or do I split into 3 tables connected by one-to-one relationship?
Columns in a table1 (merge publication)
246
Columns in a table2 (SQL Server snapshot or transactional publication)
1,000
Columns in a table2 (Oracle snapshot or transactional publication)
995
in a table, we can have maximum 246 column
http://msdn.microsoft.com/en-us/library/ms143432.aspx
A table should have as few columns as possible.....
in SQL server tables are stored on pages, 8 pages is an extent
in SQL server a page can hold about 8060 bytes, the more data you can fit on a page the less IOs you have to make to return the data
You probably want to normalize (AKA vertical partitioning) your database