Is there a way to show groups of column data for multiple rows of data? - pivot-table

I am not super good with Pivot tables, so not sure if this is even possible.
I have a spreadsheet of data that I need to show aggregated by county, with 5 different things needing to be shown in the columns.
I currently have all of the counties in the rows. I added my first column (which is race), and that is fine. I have the count for each of the races for each county working out cool.
HOWEVER, if I try to pull the next item into my columns field, it then puts that data UNDER the race that is already there --- I want to see all the columns straight across and lumped --- so all of the race columns, then all of the career levels, then the next group and so on.
Is this possible, and if so, what am I missing here?
I've tried googling and tried a few different things, but never gives me what I'm looking for.
Current

Related

Data Modeling: Is it bad practice to store IDs from various sources in the same column?

I am attempting to merge data from various sources into an existing data model. Each source uses different types of IDs (such as GUID, Salesforce IDs, etc.). For example, if I were to merge data from two different sources, the table may look like the following (where the first two SalesPersonIDs are GUID IDs and the second two are Salesforce IDs):
Is this a bad practice? I could also imagine a table where each ID type was its own column and could be left blank if it was not applicable. Something like the following:
I apologize, I am a bit new to this. Thanks in advance for any insight, I greatly appreciate it!
The big roles of an ID column are to act as a key connecting data in different tables, and to help indexing - quickly find rows so your queries run fast.
The second solution wouldn't work well for these purposes, and will lead to big headaches in queries: every time you want to group by the ID, you'll have to combine the info from 2 columns in some way, hopefully getting a correct unique result every time.
On the one hand, all you might ever need from an ID is for it to be unique. The first solution might be fine this respect - but are you sure you'll never, ever get data about one SalesPerson from more than one source?
I'd suggest keeping all the IDs in one column, and adding a column to say what kind of ID this is. At least this way, you won't lose any information and can do other things in the future.
One thing you might consider is making a separate table of SalesPerson with all their possible IDs, and have this keyed to other (Sales?) data by a unique ID used only in your database.

SQL Database How far to go to prevent Duplication of Records?

I've been reading up on proper DB creation techniques, and I've got a large project that will have numerous items in it. I have already mapped out the tables and the various lookup tables, but then I realized that in the user table, I have first_name, middle_name, last_name columns that could be changed to first_name_id, middle_name_id and last_name_id that act as lookups to a first name table, middle name table and last name table that hold only unique names to prevent duplication of data.
The question I have is how far do I go with this process? Does every single item that could potentially be duplicate information need to be done like this? At some point this seems like it will get confusing to ME to keep track of all the relationships and cascading updates/deletes, etc on everything...
Just looking for some advice as I want to make sure this is done properly as setting a proper foundation is very important to build and scale on top of in the future, but at the same time don't want to take this to extremes if it is not needed.

Modeling Fact Tables that have direct relationships, but at a detail and not a dimension layer

This is very similar to my issue.
http://forum.kimballgroup.com/t2534-modeling-fact-tables-that-have-direct-relationships-but-at-a-detail-and-not-a-dimension-layer
I’ve got a fact table for POs, Supplier Invoices, Payments, Receipts, etc. They have some dimensions in common, others not. Problem is, for example, say if they are looking at invoices by their gl account, (using an excel pivot table connected to the cube) then they expect to be able drop in a column for the PO number, the buyer of the PO, etc. Even though the buyer dimension is only related to the PO, and the account dimension is only related to the invoice. But they say, well the PO is related to the invoice, so you should be able to pull it in.
I do have a PO Ref field on the invoice fact table, but it is only filled out 50% of the time. Even when it is, you could have a one to many relationship in either way between a PO and an invoice, as far as I understand it at least.
Anyway, they expect to be able to throw in any measure from any measure group, and every single possible dimension to work, and then be able to drill down to the detail to see the POs, Invoices, Payments and Receipts and how they match up. Best practice is to keep the fact tables separate if they are different grains according to Kimball, but then all the business problems aren't solved this way.
The only solutions I can come up with are:
to either tack on a bunch of detail related columns to the degenerate dimensions when I load them. i.e. add PO to invoice and invoice to PO etc., but have it as a comma separated list in that column when it is many to one.
Create every possible relationship with every fact and dimension table. This would be a lot of work though, and some still may not have a relationship to certain dimensions.
Create a monstrous fact table with all the current ones joined together, and somehow figure out logic to only display the measure values once for the many to one joins.
This is probably a bad idea, but thought maybe somehow I could create a relationship between every measure group and the corresponding degenerate dimensions reference field. Like create a relationship between the supplier invoice degenerate dimension PO Ref field and the purchase order line measure group PO field.
Lower their expectations, lol.
Here's a screen shot of the dimension usage tab to give an idea of what it looks like currently.
I tried option 3 once. The performance was terrible. The output was misleading. Never ever again.
Your best bet is to work with the business. Where the data is not readily available (invoice without PO, for example) agree what should be done. You could show a default value (PO not recorded on invoice). You could agree on a logic, implemented in the ETL, that extracts the most likely PO.
Whatever approach you choose you must discuss it. If you do not the business will make decisions based on false assumptions. The business will find itself looking at reporting it does not understand. You must help your users to avoid these outcomes.
Once the approach has been agreed, document it. When queries arise, share the documentation. Make sure the documentation highlights all calculations, difficulties and missing source data.
Work with the teams that generate your source date. If an important field is sparsely populated arrange a meeting. See if the capture processes can be improved. Let your users know that you are investigating this area. Keep them informed of the outcome. If the source data cannot be improved (invoices continue to be raised without a PO), inform your users of the reasons for this.
Managing your customers can be challenging. Especially those who hold senior positions in the company. Transparency and solid documentation will help you.

PostgreSql correct number of field

I would like your opinion. I have a table with 120 VARCHAR fields where I will have to hire about 1,000 records per month for at least 10 years, with a total number of 240,000 records.
I could divide the fields into multiple tables but I'd rather keep it that way. Do you think I will have problems in the future?
Thank you
Well, if the data of the columns is following a certain logic, keep it flat. Which means that I would let it that way. Otherwise separate it into multiple tables. I depents on your data.
I worked once worked with medical data where one table contained over 100 columns, but all these columns where needed to get a diagnostic result. I don't remember, what exactly it was, because I worked with that data set some years ago. But in that case it would make it more complicated, if the columns would be separated into multiple columns. Logically the data of each column served a certain purpose so it was easier to have them all in the same place (the table).
If you put the columns all together just to be lazy, so that you have to call the table once, I would recommend to separate the columns into different tables to make it more comfortable to work with, and to make the database schema more understandable.

Store multiple values in one database field in Access (hear me out)

So I've done extensive searching on this and I can't seem to find a good solution that actually applies to my situation.
I have a list of projects in a table, then a list of people. I want to assign multiple people to one project. Seems pretty common. Obviously, I can't make multiple columns on my projects table for each person, as the people will change fairly frequently.
I need to display this information very quickly in a continuous list of projects (the ultimate way would be a multiple-select combobox as a listbox is too tall, but they don't exist outside of the dreaded lookup fields)
I can think of two ways:
- Store multiple employee IDs delimited by commas in one field in my projects table (I know this goes against good database design). Would require some code to store and retrieve the data.
- Have a separate table for employees assigned to projects (ID, ProjectID, EmployeeID). One to many relationship between projects table and this new table. One to many relationship between employees table and this new table. If a project has 3 employees assigned, it would store 3 records in this table. It seems a bit odd joining both tables in this way, and would also require code to get it to store and retrieve into a control like the one mentioned above).
Does anyone know if there is a better way (including displaying in an easy control) or how you usually tackle this problem?
The usual way to tackle this problem would be with a Junction Table. This is what you describe where you have a separate table maybe called EmployeeProject which has an EmployeeProjectID(PK), EmployeeID(FK) and ProjectID(FK).
In this way you model a Many-to-Many relationship where each project can have many employees involved and each employee can be involved in many projects. It's not actually all that difficult to do the SQL etc. required to pull the information back together again for display.
I would definitely stay away from storing comma-delimited values as this becomes significantly more complicated when you want to display or manipulate the data.
There's a good guide here: http://en.tekstenuitleg.net/articles/software/create-a-many-to-many-relationship-in-access but if you google "many to many junction table" or similar, there are thousands of pages/articles about implementation.

Resources