Excel to Database. Help on table design - database

This is my excel spreadsheet and I am trying to convert it to a proper database. Dry and Sensors are products/goods. I like to build a database that is flexible to add more products. Also their attributes like 'type' and 'kV' can hold multiple values.
I am not experience with designing databases. My current design is
table_company
company_id
company_name
etch...
table_dry_ctd_type
company_id
record_id
dry_ctd_type
table_dry_vtd_kV
company_id
record_id
dry_vtd_kV
I ended up with 13 tables.
business rule (if needed):
-company can have 0 to many products
-products can have 0 to many sub categories.
-product type and KV can hold multiple values.
-Type is not required to have kV

Based on the information you gave I'd create -sort of- the following 3 tables:
company data
product types (fields: [uniqueID]; ptID; pName; pSubName; subPType) so data is:
1, 1, "DRY", NULL, NULL (pSubName NULL == Product Type "definition" field)
2, 1, "DRY", "CTD", ??? --whatever 'type' field's value is
3, 1, "DRY", "VTD", ???,
...
6, 2, "Sensors", "VCS", ???
If you need to get data several times based on product sub-type, also add a ptSubID -numeric- field and get data based on that
data table storing companyID, productTypeUniqueID, KV and e.g. amount or other "value data" you may or may not have
If added productTypeUniqueID to product type table, add that here too!
Adding new company, product type or product sub-type is obvious. Getting data would look something like:
SELECT CMP.compName AS 'Company Name',PRD.pname AS 'Product',PRD.pSubName AS 'subProduct',PRD.pType AS 'Type',DTA.KV AS 'KV', DTA.amount
FROM data_table AS DTA
LEFT OUTER JOIN company_table AS CMP ON CMP.compID=DTA.compID
LEFT OUTER JOIN product_table AS PRD ON (PRD.ptID=DTA.ptID [AND PRD.ptSubID=DTA.ptSubID])
WHERE PRD.ptID=1 --all "DRY" products
/** or: **/ WHERE (PRD.ptID=1 AND PRD.ptSubID<>2) AND CMP.compID=8
But it may vary based on:
Most common data retrieval types will be used
Details you didn't share but important search keywords (e.g. sub-products' dimensions)
Frequency of product updates/changes, product type/sub-type updates/changes
Typical changes may come -based on business experience-, such as "we don't use product sub-types anymore"
Hope this helps!

Related

How to subtract a value in one table from a value in a different table based on common ID in calculation field

I'm trying to subtract a value in one field from a value in another table, depending on ID number using Filemaker Pro 19.x. I thought I'd done this before without any problems but now it isn't working and I don't know why.
First, I want to do this in a calculation field, not in a script. I don't understand scripting at all and use it only when there is no alternative. In this case, I think a calculation field should work.
I have 2 tables, I'll call them "Data" and "Categories"
Both tables have the field "CID" "Category ID".
The CID fields in both tables are linked in the Relationship Editor
The Data table has a field "Product ID"
The Categories table has several fields related to products. Two of those are "MIN PID" and "MAX PID". These are the minimum and maximum product ID numbers.
Product IDs are assigned in ranges. If it is within a certain range, it has to belong to a certain category. These categories have CID numbers.
To assign the CID number to the products listed on the Data table, I did run a script that essentially recreated all the data within the Categories table. It was inefficient (in my eyes) because the data was sitting right there in the table. I couldn't figure out how to reference it, so I gave up and ran the script instead. The other problem is that if the CID ever changes for a product, I have to rerun the script (or someone else, who might not know about the script)
That said, I now have the correct CID assigned for all 62 product categories. What I want to do now is to use the CID MIN and CID MAX values (among others) in calculation fields in the Data table.
For instance, if the product ID is "45,001", it belongs to the Category "16", which has a MIN value of "30,000" and a MAX value of "50,000". I want to subtract the "30,000" from the "45,001", (PID - CID MIN) to return the result "15,001". This allows me to locate a product within the category. Instead of being "Product 45,001", it will be "Product 16.15001".
The calculation I tried is:
If (CID = CID::CID ; PID - CID::CID MIN)
The result is a huge group of records where this field is empty.
I also tried:
PID - CID::CID MIN
Again, no results.
Both tables have the field "CID"
The two CID fields are linked in the relationship editor.
I have tried this with a test file and it works perfectly. It does not work in the database I am working within.
What am I doing wrong?

Simple database design - some columns have multiple values

Caveat: very new to database design/modeling, so bear with me :)
I'm trying to design a simple database that stores information about images in an archive. Along with file_name (which is one distinct string), I have fields like genre and starring where each field might contains multiple strings (if an image is associated with multiple genres, and/or if an image has multiple actors in it).
Right now the database is just a single table keyed on file_name, and the fields like starring and genre just have multiple comma-separated values stored. I can query it fine by using wildcards and like and in operators, but I'm wondering if there's a more elegant way to break out the data such that it is easier to use/query. For instance, I'd like to be able to find how many unique actors are represented in the archive, but I don't think that's possible with the current model.
I realize this is a pretty elementary question about data modeling, but any guidance anyone can provide or reading you can direct me to would be greatly appreciated!
Thanks!
You need to create extra tables in order to stick with the normalization. In your situation you need 4 extra tables to represent these n->m relations(2 extra would be enough if the relations were 1->n).
Tables:
image(id, file_name)
genre(id, name)
image_genres(image_id, genre_id)
stars(id, name, ...)
image_stars(image_id, star_id)
And some data in tables:
image table
id
file_name
1
/users/home/song/empire.png
2
/users/home/song/promiscuous.png
genre table
id
name
1
pop
2
blues
3
rock
image_genres table
image_id
genre_id
1
2
1
3
2
1
stars table
id
name
1
Jay-Z
2
Alicia Keys
3
Nelly Furtado
4
Timbaland
image_stars table
image_id
star_id
1
1
1
2
2
3
2
4
For unique actor count in database you can simply run the sql query below
SELECT COUNT(name) FROM stars

Strategy to design specification table for measurements data for analytical database project

I need advice on the following topic:
I am developing a DW/BI solution in SQL Server and reports are published in Power BI.
Main part of my question starts here: I have a large table which collects measurement data on product measurement for multiple attributes. Product can be of multiple type, that can be recognised by item number in this table, measurements can be done multiple times and can be identified by measurement date. Usually, we refer latest dates. If it makes things complicated, I can filter data for latest dates only. This is dense row table (multi million). Number of attribute counts about 200.
I want to include specifications for these attributes most likely in a dimension table, and there may be tens of such specifications. Intention is that user shall select in the report any one specification name and he would like to see each product with attributes passing/failing as well as the products passing if all all of specification attributes are passed.
I currently have this measurement table and a dim table with test names, I can add a table for specification if needed. Specification can define few or all test names with lower/upper spec limits:
Sample measurement table:
Sample dim table for test names:
I can add a table for specification as below and user will select any of one:
e.g. Use select ID_spec = 1 then measurement table may look like:
Some spec may contain all and some few attributes.
Please suggest strategy to design a spec table to be efficient for such large size tables. Please let me know if any further details needed.
Later, I will have to further work to calculate % of pass product if they have been tested for all needed tests in a specification selected.
For large tables, the best thing to do is choose the right key. That means dumping the "Id" column (nothing more than a row identifier) and replacing it with something that:
Guarantees uniqueness
Facilitates searches
That often means composite keys, which are fine.
It's also means dumping the whole "fact/dimension" mindset and just focusing on the relations. This is also fine.
Based on your description, this is the first draft of a data model for your warehouse. If you are unfamiliar with IDEF1X diagrams, please read this.
I've added a unique constraint to SpecCd so you could specify the value directly instead of having to check both the ProductId and SpecCd to return a result.
ProductTest exists so you can provide integrity for ProductTestCriteria and ensure tests are limited to only those products that can be measured by them. If all products are subject to all tests, this can be removed and Test can relate directly to ProductMeasurement and ProductTestCriteria.
If you want to subject the latest test of "Product A" to "Spec S" your query would look like:
SELECT
Measurement.ProductId
,Measurement.TestCd
,Measurement.TestDt
,Criteria.SpecCd
,Measurement.Value
,CASE
WHEN Measurement.Value BETWEEN Criteria.LowerValue AND Criteria.UpperValue THEN 'Pass'
ELSE 'Fail'
END AS Result
FROM
ProductMeasurement Measurement
INNER JOIN
ProductTestCriteria Criteria
ON Critera.ProductId = Measurement.ProductId
AND Criteria.TestCd = Measurement.TestCd
WHERE
Measurement.ProductId = 'A'
AND Criteria.SpecCd = 'S'
AND Measurement.TestDt =
(
SELECT
MAX(TestDt)
FROM
ProductMeasurement
WHERE
ProductId = Measurement.ProductId
)
You could remove the filters for ProductId and SpecCd and roll that into a view - users could later specify for the Products and specifications they want later.
If you want result as of a given date, the query is easily modified to this or incorporated into a TVF:
SELECT
Measurement.ProductId
,Measurement.TestCd
,Measurement.TestDt
,Criteria.SpecCd
,Measurement.Value
,CASE
WHEN Measurement.Value BETWEEN Criteria.LowerValue AND Criteria.UpperValue THEN 'Pass'
ELSE 'Fail'
END AS Result
FROM
ProductMeasurement Measurement
INNER JOIN
ProductTestCriteria Criteria
ON Critera.ProductId = Measurement.ProductId
AND Criteria.TestCd = Measurement.TestCd
WHERE
Measurement.ProductId = 'A'
AND Criteria.SpecCd = 'S'
AND Measurement.TestDt =
(
SELECT
MAX(TestDt)
FROM
ProductMeasurement
WHERE
ProductId = Measurement.ProductId
AND TestDt <= <Your Date>
)

Database Structure and Querying

Trying to figure out how to change a structure from what I currently have which is this:
tblHaulLogs
intLogID
intHaulType
intSerial
intOriginSource
intOrigin
intDestinationSource
intDestination
dtmHaulDate
ccyLogPay
intHauler
txtLogNotes
intInvoiceID
In this table, what I am doing is using the origin and destination source fields to determine which table the fk for the origin and destination comes from. This feels very wrong to me.
tblHaulTypes
intHaulTypeID
chrHaulType
intOriginSourceType
intDestinationSourceType
Data in the Haul Types Table:
LOT, 1, 1
DEL, 1, 2
RPO, 2, 1
Now let me explain:
The first type happens when an item goes from a sales lot to another sales lot.
The second type happens when an item goes from a sales lot to a customer(sale gets delivered).
The third type happens when an item returns from the customer back to the sales lot.
Then the Item can be resold/returned/resold/returned(rent-to-own system).
Now, here are the problems I have:
An Haul Log's origin will always be the destination of the last move. Therefore I thought that the origin field is redundant. However, it's the relation between the destination of the last move and the destination of the new move that defines what the shipper gets paid and what type of haul it is.
In other words, even though the first type and the third type technically have the same fields, the type of move is not the same because of the previous move type. What do I need to do here? Am I totally missing the boat on what the structure should be?
The questions I need to answer based on this data is:
How many Items do I have on my sales lots that are new inventory(have never been sold).
How many Items do I have that have been sold and returned(doesn't matter how many times).
I'm guessing at the relationship between the various fields and tables.
Your tblHaulTypes table looks fine.
intHaulTypeID
chrHaulType
intOriginSourceType
intDestinationSourceType
You're missing a haul type that accounts for deliveries from suppliers to your lots.
There has to be some table that lists your lots. I'd call it tblHaulLot.
intLotNumber
txtLotName
...
I'd make a tblHaulTransaction table that looks like this.
intTransactionID
intHaulTypeID
intHauler
intOriginOrganizationID
intDestinationOrganizationID
intOriginLot (null if origin is supplier)
intDestinationLot (null if destination is customer)
dtmHaulDate
txtLogNotes
Now, we need an tblOrganization.
intOrganizationID
txtOrganizationName
txtOrganizationAddress
...
The organization at ID 0 is your organization. Suppliers and customers would fill the rest of the table.
I'd make a tblHaulInvoice table that looks like this.
intInvoiceID
intTransactionID
ccyTransactionPay
dtmDateInvoiced
AmountInvoiced
The amount invoiced (and amount paid) have to be accounted for in some table. I don't know what ccy stands for, and I don't know your 3 letter code for a decimal (money) field.
How many Items do I have on my sales lots that are new inventory(have never been sold). How many Items do I have that have been sold and returned(doesn't matter how many times).
Nowhere in your data model is there any kind of inventory table. I'd need to know a lot more about your business to create one or more inventory tables.

Designing a multilevel company structure

I try to build a database model for the following structure:
I have companies with up to 3 hierachical levels. For each unit I have a value (these values are given randomly and duplicates between companies (not within) are possible. Let us say (1 Level: 222-Amazon, 2 Level: 441-Amazon: Germany, 542-Britan, 3 Level: 6-Distribution, 99-Shop, 124-Programming, 5-HR.
Of course for each company this is different. What I did is:
Table1:
ID_Worker
CompanyName
ID_CompanyLvL1
ID_CompanyLvL2
ID_CompanyLvL3
...
Table2:
ID_CompanyLevel1
Slot1
Slot2
...
Table3:
ID_CompanyLevel2
Slot1
Slot2
...
But with this approach I have the following problem: If two companies have the same number for a CompanyLevel1(2 or 3) unit I cannot distingush them anymore.
Another approach that is not working is
Table1:
ID_Company
ID_Worker
ID_CompanyLevel1
...
Tabel2:
ID_CompanyLevel1
Slot1
ID_CompanyLevel2
...
Table3:
ID_CompanyLevel2
Slot
ID_CompanyLevel3
...
With this approach I cannot identify which person is in e.g. which level2 unit. Could anyone help me with this i just cannot come up with the right design.
You need to decide whether the organization structure is purely hierarchical (an org unit can only belong to 0 or 1 other org unit), or whether it is graphical (an org unit can belong to 0, 1, or 1+ org units).
Your limit of three is a business rule, and should be enforced by database logic (trigger) and not the database schema.
Why the codes with the names?
If hierarchical, this is your schema:
create table organizations (
organization_id int primary key,
name varchar(whatever) not null,
parent_id int null references organizations(organization_id)
);
Use Recursive Common Table Expressions to query them.
If graphical, this is your schema:
create table organizations (
organization_id int primary key,
name varchar(whatever) not null
);
create table organizations_structure (
parent_organization_id int references organizations(organization_id),
child_organization_id int references organizations(organization_id),
primary key (parent_organization_id, child_organization_id),
check (parent_organization_id <> child_organization_id)
);
For anything like that - make sure you do not put yourself into a cornder. For example:
I have companies with up to 3 hierachical levels
No. YOu do have companies with CURRENTLY up to 3 hierarchical levels. And they do not want to scream at you when one of them decides to have 4.
I would suggest reading the Data Model Ressource Book Volume 1 - they describe all kinds of stuff and standard data schemata, among them entity organizations (entity as in "legal, human or organizatonal entity" which includes organigrams. Things are a lot more complex as you think when you do not want to put yourself into a corner that WILL make the program require a rewrite in the not too far future.

Resources