Need help in a MS SQL DB design - sql-server

Hi I have 2 type of data entry which needs to be stored in db so it can be used for calculations later. Each entry has a unique id for it. The data entry are -
1.
2.
So I have to save this data in DB. With my understanding I thought of the following -
Create 3 tables - Common, Entry1 and Entry2(multiple tables with unique id as name)
The Common table will have a unique entry of each data and which table refer to for the value (Entry1/Entry2).
The Entry1 data is a single line so it can be inserted. But the Entry2 data will require a complete table because of its structure. So whenever we add a type 2 entry then a new table has to be created, which will create a lot of tables.
Or I could save the type2 values in another database and fetch the values from there. So please suggest me a way which is better than this.

I believe that you have 2 entry types with identical structure, but one containing a single row and one containing many.
In this case, I would suggest a single table containing the data for all entries, wtih a second table grouping them together. Even if your input contains a single row, it should still gain an EntryID. Perhaps something like the below:

Related

Transfer of primary keys of dimension table to fact table cannot write values

My problem is understand the relation of primary keys to the fact table.
This is the structure I'm working in, the transfer works but it says the values I set as primary keys cannot be NULL
This is the structure I'm working in, the transfer works but it says the values I set as primary keys cannot be NULL
I'm using SSIS to transfer data from a CSV file to an OLEDB (SQL server 2019 over SSMS)
The actual problem is where/how I can get the values in the same task? I tried to do in in two different tasks but then it is in the table one after another ( this only worked when I allowed nulls for the primary keys and can't be a solution I think.)
Maybe the problem I have three transfer from the source
First dimension table
To second dimension table
To fact table. I think the primary keys are generated when I transfer the data to the DB so I think I can't get it in the same task.
dataflow 1
dataflow 2
input data
output data 5
I added the column salesid to the input to use it for the saleskey. Is there a better solution maybe with the third lookup you've mentioned?
You are attempting to load the fischspezi fact table as well as the product (produkt) and location (standort). The problem is, you don't have the keys from the dimensions.
I assume the "key" columns in your dimension are autogenerated/identity values? If that's the case, then you need to break your single data flow into two data flows. Both will keep the Flat File source and the multicast.
Data Flow Dimensions
This is the existing data flow, minus the path that leads to the Fact table.
Data Flow Fact
This data flow will work to populate the Fact table. Remove the two branches to the dimension tables. What we need to do here, is find the translated key values given our inputs. I assume produkt_ID and steuer_id should have been defined as NOT NULL and unique in the dimensions but the concept here is that we need to be able to use a value that comes in our file, product id 3892, and find the same row in the dimension table which has a key value of 1.
The tool for this, is the Lookup Transformation You're going to want 2-3 of those in your data flow right before the destination. The first one will lookup produktkey based on produkt_ID. The second will find standortkey based on steuer_id.
The third lookup you'd want here (and add back into the dimension load) would lookup the current row in the destination table. If you ran the existing package 10 times, you'd have 10x data (unless you have unique constraints defined). Guessing here, but I assume sale_id is a value in the source data so I'd have a lookup here to ensure I don't double load a row. If sales_id is a generated value, then for consistency, I'd rename the suffix to key to be in line with the rest of your data model.
I also encourage everyone to read Andy Leonard's Stairway to Integration Services series. Levels 3 &4 address using lookups and identifying how to update existing rows, which I assume will be some of the next steps in your journey.
Addressing comments
I would place them just over the fact destination and then join with a union all to fact table
No. There is no need to have either a join or a union all in your fact data flow. Flat File Source (Get our candidate data) -> Data Conversion(s) (Change data types to match the expected)-> Derived Columns (Manipulate the data as needed, add things like insert date, etc) -> Lookups (Translate source values to destination values) -> Destination (Store new data).
Assume Source looks like
produkt_ID
steuer_id
sales_id
umsatz
1234
1357
2468
12
2345
3579
4680
44
After dimension load, you'd have (simplified)
Product
produktkey
produkt_ID
1
1234
2
2345
Location
standortkey
steuer_id
7
1357
9
3579
The goal is to use that original data + lookups to have a set like
produkt_ID
steuer_id
sales_id
umsatz
produktkey
standortkey
1234
1357
2468
12
1
7
2345
3579
4680
44
2
9
The third lookup I propose (skip it for now) is to check whether sales_id exists in the destination. If it does, then you would want to see whether that existing record is the same as what we have in the file. If it's the same, then we do nothing. Otherwise, we likely want to update the existing row because we have new information - someone miskeyed the quantity and instead our sales should 120 and not 12. The update is beyond the scope of this question but it's covered nicely in the Stairway to Integration Services.

SQL equivalent query in Qlikview?

In SQL we can write query like:
Select field1,field2,field3,field4,field5,field6,field7
from table1 t1,table2 t2,table3 t3
where t1.field1 = t3.field3 and
t2.field2 = 'USD'
In Qlikview, I have created QVD's for 6 tables, now I want to create a single QVD of these 6 QVD's. Unfortunately these tables don't contain primary keys. So I cant use join. I have tried like this also:
fact:
load *
from
[D:\path\fact*.qvd](qvd);
//To store all qvd's into one qvd.
store fact into [D:\path\facttable.qvd];
This query creates a facttable but only with 2 columns, these columns are of first fact table. Diagram shows it much clear:
As it internally gives the name of all the facts table with fact, fact-1, fact-2 and so on and I have written the query store fact into [D:\path\facttable.qvd]; and in this diagram fact table contains only two columns so it creates the fact table with two columns only.
Please let me know the solution that how can we write this query in Qlikview or how can we create a fact table using all the QVDS?
Thanks in advance.
Since every qvd contains different field names it will create several tables with synthetic keys when you load *.
You can use Concatenate Load to stack each qvd onto one fact table. One simple example would be to first create a Fact table by:
Fact:
Load * INLINE [
dummyField
];
Now you can concatenate the qvd's onto that Fact table:
concatenate(Fact)
load *
from
[D:\path\fact*.qvd](qvd);
//To store all qvd's into one qvd.
store Fact into [D:\path\facttable.qvd];
//drop the dummy field.
drop field dummyField;

Database design: ordered set

task_set is a database with two colums(id, task):
id task
1 shout
2 bark
3 walk
4 run
assume there is another table with two colums(employee,task_order)
task_order is an ordered set of tasks, for example (2,4,3,1)
generally, the task_order is unchanged, but sometimes it may be inserted or deleted, e.g, (2,4,9,3,1) ,(2,4,1)
how to design such a database? I mean how to realize the ordered set?
If, and ONLY if you don't need to search inside the task_set column, or update one of it's values (i.e change 4,2,3 to 4,2,1), keeping that column as a delimited string might be an easy solution.
However, if you ever plan on searches or updates for specific values inside the task_set, then you better normalize that structure into a table that will hold employee id, task id, and task order.

Multiple elements in one database cell

The question is how database design should I apply for this situation:
main table:
ID | name | number_of_parameters | parameters
parameters table:
parameter | parameter | parameter
Number of elements in parameters table does not change. number_of_parameters cell defines how many parameters tables should be stored in next cell.
I have problems to move from object thinking to database design. So when we talk about object one row has as much parameters as number_of_parameters says.
I hope that description of requirements is clear. What is the correct way to design such database. If someone can provide some SQL statments to obtain it it would be nice. But the main goal of this question is to understand how to make such architecture.
I want to use SQLite to create this database.
The relational way is to have two tables. The main table has an ID, name and as many other universally-present parameters as possible. The parameters table contains a mapping from an ID in the main table to a parameter name and a parameter value; the main table ID should be a foreign key, and the combination of ID and name should be unique.
The number of parameters can be found by just counting the number of rows with a particular ID.
If you can serialize the data whiile saving to the database and deserialize it back when you get the record it will work. You can get total number of objects in serialized container and save the count to the number_of_parameters field and serialized data in parameters field.
There isn't one perfect correct way, but if you want to use a relational database, you preferably have relational tables.
If you have a key-value database, you place your serialized data as a document attached to your key.
If you want a hybrid solution, both human editable and single table, you can serialize your data to a human-readable format such as yaml, which sees heavy usage in configuration sections of open source projects.

Is using multiple tables an advisable solution to dealing with user defined fields?

I am looking at a problem which would involve users uploading lists of records with various field structures into an application. The 2nd part of this would be to also allow the users to specify fields to capture information.
This is a step beyond anything ive done up to this point where i would have designed a static RDMS structure myself. In some respects all records will be treated the same so there will be some common fields required for each. Almost all queries will be run on these common fields.
My first thought would be to dynamically generate a new table for each import and another for each data capture field spec.Then have a master table with a guid for every record in the application along with the common fields and then fields that specify the name of the table the data was imported to and name of table with the data capture fields.
Further information (metadata?) about the fields in the dynamically generated tables could be stored in xml or in a 'property' table.
This would mean as users log into the application i would be dynamically choosing which table of data to presented to the user, and there would be a large number of tables in the database if it was say not only multiuser but then multitennant.
My question is are there other methods to solving this kind of varaible field issue, im i going down an unadvised path here?
I believe that EAV would require me to have a table defining the fields for each import / data capture spec and then another table with the import - field - values data and that seems impracticle.
I hate storing XML in the database, but this is a perfect example of when it makes sense. Store the user imports in XML initially. As your data schema matures, you can later decide which tables to persist for your larger clients. When the users pick which fields they want to query, that's when you come back and build a solid schema.
What kind is each field? Could the type of field be different for each record?
I am working on a program now that does this sorta and the way we handle it is basically a record table which points to a recordfield table. the recordfield table contains all of the fields along with the field name of the actual field in the database(the column name). We then have a recorddata table which is where all the data goes for each record. We also store a record_id telling it which record it is holding.
This is how we do it where if each column for the record is the same type, then we don't need to add new columns to the table, and if it has more fields or fields of a different type, then we add fields as appropriate to the data table.
I think this is what you are talking about.. correct me if I'm wrong.
I think that one additional table for each type of user defined field for the table that the user can add the fields to is a good way to go.
Say you load your records into user_records(id), that table would have an id column which is a foreign key in the user defined fields tables.
user defined string fields would go in user_records_string(id, name), where id is a foreign key to user_records(id), and name is a string, or a foreign key to a list of user defined string fields.
Searching on them requires joining them in to the base table, probably with a sub-select to filter down to one field based on the user meta-data, so that the right field can be added to the query.
To simulate the user creating multiple tables, you can have a foreign key in the user_records table that points at a table list, and filter on that when querying for a single table.
This would allow your schema to be static while allowing the user to arbitrarily add fields and tables.

Resources