I'm developing an online website (using Django and Mysql). I have a Tests table and User table.
I have 50 tests within the table and each user completes them at their own pace.
How do I store the status of the tests in my DB?
One idea that came to my mind is to create an additional column in User table. That column containing testid's separated by comma or any other delimiter.
userid | username | testscompleted
1 john 1, 5, 34
2 tom 1, 10, 23, 25
Another idea was to create a seperate table to store userid and testid. So, I'll have only 2 columns but thousands of rows (no of tests * no of users) and they will always continue to increase.
userid | testid
1 1
1 5
2 1
1 34
2 10
Your second option is vastly preferred... your first solution breaks normalization rules by trying to store multiple values in a single field, which will cause you headaches down the road.
The second table will not only be easier to maintain when trying to add or remove values, but will also likely perform better since you'll be able to effectively index those columns.
There are two phrases that should automatically disqualify any database design idea.
create one table per [anything]
a column containing [anything] separated by a comma
Separate table, two columns--you're on the right track there.
Related
My problem is understand the relation of primary keys to the fact table.
This is the structure I'm working in, the transfer works but it says the values I set as primary keys cannot be NULL
This is the structure I'm working in, the transfer works but it says the values I set as primary keys cannot be NULL
I'm using SSIS to transfer data from a CSV file to an OLEDB (SQL server 2019 over SSMS)
The actual problem is where/how I can get the values in the same task? I tried to do in in two different tasks but then it is in the table one after another ( this only worked when I allowed nulls for the primary keys and can't be a solution I think.)
Maybe the problem I have three transfer from the source
First dimension table
To second dimension table
To fact table. I think the primary keys are generated when I transfer the data to the DB so I think I can't get it in the same task.
dataflow 1
dataflow 2
input data
output data 5
I added the column salesid to the input to use it for the saleskey. Is there a better solution maybe with the third lookup you've mentioned?
You are attempting to load the fischspezi fact table as well as the product (produkt) and location (standort). The problem is, you don't have the keys from the dimensions.
I assume the "key" columns in your dimension are autogenerated/identity values? If that's the case, then you need to break your single data flow into two data flows. Both will keep the Flat File source and the multicast.
Data Flow Dimensions
This is the existing data flow, minus the path that leads to the Fact table.
Data Flow Fact
This data flow will work to populate the Fact table. Remove the two branches to the dimension tables. What we need to do here, is find the translated key values given our inputs. I assume produkt_ID and steuer_id should have been defined as NOT NULL and unique in the dimensions but the concept here is that we need to be able to use a value that comes in our file, product id 3892, and find the same row in the dimension table which has a key value of 1.
The tool for this, is the Lookup Transformation You're going to want 2-3 of those in your data flow right before the destination. The first one will lookup produktkey based on produkt_ID. The second will find standortkey based on steuer_id.
The third lookup you'd want here (and add back into the dimension load) would lookup the current row in the destination table. If you ran the existing package 10 times, you'd have 10x data (unless you have unique constraints defined). Guessing here, but I assume sale_id is a value in the source data so I'd have a lookup here to ensure I don't double load a row. If sales_id is a generated value, then for consistency, I'd rename the suffix to key to be in line with the rest of your data model.
I also encourage everyone to read Andy Leonard's Stairway to Integration Services series. Levels 3 &4 address using lookups and identifying how to update existing rows, which I assume will be some of the next steps in your journey.
Addressing comments
I would place them just over the fact destination and then join with a union all to fact table
No. There is no need to have either a join or a union all in your fact data flow. Flat File Source (Get our candidate data) -> Data Conversion(s) (Change data types to match the expected)-> Derived Columns (Manipulate the data as needed, add things like insert date, etc) -> Lookups (Translate source values to destination values) -> Destination (Store new data).
Assume Source looks like
produkt_ID
steuer_id
sales_id
umsatz
1234
1357
2468
12
2345
3579
4680
44
After dimension load, you'd have (simplified)
Product
produktkey
produkt_ID
1
1234
2
2345
Location
standortkey
steuer_id
7
1357
9
3579
The goal is to use that original data + lookups to have a set like
produkt_ID
steuer_id
sales_id
umsatz
produktkey
standortkey
1234
1357
2468
12
1
7
2345
3579
4680
44
2
9
The third lookup I propose (skip it for now) is to check whether sales_id exists in the destination. If it does, then you would want to see whether that existing record is the same as what we have in the file. If it's the same, then we do nothing. Otherwise, we likely want to update the existing row because we have new information - someone miskeyed the quantity and instead our sales should 120 and not 12. The update is beyond the scope of this question but it's covered nicely in the Stairway to Integration Services.
Okey, context:
I have a system that requires to do a monthly, weekly and dayly reports.
Architecture A:
3 tables:
1) Monthly reports
2) Weekly reports
3) Daily reports
Architecture B:
1 table:
1) Reports: With extra column report_type, with values: "monthly", "weekly", "daily".
Which one would be more performant and why?
The common method I use to do this is use two tables, following similarly to your B approach. One table would be as you describe with report data and an extra column, but instead of hard coding the values, this column would hold an id to a reference table. The reference table would then hold the names of these values. This set up allows you to easily reference the intervals with other tables should you need that later on, but also makes name updates much more efficient. Changing the name of say "Monthly" to "Month" would require one update here, vs n updates if you stored the string in your report table.
Sample structure:
report_data | interval_id
xxxx | 1
interval_id | name
1 | Monthly
As a side note, you would rarely want to take your first approach, approach A, due to how it limits changing the interval type of entered data. If all of a sudden you want to change half of your Daily entries to Weekly entries, you need to do n/2 deletes and n/2 inserts, which is fairly costly especially if you start introducing indexes. In general tables should describe types of data (ie Reports) and columns should describe that type (ie How often a report happens)
Please help me on maintaining the relatoinships between more than two tables. I will explain the one scenario that i am facing now.
Table 1 has all the codes i am using in the application, example status codes means Active has the code as 10, InActive has the code as 20, in this way i am mainting all the codes,
Table 2 is having the 5 columns,
column 1 is auto generated key
column 2 - has the id of table 1
column 3 - also has the id of table 1
column 4 - also has the id of table 1
column 5 - has description.
So,I need to perform multiple joins while retreiving the data from these two tables.My question is this, is it right way to maitain the tables?
Moreover i am using Spring with Hibernate to fetch the data from DB. Any ideas on how to do that using Hibernate?
Please suggest me on this.
I have several list boxes in my web application that user has to fill. Administrator can add/remove/edit values in the combo box from controle panel. so problem is what is the best way to keep these combo box in database.
one way is keeping each table for each combo box. I think this is very easy to handle but I will have to create more than 20 tables for each combo/list box.And I think whether it is good practice to do so.
anotherway is keeping one table for all combo box. But I am worring when deleting data in this case.
If I want to remove India from countr coloum in combo box table, then I will be in problem. I may have to update it to null or some otherway and have to handel this in programming side.
Am I correct. can you help me ?
I think you just should create a table with 3 fields. First field is the id, second is the name and the last is the foreign key. For example:
combo_box_table
id - name - box
1 - Japan - 1
2 - India - 1
3 - Scotland - 2
4 - England - 3
you just have to play with query, each box represent the last field. 1 represent combo box 1 and 2 represent combo box 2 etc.
select * from combo_box_table where box = 1
if you want to delete india the query is just delete from combo_box_table where id = 2
May this help
Another possibility would be to save the combo box data as an array or a json string in a single field in your table, but whether you want to do this or not depends on how you want your table to function and what you application is. See Save PHP array to MySQL? for further information.
EDIT:
I'm going to assume you have a combo-box with different countries and possibly another with job titles and others.
If you create multiple tables then yes you would have to use multiple SQL querys, but the amount of data in the table would be flexible and deleting would be a one step process:
mysqli_query($link,"DELETE FROM Countries WHERE Name='India'");
With the json or array option you could have one table, and one column would be each combo-box. This would mean you only have to query the table once to populate the combo-boxes, but then you would have to decode the json strings and iterate through them also checking for null values for instance if countries had 50 entries but job titles only had 20. There would be some limitations on data amount as the "text" type only has a finite amount of length. (Possible, but a nightmare of code to manage)
You may have to query multiple times to populate the boxes, but I feel that the first method would be the most organized and flexible, unless I have mis-interpreted your database structure needs...
A third possible answer, though very different, could be to use AJAX to populate the combo-boxes from separate .txt files on the server, though editing them and removing or adding options to them through any way other than manually opening the file and typing in it or deleting it would be complex as well.
Unless you have some extra information at the level of the combo-box itself, just a simple table of combo-box items would be enough:
CREATE TABLE COMBO_BOX_ITEM (
COMBO_BOX_ID INT,
VALUE VARCHAR(255),
PRIMARY KEY (COMBO_BOX_ID, VALUE)
)
To get items of a given combo-box:
SELECT VALUE FROM COMBO_BOX_ITEM WHERE COMBO_BOX_ID = <whatever>
The nice thing about this query is that it can be satisfied by a simple range scan on the primary index. In fact, assuming the query optimizer of your DBMS is clever enough, the table heap is not touched at all, and you can eliminate this "unnecessary" heap by clustering the table (if your DBMS supports clustering). Physically, you'd end-up with just a single B-Tree representing the whole table.
Use a single table Countries and another for Job Descriptions setup like so:
Countries
ID | Name | JobsOffered | Jobs Available
_________________________________________
1 | India | 1,2,7,6,5 | 5,6
2 | China | 2,7,5, | 2,7
etc.
Job Descriptions
ID | Name | Description
___________________________________
1 | Shoe Maker | Makes shoes
2 | Computer Analyst | Analyzes computers
3 | Hotdog Cook | Cooks hotdogs well
Then you could query your database for the country and get the jobs that are available (and offered) then simply query the Job Description table for the names and display to the user which jobs are available. Then when one job is filled or is opened all you have to do is Update the contry table with the new jobID.
Does this help? (In this case you will need a separate table for each combo-box, as suggested, and you have referencing IDs for the jobs available)
I developing a tool which may got more than a million data to fill in.
current i have designed single table with 36 coloumns. my question is do I need to divide these into multiple tables or single??
If single what is the advantage and disadvantage
if multiple then what is the advantage and disadvantage
and what will be the engine to use for speed...
my concern is a large database which will have atleast 50000 queries perday..
any help??
Yes, you should normalize your database. A general rule of thumb is that if a column that isn't a foreign key contains duplicate values, the table should be normalized.
Normalization involves splitting your database into tables, and helps to:
Avoid modification anomolies.
Minimize impact of changes to the data structure.
Make the data model more informative.
There is plenty of information about normalization on Wikipedia.
If you have a serious amount of data and don't normalize, you will eventually come to a point where you will need to redesign your database, and this is incredibly hard to do retrospectively, as it will involve not only changing any code that accesses the database, but also migrating all existing data to the new design.
There are cases where it might be better to avoid normalization for performance reasons, but you should have a good understanding of normalization before making this decision.
First and foremost ask yourself are you repeating fields or attributes of fields. Does your one table contain relationships or attributes that should be separated. Follow third normal form...we need more info to help but generally speaking one table with thirty six columns smells like a db fart.
If you want to store a million rows of the same kind, go for it. Any decent database will cope even with much bigger tables.
Design your database to best fit the data (as seen from your application), get it up, and optimize later. You will probably find that performance is not a problem.
You should model your database according to the data you want to store. This is called "normalization": Essentially, each piece of information should only be stored once, otherwise a table cell should point to another row or table containing the value. If, for example, you have table containing phone numbers, and one column contains the area code, you will likely have more than one phone number with the same value in the same column. Once this happens, you should set up a new table for area codes and link to its entries by referencing the primary key of the row the desired area code is stored in.
So instead of
id | area code | number
---+-----------+---------
1 | 510 | 555-1234
2 | 510 | 555-1235
3 | 215 | 555-1236
4 | 215 | 555-1237
you would have
id | area code id | number | area code
---+---------- ---+----------+-----------
1 | 510 1 | 555-1234 | 1
2 | 215 2 | 555-1235 | 1
3 | 555-1236 | 2
4 | 555-1237 | 2
The more occurences of the same value you have, the more likely will you save memory and get quicker performance if you organize your data in this way, especially when you're handling string values or binary data. Also, if an area code would change, all you need to do is update a single cell instead of having to perform an update operation on the whole table.
Try this tutorial.
Correlation does not imply causation.
Just because shitloads of columns usually indicate a bad design, doesn't mean that a shitload of columns is a bad design.
If you have a normalized model, you store whatever number of columns you need a single table.
It depends!
Does that one table contain a single 'entity'? i.e. Are all 36 columns attributes a single thing, or are there several 'things' mixed together?
If mixed, then you should normalise (separate into distinct entities with relationships between them). You should aim for at least Third Normal Form (3NF).
A best practice is to normalise as much as you can; if you later identify a performance problem, then denormalise as little as you can.