I am looking for the best way to setup my SQL tables in the following scenario.
I will do my best to explain my situation.
For instance, I have 10 tests, Test1, Test2, Test3, ...., Test10. They all use similar field but some tests will use different fields depending on the test.
Let's say Test1 uses Field1,Field2, Field3. Test2 uses Field1, Field2, Field4, Field5. I need to store the required field information into a table, but I also need to store what fields each tests use. I will be accessing this info using VB.net.
I am looking for the best way to set this up. It needs to be somewhat easy to maintain but also have pretty good performance.
My initial thought was to setup two tables. One table the would store each test results and one that would store the fields used for each test.
The one that would store each test results would have every possible column any of the tests could use. The table for which fields each test would use would also have all possible columns. In this instance, each row would be a test and each column would be which fields are used for that test. So, Test1 would have a 1 in Column1, Column2, and Column3. Test2 would have a 1 in Column1, Column2, Column4, and Column5. This would tell us what fields need to be used when selecting, Updating, or Inserting into our results table.
Hopefully that makes sense on what I am trying to accomplish. I am not sure if this is the best way to accomplish my requirements or not.
Any guidance here would be greatly appreciated.
Thanks,
Tony
UPDATE
I just want to clarify that I am using MS SQL.
UPDATE
I also wanted to clarify that my field names aren't actually Field 1, Field2, etc. I am just using that to try and explain what I am trying to accomplish.
What you are asking for I believe, is a variable column table. Especially if you consider the long-term affects of adding more tests.
One thing for certain, the way NOT to solve the problem is to add a bunch of columns with generic names (field1, 2, 3 .... 40) and use them as you need in hopes that you never need 40. This makes for a design that is highly problematic to develop around and maintain.
A less horrible approach but still problematic is to make a table that pivots fields and makes them each their own row, and associates the two.
A better solution, using modern databases, is to store your objects (tests) in a no-schema type of way. In a relational database that supports it, you can use a native XML field (or json in some databases). In this scenario, you store the meta data about the object in regular fields and store the XML-serialized objects in the xml field. This way, your object can change as needed without a change to the database schema and you can continue to use meaningful names and data types.
It is important in the relational db scenario to choose a db that has a native xml or json datatype, vs. just using a varchar or blob. The reason is that a native type will include the ability to query the data in ways that are generally more performant than regex.
Of course, this what no-sql databases such as MongoDB are great at. I've had good success with both approaches. For simple solutions, I'll generally choose Mongo. For solutions in which I need a relational database anyway, I'll use MS-SQL and SQLXml.
SteveJ
You need four tables. Tests and Fields have a many-to-many relationship, so you need these two tables plus a TestsFields junction table. Finally you need a results table with TestNumber, Fieldnumber and Result fields. This fits the information given in your question, though it's a little ambiguous. You might need to extend this schema to accomodate multiple testsessions/exams, or whatever - you've not given this context so I can't say.
EDIT:
For instance, lets take a car servicing app as an example.
(Unfortunately you've chosen 'Fields' for one of your tables, and I want to use 'fields' in what follows, so I've distinguished them by capitalisation) In this scenario, Tests table would have the fields TestID and Description, with values like 1, 'Pre-delivery Inspection', 2, '5000 mile service' etc and Fields table would have fieldID and Description, with values like 1, 'Check tyre pressure' , 2, 'Check Handbrake cable' etc.
Then the junction table TestsFields would just consist of the two primary keys.
In this scenario, you would also need another table or two to cover the individual cars, service appointments etc, but let's not get carried away!
The results table would include ServiceApptID,TestID, FieldID, and Result. Result could be free text, where the mechanic could record results, or another lookup to a set of canned responses - 1 'Adjusted', 2, 'Part Replaced' etc
Should be enough there to get the idea across.
I think the better way is use custom char as separator for example | char
like this
somthing|another field|another filed data
when you want to read data split by |
and also you can save selected value by this scenario
1|3|5|2|0|4|1|2
you need only 2 string field to store question and answer :)
Related
I am attempting to merge data from various sources into an existing data model. Each source uses different types of IDs (such as GUID, Salesforce IDs, etc.). For example, if I were to merge data from two different sources, the table may look like the following (where the first two SalesPersonIDs are GUID IDs and the second two are Salesforce IDs):
Is this a bad practice? I could also imagine a table where each ID type was its own column and could be left blank if it was not applicable. Something like the following:
I apologize, I am a bit new to this. Thanks in advance for any insight, I greatly appreciate it!
The big roles of an ID column are to act as a key connecting data in different tables, and to help indexing - quickly find rows so your queries run fast.
The second solution wouldn't work well for these purposes, and will lead to big headaches in queries: every time you want to group by the ID, you'll have to combine the info from 2 columns in some way, hopefully getting a correct unique result every time.
On the one hand, all you might ever need from an ID is for it to be unique. The first solution might be fine this respect - but are you sure you'll never, ever get data about one SalesPerson from more than one source?
I'd suggest keeping all the IDs in one column, and adding a column to say what kind of ID this is. At least this way, you won't lose any information and can do other things in the future.
One thing you might consider is making a separate table of SalesPerson with all their possible IDs, and have this keyed to other (Sales?) data by a unique ID used only in your database.
Suppose reviews can have zero or more tags.
One could implement this using three tables, Review/Tag/ReviewTagRelation.
ReviewTagRelation would have foreign key to Review and Tag table.
Or using two tables Review/Tag. Review has a json field to hold the list of tag ids.
Traditional approach seems to be the one using the three tables.
I wonder if it is ok to use the two tables approach when there's no need to reference reviews from tags.
i.e. I only need to know what tags are associated with a given review.
In my experience it is always best to keep the data in your database normalized, unless there is a clean and clear cut reason for not doing so that makes sense as per your business requirements.
With normalized data, you know that no matter what, you will always be able to write a query to receive exactly what you are looking for, and if for some reason you want to return data as json, you can do so in your select query.
Consider a situation where the schema of a database table may change, that is, the fields, number of fields, and types of those fields may vary based on, say a client ID.
Take, for example a Users table. Typically we would represent this in a horizontal table with the following fields:
FirstName
LastName
Age
However, as I mentioned, each client may have different requirements.
I was thinking that to represent a multi-schema approach to Users in a relational database like SQL Server, this would be done with two tables:
UsersFieldNames - {FieldNameId, ClientId, FieldName, FieldType}
UsersValues - {UserValueId, FieldNameId, FieldValue}
To retrieve the data (using EntityFramework DB First), I'm thinking pivot table, and the use of something like LINQ Extentions - Pivot Extensions may be useful.
I would like to know of any other approaches that would satisfy this requirement.
I'm asking this question for my own curiosity, as I recall similar conversations coming up in the past, and in relation to this question posed.
Thanks.
While I think a NoSQL data base would work best for this, I once tried something like this.
Have a table named something like METATABLES, like this
METATABLE = {table_name, field name}
and another,
ACTUAL_DATA ={table_name, field_name, actual_data_id, float_value, string_value, double_value, varchar_value}
in actual_data, the fields table_name and field_name would be foreign keys, pointing to METATABLES. In METATABLES you define the specific fields each client requires. the ACTUAL_DATA table holds the actual values of those fields, stored in the appropiate value field, depending on the data type (if the field value is a string, it would be stored in the string_Value field).
This approach is probably not the most efficient, though. Hope it helps.
I think it would be a mistake to have the schema vary. It is typically something you want to be standard.
In this case you may have users that have different attributes. In the user table you store attributes that are common across all users:
USER {id(primary key), username, first, last, DOB, etc...}
Note: Age is something that should not be stored, it should be calculated.
Then you could have a USER_ATTRIBUTE table:
{userId,key,value}
So users can have multiple attributes that are unrelated to one another without the schema changing.
Changing the schema often breaks the application.
Please, read first my previous question: T-SQL finding of exactly same values in referenced table
The main purpose of this question is to find out if this approach of storing of data is effective.
Maybe it would be better to get rid of PropertyValues table. And use additional PropertyValues nvarchar(max) column in Entities table instead of it. For example instead of
EntityId PropertyId PropertyValue
1 4 Val4
1 5 Val5
1 6 Val6
table, I could store such data in PropertyValues column: "4:Val4;5:Val5;6Val6"
As an alternative, I could store XML in PropertyValues column....
What do you think about the best approach here?
[ADDED]
Please, keep in mind:
Set of properties must be customizable
Objects will have dozens of properties (approximately from 20 to 120). Database will contain thousands of objects
[ADDED]
Data in PropertyValues table will be changed very often. Actually, I store configured products. For example, admin configures that clothes have attributes "type", "size", "color", "buttons type", "label type", "label location" etc... User will select values for these attributes from the system. So, PropertyValues data cannot be effectively cached.
You will hate yourself later if you implement a solution using multi-value attributes (i.e. 4:Val4;5:Val5;6Val6).
XML is marginally better because there are XQuery functions to help you pull out and parse the values. But the XML type is implemented as a CLR type in SQL Server and it can get extremely slow to work with.
The best solution to this problem is one like you have. Use the sql_variant type for the column if it could be any number of data types. Ideally you'd refactor this into multiple tables / entities so that the data type can be something more concrete.
I work with the similar project (web-shop generator). So every product has attribute and every attribute has set of values. It is different tables. And for all of this there are translations in several languages. (So exists additional tables for attributes and values translations).
Why we choose such solution? Because for every client there should be database with the same scheme. So such database scheme is very elastic.
So what about this solution. As always, "it depends" -))
Storage. If your value will be used often for different products, e.g. clothes where attribute "size" and values of sizes will be repeated often, your attribute/values tables will be smaller. Meanwhile, if values will be rather unique that repeatable (e.g. values for attribute "page count" for books), you will get a big enough table with values, where every value will be linked to one product.
Speed. This scheme is not weakest part of project, because here data will be changed rarely. And remember that you always can denormalize database scheme to prepare DW-like solution. You can use caching if database part will be slow too.
Elasticity This is the strongest part of solution. You can easily add/remove attributes and values and ever to move values from one attribute to another!
So answer on your question is not simple. If you prepare elastic scheme with unknown attributes and values, you should use different tables. I suggest to you remember about storing values in CSV strings. It is better to store it as XML (typed and indexed).
UPDATE
I think that PropertyValues will not change often , if comparing with user orders. But if you doubt, you should use denormalization tables or indexed views to speed up.Anyway, changing XML/CSV on large quantity of rows will have poor performance, so "separate table" solution looks good.
The SQL Customer Advisory Team (CAT) has a whitepaper written just for you: Best Practices for Semantic Data Modeling for Performance and Scalability. It goes through the common pitfalls of EAV modeling and recommends how to design a scalable EAV solution.
I am putting together a schema for a database. The goal of the database is to track applications in our department. I have a repeated problem that I am trying to solve.
For example, I have an "Applications" table. I want to keep track if any application uses a database or a bug tracking system so right now I have fields in the Applications table called
Table: Applications
UsesDatabase (bit)
Database_ID (int)
UsesBugTracking (bit)
BugTracking_ID (int)
Table: Databases:
id
name
Table: BugTracking:
id
name
Should I consolidate the "uses" column with the respective ID columns so there is only one bug tracking column and only one database column in the applications table?
Any best practice here for database design?
NOTE: I would like to run reports like "Percent of Application that use bug tracking" (although I guess either approach could generate this data.)
You could remove the "uses" fields and make the id columns nullable, and let a null value mean that it doesn't use the feature. This is a common way of representing a missing value.
Edit:
To answer your note, you can easily get that statistics like this:
select
count(*) as TotalApplications,
count(Database_ID) as UsesDatabase,
count(BugTracking_ID) as UsesBugTracking
from
Applications
Why not get rid of the two Use fields and simply let a NULL value in the _ID fields indicate that the record does not use that application (bug tracking or database)
Either solution works. However, if you think you may want to occasionally just get a list of applications which do / do not have databases / bugtracking consider that having the flag fields reduces the query by one (or two) joins.
Having the bit fields is slightly denormalized, as you have to keep two fields in sync to keep one piece of data updated, but I tend to prefer them for cases like this for the reason I gave in the prior paragraph.
Another option would be to have the field nullable, and put null in it for those entries which do not have DBs / etc, but then you run into problems with foreign key constraints.
I don't think there is any one supreme right way, just consider the tradeoffs and go with what makes sense for your application.
I would use 3 tables for the objects: Application, Database, and BugTracking. Then I would use 2 join tables to do 1-to-many joins: ApplicationDatabases, and ApplicationBugTracking.
The 2 join tables would have both an application_id and the id of the other table. If an application used a single database, it would have a single ApplicationDatabases record joining them together. Using this setup, an application could have 0 database (no records for this app in the ApplicationDatabases table), or many databases (multiple records for this app in the ApplicationDatabases table).
"Should i consolidate the "uses" column"
If I look at your problem statement, then there either is no "uses" column at all, or there are two. In either case, it is wrong of you to speak of "THE" uses column.
May I politely suggest that you learn to be PRECISE when asking questions ?
Yes using null in the foreign key fields should be fine - it seems superfluous to have the bit fields.
Another way of doing it (though it might be considered evil by database people ^^) is to default them to 0 and add in an ID 0 data row in both bugtrack and database tables with a name of "None"... when you do the reports, you'll have to do some more work unless you present the "None" values as they are as well with a neat percentage...
To answer the edited question-
Yes, the fields should be combined, with NULL meaning that the application doesn't have a database (or bug tracker).