find the instance makes two queries different - database

Given any schema of a database and any two queries regarding this database, try to find the smallest instance that causes the two queries having different result sets.
I can only come up with an idea of how to find the difference between two queries, i.e. treat the result of each query as a subtable and compare the two tables to see if they are the same. Yet I am not sure whether this will work or not
have no clue about how to find the smallest instance, can anyone give me some hint or inspiration?
should I start to construct the instance making use of the info from the two queries or from the schema of the database or I am heading the wrong direction?
Thanks a lot!
update1: database instance is a scenario in which each table of the database has some specific values for its attributes.
for instance,
schema:
table A: attr1 attr2... table B: attr1 attr2 attr3 ...
I have to find in what scenario of the database that two arbitrary queries will return different results?

Assuming that you are using SQL Server, I believe that you want to see the difference of two results from two queires.
Use EXCEPT like this
SELECT * FROM table1
EXCEPT
SELECT * FROM table2

Related

Table setup in MSSQL

I am looking for the best way to setup my SQL tables in the following scenario.
I will do my best to explain my situation.
For instance, I have 10 tests, Test1, Test2, Test3, ...., Test10. They all use similar field but some tests will use different fields depending on the test.
Let's say Test1 uses Field1,Field2, Field3. Test2 uses Field1, Field2, Field4, Field5. I need to store the required field information into a table, but I also need to store what fields each tests use. I will be accessing this info using VB.net.
I am looking for the best way to set this up. It needs to be somewhat easy to maintain but also have pretty good performance.
My initial thought was to setup two tables. One table the would store each test results and one that would store the fields used for each test.
The one that would store each test results would have every possible column any of the tests could use. The table for which fields each test would use would also have all possible columns. In this instance, each row would be a test and each column would be which fields are used for that test. So, Test1 would have a 1 in Column1, Column2, and Column3. Test2 would have a 1 in Column1, Column2, Column4, and Column5. This would tell us what fields need to be used when selecting, Updating, or Inserting into our results table.
Hopefully that makes sense on what I am trying to accomplish. I am not sure if this is the best way to accomplish my requirements or not.
Any guidance here would be greatly appreciated.
Thanks,
Tony
UPDATE
I just want to clarify that I am using MS SQL.
UPDATE
I also wanted to clarify that my field names aren't actually Field 1, Field2, etc. I am just using that to try and explain what I am trying to accomplish.
What you are asking for I believe, is a variable column table. Especially if you consider the long-term affects of adding more tests.
One thing for certain, the way NOT to solve the problem is to add a bunch of columns with generic names (field1, 2, 3 .... 40) and use them as you need in hopes that you never need 40. This makes for a design that is highly problematic to develop around and maintain.
A less horrible approach but still problematic is to make a table that pivots fields and makes them each their own row, and associates the two.
A better solution, using modern databases, is to store your objects (tests) in a no-schema type of way. In a relational database that supports it, you can use a native XML field (or json in some databases). In this scenario, you store the meta data about the object in regular fields and store the XML-serialized objects in the xml field. This way, your object can change as needed without a change to the database schema and you can continue to use meaningful names and data types.
It is important in the relational db scenario to choose a db that has a native xml or json datatype, vs. just using a varchar or blob. The reason is that a native type will include the ability to query the data in ways that are generally more performant than regex.
Of course, this what no-sql databases such as MongoDB are great at. I've had good success with both approaches. For simple solutions, I'll generally choose Mongo. For solutions in which I need a relational database anyway, I'll use MS-SQL and SQLXml.
SteveJ
You need four tables. Tests and Fields have a many-to-many relationship, so you need these two tables plus a TestsFields junction table. Finally you need a results table with TestNumber, Fieldnumber and Result fields. This fits the information given in your question, though it's a little ambiguous. You might need to extend this schema to accomodate multiple testsessions/exams, or whatever - you've not given this context so I can't say.
EDIT:
For instance, lets take a car servicing app as an example.
(Unfortunately you've chosen 'Fields' for one of your tables, and I want to use 'fields' in what follows, so I've distinguished them by capitalisation) In this scenario, Tests table would have the fields TestID and Description, with values like 1, 'Pre-delivery Inspection', 2, '5000 mile service' etc and Fields table would have fieldID and Description, with values like 1, 'Check tyre pressure' , 2, 'Check Handbrake cable' etc.
Then the junction table TestsFields would just consist of the two primary keys.
In this scenario, you would also need another table or two to cover the individual cars, service appointments etc, but let's not get carried away!
The results table would include ServiceApptID,TestID, FieldID, and Result. Result could be free text, where the mechanic could record results, or another lookup to a set of canned responses - 1 'Adjusted', 2, 'Part Replaced' etc
Should be enough there to get the idea across.
I think the better way is use custom char as separator for example | char
like this
somthing|another field|another filed data
when you want to read data split by |
and also you can save selected value by this scenario
1|3|5|2|0|4|1|2
you need only 2 string field to store question and answer :)

Linking table in same SQL Server database to find or match data easier

I will be having multiple tables depends on how many type of data I will be receive after reading a file.
So far I have done creating and insert all the data accordingly into multiple tables where they should belong to.
How to link those table together in a same database so that I can find the repeated data in different tables.
I need to match all the multiple tables together so that I can find or match all the data together to see how many times they have appear in different tables and allocate where are them. Is there anyway to do so? My previews coding is done in Python Pyodbc module, about this linking table, it can be done in a SQL Server query right?
When I want to know how many times the 4 has appear in the column No_Person_in_the_room in both tables or more tables, it will shows the number of 4 has appear how many times in all the tables
And also
1) Honestly there should be just one table (PersonRoleRelationship) which will hold all relationships between different Person roles (because same person can have different roles in different relationships). This structure would make it very simple to query the Parent - Child relationship to query. A sample database structure will look like this:
2) If the database redesign is not possible, then you can add a new column having calculated hash values for the columns you need which can then be used to compare among different tables.

Planning out basic TSQL to manipulate table data

Can someone help me get into the thinking of knowing how to fix data in SQL tables (by trying NOT to give me an SQL routines I could run).
Ok, this is the situation…. Suppose I have a single table with has a column called ColumnA which has lots of duplicate values. I need to remove all the duplicate entries from the table in question. Question is….if I had to write pseudo-code as a plan, what SQL should be written
Many thanks to anyone who can offer me any pointers.
Kind Regards
James
I think you've already articulated a very basic psuedocode for the issue you describe in stating that you wish to delete duplicate values from column A.
For this example, I would tend to;
Find all instances of duplicates
Work out method of determining which one to keep (Google "MAX N in
Group" for ideas) There are good articles here on SO and DBA
Stackexchange also other external articles with examples
Write your delete to cater for the records you identify as unwanted
duplicates in the previous step
For me, when working through these types of issues in SQL Server, I tend to write a series of Common Table Expressions (CTE) to identify my target records and then delete based on that.
For example;
;WITH Duplicates AS (
-- Write your select query to identify which subset of your records are affected by having duplicate values in Column A
), TargetRows AS (
-- Further select from Duplicates some method of MAX N in Group to identify which of the rows are unwanted
) -- Then here DELETE from your table based upon your findings from above

T-SQL: query which joins with all dependent tables and produce cartesian product

I have a bunch of tables which refer to some number of other tables (zero, one, two or more).
My example tables might contain following columns:
Id | StatementTable1Id | StatementTable2Id | Value
where StatementTable1 will contain following columns:
Id | Name | Label
I wish to get all possible combinations and join all of them.
I found this link very useful (query which produce information about dependencies).
I would imagine my code as follows:
Prepare list of tables which I wish to query.
Query link for all my tables and save results into temporary table.
Check maximum number of dependent tables. Prepare query template - for example if maximum number of dependent tables is equal two:
Select
Id, '%Table1Name%' as Table1Name,
'%StatementLabelTable1%' as StatementLabelTable1,
'%Table2Name%' as Table2Name,
'%StatementLabelTable2%' as StatementLabelTable2, Value"
Use cursor - for each dependent table replace appropriate part with dependent table name and label of elements within it.
When all dependent tables have been used - replace all remaining columns with empty string.
add "UNION ALL" and proceed to next table
Run query
Could you tell me if there's any easier or better way?
What you've listed there sounds like you'll need to do if you don't know the column details ahead of time. There's likely going to be some trial-and-error to get the details correct, but it's a good plan to start.
That being said, why on earth would you want to do such a thing? It sounds like you need to narrow down your requirements on what data is actually needed. Otherwise, as you add data to your database, this query and resulting data set is going to quickly become quite unwieldy (these data sets are the kinds you hear about becoming daily "door-stop reports"; no one uses them, but they never remember why it was created, so they keep running the report, and just use it as a door-stop).

Database design - do I need one of two database fields for this?

I am putting together a schema for a database. The goal of the database is to track applications in our department. I have a repeated problem that I am trying to solve.
For example, I have an "Applications" table. I want to keep track if any application uses a database or a bug tracking system so right now I have fields in the Applications table called
Table: Applications
UsesDatabase (bit)
Database_ID (int)
UsesBugTracking (bit)
BugTracking_ID (int)
Table: Databases:
id
name
Table: BugTracking:
id
name
Should I consolidate the "uses" column with the respective ID columns so there is only one bug tracking column and only one database column in the applications table?
Any best practice here for database design?
NOTE: I would like to run reports like "Percent of Application that use bug tracking" (although I guess either approach could generate this data.)
You could remove the "uses" fields and make the id columns nullable, and let a null value mean that it doesn't use the feature. This is a common way of representing a missing value.
Edit:
To answer your note, you can easily get that statistics like this:
select
count(*) as TotalApplications,
count(Database_ID) as UsesDatabase,
count(BugTracking_ID) as UsesBugTracking
from
Applications
Why not get rid of the two Use fields and simply let a NULL value in the _ID fields indicate that the record does not use that application (bug tracking or database)
Either solution works. However, if you think you may want to occasionally just get a list of applications which do / do not have databases / bugtracking consider that having the flag fields reduces the query by one (or two) joins.
Having the bit fields is slightly denormalized, as you have to keep two fields in sync to keep one piece of data updated, but I tend to prefer them for cases like this for the reason I gave in the prior paragraph.
Another option would be to have the field nullable, and put null in it for those entries which do not have DBs / etc, but then you run into problems with foreign key constraints.
I don't think there is any one supreme right way, just consider the tradeoffs and go with what makes sense for your application.
I would use 3 tables for the objects: Application, Database, and BugTracking. Then I would use 2 join tables to do 1-to-many joins: ApplicationDatabases, and ApplicationBugTracking.
The 2 join tables would have both an application_id and the id of the other table. If an application used a single database, it would have a single ApplicationDatabases record joining them together. Using this setup, an application could have 0 database (no records for this app in the ApplicationDatabases table), or many databases (multiple records for this app in the ApplicationDatabases table).
"Should i consolidate the "uses" column"
If I look at your problem statement, then there either is no "uses" column at all, or there are two. In either case, it is wrong of you to speak of "THE" uses column.
May I politely suggest that you learn to be PRECISE when asking questions ?
Yes using null in the foreign key fields should be fine - it seems superfluous to have the bit fields.
Another way of doing it (though it might be considered evil by database people ^^) is to default them to 0 and add in an ID 0 data row in both bugtrack and database tables with a name of "None"... when you do the reports, you'll have to do some more work unless you present the "None" values as they are as well with a neat percentage...
To answer the edited question-
Yes, the fields should be combined, with NULL meaning that the application doesn't have a database (or bug tracker).

Resources