Database Design For reporting comparison results [closed] - sql-server

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
We are going to design an table architecture. Here I wanted to compare same kind of data coming from different sources say Source_A and Source_B. I have to compare few attributes and identify below cases
Mismatches in attribute1
Mismatches in attribute2
Data that are missing in source_A
Data that are missing in Source_B.
Finally i have report to the same in PowerBI with charts. For now I have 2 tables A_DATA and B_DATA to store the incoming data and both are having below structure (this is just a sample, I have lot more columns)
+---------------+
| Columns |
+---------------+
| Material_ID |
+---------------+
| Material_Name |
+---------------+
| Material_Type |
+---------------+
| Quantity |
+---------------+
Now I'm confused whether should I create separate table for 4 cases(Mismatch,Source_A missing,Source_B Missing) or In single table I should have one more column saying Status and keep everything there. For reporting in PowerBI (like out of 1K rows, 5K are mismatches). Please suggest which one is better for reporting cases. Im really confused.

I would say neither of your two options (additional column, or new table) are optimal. I think this would be best handled with a view. Something like:
CREATE VIEW MisMatches
AS
SELECT Material_ID = ISNULL(a.Material_ID, b.Material_ID),
Status = CASE WHEN a.Material_ID IS NULL THEN 'Mising A'
WHEN b.Material_ID IS NULL THEN 'Mising B'
WHEN a.Material_Name <> b.Material_Name THEN 'Mismatch Name'
WHEN a.Material_Type <> b.Material_Type THEN 'Mismatch Type'
WHEN a.Quantity <> b.Quantity THEN 'Mismatch Quantity'
END,
MaterialName_A = a.Material_Name,
MaterialName_B = b.Material_Name,
Material_Type_A = a.Material_Type,
Material_Type_B = b.Material_Type,
Quantity_A = a.Quantity,
Quantity_B = b.Quantity
FROM A_Data AS a
FULL JOIN B_Data AS b
ON b.Material_ID = a.Material_ID
WHERE CHECKSUM(a.Material_Name, a.Material_Type, a.Quantity) <> CHECKSUM(b.Material_Name, b.Material_Type, b.Quantity);
This short circuits on your status column which may not be what you want, that is to say that if you name, quantity and type all don't match, then the status will only tell you that the name is a mismatch. If you want all mis-matches you will need to extend the case expression slightly. Also, if any of your columns are nullable, you will need to handle this in the Status case expression, e.g.
WHEN a.Quantity <> b.Quantity OR a.Quantity IS NULL OR b.Quantity IS NULL THEN ...
I have also had to make an assumption about how you identify a match, but hopefully this gives the general gist of it
Edit
There is a better way of doing this rather than CHECKSUM:
CREATE VIEW MisMatches
AS
SELECT Material_ID = ISNULL(a.Material_ID, b.Material_ID),
Status = CASE WHEN a.Material_ID IS NULL THEN 'Mising A'
WHEN b.Material_ID IS NULL THEN 'Mising B'
WHEN a.Material_Name <> b.Material_Name THEN 'Mismatch Name'
WHEN a.Material_Type <> b.Material_Type THEN 'Mismatch Type'
WHEN a.Quantity <> b.Quantity THEN 'Mismatch Quantity'
END,
MaterialName_A = a.Material_Name,
MaterialName_B = b.Material_Name,
Material_Type_A = a.Material_Type,
Material_Type_B = b.Material_Type,
Quantity_A = a.Quantity,
Quantity_B = b.Quantity
FROM A_Data AS a
FULL JOIN B_Data AS b
ON b.Material_ID = a.Material_ID
WHERE NOT EXISTS
( SELECT a.Material_Name, a.Material_Type, a.Quantity
INTERSECT
SELECT b.Material_Name, b.Material_Type, b.Quantity
);
I discovered this read the following article: Undocumented Query Plans: Equality Comparisons

Related

Return Parts of an Array in Postgres

I have a column (text) in my Postgres DB (v.10) with a JSON format.
As far as i now it's has an array format.
Here is an fiddle example: Fiddle
If table1 = persons and change_type = create then i only want to return the name and firstname concatenated as one field and clear the rest of the text.
Output should be like this:
id table1 did execution_date change_type attr context_data
1 Persons 1 2021-01-01 Create Name [["+","name","Leon Bill"]]
1 Persons 2 2021-01-01 Update Firt_name [["+","cur_nr","12345"],["+","art_cd","1"],["+","name","Leon"],["+","versand_art",null],["+","email",null],["+","firstname","Bill"],["+","code_cd",null]]
1 Users 3 2021-01-01 Create Street [["+","cur_nr","12345"],["+","art_cd","1"],["+","name","Leon"],["+","versand_art",null],["+","email",null],["+","firstname","Bill"],["+","code_cd",null]]
Disassemble json array into SETOF using json_array_elements function, then assemble it back into structure you want.
select m.*
, case
when m.table1 = 'Persons' and m.change_type = 'Create'
then (
select '[["+","name",' || to_json(string_agg(a.value->>2,' ' order by a.value->>1 desc))::text || ']]'
from json_array_elements(m.context_data::json) a
where a.value->>1 in ('name','firstname')
)
else m.context_data
end as context_data
from mutations m
modified fiddle
(Note:
utilization of alphabetical ordering of names of required fields is little bit dirty, explicit order by case could improve readability
resulting json is assembled from string literals as much as possible since you didn't specified if "+" should be taken from any of original array elements
the to_json()::text is just for safety against injection
)

Adding multiple records from a string

I have a string of email addresses. For example, "a#a.com; b#a.com; c#a.com"
My database is:
record | flag1 | flag2 | emailaddresss
--------------------------------------------------------
1 | 0 | 0 | a#a.com
2 | 0 | 0 | b#a.com
3 | 0 | 0 | c#a.com
What I need to do is parse the string, and if the address is not in the database, add it.
Then, return a string of just the record numbers that correspond to the email addresses.
So, if the call is made with "A#a.com; c#a.com; d#a.com", the rountine would add "d#a.com", then return "1, 3,4" corresponding to the records that match the email addresses.
What I am doing now is calling the database once per email address to look it up and confirm it exists (adding if it doesn't exist), then looping thru them again to get the addresses 1 by 1 from my powershell app to collect the record numbers.
There has to be a way to just pass all of the addresses to SQL at the same time, right?
I have it working in powershell.. but slowly..
I'd love a response from SQL as shown above of just the record number for each email address in a single response. That is, "1,2,4" etc.
My powershell code is:
$EmailList2 = $EmailList.split(";")
# lets get the ID # for each eamil address.
foreach($x in $EmailList2)
{
$data = exec-query "select Record from emailaddresses where emailAddress = #email" -parameter #{email=$x.trim()} -conn $connection
if ($($data.Tables.record) -gt 0)
{
$ResponseNumbers = $ResponseNumbers + "$($data.Tables.record), "
}
}
$ResponseNumbers = $($ResponseNumbers+"XX").replace(", XX","")
return $ResponseNumbers
You'd have to do this in 2 steps. Firstly INSERT the new values and then use a SELECT to get the values back. This answer uses delimitedsplit8k (not delimitedsplit8k_LEAD) as you're still using SQL Server 2008. On the note of 2008 I strongly suggest looking at upgrade paths soon as you have about 6 weeks of support left.
You can use the function to split the values and then INSERT/SELECT appropriately:
DECLARE #Emails varchar(8000) = 'a#a.com;b#a.com;c#a.com';
WITH Emails AS(
SELECT DS.Item AS Email
FROM dbo.DelimitedSplit8K(#Emails,';') DS)
INSERT INTO YT (emailaddress) --I don't know what the other columns value should be, so have excluded
SELECT E.Email
FROM dbo.YourTable YT
LEFT JOIN Emails E ON YT.emailaddress = E.Email
WHERE E.Email IS NULL;
SELECT YT.record
FROM dbo.YourTable YT
JOIN dbo.DelimitedSplit8K(#Emails,';') DS ON DS.Item = YT.emailaddress;

converting from file path in a column to reference another table with an id number for a filepath

Using Microsoft SQL Server 2012
I've been trying for weeks to get this sorted over my original question for getting a query to work, it does but not correctly. What I want to achieve from this is on the old questions table I'm getting data from has a "filepath" column that the question data refers to from a folder on the local machine this is structured as an example like this: (some columns I haven't included)
Old question table
QuizQuestionID MasterQuestionID MasterQuestionGUID MasterTypeID MasterDifficultyID MasterCategoryID MasterDecadeID QuizQuestionTypeID QuizQuestionDifficultyID QuizQuestionCategoryID QuizQuestionDecadeI D QuestionText AnswerText FilePath IsEditable IsDeletable IsDeleted IsDifficultyOverridden CreatedDate ModifiedDate AUTO_UseCount AUTO_TieBreakerUsageCount AUTO_TieBreakerLastUsed
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1222 1755 0bee472592ce78e7457d87d7a172ff7b 3 2 7 3 3 2 7 3 Name the singer. David Essex /Sounds/CHORUS David Essex - Tahiti.mp3 False True False False 2014-01-18 12:53:59.000 2014-02-07 12:28:55.000 0 NULL NULL
1223 1756 1df7bd191ef5e31b854c7de5f18982d0 1 2 11 NULL 1 2 11 NULL What is this savoury item? Green Chili /Images/General/Greeen Chili.png False True False False 2014-01-18 15:17:39.000 2014-01-26 19:46:00.000 0 NULL NULL
Now in the new question table, a column uses another table called media2 to find the local filepath of files these are structured:
New question table
id uuid type question answer media created_date modified_date created_user modified_user master_category master_decade master_difficulty is_editable is_deletable multiple_choice choice_1 choice_2 choice_3 choice_4 blur_effect related square_1 square_2 square_3 square_4 tie_breaker
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8033 07B3D24A-6FFA-40AF-B723-B78C9899D4B4 Audio Name the singer. David Essex 21488 2015-03-23 11:51:31.000 2016-03-08 15:21:48.697 NULL NULL 7 2 1 False False False NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
17395 48E555BD-52D6-4358-89E5-8FEE0F0F3AFD Text What is this savoury item? Green chili 19459 2013-09-10 23:51:35.460 2013-09-13 12:51:53.963 NULL NULL 12 NULL 1 False False NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
Media2 table
id UUID name Path Mime/type directory used for Category folder import folder
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
21488 c25183d2-aad7-4c16-8869-9eac711cc39b CHORUS David Essex - Tahiti Quiz C:\media\audio\c25183d2-aad7-4c16-8869-9eac711cc39b.mp3 audio/mp3 Audio Quiz Audio NULL NULL
19459 642db3c3-c531-4818-9b0d-4d7ccd35e0f9 Green chili C:\media\images\642db3c3-c531-4818-9b0d-4d7ccd35e0f9.png image/png Images Quiz Images NULL NULL
Just need to add that "UUID" is a random set of numbers the table uses to identify the file name in their folders so we don't get any duplicated files that may be named the same but don't have the same content and are also used by other software apart from the quiz so I can't use this as a reference to create a query.
This is the whole code that converts the "old questions table" to the "new questions table" the substring is the part that converts the format from old to new which works fine but to clarify it returns null to the "media" column on any image or sound questions, its supposed to convert the filepath column from the old table against the "media2" table and then import all rows to the new question table with the "media" column referencing the "media2" table for filepaths.
--Specify the database, table and columns we want to insert into
INSERT INTO NewDatabase.dbo.quizquestions (uuid, [type], question, answer, created_date, modified_date, master_category, master_decade, master_difficulty, is_editable, is_deletable, media)
--Get the data from the old database (old_database) and map the media filename to the media2 table in the new database
select
questions.uuid,
questions.newtype,
questions.QuestionText,
questions.AnswerText,
questions.CreatedDate,
questions.ModifiedDate,
questions.master_category,
questions.master_decade,
questions.master_difficulty,
questions.IsEditable,
questions.IsDeletable,
media.id as mediaid from
(
select NEWID() as uuid,
CASE
WHEN qqt.TypeName = 'Sound' THEN 'Audio'
ELSE qqt.TypeName
END AS newtype,
qq.QuestionText, qq.AnswerText,
CASE
WHEN qq.createdDate is not NULL THEN qq.createdDate
ELSE GETDATE()
END as createdDate,
CASE
WHEN qq.ModifiedDate is not NULL THEN qq.ModifiedDate
ELSE GETDATE()
END as ModifiedDate,
CASE
WHEN cat.id is NOT NULL THEN cat.id
ELSE 1
END as master_category,
qqdc.QuizQuestionDecadeID as master_decade,
qd.id as master_difficulty,
qq.IsEditable, qq.IsDeletable,
qq.FilePath,
SUBSTRING(
qq.FilePath,
(len(qq.FilePath) - charindex('/', reverse(qq.FilePath)) + 2),
(len(qq.FilePath) - charindex('.', reverse(qq.FilePath))) - (len(qq.FilePath) - charindex('/', reverse(qq.FilePath))) -1 ) as fpath
-- (len(qq.FilePath) - charindex('.', reverse(qq.FilePath)) as positionoflastdot),
--(len(qq.FilePath) - charindex('/', reverse(qq.FilePath)) as positionoflastslash),
from Olddatabase.dbo.QuizQuestion qq
left join Olddatabase.dbo.QuizQuestionType qqt on qq.QuizQuestionTypeID = qqt.QuizQuestionTypeID
left join Olddatabase.dbo.QuizQuestionDifficulty qqd on qq.QuizQuestionDifficultyID = qqd.QuizQuestionDifficultyID
left join Olddatabase.dbo.QuizQuestionCategory qqc on qq.QuizQuestionCategoryID = qqc.QuizQuestionCategoryID
left join Olddatabase.dbo.QuizQuestionDecade qqdc on qq.QuizQuestionDecadeID = qqdc.QuizQuestionDecadeID
left join Olddatabase.dbo.QuizQuestionCategory qqmc on qq.MasterCategoryID = qqmc.MasterQuestionCategoryID
left join Newdatabase.dbo.QuizCategories cat on qqc.CategoryName = cat.name
left join Newdatabase.dbo.QuizDifficulties qd on qd.id = qq.MasterDifficultyID
where qq.MasterCategoryID is not null
) as questions
left join Newdatabase.dbo.Media2 media on media.name = replace(fpath, 'quiz','')
i cant really change to structure of the tables as the software relies heavily on it, and changing it manually would be painful and time consuming as there are over 2000 audio/images questions
i'm just getting my head around SQL databases and queries but this has stopped me dead from progressing.
any help changing this above query or creating something entirely different would be greatly appreciated.
Just looking at it, the code seems properly structured. Faced with something like this, I’d “cut down” the query to only the relevant parts, and debug from there, tossing in extra lines like those you currently have commented out to review the parts of the function calls being returned. Pick it part, look at all th details, and you’ll eventually figure out what’s not lining up. The following should work as an initial "minimum query":
select
qquestions.FilePath
,questions.fpath
,media.id as mediaid
from (-- Build the necessary filepath. Running just the subquery can help figure things out too
select
qq.FilePath
,SUBSTRING(
qq.FilePath
,(len(qq.FilePath) - charindex('/', reverse(qq.FilePath)) + 2)
,(len(qq.FilePath) - charindex('.', reverse(qq.FilePath))) - (len(qq.FilePath) - charindex('/', reverse(qq.FilePath))) -1 ) as fpath
from Olddatabase.dbo.QuizQuestion qq
where qq.MasterCategoryID is not null
) as questions
left join Newdatabase.dbo.Media2 media
on media.name = replace(fpath, 'quiz','')
Mess with this and it should be easier to figre things out.
Bonus advice: code like
CASE
WHEN qq.createdDate is not NULL THEN qq.createdDate
ELSE GETDATE()
END as createdDate,
can be replaced with the shorter and more legible form
isnull(qq.createdDate, getdate()) createdDate,

How can I write a better stored procedure to return all records with a flag/bit field = 1 in which I pass the flag-field's name into the sproc?

I'm dealing with a substantially more complicated schema (of course), but I think i'm including just enough in this question to show what I need and no more. (I'll update as needed based on comments.) Let's say I have this table:
create table peeps (
id int primary key identity(1,1),
name varchar(50),
eligible bit,
lefty bit,
contractor bit
)
And I want to write a sproc that's going to return the names of all my peeps if they are eligible or lefties or contractors and I want to have a single procedure to handle all those cases.
My current approach to this (which seems to totally work) looks like this:
CREATE PROCEDURE getFlagMatchingPeeps #flagName varchar(30)
select name
from peeps
where
(eligible = 1 or eligible = case when #flagName = 'eligible' then 1 else 0 end) and
(lefty = 1 or lefty = case when #flagName = 'lefty' then 1 else 0 end) and
(contractor = 1 or contractor = case when #flagName = 'contractor' then 1 else 0 end)
But that feels like an ugly solution (e.g. testing whether a value is 1 twice feels wasteful). So I'm here looking for more advanced SQL people to help me craft something better.
I guess first, is there anything wrong with my solution? If not, this can be quick. But if there is:
What is it?
What's the right approach?
How should I have identified the problem?
How should I be thinking differently in order to solve this?
If you just wanted to get rid of the extra comparison (for simplicity, this assumes that the bit columns are non-nullable contrary to the defined schema)
SELECT
[name]
FROM
[dbo].[peeps]
WHERE
([eligible] = CASE WHEN #flagName = 'eligible' THEN 1 ELSE [eligible] END) AND
([lefty] = CASE WHEN #flagName = 'lefty' THEN 1 ELSE [lefty] END) AND
([contractor] = CASE WHEN #flagName = 'contractor' THEN 1 ELSE [contractor] END);
Ignoring the fact that you are filtering on bit columns, in general there will be performance issues with queries using dynamic search parameters without the use of OPTION(RECOMPILE) or dynamic sql due to parameter sniffing. See Erland Sommarskog's article on this topic.

Social Network Database Design - Friend/Block Relationships

I'm working on a social networking site and need users to be able to friend each other and/or block each other. The way I see it, 2 users can either be Friend, Pending, Block, or NULL. I'd like to have a single view that shows a single row for each confirmed relationship. My view properly shows the relationship but I had to do a workaround to only show 1 row/relationship without unioning the table with itself and swapping the order or Requestor and Requestee.
Anybody have any ideas about how to clean this up?
Thanks,
- Greg
Relationship Table:
Requestor (int) | Requestee (int) | ApprovedTimestamp (smalldatetime) | IsBlock (bit)
vwRelationship View:
SELECT DISTINCT
CASE WHEN f.Requestor < f.Requestee THEN f.Requestor ELSE f.Requestee END AS UserA,
CASE WHEN f.Requestor < f.Requestee THEN f.Requestee ELSE f.Requestor END AS UserB, CASE WHEN b.Requestor IS NULL AND b.Requestee IS NULL
THEN CASE WHEN f.AcceptedTimestamp IS NULL THEN 'Pending' ELSE 'Friend' END ELSE 'Block' END AS Type
FROM dbo.Relationship AS f LEFT OUTER JOIN
(SELECT Requestor, Requestee
FROM dbo.Relationship
WHERE (IsBlock = 1)) AS b ON f.Requestor = b.Requestor AND f.Requestee = b.Requestee OR f.Requestor = b.Requestee AND f.Requestee = b.Requestor
Example Query:
Select Type From vwRelationship Where (UserA = 1 AND UserB = 2) OR (UserA = 2 AND UserB = 1)
Scenario:
User 1 and User 2 don't know each other | Relationship Type = NULL
User 1 friends User 2 | Relationship Type = Pending
User 2 accepts | Relationship Type = Friend
a month later User 2 blocks User 1 | Relationship Type = Block
Here's what I ended up using:
Table - Relationship
RelationshipID, RelationshipTypeID, CreatedByUserID, CreatedTimestamp
Table - RelationshipType
RelationshipTypeID, RelationshipTypeName
Table - UserRelationship
UserID, RelationshipID, IsPending
Anybody think of anything better?

Resources