Find fields and tables that can make a report blank - sql-server

Consider the situation where I have a report(Stored Procedure) in SQL Server. I would like to know which fields and tables need to be populated in my database for that report to return rows. For a small procedure like this:
Create Procedure dbo.getWorkOrders #status nvarchar(10)
Select Member.Member_Name, Member.Member_ID, WorkOrder.Technician_ID, WorkOrder.Status
From Member Inner Join WorkOrder
on WorkOrder.Member_ID = Member.Member_ID
Where
WorkOrder.Status like #status
For the example above, I would define the required data by looking at inner joins and items in the where clause. In this example, there must be rows on the Member table with Member_ID's, there must also be rows on WorkOrder with Member_ID's and statuses. I'm not concerned if the procedure returns blank because the user enters a status that doesn't exist on the WorkOrder table, but I am concerned if the WorkOrder table was loaded rows that have no status, or if no rows were loaded for Member or WorkOrder.
To put it another way, a routine task for me is to find out what data needs to be loaded in a warehouse so that we can test the stored procedures. I'm currently doing this manually as I described in the example above, but there are many reports and tables so it's a difficult process. I would like to automate this part of my job. Does something like this already exist? If I'm going to code it, how should I start?
I was thinking about writing something in Python to extract the Inner Join and Where clauses, but if it makes sense to make a stored procedure to do this in SQL Server I would prefer that.

Related

'Multiple' values for a variable

A bit of background. There are multiple tables from multiple databases that have the same schemas. So, when I query to select all columns having the same master code (in the tables, the master code is in the column called CATMASTRCAT), the same code will have multiple rows, the only same thing about them is the CATMASTRCAT column. This works for a single master code (in the script below if I set the variable to 031325-002-70 it will show multiple rows having different organizations and same data with the rest, which is the desired result).
Question is, is there a way to have multiple master codes be as an input in the variable? I'm planning to create this as a stored procedure.
This is my SQL script:
DECLARE #ProductNumber AS VARCHAR(1000)
SET #ProductNumber = ('031325-002-70')
SELECT ITEMS
,ORGANIZATION
FROM [EU].[dbo].[SOMETHING14]
WHERE ITEMS in (#ProductNumber)
UNION
SELECT ITEMS
,ORGANIZATION
FROM [EU].[dbo].[SOMETHING12]
WHERE ITEMS in (#ProductNumber)
UNION
SELECT ITEMS
,ORGANIZATION
FROM [EU].[dbo].[SOMETHING11]
WHERE ITEMS IN (#ProductNumber)
Feel free to clarify any other needed data. I'm fairly new to SQL, just self-learning. You can also lecture me about the wrong code haha and how to do this better.
Thanks!
P.S. Attached the picture of query result
Yes the best way to do this is to use a table value parameter and then change the where clause to say
WHERE catmastrcat IN (SELECT catmastrcat FROM #tablevaluename)
or you could use an inner join -- which might be faster depending on indexes and other issues - the code for that would look like this
JOIN #tablevaluename tv ON AJF_CATMASTER.catmastrcat = tv.catmastrcat

SQL stored procedure - join a software log table with a new task table to create assignments

I am new to joins and feel that it might best suit this need. I have a software log written to a SQL table that I can't modify. I also have a task table that I created so I can assign an administrator the error and that they can investigate it.
I need a way to bring back unique/distinct errors so I initially created a stored procedure to only return distinct errors (with a date range), due to the number of common errors and narrow scope of the investigating:
CREATE PROCEDURE [dbo].[GetUniqueValueNames]
#StartDate datetime = NULL,
#EndDate datetime = NULL
AS
SELECT
ERRORMSG, MAX(ERRORDATE) AS MessageLogDate
FROM
Server.ErrorLog
WHERE
MessageLogDate BETWEEN Coalesce(#StartDate, MessageLogDate) AND Coalesce(#EndDate, MessageLogDate)
GROUP BY
ERRORMSG
My intent: I would like to join these two tables, keeping the distinct error message functionality and be able to tell if a log entry has a task assigned to it.
I was guessing that I need to join on the error message. I'm copying some of the values over to the task, and the error (distinct) is what's been driving this situation.
TaskTable (Database A) LA.TaskTbl
taskID TaskDescription TaskProcess **ErrorMsg** Status ErrorClassification Priority SafetoRestart AssignedUser taskDate
Log Table (Database B) Server.ErrorLog
ID **ERRORMSG** ERRORDATE ERRORITEMNAME FOLIO OBJECTID PROCESSNAME PROCID PROCINSTID PROCSETID
Let me know if anything else is needed.
Thank you.
Instead of a stored procedure, you'll want to use a table-valued function. Syntax varies a bit between DBMSs (it helps to tag queries with the DBMS you are using), but the advantage of table-valued functions is that they can be used in queries like regular tables:
SELECT log.*, task.* -- or whatever you are interested in
FROM LogTable log
JOIN GetUniqueValueNames(...) errors ON errors.ErrorMsg = log.ErrorMsg
LEFT OUTER JOIN TaskTable task ON task.ErrorMsg = log.ErrorMsg
or slightly less efficient by more readable:
SELECT log.*, task.* -- or whatever you are interested in
FROM LogTable log
LEFT OUTER JOIN TaskTable task ON task.ErrorMsg = log.ErrorMsg
WHERE log.ErrorMsg IN (SELECT ErrorMsg from GetUniqueValueNames(...))
An advantage of the first query is that you can also select the MessageLogDate.

Use of inserted and deleted tables for logging - is my concept sound?

I have a table with a simple identity column primary key. I have written a 'For Update' trigger that, among other things, is supposed to log the changes of certain columns to a log table. Needless to say, this is the first time I've tried this.
Essentially as follows:
Declare Cursor1 Cursor for
select a.*, b.*
from inserted a
inner join deleted b on a.OrderItemId = b.OrderItemId
(where OrderItemId is the actual name of the primary identity key).
I then do the usual open the cursor and go into a fetch next loop. With the columns I want to test, I do:
if Update(Field1)
begin
..... do some logging
end
The columns include varchars, bits, and datetimes. It works, sometimes. The problem is that the log function is writing the a and b values of the field to a log and in some cases, it appears that the before and after values are identical.
I have 2 questions:
Am I using the Update function correctly?
Am I accessing the before and after values correctly?
Is there a better way?
If you are using SQL Server 2016 or higher, I would recommend skipping this trigger entirely and instead using system-versioned temporal tables.
Not only will it eliminate the need for (and performance issues around) the trigger, it'll be easier to query the historical data.

How to force reasonable execution plan for query with LIKE statement?

When creating ad-hoc queries to look for information in a table I have run into this issue over and over.
Let's say I have a table with a million records with fields id - int, createddatetime - timestamp, category - varchar(50) and content - varchar(max). I want to find all records in the last day that have a certain string in the content field. If I create a query like this...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
it may complete in a second because in the last day there may only be 100 records so the LIKE clause is only operating on a small number of records
However if I add one more item to the where clause...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
and category = 'testing'
then it could take many minutes to complete while locking up the table.
It appears to be changing from performing all the straight forward WHERE clause items first and then the LIKE on the limited set of records, over to having the LIKE clause first. There are even times where there are multiple LIKE statements and adding one more causes the query to go from a split second to minutes.
The only solutions I've found are to either generate an intermediate table (maybe temp tables would work), insert records based on the basic WHERE clause items, then run a separate query to filter by one or more LIKE statements. I've tried various JOIN and CTE approaches which usually have no improvement. Alternatively CHARINDEX also appears to work though difficult to use if trying to convert the logic of multiple LIKE statements.
Is there any hint or something that can be placed in the query statement to tell sql server to wait until records are filtered by the basic WHERE clause items before filtering by the LIKE?
I actually just tried this approach and it had the same issue...
select *
from (
select *, charindex('something', content) as found
from bounce
where createddatetime > '2018-1-31'
) t
where found > 0
while the subquery independently returns in a couple seconds, the overall query just never returns. Why is this so bad
Not fancy, but I've had better luck with temp tables than nested select statements... It will isolate the first data set, and then you can select just from that. If you're looking for quick and dirty, which usually serves my purposes for ad-hoc, this may help. If this is a permanent stored proc, the indexing suggestions may serve you better in the long run.
select *
into #like
from table
where createddatetime > '2018-1-31'
and content like '%something%'
select *
from #like
where category = 'testing'

Preferred way to access data within XML columns in SQL Server

Background
Recently I've started to use XML a lot more as a column in SQL Server 2005. During a bit of downtime yesterday, I noticed that two of the link tables I used a really just in the way and it bores me to tears having to write yet more supporting structure code for a couple of joins.
To actually generate the data for these two link tables, I pass in two XML fields to my stored procedure, which writes the main record, breaks the two XML variables down into #tables and inserts them into the actual tables with the new SCOPE_IDENTITY() from the master record.
After some though, I decided to just do away with those tables altogether and just store the XML in XML fields. Now I understand there are some pitfalls here, like general querying performance, GROUP BY doesn't work on XML data. And the query is generally a bit of a mess, but overall I like that I can now work with XElement when I get the data back.
Also, this stuff isn't going to get changed. It's a one shot affair, so I don't have to worry about modification.
I am wondering about the best way to actually get at this data. A lot of my queries involve getting a master record based upon the criteria of a child or even a subchild record. Most of the sprocs in the database do this but on a far more elaborate scale, usually requiring UDFs and Subqueries to work effectively but I have knocked up a trivial example to test querying some data...
INSERT INTO Customers VALUES ('Tom', '', '<PhoneNumbers><PhoneNumber Type="1" Value="01234 456789" /><PhoneNumber Type="2" Value="01746 482954" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Andy', '', '<PhoneNumbers><PhoneNumber Type="2" Value="07948 598348" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Mike', '', '<PhoneNumbers><PhoneNumber Type="3" Value="02875 482945" /></PhoneNumbers>')
INSERT INTO Customers VALUES ('Steve', '', '<PhoneNumbers></PhoneNumbers>')
Now I can see two ways of grabbing it.
Method 1
DECLARE #PhoneType INT
SET #PhoneType = 2
SELECT ct.*
FROM Customers ct
WHERE ct.PhoneNumbers.exist('/PhoneNumbers/PhoneNumber[#Type=sql:variable("#PhoneType")]') = 1
Really? sql:variable feels a bit unwholesome. However, it does work. However it's distinctively more difficult to access data in a more meaningful way.
Method 2
SELECT ct.*, pt.PhoneType
FROM Customers ct
CROSS APPLY ct.PhoneNumbers.nodes('/PhoneNumbers/PhoneNumber') AS nums(pn)
INNER JOIN PhoneTypes pt ON pt.ID = nums.pn.value('./#Type[1]', 'int')
WHERE nums.pn.value('./#Type[1]', 'int') = #PhoneType
This is more like it. Already I can easily expand it to do joins and all other good stuff. I've used CROSS APPLY before on a table valued function, and it was very good. The execution plan for this as opposed to the previous query is seriously more advanced. Admittedly I haven't done any indexing and whatnot on these tables, but it's 97% of the entire batch cost.
Method 2 (expanded)
SELECT ct.ID, ct.CustomerName, ct.Notes, pt.PhoneType
FROM Customers ct
CROSS APPLY ct.PhoneNumbers.nodes('/PhoneNumbers/PhoneNumber') AS nums(pn)
INNER JOIN PhoneTypes pt ON pt.ID = nums.pn.value('./#Type[1]', 'int')
WHERE nums.pn.value('./#Type[1]', 'int') IN (SELECT ID FROM PhoneTypes)
Nice IN clause here. I can also do something like pt.PhoneType = 'Work'
Finally
So I'm essentially obtaining the results that I want, but is there anything I should be aware of when using this mechanism to interrogate small amounts of XML data? Will it fall down on performance during elaborate searches? And is the storage of such markup style data too much of an overhead?
Side note
I've used things like sp_xml_preparedocument and OPENXML in the past just to pass lists into sprocs, but this is like a breath of fresh air in comparison!
One approach we've taken for some of our key items of information stored inside an XML column is to "surface" them as computed, persisted properties on the "parent" table. This is done using a little stored function.
It works great, because the value is computed only once every time the XML changes - as long as it's not changing, there's no recomputation, the value is stored on the table like any other column.
It's also great since it can be indexed! So if you're searching and/or joining on such a field - that works like a charm!
So you basically need a stored function along the lines of this:
CREATE FUNCTION [dbo].[GetPhoneNo1](#DataXML XML)
RETURNS VARCHAR(50)
WITH SCHEMABINDING
AS BEGIN
DECLARE #result VARCHAR(20)
SELECT
#result = #DataXML.value('(/PhoneNumbers/PhoneNumber[#Type="1"]/#Value)[1]', 'VARCHAR(50)')
RETURN #result
END
If you don't have a phone number of type 1, you'll just get back a NULL.
Then, you need to extend your parent table with a computed, persisted column:
ALTER TABLE dbo.Customers
ADD PhoneNumberType1 AS dbo.GetPhoneNo1(PhoneNumbers)
As you can see - it works just fine for single entries, but unfortunately, you cannot surface a whole list of properties. But if you have some key items, like ID's or something, that you expect most of your rows to have, this can be a very nice and slick way to get at that information more easily and more efficiently.

Resources