Create report using dynamic SQL - sql-server

I have a table example as such:
State Project Build Type ACTUAL0 ACTUAL1 ACTUAL2
------- ------- ----------- ----------- ----------- ----------
Ohio 154214 Residential 1/5/2013 2/25/2013 7/12/12
Utah 214356 Commercial 7/08/13 6/9/13 7/1/12
I am trying to create a report that takes the column headers beginning with the word actual and get a count of how many dates are less than a specific date. I have a temp table that I create of the column headers beginning with the word actual. This is just and example, there are over 250 columns name actual. So the table looks like this:
MilestoneNmbr
-------------
ACTUAL1
ACTUAL2
ACTUAL3
Now what I think would work is to take the row as a variable for the column header and pass in a date into a function. Here is a function I created:
CREATE FUNCTION [dbo].[GetMSActualCount]
(
#ACTUAL nvarchar(16),
#DATE nvarchar(16)
)
RETURNS int
AS
BEGIN
DECLARE #ACTUALRETURN int
DECLARE #SQL nVarchar(255) =
'SELECT COUNT(' + #ACTUAL + ') AS Expr1
FROM [CASPR_MILESTONES_000-036]
WHERE '+ #ACTUAL +' > ' + #DATE
exec sp_executesql #SQL, N''
SET #ACTUALRETURN = #SQL
-- Return the result of the function
RETURN #ACTUALRETURN
END
If I run the following query:
DECLARE #DATE varchar(20)
SET #DATE = '''1/1/2013'''
SELECT MilestoneNmbr, dbo.getMSActualCount(milestonenmbr,#Date) from #List_CASPR_Milestones
So my error is that I can't use dynamic SQL in a function. With that being so, what can I do? My easy query here I think will turn into hundreds of lines. Is there another easy way I can do this?
EDIT:
My results I am looking for is something like this:
MilestoneNmbr CountofDate
--------------- ------------
ACTUAL1 200
ACTUAL2 344
ACTUAL3 400

You are right you can't use dynamic SQL in a function. There are two answers:
First your table with 250 columns ACTUAL plus a number is a nightmare. You can't use any of the built in stuff that SQL does well to help. You should have two tables. First a projects table that has an ID column plus columns for State, Project, and BuildType. Then a table of ProjectDates with a ProjectID column that references the first table and then a column for ActualDate. Reporting from that should be easy.
Given that you probably can't fix the structure try writing a stored procedure. That can use dynamic SQL. Event better is that your stored procedure can create temp tables like above and then use them to do statistics.

I agree 100% with Charles. If you CAN change the structure this is what I would do:
If possible have a build type table (ID/Build Type), don't have text columns unless you need them as text for something. Anything that can be coded, code it.
The two tables:
project header (Proj_ID (long_int)/State (int or char(2)) / build_type (int)), primary key either Proj_id by itself or a new ID if its not unique (as a PK Proj_id & State would not be too useful as a PK).
Project_date (Proj_ID (same as PK above) / Date_ID (int) / Actual_Date (DateTime))
So your second example would be:
Project_Header:
214356 / UT / 2 (being 1 Residential, 2 Commercial, 3 Industrial ...)
Project_Date:
214356 / 0 / '07/08/13'
214356 / 1 / '06/09/13'
214356 / 2 / '07/01/12'
Latest build date by project would be:
Select 'Actual_date'
from Project_date
where Project_id='nnn'
order by date_id DESC
Limit 1;
Your query would be something like (if the dates are in incremental order):
Select Project_id, max(Date_id)
From Project_date
Group by Project_id
having Actual_date < #date
you can see it's pretty straight forward.
If you CAN'T change the structures but you CAN make new tables I would make an SP that takes that ugly table and generates the Project_Date x times per day ( or you could even tie it to a trigger on inert/update of the first table) and the Project_header once per day (or more often if needed). This would take considerably less time and effort than what you are attempting, plus you could use it for other queries.

To solve this I created a table housing the ACTUAL dates. I then went and looped through each row in the List_ACTUAL table to get the names and select the count of the dates names greater than the variable I pass in. I will be converting this to a PROC. This is how:
DECLARE #MS nvarchar(16)
DECLARE MSLIST CURSOR LOCAL FOR SELECT MilstoneNmbr FROM List_ACTUAL
DECLARE #SQL nvarchar(max)
DECLARE #DATE nvarchar(16)
SET #DATE = '1/1/2013'
CREATE #TEMP (Milestones nvarchar(16), Frozen int)
OPEN MSLIST
FETCH NEXT FROM MSLIST INTO #MS
WHILE ##FETCH_STATUS=0
BEGIN
SELECT #SQL = 'INSERT INTO #TEMP VALUES (''' +#MS+ ''', (Select count(' +#MS+ ') FROM PROJECTDATA WHERE ' +#MS+ ' > ''' + #Date + '''))'
EXEC sp_executesql #SQL, N''
FETCH NEXT FROM MSLIST INTO #MS
END
CLOSE MSLIST
DEALLOCATE MSLIST
Hope this helps someone.

Related

Can you cancel a BULK INSERT of all VARCHARs when a line's field count is incorrect?

I'm using a BULK INSERT to load delimited .txt files into a staging table with 5 columns. The .txt files can sometimes contain errors and have more/less than 5 fields per line. If this happens, is it possible to detect it and cancel the entire BULK INSERT?
Each table column is of type VARCHAR. This was done because header (H01) and line (L0101, L0102, etc...) rows contain fields with different types. Because of this, setting MAXERRORS = 0 doesn't seem to be working as there are technically no syntax errors. As a result the transaction is committed, the catch block never activates and the rollback doesn't occur. Lines still get inserted into the table incorrectly shifted or bunched.
Expected .txt file format:
H01|Order|Date|Name|Address
L0101|Order|Part|SKU|Qty
L0102|Order|Part||Qty <-- Fields can be blank
L0103|Order|Part|SKU|Qty
Incorrect .txt file example:
H01|Order|Date|Name|Address
L0101|Order||Part|SKU|Qty <-- Extra field in the middle
||L0102|Order|Part|SKU|Qty <-- Extra fields at the beginning
L0103|Order|Part|SKU|Qty|| <-- Extra fields at the end
Code:
CREATE TABLE #TempStage (
Column1 VARCHAR(255) NULL
,Column2 VARCHAR(255) NULL
,Column3 VARCHAR(255) NULL
,Column4 VARCHAR(255) NULL
,Column5 VARCHAR(255) NULL
)
DECLARE
#dir SYSNAME
,#fname SYSNAME
,#SQL_BULK VARCHAR(255)
SELECT
#dir = '\\sharedfolder\'
,#fname = 'testOrder.txt'
SET #SQL_BULK =
'BULK INSERT #TempStage FROM ''' + #dir + #fname + ''' WITH
(
FIRSTROW = 1,
DATAFILETYPE=''char'',
FIELDTERMINATOR = ''|'',
ROWTERMINATOR = ''0x0a'',
KEEPNULLS,
MAXERRORS = 0
)'
BEGIN TRY
BEGIN TRANSACTION
EXEC (#SQL_BULK)
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
END CATCH
SELECT * FROM #TempStage
DROP TABLE #TempStage
Expected output:
Column1
Column2
Column3
Column4
Column5
H01
Order
Date
Name
Address
L0101
Order
Part
SKU
Qty
L0102
Order
Part
NULL
Qty
L0103
Order
Part
SKU
Qty
Incorrect output, would like to cancel so this doesn't happen (\ = pipe):
Column1
Column2
Column3
Column4
Column5
H01
Order
Date
Name
Address
L0101
Order
NULL
Part
SKU \ Qty
NULL
NULL
L0102
Order
Part \ SKU \ QTY
L0103
Order
Part
SKU
Qty\ |
SQL Server 2016, 13.0.1742.0
As many before have noted: BULK INSERT is fast, but not very flexible, especially to column inconsistencies.
When your input might have bad data (and technically, from a SQL standpoint that is what you are describing), you have to employ one or more of some different approaches:
Pre-process and "clean" the data with an external program first, or
BULK INSERT to a staging table with one big VARCHAR(MAX) column, and then parse and clean the data yourself with SQL before moving it into tables with your real columns, or
Use CLR code/tricks to effectively to (1) and/or (2) above, or
Write an external program to simultaneously clean/pre-process and SqlBulkCopy the data into your SQL Server (replacing BULK INSERT), or
Use SSMS instead (still pretty hard to deal with bad/variable columns though)
I have done all of these at one time or another during my career, and they are all somewhat difficult and time-consuming (the work was time-consuming, their run-times were pretty good).
I've created a basic way to do what I needed:
After loading the staging table, check for any instance of '|' (or whatever delimiter you're using), and raise an error if found.
IF EXISTS(SELECT * FROM #TempStage WHERE
Column1 LIKE '%|%'
OR Column2 LIKE '%|%'
OR Column3 LIKE '%|%'
OR Column4 LIKE '%|%'
OR Column5 LIKE '%|%'
)
RAISERROR('Incorrect file formatting; Pipe character found.', 16, 1);

Convert string date to datetime on sql server

i've have something like this:
.
I need to concatenate and convert the DATA and ORA fields into one because I'll insert those in another table with just one field.
My problem is to convert them 'cause I've not found any format good for making it.
Also the customer uses "Italian month" like in the photo... Apr it's "Aprile" (April)
Does someone have a possible solution?
I can't actually modify the format of the two fields unfortunately.
EDIT: the table fields are VARCHAR(MAX), the point is i need to make an insert into another table where the "date" field is in datetime format, and the year it's supposed to be always the current one
EDIT 2: i create and drop this small table every time, and data is brought in by a bulk insert from a .csv
EDIT 3: i'm sorry but i'ts my first question =)...btw the output should be like this table here with the "DATA" in datetime format
EDIT 4: DDL:
create table notaiTESTCSV(
NUMERO_FINANZIAMENTO varchar(MAX),
DATA varchar(MAX),
ORA varchar(MAX),
)
EDIT 5: this is how i take data from csv:
bulk insert notaiTESTCSV from 'path\SPEDIZIONE NOTAI.csv' with
(firstrow = 2,fieldterminator = ';', rowterminator ='
')
The customer uses "Italian month" like in the photo
PS: sorry for my bad English it's not my first language
Thank you in advance!
SQL Server is remarkably robust in the ways it can manage datetime data. This gets ugly by the end, so I tried to break it down some to show what it's doing in steps.
Here's what each piece does by itself:
DECLARE #data varchar(100) = '19-apr',
#ora varchar(100) = '9,00',
#dt datetime,
#tm datetime;
--The date component
SET #data = CONCAT(#data,'-',CAST(YEAR(GETDATE()) AS VARCHAR(4)));
SET #dt = CAST(#data as DATETIME);
--The time component
SET #ora = CONCAT(REPLACE(#ora,',',':'),':00');
SET #tm = CAST(#ora as DATETIME);
Then a little help from our friends, showing that math works:
How to combine date from one field with time from another field - MS SQL Server
SELECT #dt + #tm AS [MathWorks];
Results:
+-------------------------+
| MathWorks |
+-------------------------+
| 2018-04-19 09:00:00.000 |
+-------------------------+
Bringing it all into one statement
DECLARE #data varchar(100) = '19-apr',
#ora varchar(100) = '9,00';
SELECT CAST(CONCAT(#data,'-',CAST(YEAR(GETDATE()) AS VARCHAR(4))) as DATETIME)
+
CAST(CONCAT(REPLACE(#ora,',',':'),':00') as DATETIME) AS [CombinedDateTime]
Results:
+-------------------------+
| CombinedDateTime |
+-------------------------+
| 2018-04-19 09:00:00.000 |
+-------------------------+

SQL use a variable as TABLE NAME in a FROM

We install our database(s) to different customers and the name can change depending on the deployment.
What I need to know is if you can use a variable as a table name.
The database we are in is ****_x and we need to access ****_m.
This code is part of a function.
I need the #metadb variable to be the table name - Maybe using dynamic SQL with
sp_executesql. I am just learning so take it easy on me.
CREATE FUNCTION [dbo].[datAddSp] (
#cal NCHAR(30) -- calendar to use to non-working days
,#bDays INT -- number of business days to add or subtract
,#d DATETIME
)
RETURNS DATETIME
AS
BEGIN
DECLARE #nDate DATETIME -- the working date
,#addsub INT -- factor for adding or subtracting
,#metadb sysname
SET #metadb = db_name()
SET #metadb = REPLACE (#metadb,'_x','_m')
SET #metadb = CONCAT (#metadb,'.dbo.md_calendar_day')
SET #ndate = #d
IF #bdays > 0
SET #addsub = 1
ELSE
SET #addsub = -1
IF #cal = ' ' OR #cal IS NULL
SET #cal = 'CA_ON'
WHILE #bdays <> 0 -- Keep adding/subtracting a day until #bdays becomes 0
BEGIN
SELECT #ndate = dateadd(day, 1 * #addsub, #ndate) -- increase or decrease #ndate
SELECT #bdays = CASE
WHEN (##datefirst + datepart(weekday, #ndate)) % 7 IN (0, 1) -- ignore if it is Sat or Sunday
THEN #bdays
WHEN ( SELECT 1
FROM #metadb -- **THIS IS WHAT I NEED** (same for below) this table holds the holidays
WHERE mast_trunkibis_m.dbo.md_calendar_day.calendar_code = #cal AND mast_trunkibis_m.dbo.md_calendar_day.calendar_date = #nDate AND mast_trunkibis_m.dbo.md_calendar_day.is_work_day = 0
) IS NOT NULL -- ignore if it is in the holiday table
THEN #bdays
ELSE #bdays - 1 * #addsub -- incr or decr #ndate
END
END
RETURN #nDate
END
GO
The best way to do this, if you aren't stuck with existing structures is to keep all of the table structures and names the same, simply create a schema for each customer and build out the tables in the schema. For example, if you have the companies: Global Trucking and Super Store you would create a schema for each of those companies: GlobalTrucking and SuperStore are now your schemas.
Supposing you have products and payments tables for a quick example. You would create those tables in each schema so you end up with something that looks like this:
GlobalTrucking.products
GlobalTrucking.payments
and
SuperStore.products
SuperStore.payments
Then in the application layer, you specify the default schema name to use in the connection string for queries using that connection. The web site or application for Global Trucking has the schema set to GlobalTrucking and any query like: SELECT * FROM products; would actually automatically be SELECT * FROM GlobalTrucking.products; when executed using that connection.
This way you always know where to look in your tables, and each customer is in their own segregated space, with the proper user permissions they will never be able to accidentally access another customers data, and everything is just easier to navigate.
Here is a sample of what your schema/user/table creation script would look like (this may not be 100% correct, I just pecked this out for a quick example, and I should mention that this is the Oracle way, but SQL Server should be similar):
CREATE USER &SCHEMA_NAME IDENTIFIED BY temppasswd1;
CREATE SCHEMA AUTHORIZATION &SCHEMA_NAME
CREATE TABLE "&SCHEMA_NAME".products
(
ProductId NUMBER,
Description VARCHAR2(50),
Price NUMBER(10, 2),
NumberInStock NUMBER,
Enabled VARCHAR2(1)
)
CREATE TABLE "&SCHEMA_NAME".payments
(
PaymentId NUMBER,
Amount NUMBER(10, 2),
CardType VARCHAR2(2),
CardNumber VARCHAR2(15),
CardExpire DATE,
PaymentTimeStamp TIMESTAMP,
ApprovalCode VARCHAR2(25)
)
GRANT SELECT ON "&SCHEMA_NAME".products TO &SCHEMA_NAME
GRANT SELECT ON "&SCHEMA_NAME".payments TO &SCHEMA_NAME
;
However, with something like the above, you only have 1 script that you need to keep updated for automation of adding new customers. When you run this, the &SCHEMA_NAME variable will be populated with whatever you choose for the new customer's username/schemaname, and an identical table structure is created every time.

Creating custom order by SQL Server

Customer wants to display a list of values returned by my query in a specific order. The issue is ordering simply by asc or desc is not giving me what the customer want. DBA doesn't want me to hard code values. Is there a way to custom sort without hard coding values? Because the values will change every year and would have to maintain/update it every year.
Table Structure:
Column: CMN_CodesID (unique), Name (is what I'd like to display in custom order)
something like this.
order by case when Jack then 1
when Apple then 2
when Orange then 3
...
End
You could use dynamic sql in a stored procedure and pass #MyOrderBy into it as a parameter (Added to this example for illustration).
DECLARE #MyOrderBy VARCHAR(300)
SELECT #MyOrderBy = 'case when myfield = ''Jack'' then 1 when myfield = ''Apple'' then 2 when myfield = ''Orange'' then 3 else 4 End'
DECLARE #sSQL VARCHAR(300)
SELECT #sSQL = 'SELECT * FROM mytable ORDER BY ' + #MyOrderBy
EXEC(#sSQL)

Speeding Up SSIS Package (Insert and Update)

Referred here by #sqlhelp on Twitter (Solved - See the solution at the end of the post).
I'm trying to speed up an SSIS package that inserts 29 million rows of new data, then updates those rows with 2 additional columns. So far the package loops through a folder containing files, inserts the flat files into the database, then performs the update and archives the file. Added (thanks to #billinkc): the SSIS order is Foreach Loop, Data Flow, Execute SQL Task, File Task.
What doesn't take long: The loop, the file move and truncating the tables (stage).
What takes long: inserting the data, running the statement below this:
UPDATE dbo.Stage
SET Number = REPLACE(Number,',','')
## Heading ##
-- Creates temp table for State and Date
CREATE TABLE #Ref (Path VARCHAR(255))
INSERT INTO #Ref VALUES(?)
-- Variables for insert
DECLARE #state AS VARCHAR(2)
DECLARE #date AS VARCHAR(12)
SET #state = (SELECT SUBSTRING(RIGHT([Path], CHARINDEX('\', REVERSE([Path]))-1),12,2) FROM #Ref)
SET #date = (SELECT SUBSTRING(RIGHT([Path], CHARINDEX('\', REVERSE([Path]))-1),1,10) FROM #Ref)
SELECT #state
SELECT #date
-- Inserts the values into main table
INSERT INTO dbo.MainTable (Phone,State,Date)
SELECT d.Number, #state, #date
FROM Stage d
-- Clears the Reference and Stage table
DROP TABLE #Ref
TRUNCATE TABLE Stage
Note that I've toyed with upping Rows per batch on the insert and Max insert commit size, but neither have affected the package speed.
Solved and Added:
For those interested in the numbers: the OP package time was 11.75 minutes; with William's technique (see below this) it's dropped to 9.5 minutes. Granted, with 29 million rows and on a slower server, this can be expected, but hopefully that shows you the actual data behind how effective this is. The key is to keep as many processes happening on the Data Flow task as possible, as the updating data (after the data flow), consumed a signficant portion of time.
Hopefully that helps anyone else out there with a similar problem.
Update two: I added an IF statement and that reduced it from 9 minutes to 4 minutes. Final code for Execute SQL Task:
-- Creates temp table for State and Date
CREATE TABLE #Ref (Path VARCHAR(255))
INSERT INTO #Ref VALUES(?)
DECLARE #state AS VARCHAR(2)
DECLARE #date AS VARCHAR(12)
DECLARE #validdate datetime
SET #state = (SELECT SUBSTRING(RIGHT([Path], CHARINDEX('\', REVERSE([Path]))-1),12,2) FROM #Ref)
SET #date = (SELECT SUBSTRING(RIGHT([Path], CHARINDEX('\', REVERSE([Path]))-1),1,10) FROM #Ref)
SET #validdate = DATEADD(DD,-30,getdate())
IF #date < #validdate
BEGIN
TRUNCATE TABLE dbo.Stage
TRUNCATE TABLE #Ref
END
ELSE
BEGIN
-- Inserts new values
INSERT INTO dbo.MainTable (Number,State,Date)
SELECT d.Number, #state, #date
FROM Stage d
-- Clears the Reference and Stage table after the insert
DROP TABLE #Ref
TRUNCATE TABLE Stage
END
As I understand it, you are Reading ~ 29,000,000 rows from flat files and writing them into a staging table, then running a sql script that updates (reads/writes) the same 29,000,000 rows in the staging table, then moves those 29,000,000 records (read from staging then write to nat) to the final table.
Couldn't you Read from your flat files, use SSIS transfomations to clean your data and add your two additional columns, then write directly into the final table. You would only then work on each distinct set of data once rather than the three (six if you count reads and writes as distinct) times that your process does?
I would change your data flow to transform in process the needed items and write directly into my final table.
edit
From the SQL in your question it appears you are transforming the data by removing comma's from the PHONE field, and then retrieving the STATE and the Date from specific portions of the file path that the currently processed file is in, then storing those three data points into the NAT table. Those things can be done with the derived column transformation in your Data Flow.
For the State and Date columns, set up two new variables called State and Date. Use expressions in the variable definition to set them to the correct values (like you did in your SQL). When the Path variable updates (in your loop, I assume). the State and Date variables will update as well.
In the Derived Column Transformation, drag the State Variable into the Expression field and create a new column called State.
Repeat for Date.
For the PHONE column, in the Derived Column transforamtion create an expression like the following:
REPLACE( [Phone], ",", "" )
Set the Derived Column field to Replace 'Phone'
For your output, create a destination to your NAT table and link Phone, State, and Date columns in your data flow to the appropriate columns in the NAT table.
If there are additional columns in your input, you can choose not to bring them in from your source, since it appears that you are only acting on the Phone column from the original data.
/edit

Resources