Importing Grouped Report Data to Database - sql-server

My company receives data from a client that is unable to provide data in any direct format so we have to import several reports that are in a grouped layout like the one below. We have to develop in house methods to ungroup the report and then import the data to get all of the data we need. Currently a member on my team is using MS Access / VBA to generate the needed detail records but I want to move this to a server based and automated process. We are using SQL Server 2008R2 for storage and I would like to use SSIS to accomplish the task. Does anyone know of a way I can generate the detail records and import the data directly into SQL Server?

Hmm - well you will definitely have to do some programmatic adjusting of the data set to add that group date to the detail line. I'm unsure of how you will be importing the xlsx but I would recommend first off just using a SSIS package and then doing the adjustments in a script task as the "best" way to do this. See here on how to handle Excel in SSIS Script tasks.
If you don't know SSIS or especially programming though, you're next best bet (in my opinion) is to just import the data into a staging table, do the manipulations with T-SQL and then insert that table into your main table. I did a SQL Fiddle of this here.
CREATE TABLE ActivitySummary
(
id int identity(1,1),
activity_date date,
activity varchar(100),
paid_time decimal(5,2),
unpaid_time decimal(5,2),
total_time decimal(5,2)
)
CREATE TABLE ActivitySummary_STG
(
id int identity(1,1),
activity_date date,
activity varchar(100),
paid_time decimal(5,2),
unpaid_time decimal(5,2),
total_time decimal(5,2)
)
GO
-- Simulate import of Excel sheet into staging table
truncate table ActivitySummary_STG;
GO
INSERT INTO ActivitySummary_STG (activity_date, activity, paid_time, unpaid_time, total_time)
select '8/14/17',null,null,null,null
UNION ALL
select null,'001 Lunch',0,4.4,4.4
UNION ALL
select null,'002 Break',4.2,0,4.2
UNION ALL
select null,'007 System Down',7.45,0,7.45
UNION ALL
select null,'019 End of Work Day',0.02,0,0.02
UNION ALL
select '8/15/17',null,null,null,null
UNION ALL
select null,'001 Lunch',0,4.45,4.45
UNION ALL
select null,'002 Break',6.53,0,6.53
UNION ALL
select null,'007 System Down',0.51,0,0.51
UNION ALL
select null,'019 End of Work Day',0.02,0,0.02
GO
-- Code to massage data
declare #table_count int = (select COALESCE(count(id),0) from ActivitySummary_STG);
declare #counter int = 1;
declare #activity_date date,
#current_date date;
WHILE (#table_count > 0 AND #counter <= #table_count)
BEGIN
select #activity_date = activity_date
from ActivitySummary_STG
where id = #counter;
if (#activity_date is not null)
BEGIN
set #current_date = #activity_date;
delete from ActivitySummary_STG
where id = #counter;
END
else
BEGIN
update ActivitySummary_STG SET
activity_date = #current_date
where id = #counter;
END
set #counter += 1;
END
INSERT INTO ActivitySummary (activity_date, activity, paid_time, unpaid_time, total_time)
select activity_date, activity, paid_time, unpaid_time, total_time
from ActivitySummary_STG;
truncate table ActivitySummary_STG;
GO
select * from ActivitySummary;

I'd do it with a script component.
Total Data Flow:
ExcelSource --> Script Component (Tranformation) --> Conditional Split --> SQL Destination
In script component:
Check accountSummary on InputColumns
Add ActivityDate as output column.
Open Script:
outside of your row processing.
Add:
public datetime dte;
inside row processing:
if (DateTime.TryParse(Row.ActivitySummary.ToString()))
{dte=DateTime.Parse(Row.ActivitySummary.ToString());}
else
{Row.ActivityDate = dte;}
Then add a Conditional Split to remove null Activity Dates

Related

T-SQL, SSRS: Set up automatic daily Inserts into Table

I'm using SQL Server 2012.
SSMS 11.0.6248.0.
I want to create an automated way of Inserting data [using a T-SQL insert statement] into a SQL Server table before users start using the system [third-party business system] each morning.
I do a lot of SSRS reporting and creating subscriptions; know how to do inserts using T-SQL, and I am familiar with stored procedures, but I have not had to automate something like this strictly within SQL Server.
Can I make this happen on a schedule - strictly in the SQL Server realm [i.e. using SSRS ... or a stored procedure ... or a function]?
Example Data to read:
Declare #t Table
(
DoctorName Varchar(1),
AppointmentDate Date,
Appointments Int
)
Insert Into #t select 'A','2018-10-23', 5
Insert Into #t select 'B','2018-10-23', 5
Insert Into #t select 'C','2018-10-23', 5
Insert Into #t select 'D','2018-10-23', 5
Insert Into #t select 'E','2018-10-23', 5
Insert Into #t select 'F','2018-10-23', 5
Insert Into #t select 'G','2018-10-23', 5
Insert Into #t select 'H','2018-10-23', 5
Insert Into #t select 'I','2018-10-23', 5;
Select * From #t
The value in Appointments changes through the day as Doctors see patients. Patients may cancel. Patients may walk in. Typically, at the end of the day Doctors end up seeing more patients than they have scheduled at the start of the day. [I set the number at 5 for all Doctors at the start of the above day].
I want to capture the data as it is at the start of each day - before the Clinic opens and the numbers change - and store it in another Table for historic reporting.
I hope this simplified example clarifies what I want to do.
I would appreciate any suggestions on how I might best go about doing this.
Thanks!
This sounds like a job for the SQL Server Agent. A more specific suggestion will require a more detailed description of what you're doing (with sample data, preferably).
You can use SSIS to create a job that you can then schedule. Since you are familiar with stored procedures, you would create your SP first then in SSIS add a Control Flow of Execute SQL Task and configure it according to your needs.
If that doesn't work for you, you could create an application to run on a Timer that executes your SP, however, since you want to stay in the SQL realm, SSIS is the place to look.

SQL use a variable as TABLE NAME in a FROM

We install our database(s) to different customers and the name can change depending on the deployment.
What I need to know is if you can use a variable as a table name.
The database we are in is ****_x and we need to access ****_m.
This code is part of a function.
I need the #metadb variable to be the table name - Maybe using dynamic SQL with
sp_executesql. I am just learning so take it easy on me.
CREATE FUNCTION [dbo].[datAddSp] (
#cal NCHAR(30) -- calendar to use to non-working days
,#bDays INT -- number of business days to add or subtract
,#d DATETIME
)
RETURNS DATETIME
AS
BEGIN
DECLARE #nDate DATETIME -- the working date
,#addsub INT -- factor for adding or subtracting
,#metadb sysname
SET #metadb = db_name()
SET #metadb = REPLACE (#metadb,'_x','_m')
SET #metadb = CONCAT (#metadb,'.dbo.md_calendar_day')
SET #ndate = #d
IF #bdays > 0
SET #addsub = 1
ELSE
SET #addsub = -1
IF #cal = ' ' OR #cal IS NULL
SET #cal = 'CA_ON'
WHILE #bdays <> 0 -- Keep adding/subtracting a day until #bdays becomes 0
BEGIN
SELECT #ndate = dateadd(day, 1 * #addsub, #ndate) -- increase or decrease #ndate
SELECT #bdays = CASE
WHEN (##datefirst + datepart(weekday, #ndate)) % 7 IN (0, 1) -- ignore if it is Sat or Sunday
THEN #bdays
WHEN ( SELECT 1
FROM #metadb -- **THIS IS WHAT I NEED** (same for below) this table holds the holidays
WHERE mast_trunkibis_m.dbo.md_calendar_day.calendar_code = #cal AND mast_trunkibis_m.dbo.md_calendar_day.calendar_date = #nDate AND mast_trunkibis_m.dbo.md_calendar_day.is_work_day = 0
) IS NOT NULL -- ignore if it is in the holiday table
THEN #bdays
ELSE #bdays - 1 * #addsub -- incr or decr #ndate
END
END
RETURN #nDate
END
GO
The best way to do this, if you aren't stuck with existing structures is to keep all of the table structures and names the same, simply create a schema for each customer and build out the tables in the schema. For example, if you have the companies: Global Trucking and Super Store you would create a schema for each of those companies: GlobalTrucking and SuperStore are now your schemas.
Supposing you have products and payments tables for a quick example. You would create those tables in each schema so you end up with something that looks like this:
GlobalTrucking.products
GlobalTrucking.payments
and
SuperStore.products
SuperStore.payments
Then in the application layer, you specify the default schema name to use in the connection string for queries using that connection. The web site or application for Global Trucking has the schema set to GlobalTrucking and any query like: SELECT * FROM products; would actually automatically be SELECT * FROM GlobalTrucking.products; when executed using that connection.
This way you always know where to look in your tables, and each customer is in their own segregated space, with the proper user permissions they will never be able to accidentally access another customers data, and everything is just easier to navigate.
Here is a sample of what your schema/user/table creation script would look like (this may not be 100% correct, I just pecked this out for a quick example, and I should mention that this is the Oracle way, but SQL Server should be similar):
CREATE USER &SCHEMA_NAME IDENTIFIED BY temppasswd1;
CREATE SCHEMA AUTHORIZATION &SCHEMA_NAME
CREATE TABLE "&SCHEMA_NAME".products
(
ProductId NUMBER,
Description VARCHAR2(50),
Price NUMBER(10, 2),
NumberInStock NUMBER,
Enabled VARCHAR2(1)
)
CREATE TABLE "&SCHEMA_NAME".payments
(
PaymentId NUMBER,
Amount NUMBER(10, 2),
CardType VARCHAR2(2),
CardNumber VARCHAR2(15),
CardExpire DATE,
PaymentTimeStamp TIMESTAMP,
ApprovalCode VARCHAR2(25)
)
GRANT SELECT ON "&SCHEMA_NAME".products TO &SCHEMA_NAME
GRANT SELECT ON "&SCHEMA_NAME".payments TO &SCHEMA_NAME
;
However, with something like the above, you only have 1 script that you need to keep updated for automation of adding new customers. When you run this, the &SCHEMA_NAME variable will be populated with whatever you choose for the new customer's username/schemaname, and an identical table structure is created every time.

sql server after insert trigger update 3 fields of inserted row, ONLY for specific users and ONLY if fields are blank

I don't have access to the Insert Statement so if they are blank, I don't know if the blank fields are even part of the Insert statement to begin with. Office users and Field (tablet) users insert Work Order records using different applications. To keep the field users from having to populate their Crew Name, Supervisor's name and Shop Name on every record, I've put them into a lookup table keyed on the INITIATEDBY field from the Work Order record (which is auto populated by the app). Office workers may be creating Work orders for anyone but Field Crews only create work orders for their crews so when a Field crew inserts a record I want to populate the 3 fields for them. (Actually they cannot populate the 3 fields because I have hidden them on the Work Order form they use.)
Your trigger code needs to be a set based approach. In the answer you posted you assume there will only be a single row in inserted. Something like this more along the lines of what you want.
This would be the ENTIRE body of the trigger.
Update w
set Crew = tu.Crew
, SHOP = tu.Shop
, Supervisor = tu.Supervisor
from inserted i
join TableUsers tu on tu.EmpName = i.INITBY
join WorkOrder w on i.ID = w.WOID
I figures it out myself. I just had to read enough examples to put it all together. The only thing that scares me is the IF #EMPName <> ''. Is there a better way to check if a record was retrieved in the 2nd select statement?
CREATE TRIGGER trgUpdateCrewShopSuper ON ESDWO
FOR INSERT
AS
BEGIN
DECLARE #ID nvarchar(60), #InitBy nvarchar(50), #EMPName nvarchar(50), #CREW as nvarchar(50), #Shop nvarchar(100), #Super nvarchar(50);
select #ID=i.ID, #InitBy=i.INITBY from inserted i;
select #EMPName=EmpName, #CREW=CrewName, #Shop=SHOP, #Super=Supervisor
From dbo.TabletUsers
WHERE EmpName = #InitBy
IF #EMPName <> ''
BEGIN
update dbo.WorkOrder
set Crew = #Crew, SHOP = #Shop, Supervisor = #Super
WHERE WOID = #ID
END
END

Speeding Up SSIS Package (Insert and Update)

Referred here by #sqlhelp on Twitter (Solved - See the solution at the end of the post).
I'm trying to speed up an SSIS package that inserts 29 million rows of new data, then updates those rows with 2 additional columns. So far the package loops through a folder containing files, inserts the flat files into the database, then performs the update and archives the file. Added (thanks to #billinkc): the SSIS order is Foreach Loop, Data Flow, Execute SQL Task, File Task.
What doesn't take long: The loop, the file move and truncating the tables (stage).
What takes long: inserting the data, running the statement below this:
UPDATE dbo.Stage
SET Number = REPLACE(Number,',','')
## Heading ##
-- Creates temp table for State and Date
CREATE TABLE #Ref (Path VARCHAR(255))
INSERT INTO #Ref VALUES(?)
-- Variables for insert
DECLARE #state AS VARCHAR(2)
DECLARE #date AS VARCHAR(12)
SET #state = (SELECT SUBSTRING(RIGHT([Path], CHARINDEX('\', REVERSE([Path]))-1),12,2) FROM #Ref)
SET #date = (SELECT SUBSTRING(RIGHT([Path], CHARINDEX('\', REVERSE([Path]))-1),1,10) FROM #Ref)
SELECT #state
SELECT #date
-- Inserts the values into main table
INSERT INTO dbo.MainTable (Phone,State,Date)
SELECT d.Number, #state, #date
FROM Stage d
-- Clears the Reference and Stage table
DROP TABLE #Ref
TRUNCATE TABLE Stage
Note that I've toyed with upping Rows per batch on the insert and Max insert commit size, but neither have affected the package speed.
Solved and Added:
For those interested in the numbers: the OP package time was 11.75 minutes; with William's technique (see below this) it's dropped to 9.5 minutes. Granted, with 29 million rows and on a slower server, this can be expected, but hopefully that shows you the actual data behind how effective this is. The key is to keep as many processes happening on the Data Flow task as possible, as the updating data (after the data flow), consumed a signficant portion of time.
Hopefully that helps anyone else out there with a similar problem.
Update two: I added an IF statement and that reduced it from 9 minutes to 4 minutes. Final code for Execute SQL Task:
-- Creates temp table for State and Date
CREATE TABLE #Ref (Path VARCHAR(255))
INSERT INTO #Ref VALUES(?)
DECLARE #state AS VARCHAR(2)
DECLARE #date AS VARCHAR(12)
DECLARE #validdate datetime
SET #state = (SELECT SUBSTRING(RIGHT([Path], CHARINDEX('\', REVERSE([Path]))-1),12,2) FROM #Ref)
SET #date = (SELECT SUBSTRING(RIGHT([Path], CHARINDEX('\', REVERSE([Path]))-1),1,10) FROM #Ref)
SET #validdate = DATEADD(DD,-30,getdate())
IF #date < #validdate
BEGIN
TRUNCATE TABLE dbo.Stage
TRUNCATE TABLE #Ref
END
ELSE
BEGIN
-- Inserts new values
INSERT INTO dbo.MainTable (Number,State,Date)
SELECT d.Number, #state, #date
FROM Stage d
-- Clears the Reference and Stage table after the insert
DROP TABLE #Ref
TRUNCATE TABLE Stage
END
As I understand it, you are Reading ~ 29,000,000 rows from flat files and writing them into a staging table, then running a sql script that updates (reads/writes) the same 29,000,000 rows in the staging table, then moves those 29,000,000 records (read from staging then write to nat) to the final table.
Couldn't you Read from your flat files, use SSIS transfomations to clean your data and add your two additional columns, then write directly into the final table. You would only then work on each distinct set of data once rather than the three (six if you count reads and writes as distinct) times that your process does?
I would change your data flow to transform in process the needed items and write directly into my final table.
edit
From the SQL in your question it appears you are transforming the data by removing comma's from the PHONE field, and then retrieving the STATE and the Date from specific portions of the file path that the currently processed file is in, then storing those three data points into the NAT table. Those things can be done with the derived column transformation in your Data Flow.
For the State and Date columns, set up two new variables called State and Date. Use expressions in the variable definition to set them to the correct values (like you did in your SQL). When the Path variable updates (in your loop, I assume). the State and Date variables will update as well.
In the Derived Column Transformation, drag the State Variable into the Expression field and create a new column called State.
Repeat for Date.
For the PHONE column, in the Derived Column transforamtion create an expression like the following:
REPLACE( [Phone], ",", "" )
Set the Derived Column field to Replace 'Phone'
For your output, create a destination to your NAT table and link Phone, State, and Date columns in your data flow to the appropriate columns in the NAT table.
If there are additional columns in your input, you can choose not to bring them in from your source, since it appears that you are only acting on the Phone column from the original data.
/edit

SQL Server best way to calculate datediff between current row and next row?

I've got the following rough structure:
Object -> Object Revisions -> Data
The Data can be shared between several Objects.
What I'm trying to do is clean out old Object Revisions. I want to keep the first, active, and a spread of revisions so that the last change for a time period is kept. The Data might be changed a lot over the course of 2 days then left alone for months, so I want to keep the last revision before the changes started and the end change of the new set.
I'm currently using a cursor and temp table to hold the IDs and date between changes so I can select out the low hanging fruit to get rid of. This means using #LastID, #LastDate, updates and inserts to the temp table, etc...
Is there an easier/better way to calculate the date difference between the current row and the next row in my initial result set without using a cursor and temp table?
I'm on sql server 2000, but would be interested in any new features of 2005, 2008 that could help with this as well.
Here is example SQL. If you have an Identity column, you can use this instead of "ActivityDate".
SELECT DATEDIFF(HOUR, prev.ActivityDate, curr.ActivityDate)
FROM MyTable curr
JOIN MyTable prev
ON prev.ObjectID = curr.ObjectID
WHERE prev.ActivityDate =
(SELECT MAX(maxtbl.ActivityDate)
FROM MyTable maxtbl
WHERE maxtbl.ObjectID = curr.ObjectID
AND maxtbl.ActivityDate < curr.ActivityDate)
I could remove "prev", but have it there assuming you need IDs from it for deleting.
If the identity column is sequential you can use this approach:
SELECT curr.*, DATEDIFF(MINUTE, prev.EventDateTime,curr.EventDateTime) Duration FROM DWLog curr join DWLog prev on prev.EventID = curr.EventID - 1
Hrmm, interesting challenge. I think you can do it without a self-join if you use the new-to-2005 pivot functionality.
Here's what I've got so far, I wanted to give this a little more time before accepting an answer.
DECLARE #IDs TABLE
(
ID int ,
DateBetween int
)
DECLARE #OID int
SET #OID = 6150
-- Grab the revisions, calc the datediff, and insert into temp table var.
INSERT #IDs
SELECT ID,
DATEDIFF(dd,
(SELECT MAX(ActiveDate)
FROM ObjectRevisionHistory
WHERE ObjectID=#OID AND
ActiveDate < ORH.ActiveDate), ActiveDate)
FROM ObjectRevisionHistory ORH
WHERE ObjectID=#OID
-- Hard set DateBetween for special case revisions to always keep
UPDATE #IDs SET DateBetween = 1000 WHERE ID=(SELECT MIN(ID) FROM #IDs)
UPDATE #IDs SET DateBetween = 1000 WHERE ID=(SELECT MAX(ID) FROM #IDs)
UPDATE #IDs SET DateBetween = 1000
WHERE ID=(SELECT ID
FROM ObjectRevisionHistory
WHERE ObjectID=#OID AND Active=1)
-- Select out IDs for however I need them
SELECT * FROM #IDs
SELECT * FROM #IDs WHERE DateBetween < 2
SELECT * FROM #IDs WHERE DateBetween > 2
I'm looking to extend this so that I can keep at maximum so many revisions, and prune off the older ones while still keeping the first, last, and active. Should be easy enough through select top and order by clauses, um... and tossing in ActiveDate into the temp table.
I got Peter's example to work, but took that and modified it into a subselect. I messed around with both and the sql trace shows the subselect doing less reads. But it does work and I'll vote him up when I get my rep high enough.

Resources