SQL query duplicates, self-join, and calculated field - sql-server

I'm putting together a report to track latency in a SQL Availability Group. I need to find the difference between two times. I have server1, database1 and time1. I need to compare it to server2, database1, and time2. The data is stored in a table, and new data will be appended to the table. The query I'm trying to create will get these new entries (the field Hardened_time_MS_Diff will be NULL) and I need to get the difference between the times returned for the Primary and Secondary servers. Here's an example of the data:
Server_name Database_name Last_Hardened_Time
Server1 ABC 2015-10-08 10:10:05.180
Server2 ABC 2015-10-08 10:10:05.643
My query to get the Hardened_time_MS_Diff values looks like this:
Select a1.server_name, a1.database_name,
Case when a1.Last_Hardened_Time >= a2.Last_Hardened_Time
then
DATEDIFF (MS, a2.Last_Hardened_Time, a1.Last_Hardened_Time)
Else
DATEDIFF (MS, a1.Last_Hardened_Time, a2.Last_Hardened_Time)
End as Hardened
FROM [database].[dbo].[ag_latency] a1
JOIN ag_latency a2
ON a1.database_name = a2.database_name
AND a1.server_name <> a2.server_name
where a1.Hardened_time_MS_Diff is NULL
Right now it's returning some crazy numbers, and duplicates. The first run it'll work fine. The second run my data will look correct only for the servers whose Last_Hardened_time hasn't changed. If it changed I'll get an absolutely huge number for the second batch like this:
Server_name Database_name Last_Hardened_Time Hardened_time_MS_Diff
Server1 ABC 2015-10-09 12:00:05.013 26
Server2 ABC 2015-10-09 12:00:05.040 26
Server1 ABC 2015-10-09 12:15:07.843 -902803
Server2 ABC 2015-10-09 12:15:07.877 -902863
How can I get this to do what I want it to do?
Also, the part to get the data is in Powershell, so if there's an easier way to manipulate the data table in PoSH and then push that data down let me know.
Thanks in advance!

Simply answer your question, your join clause won't work when you have multiple pair of timestamps - it will join not only the row1-row2 and row3-row4, but also join row1-row4 and row2-row3, which the later two doesn't make sense to your report.
One possible solution would be eliminate any time difference greater than a threshold (say 1000 MilliSec, according to your sample data). So your where clause would look like
where a1.Hardened_time_MS_Diff is NULL AND (DATEDIFF (MS, a2.Last_Hardened_Time, a1.Last_Hardened_Time) < 1000 AND DATEDIFF (MS, a2.Last_Hardened_Time, a1.Last_Hardened_Time) > -1000)
Another solution would be more robust, but requires you to change your table schema. You need to add a column to the table, indicate which 2 rows are paired. For example, assign 1 to first two rows and 2 to last two rows, and then join table a1 and a2 by this column.
Key Server_name Database_name Last_Hardened_Time Hardened_time_MS_Diff
1 Server1 ABC 2015-10-09 12:00:05.013 26
1 Server2 ABC 2015-10-09 12:00:05.040 26
2 Server1 ABC 2015-10-09 12:15:07.843 -902803
2 Server2 ABC 2015-10-09 12:15:07.877 -902863
For the powershell part, can you please elaborate what you need more specifically?
Henry

Related

Split data from strings into columns

I have a column with a long string. The data needs split into columns and there are variable lengths of strings with not always the same amount of columns. Not exactly sure how to do this so was looking for some advice here.
Lets say I have this string:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
And in some cases the string might not have all the medical conditions just some of them.
I need to split into columns where the column name is in between the tilds i.e. MedCond1 and the value would be the value to the right of the tild but before the pipe and end up like this:
MedCond1 MedCond2 MedCond3 MedCond4 MedCond5 MedCond6 MedCond7 MedCond8
======== ======== ======== ======== ======== ======== ======== ========
35.1 24 110 64 100 88 21 79
I need to do this for a lot of rows within a large table and as I said not all the columns are always present but they will not be different names, you might have med cond 1- 8, then in another set have med cond 3, 4, 7.
Here is a query I created that is kind of what I want but not dynamic so it is picking up the values with some extra bits of the string
select MainCol, case when charindex('MedCond1', MainCol) > 0 then
substring(MainCol, charindex('MedCond1', MainCol) + 9, 4) end as [MedCond1]
from MedTable
Will return
MedCond1
========
35.3
40.2
33.6
33|V <--- Problem
As you can see the numeric value is sometimes picked up with additional part of the string due to hard coding of the charindex number. The value is sometimes 4 characters long with a decimal place, sometimes 2 long with no decimal place. I would like to make this dynamic. The pipe defines the end of the data I need and the start is defined by the tild at the end of the column name.
Thanks for any thoughts on making this dynamic
Andrew
This data looks like a table itself. It could have been stored in SQL Server as xml. SQL Server supports xml fields and allows querying them. In fact, one could try to convert this string to XML, then try to query it:
declare #medTable table (item nvarchar(2000))
insert into #medTable
values ('VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|');
-- Step 1: Replace `|` with <item> tags and `~` with `tag` tags
-- This will return an xml value for each medTable row
with items as (
select xmlField= cast('<item><tag>'
+ replace(
replace(item,'|','</tag></item><item><tag>'),
'~','</tag><tag>' )
+ '</tag></item>' as xml)
from #medTable
)
-- Step 2: Select different tags and display them as fields
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from items outer apply xmlField.nodes('item') as y(item)
The result is :
-------------------- -------------------- -------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
NULL NULL NULL
It would be better to perform this conversion when loading the data though. It's easier for example, to make the replacements in C# or SSIS and store a complete xml value in the database.
You can modify this query too, to generate the xml value and store it in the database:
declare #medTable2 table (xmlField xml)
with items as (
select xmlField= cast('<item><tag>' + replace(replace(item,'|','</tag></item><item><tag>'),'~','</tag><tag>' ) + '</tag></item>' as xml)
from #medTable
)
insert into #medTable2
select items.xmlField
from items
-- Query the new table from now on
select
y.item.value('(tag/text())[1]','nvarchar(20)'),
y.item.value('(tag/text())[2]','nvarchar(20)'),
y.item.value('(tag/text())[3]','nvarchar(20)')
from #medTable2 outer apply xmlField.nodes('item') as y(item)
OK, let me take a stab at this. The solution I'm outlining is not going to be purely SQL Server, however, it uses a round-trip via a text-file.
The approach uses the following steps:
Unpivot the data delimited by the pipe symbols (to create more than one line of output for each line of input)
Round-trip the data from SQL Server to a text file and back
Separate the data into columns on the tilde ~ symbol delimiter
Pivot the data back into columns
The key benefit of this approach is the unpivot operation, which allows you to handle missing columns like MedCond2 naturally by the absence of an equivalent row. It also eliminates nearly all string manipulation, save for the one REPLACE function in step 1 below.
Given a single row's contents like the following:
VS5~MedCond1~35.4|VS4~MedCond2~16|VS1~MedCond3~155|VS2~MedCond4~70|SPO2~MedCond5~100|VS3~MedCond6~64|FiO2~MedCond7~21|MAP~MedCond8~98|
Step 1 (Unpivot): Find and replace all instances of the pipe symbol with a newline character. So, REPLACE(column, '|', CHAR(13)) will give you the following lines of text (i.e. multiple lines of text in a single database row) for a single input row:
VS5~MedCond1~35.4
VS4~MedCond2~16
VS1~MedCond3~155
VS2~MedCond4~70
SPO2~MedCond5~100
VS3~MedCond6~64
FiO2~MedCond7~21
MAP~MedCond8~98
Step 2 (Round-trip): Write the above output to a text file, using your tool of choice (SSIS, SQLCMD, etc.) and ensure that the newline character defined is the same as that used in the REPLACE command in step 1.
The purpose of this step is to concatenate multiple lines within the same row with other lines in different rows.
Note that steps 1 can be eliminated by defining the row delimiter for steps 2 & 3 as the pipe symbol. I've put in the additional step 1 using newlines only to make it easier to understand and debug.
Step 3 (Separate columns): Import the text file back into SQL Server using the same tool, and define the column delimiter as the tilde ~ symbol, row delimiter same as in steps 1/2.
ColA MedCondTitle MedCondValue
------ ------------- -------------
VS5 MedCond1 35.4
VS4 MedCond2 16
VS1 MedCond3 155
VS2 MedCond4 70
SPO2 MedCond5 100
VS3 MedCond6 64
FiO2 MedCond7 21
MAP MedCond8 98
Step 4 (Pivot): Now you'd have a trivially simple step of pivoting rows to columns, which can be achieved with a statement of the form:
SUM(CASE WHEN MedCondTitle='MedCond1' THEN MedCondValue ELSE 0) as MedCond1

Loading Flat File into SQL Server using SSIS

New to SSIS and am trying to import a flat file into my DB. There are 6 different rows on the flat file that I need to combine into one row in the database, each of these rows contain a different price for one symbol. For example below:
IGBGK 21 w 47
IGBGK 21 u 2.9150
IGBGK 21 h 2.9300
IGBGK 21 l 2.9050
IGBGK 22 h 2.9300
IGBGK 22 l 2.8800
So each of these are in a different rows on the flat file but will become one row in different columns for symbol IGBGK. I can transform the data to place each number into its own column but can not get them to combine into one row.
Any help on the direction I need to go with this is greatly appreciated.
End product should look like:
Symbol | col 1 | col 2 | col 3 | col 4 | col 5 | col 6
-------+-------+-------+-------+-------+-------+-------
IGBGK | 47 | 2.915 | 29.30 | 2.905 | 2.930 | 2.880
1.Name a variable with whatever name you want with system object type
2.Use execute sql task
Query for you table:
WIth ABC
as
(Select * From table --which give you the original result
)
Select * From ABC
PIVOT (Count(**4th Column Name**) for **1st Column Name** IN ([col 1],[col 2],[col 3],[col 4],[col 5],[col 6]))
4.copy all the complete query into that task and specify the result Set to Full result
5.Switch to Result Set page, choose the variable you create, and set the result name to 0
6.Now every time you run the package the variable will be assigned as the complete result table as shown in your desired format above.
7.And specify another 7 variables corresponding to each column, "symbol, [col 1]...", should be string data type for each variable
Use another execute sql task, specify Variable in SQL Source Type, then go to the Parameter Mapping page, choose that System Object variable, set Name to 0, after that go to Result set page, choose all those seven parameters one by one, and change the parameter name to 0,1,2,3,4,5,6
From now on every time you run the package, each variable would be assigned each value, if you want to load them into target table, here comes the last step
Use another Execute SQL Task, using query like this:
Insert into table
select ?,?,?,?,?,?,?
go to the Parameter Mapping page, choose all those seven variables and change name to 0,1,2,3,4,5,6 for each one by one to map the ?
There could be some small issue you need to figure by yourself, like the data type, but the logic is almost like this.
Hope this helps!

SQL Server : Search Between Characters in UK Postcode Unit Field

As a solution to a postcode lookup I have split a postcode database into a the separate columns for Area, District, Sector, Unit. I have even split out the characters that could be in District like the A and B in W1A and W1B etc.
I am wanting to search for a postcode like B46 3BA and have got as far as returning details as below.
The SQL query used thus far is
SELECT
StationID, FromPostcode, ToPostcode, FromPostCodeArea,
FromPostCodeDistrict, FromPostCodeDistrictChar,
FromPostCodeSector, FromPostCodeUnit,
ToPostCodeArea, ToPostCodeDistrict,
ToPostCodeDistrictChar, ToPostCodeSector, ToPostCodeUnit
FROM
Station
WHERE
(FromPostCodeArea = 'B')
AND (3 BETWEEN FromPostCodeSector AND ToPostCodeSector)
AND (46 BETWEEN FromPostCodeDistrict AND ToPostCodeDistrict)
My main issues for legacy application reasons is I do not have access to the Select query and can only change the WHERE clause. I just need it to return the single row in this case row with stationid 3321.
But I am totally lost on how to take this any further
Answer As suggested by Jamie makes the sql
SELECT
StationID, FromPostcode, ToPostcode, FromPostCodeArea,
FromPostCodeDistrict, FromPostCodeDistrictChar,
FromPostCodeSector, FromPostCodeUnit,
ToPostCodeArea, ToPostCodeDistrict,
ToPostCodeDistrictChar, ToPostCodeSector, ToPostCodeUnit
FROM
Station
WHERE
(FromPostCodeArea = 'B')
AND (3 BETWEEN FromPostCodeSector AND ToPostCodeSector)
AND (46 BETWEEN FromPostCodeDistrict AND ToPostCodeDistrict)
AND ('B46 3BA' BETWEEN FromPostCode AND ToPostCode)
Seems to do the trick on a test run of 2000 random postcodes.

how to avoid re-inserting data into sql table while re-running SSIS package that loads data from flat file to SQL table?

I have a flat file with the following data
Id,FirstName,LastName,Address,PhoneNumber
1,ben,afflick,xyz Address,5014123746
3,christina,smith,test address,111000110
1,ben,afflick,xyz Address,5014123746
3,christina,smith,test address,111000110
4,nash,gordon,charlotte NC ADDRESS,111200110
I have created a SSIS package that has flat file source , and an aggregate function that makes sure that only unique rows are inserted and not duplicate records from the flat file , and SQL table as my destination.
Everything is fine when i run the package , i am getting below output in SQL table
Id FName LName Address phoneNumber
1 ben afflick xyz Address 5014123746
4 nash gordon charlotte NC ADDRESS 111200110
3 christina smith test address 111000110
But when i add some new data to the flat file as below
Id,FirstName,LastName,Address,PhoneNumber
1,ben,afflick,xyz Address,5014123746
3,christina,smith,test address,111000110
1,ben,afflick,xyz Address,5014123746
3,christina,smith,test address,111000110
4,nash,gordon,charlotte NC ADDRESS,111200110
5,abc,xyz,New York,9999988888
and re-run the package the data that is already present in the table is getting re-inserted as below
1 ben afflick xyz Address 5014123746
4 nash gordon charlotte NC ADDRESS 111200110
3 christina smith test address 111000110
1 ben afflick xyz Address 5014123746
5 abc xyz New York 9999988888
4 nash gordon charlotte NC ADDRESS 111200110
3 christina smith test address 111000110
But i DO NOT want this , i don't want data to be inserted that is already present.
I want only the newly added data to get inserted into the SQL table.
Can someone please help me to achieve this?
Another method is to load your file into a staging table in the database and then use a merge statement to insert data into your destination table.
In practice this would look like a data flow from your flat file to your staging table and then a execute sql task containing a merge statement.
You can then update any matching values as well, should you wish to.
merge into table_a
using stage_a
on stage_a.key = table_a.key
when not matched then insert (a,b,c,d) values ( a,b,c,d )
Your data flow task would look something like this. Here, the Flat file source reads the CSV file and then passes the data to Lookup transformation. This transformation will check for existing data in the destination table. If there are no matching records, then the data from CSV file will be sent to the OLE DB Destination otherwise, the data will be discarded only.
lookup Transformation link ---
http://www.codeproject.com/Tips/574437/Term-Lookup-Transformation-in-SSIS

How to get last access/modification date of a PostgreSQL database?

On development server I'd like to remove unused databases. To realize that I need to know if database is still used by someone or not.
Is there a way to get last access or modification date of given database, schema or table?
You can do it via checking last modification time of table's file.
In postgresql,every table correspond one or more os files,like this:
select relfilenode from pg_class where relname = 'test';
the relfilenode is the file name of table "test".Then you could find the file in the database's directory.
in my test environment:
cd /data/pgdata/base/18976
ls -l -t | head
the last command means listing all files ordered by last modification time.
There is no built-in way to do this - and all the approaches that check the file mtime described in other answers here are wrong. The only reliable option is to add triggers to every table that record a change to a single change-history table, which is horribly inefficient and can't be done retroactively.
If you only care about "database used" vs "database not used" you can potentially collect this information from the CSV-format database log files. Detecting "modified" vs "not modified" is a lot harder; consider SELECT writes_to_some_table(...).
If you don't need to detect old activity, you can use pg_stat_database, which records activity since the last stats reset. e.g.:
-[ RECORD 6 ]--+------------------------------
datid | 51160
datname | regress
numbackends | 0
xact_commit | 54224
xact_rollback | 157
blks_read | 2591
blks_hit | 1592931
tup_returned | 26658392
tup_fetched | 327541
tup_inserted | 1664
tup_updated | 1371
tup_deleted | 246
conflicts | 0
temp_files | 0
temp_bytes | 0
deadlocks | 0
blk_read_time | 0
blk_write_time | 0
stats_reset | 2013-12-13 18:51:26.650521+08
so I can see that there has been activity on this DB since the last stats reset. However, I don't know anything about what happened before the stats reset, so if I had a DB showing zero activity since a stats reset half an hour ago, I'd know nothing useful.
PostgreSQL 9.5 let us to track last modified commit.
Check track commit is on or off using the following query
show track_commit_timestamp;
If it return "ON" go to step 3 else modify postgresql.conf
cd /etc/postgresql/9.5/main/
vi postgresql.conf
Change
track_commit_timestamp = off
to
track_commit_timestamp = on
Restart the postgres / system
Repeat step 1.
Use the following query to track last commit
SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME;
SELECT pg_xact_commit_timestamp(xmin), * FROM YOUR_TABLE_NAME where COLUMN_NAME=VALUE;
My way to get the modification date of my tables:
Python Function
CREATE OR REPLACE FUNCTION py_get_file_modification_timestamp(afilename text)
RETURNS timestamp without time zone AS
$BODY$
import os
import datetime
return datetime.datetime.fromtimestamp(os.path.getmtime(afilename))
$BODY$
LANGUAGE plpythonu VOLATILE
COST 100;
SQL Query
SELECT
schemaname,
tablename,
py_get_file_modification_timestamp('*postgresql_data_dir*/*tablespace_folder*/'||relfilenode)
FROM
pg_class
INNER JOIN
pg_catalog.pg_tables ON (tablename = relname)
WHERE
schemaname = 'public'
I'm not sure if things like vacuum can mess this aproach, but in my tests it's a pretty acurrate way to get tables that are no longer used, at least, on INSERT/UPDATE operations.
I guess you should activate some log options. You can get information about logging on postgreSQL here.

Resources