Combine multiple date columns into one date column - netezza

I want to combine multiple date columns by taking the least date value among date columns excluding the nulls values. I have tried various ways such as using 'case when' and 'Min' function but can't weed out NULL values. I am not looking for first non-NULL value either. What makes matters worst is that the 'LEAST' function is not available in Netezza.
My dummy data(not highlighted columns), my desired output(highlighted columns) is shown in the table below:

the MIN and MAX functions in a netezza system work as simple scalar functions (as well as columnar functions) and you can give them multiple arguments. Essentially the same as LEAST. This example covers it for the MIN function but MAX is the same:
select min(a)
from (
select min(min(1,2),NULL) a
union all
select 10
union all
select NULL
) x
Result is 10
To disregard possible NULL values with a NVL (same as DB2 coalesce function) and an "infinate" value of some sort. Replace line 3 in the above statement with this:
select min(min(1,2),nvl(NULL,100)) a
Result is now 1
In you case you should be able to do this:
select patient_id,min(
nvl(INDEX_DT,'9999-12-31'),
nvl(PRE_FEVER_DT,'9999-12-31'),
nvl(POST_FEVER_DT,'9999-12-31'),
nvl(PRE_DIARR_DT,'9999-12-31'),
nvl(PRE_DIARR_DT,'9999-12-31'),
nvl(PRE_COUGH_DT,'9999-12-31'),
nvl(POST_COUGH_DT,'9999-12-31')
) as signs_DT

The least() function is available as a part of the SQL extension toolkit.
Here is the link to the least() function -> https://www.ibm.com/docs/en/netezza?topic=functions-least
Here is the install instructions -> https://www.ibm.com/docs/en/psfa/7.1.0?topic=setup-installing-netezza-sql-extensions-toolkit
SQL Toolkit can be downloaded from IBM Fix central.

Related

How can I check and remove duplicate rows?

Have problem with quite big table, where are some null values in 3 columns - datetime2 (and 2 float columns).
Nice simple request from similar question returns only 2 rows where datetime2 is null, but nothing else (same as lot of others):
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, allRemainingCols
FROM MyTable
GROUP BY allRemainingCols
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
Seems to work without datetime2 column having nulls ??
There is manual workaround, but is there any way to create request or procedure using TSQL only ?
SELECT id,remainingColumns
FROM table
order BY remainingColumns
Compare all columns in XL (15 in my case, placed =ROW() in first column as a check and formula next to last column + auto filter for TRUEs): =AND(B1=B2;C1=C2;D1=D2;E1=E2;F1=F2;G1=G2;H1=H2;I1=I2;J1=J2;K1=K2;L1=L2;M1=M2;N1=N2;O1=O2;P1=P2)
Or compare 3 rows like this and select all non-unique rows
=OR(
AND(B1=B2;C1=C2;D1=D2;E1=E2;F1=F2;G1=G2;H1=H2;I1=I2;J1=J2;K1=K2;L1=L2;M1=M2;N1=N2;O1=O2;P1=P2);
AND(B2=B3;C2=C3;D2=D3;E2=E3;F2=F3;G2=G3;H2=H3;I2=I3;J2=J3;K2=K3;L2=L3;M2=M3;N2=N3;O2=O3;P2=P3)
)
Quite much work to find my particular data/answer...
Most of float numbers were slightly different.
Hard to find, but simple CAST(column as binary) can show these invisible differences...
Like 96,6666666666667 vs 0x0000000000000000000000000000000000000000000040582AAAAAAAAAAD vs 0x0000000000000000000000000000000000000000000040582AAAAAAAAAAB etc.
And visible 96.6666666666667 can return something different way again:
0x0000000000000000000000000000000000000F0D0001AB6A489F2D6F0300

SQL Server testing for 1 value in multiple columns

I am testing a table and would like to find out if 10 columns of that table (integer fields) EQUAL the value 999. can ANY or the IN clause be used for this?
At a pure guess, and this is pseudo-SQL, but
SELECT {Columns}
FROM {YourTable}
WHERE '999' IN ({First Column},{Second Column},{Third Column},...,{Tenth Column});
The only way to test this is something like this:
select * from table1 where i1=999 and i1=i2 and i2=i3 and i3=i4 and i4=i5 and i5=16 and i6=i7 and i7=i8 and i8=i9;
where iX are column names. This will return rows which match all the value for all 9 columns.
TSQL is a MS implementation of various SQL standards.

Mixing indexed and calculated fields in a table-valued function

I work with SQL Server 2008, but can use a later version if it would matter.
I have 2 tables with pretty similar data about some people but in different formats (no intersections between these 2 sets of people).
Table 1:
int personID
bit IsOldPerson //this field is indexed
Table 2:
int PersonID
int Age
I want to have a combined view that has the same structure as the Table 1. So I write the following script (a simplified version):
CREATE FUNCTION CombinedView(#date date)
RETURNS TABLE
AS
RETURN
select personID as PID, IsOldPerson as IOP
from Table1
union all
select personID as PID, dbo.CheckIfOld(Age,#date) as IOP
from Table2
GO
The function "CheckIfOld" returns yes/no depending on the input age at the date #date.
So I have 2 questions here:
A. if I try select * from CombinedView(TODAY) where IOP=true, whether the SQL Server will do the following separately: 1) for the Table 1 use the index for the field IsOldPerson and do a "clever" index-based selection of results; 2) for the Table 2 calculate CheckIfOld for all the rows and during the calculation pick up or rejecting rows on the row-by-row basis ?
B. how can I check the execution plan in this particular case to understand whether my guess in the question (A) is correct or not?
Any help is greatly appreciated! Thanks!
Yes, if the query isn't too complex, the query optimizer should "see through" the view into its constituent UNION-ed SELECT statements, evaluate them separately, and concatenate the results. If there is an index on Table1, it should be able to use it. I tested this using tables we had and the same function concepts you presented. I reviewed the query plans of the raw SELECT to Table1 and the SELECT to the inline table-valued function with the UNION and the portion of the query plan relevant to Table1 was the same-- and it used the index.
Now if performance is a concern, I suggest you do one of two things:
If (a) Table2 is read-heavy rather than write-heavy, (b) you have the space, and (c) you can write CheckIfOld as a single CASE statement (as its name and context in your question implies), then you should consider creating a persisted calculated field in Table2 with the calculation from IsOldPerson and applying an index to it.
If Table2 is write-heavy, or you have no space for additional fields, you should at least consider converting CheckIfOld into an inline function. You will likely reap performance gains, depending on how it is used. In your case, it would be used like this:
select personID as PID, IOP.IsOldPerson from Table2 CROSS APPLY dbo.CheckIfOld(Age,#date) AS IOP

Simple SQL approach to transform columns into rows (SQL Server 2005 or 2008)

On SQL Server, executing the following SQL Statement:
SELECT 1,2,3
will return
(no column name) (no column name) (no column name)
1 2 3
Note that the columns don't have names and the number of columns is not definite (it can have 1 column or it can also have > 100 columns).
My question is - Does anybody know of a simple approach so I can get the following result:
(no column name)
1
2
3
What I'm really trying to do is come up with a SQL similar to the one below. I wish I could execute it as it is but of course we know that the Select 1,2,3 won't work, we have to somehow transform that into a table with the values in each row.
SELECT *
FROM NORTHWIND.DBO.CUSTOMERS
WHERE EMPLOYEEID IN (*Select 1,2,3*); -- *Select 1,2,3 will not work
Currently I'm thinking of creating a user defined function that returns a table by iterating through each column and dynamically creating multiple SQL statements combined by UNION similar to: SELECT 1 Col1 UNION SELECT 2 UNION SELECT 3. I'm not a fan of dynamic SQL and looping procedures in my queries as it can be expensive to process especially for an application with expected usage of 1000+ per minute. Also, there is that concern for SQL Injection Attacks with Dynamic SQL when I start using strings instead of integer values. I'm also trying to avoid temporary tables as it can even be more expensive to process.
Any ideas? Can we use UNPIVOT without the need for looping through the indefinite number of columns and dynamically creating the SQL text to execute it and transform the columnar values into rows? What about Common Table Expressions?
Get rid of the select and just specify a list of values:
SELECT * FROM NORTHWIND.DBO.CUSTOMERS
WHERE EMPLOYEEID IN (1,2,3);

How to get MAX value of a version-number (varchar) column in T-SQL

I have a table defined like this:
Column: Version Message
Type: varchar(20) varchar(100)
----------------------------------
Row 1: 2.2.6 Message 1
Row 2: 2.2.7 Message 2
Row 3: 2.2.12 Message 3
Row 4: 2.3.9 Message 4
Row 5: 2.3.15 Message 5
I want to write a T-Sql query that will get message for the MAX version number, where the "Version" column represents a software version number. I.e., 2.2.12 is greater than 2.2.7, and 2.3.15 is greater than 2.3.9, etc. Unfortunately, I can't think of an easy way to do that without using CHARINDEX or some complicated other split-like logic. Running this query:
SELECT MAX(Version) FROM my_table
will yield the erroneous result:
2.3.9
When it should really be 2.3.15. Any bright ideas that don't get too complex?
One solution would be to use a table-valued split function to split the versions into rows and then combine them back into columns so that you can do something like:
Select TOP 1 Major, Minor, Build
From ( ...derived crosstab query )
Order By Major Desc, Minor Desc, Build Desc
Actually, another way is to use the PARSENAME function which was meant to split object names:
Select TOP 1 Version
From Table
Order By Cast(Parsename( Z.Version , 3 ) As Int) Desc
, Cast(Parsename( Z.Version , 2 ) As Int) Desc
, Cast(Parsename( Z.Version , 1 ) As Int) Desc
Does it have to be efficient on a large table? I suggest you create an indexed persisted computed column that transform the version into a format that ranks correctly, and use the computed column in your queries. Otherwise you'll always scan end to end.
If the table is small, it doesn't matter. Then you can use a just-in-time ranking, using a split function, or (ab)using the parsename as Thomas suggested.

Resources