Can a Snowflake UDF be used to create MD5 on the fly? - md5

I was wondering if anyone has an example of creating an MD5 result using an UDF in Snowflake?
Scenario: I want a UDF that can set X columns depending on the source to create an MD5 result. So table A might have 5 columns, table B has 10....and accounting for various data types.
Thanks,
Todd

Snowflake already provided md5 built in fucntion.
https://docs.snowflake.com/en/sql-reference/functions/md5.html
select md5('Snowflake');
----------------------------------+
MD5('SNOWFLAKE') |
----------------------------------+
edf1439075a83a447fb8b630ddc9c8de |
----------------------------------+

There are many ways you can do the MD5 calculation. But I thought it will be good to understand your use case. I am assuming that you want to use MD5 to validate the data migrated to Snowflake. If that is the case, then MD5 way of checking each row on snowflake may be expensive. A more optimal way of validation will be to identify each column for the table and calculate the MIN, MAX, COUNT, NUMBER OF NULLS, DISTINCT COUNT for each column and validate it with the source. I have created a framework with this approach where I use the 'SHOW COLUMNS' query to get the list if COLUMNS. The framework also allows to skip some columns if required, also filter the number of rows retrieved based on a dynamic criteria. This way of validating the data will be more optimal. It will definitely help to understand your use case better.
MD5
Does this work for you
create or replace function md5_calc (column_name varchar)
returns varchar
LANGUAGE SQL
AS $$
select md5(column_name)
$$;
SELECT EMPLID,md5_calc(EMPLID),EMPNAME,md5_calc(EMPNAME) from employee;

Related

SQL Server - Mapping table to identify fields to sum

I have a large database of fields around 400, that need to be summed in various different ways.
Currently I do this in Excel with a look up table to identify which field to sum and then use a sum if formula to then sum those columns. How would I replicate this in SQL?
I've seen some examples where you manually type out each field to calculate but this seems very impractical if you are summing up to 300 fields and if there are any changes to that mapping table, then we would have to redo that.
So far my only solution is to copy and paste all the fields to be calculated in excel, and add the correct summation syntax and paste back in to SQL.
Thanks
Lots of ways of doing this;
create a case statement which acts like your sumif and pass the conditions in the case e.g. CASE WHEN amount >=100 THEN SUM(Amount) END as 'New Column'
Create a function and store all your logic in the function similar to above but easy to maintain should your logic change.
if you post your sumif's with a sample I can help you.

Is it possible in SQL Server to convert the type of a field based on the content of another field?

I have a table, DD, which is a data dictionary, with fields (say):
ColumnID (longint PK), ColumnName (varchar), Datatype (varchar)
I have another table, V, where I have sets of records in the form:
ColumnID (longint FK), ColumnValue (varchar)
I want to be able to convert sets of records from V into another table, Results, where each field will be translated based on the value of DD.Datatype, so that the destination table might be (say):
ColumnID (longint FK), ColumnValue (datetime)
To be able to do this, ISTM that I need to be able to do something like
CONVERT(value of DD.Datatype, V.ColumnValue)
Can anyone give me any clues on whether this is even possible, and if so what the syntax would be? My google-fu has proved inadequate to find anything relevant
You could do something like this with dynamic sql, certainly. As long as you are aware of the limitation that the datatype is a property of the COLUMN in the resultset, and not each cell in the resultset. So all the rows in a given column must have the same datatype.
The only way to accomplish something like CONVERT(value of DD.Datatype, V.ColumnValue) in SQL is with dynamic SQL. That has it's own problems, such as basically needing to use stored procedures to keep queries efficient.
Alternately, you could fetch the datatype metadata with one query, construct a new query in your application, and then query the database again. Assuming you're using SQL Server 2012+, you could also try using TRY_CAST() or TRY_CONVERT(), and writing your query like:
SELECT TRY_CAST(value as VARCHAR(2)) FieldName
FROM table
WHERE datatype = 'VARCHAR' AND datalength = 2
But, again, you've got to know what the valid types are; you can't determine that dynamically with SQL without dynamic SQL. Variables and parameters are not allowed to be used for object or type names. However, no matter what you do, you need to remember that all data in a given column of a result set must be of the same datatype.
Most Entity-Attribute-Value tables like this sacrifice data integrity that strong typing brings by accepting that the data type is determined by the application and not the RDBMS. EAV does not allow you to have your cake (store data without a fixed schema) and eat it, too (enjoy DB enforced strong data typing, not having to typecast strings in the application, etc.).
EAV breaks data normalization pretty badly. It breaks First Normal Form; the most basic rule, and this is just one of the consequences. EAV tables will make querying the data anywhere from awkward to extremely difficult, and you're almost always going to sacrifice performance doing it because the RDBMS is built around the relational model.
That doesn't mean you shouldn't ever use EAV tables. They're relatively great for user defined fields. However, it does mean that they're always going to suck to query and manage. That's just the tradeoff. You broke First Normal Form. Querying and performance are going to suffer consequences of that choice.
If you really want to store your all your data like this, you should look at either storing data as blobs of XML or JSON (SQL Server 2016) -- but that's a general pain to query -- or use a NoSQL data store like MongoDB or Cassandra instead of an SQL RDBMS.

TSQL - Get maximum length of data in every column in every table without Dynamic SQL

Is there a way to get maximum length of data stored in every column in the database? I have seen some solutions which used Dynamic SQL, but I was wondering if it can be done with a regular query.
Yes, Just query the INFORMATION_SCHEMA.COLUMNS view for the database, you can get the information out from all columns of all tables in the database if you desire, see the following for more details:
Information_Schema - COLUMNS
If you are talking about the length of particular data in and not the declared length of a column, I am afraid that is not achievable without dynamic SQL.
The reason is that there is only way to retrieve data, and that is the SELECT statement. This statement however requires an explicit column, which is part of the statement itself. There is nothing like
-- This does not work
select col.Data
from Table
where Table.col.Name='ColumnName'
So the answer is: No.

Creating SQL Server JSON Parsing/Query UDF

First of all before I get into the question, I'll preface this with the fact that I know that this is a "bad" idea. But for business reasons it is something that I have to come up with a solution to, and I'm hoping that someone, somewhere might have some ideas on how to go about this.
I have a SQL Server 2008 R2 table that has a "OtherProperties" column. This column contains various other, somewhat arbitrary additional pieces of information that relate to the records. There is a business need to create a UDF that we can use to query the results, for example.
SELECT *
FROM MyTable
WHERE MyUDFGetValue(myTable.OtherProperties, "LinkedOrder[0]") IS NOT NULL
This would find a record where there was an array of LinkedOrder entries that contained a value at index 0
SELECT *
FROM MyTable
WHERE MyUDFGetValue(myTable.OtherProperties, "SubOrder.OrderId") = 25
This would find a property "orderId" and use its value in a comparison.
Anyone seen an implementation of this? I've seen implementations of functions. Like this JSONParser that take the values into a table which just will not get us what we need query wise. Complexity wise, I don't want to write a full fledged JSON parser, but I can if I need to.
Not sure if this will suit your needs but I read about a CLR JSON serializer/deserializer. You can find it here, http://www.sqlservercentral.com/articles/CLR/74160/
It's been a long time since you asked your question but there is now a solution you can use - JSON Select which provides various functions for different datatypes, for example the JsonInt() function. From one of your examples (assuming OrderId is an int, if not you could use a different function):
SELECT *
FROM MyTable
WHERE dbo.JsonInt(myTable.OtherProperties, 'SubOrder.OrderId') = 25
DISCLOSURE:
I am the author of JSON Select, and as such have an interest in you using it :)
If you cannot use SQL Server 2016 with built-in JSON support, you would need to use CLR e.g. JSONselect, json4sql, or custom code such as http://www.codeproject.com/Articles/1000953/JSON-for-SQL-Server-Part, etc.

SQL Server Common function for getting max(id)

In my application many I time we use MAX(). How can I write a common function where I can pass table name and column name and get MAX().
I mean single function for any table/field.
Thanks,
Tanmay.
In MSSQL there are no particularly elegant ways to use UDFs with dynamic table/field names, so personally I would just stick to a simple SELECT MAX().
If the IDs are IDENTITY columns you can just use IDENT_CURRENT('the_table')

Resources