Database based table with manually added column - maintaining values positions - sql-server

Let's consider simple Excel table associated with SQL Server's table:
ID some_data
0 a
1 b
2 c
I'd like to extend it with manually added column (not present in SQL Server's table):
ID some_data my_column
0 a some_data_for_0
1 b some_data_for_1
2 c some_data_for_2
However, when source data are changed (rows inserted / deleted / updated) the relation between my_column and ID column is not preserved. For example, when new row (3, d) is added:
ID some_data my_column
0 a some_data_for_0
1 b some_data_for_1
2 c
3 d some_data_for_2
Is there any Excel built-in solution that would allow me to specify how my_column rows should be ordered in relation to ID column or do I need to implement it by myself using VBA?

You could use an ORDER BY clause in your SQL statement, but even that's not very reliable. The only reliable way to do this is store your additional data in its own table and use a formula to relate it to the SQL data.
On a separate worksheet, put
ID my_column
0 some_data_for_0
1 some_data_for_1
2 some_data_for_2
Now in a column adjacent to the SQL data, put
=IFERROR(VLOOKUP([#ID],tblAddtlInfo,2,FALSE),"")
However the SQL data is sorted, the additional info will be in the right row. This assumes you made your additional info list into a table and named it tblAddtlInfo.
If you want to get fancy, you can write some code in the Change event that looks for non-formulas in the extra column. If the formula gets over written, then grab the new data, add it to (or update) your additional info table, and restore the formula. Then you can type the data in the row, but maintain the integrity by moving it to a different table.

Related

Snowflake - Keeping target table schema in sync with source table variant column value

I ingest data into a table source_table with AVRO data. There is a column in this table say "avro_data" which will be populated with variant data.
I plan to copy data into a structured table target_table where columns have the same name and datatype as the avro_data fields in the source table.
Example:
select avro_data from source_table
{"C1":"V1", "C2", "V2"}
This will result in
select * from target_table
------------
| C1 | C2 |
------------
| V1 | V2 |
------------
My question is when schema of the avro_data evolves and new fields get added, how can I keep schema of the target_table in sync by adding equivalent columns in the target table?
Is there anything out of the box in snowflake to achieve this or if someone has created any code to do something similar?
Here's something to get you started. It shows how to take a variant column and parse out the internal columns. This uses a table in the Snowflake sample data database, which is not always the same. You can to adjust the table name and column name.
SELECT DISTINCT regexp_replace(regexp_replace(f.path,'\\\\[(.+)\\\\]'),'(\\\\w+)','\"\\\\1\"') AS path_name, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING') AS attribute_type, -- This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\\\[(.+)\\\\]'),'[^a-zA-Z0-9]','_') AS alias_name -- This generates column aliases based on the path
FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."JCUSTOMER",
LATERAL FLATTEN("CUSTOMER", RECURSIVE=>true) f
WHERE TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[');
This is a snippet of code modified from here: https://community.snowflake.com/s/article/Automating-Snowflake-Semi-Structured-JSON-Data-Handling. The blog author attributes credit to a colleague for this section of code.
While the current incarnation of the stored procedure will create a view from the internal columns in a variant, an alternate version could create and/or alter a table to keep it in sync with changes.

How to insert data into a table such that possible extra columns in data get added to the parent table?

I'm trying to insert daily imported data into a SQL Server (2017) table. While most of the time the imported data has a fixed amount of columns, sometimes the client wants to add a new column to the data-to-be-imported.
I'm seeking for a solution that when the data gets imported (whether it is from another table, from R or from .csv's, don't mind this), SQL would automatically add the missing (extra) column to the parent table, providing the column name and assigning NULL to all previous entries.
I've tried with both UNION ALL and BULK INSERT, but both of these require the same # of columns. I'm working with SSMS2017, R3.4.1.
Next, I tried with a staging table and modifying the UNION clause as:
SELECT * FROM Table_new
UNION ALL
SELECT Tp.*, '' FROM Table_parent Tp;
But more often than not the extra column doesn't occur, so the column dimension problem occurs again.
I also thought about running the queries from R with DBI and odbc dbWriteTable() and handling the invalid column error with TryCatch(), parsing the column name from the error message and so on, but this would be a shakiest craft I've ever done and would prefer not to.
Ultimately I thought adding an if clause in R, and depending on the number of added new columns, loop and add the ', ""' part to the SQL query to create the extra columns. I'm convinced that this is too complex solution to this problem.
# Pseudo-R
#calculate the difference between lenght(colnames)
diff <- diff(length(colnames_new, colnames_parent)
if diff = 0 {
dbQuery(BULK INSERT INTO old SELECT * FROM new;)
} else if diff > 0 {
dbQuery(paste0(SELECT * FROM new
UNION ALL
SELECT T1.*, loop_paste(, '' /* for every diff */), FROM parent T1;))
} else if diff < 0 {
dbQuery(SELECT * FROM parent
UNION ALL
SELECT T2.*, loop_paste(, '' /* for every diff */), FROM new T2;))
}
To summarize: when inserting data to SQL table, how to (automatically) append the columns in the parent table, when necessary? Thanks!
The things in your database such as tables, columns, primary keys, foreign keys, check clauses are all part of the database schema. People design the schema before adding data to the database.
If you want to add new columns then you have to redesign your schema. When you do this you will also have to rewrite some of the CRUD procedures.

SQL Shift Table Column Down 1

I have a table of +15 million rows and 36 columns, there are two rows of data for every object to which the table refers. I need to:
Move one Column 0 down one space so that the useful information from that column appears in the row below.
Here is a sample of the data with less columns:
Table name = ekd0310
I want to shift Column 0 down 1
Column 0 Column 1 Column 2 Column 3
B02100AA.CZE
B02100AA.CZF I MIGA0027 SUBDIREC.019
B02100AA.CZG
B02100AA.CZH I MIGA0027 SUBDIREC.019
B02100AA.CZI
B02100AA.CZJ I MIGA0027 SUBDIREC.019
B02100AA.CZK '
THe function that you are looking for is probably lead(). You can use this if you assume that there is a column that specifies the ordering. An example:
select e.*, lead(col) over (order by id) as nextcol
from ekd0310 e;
Although this is an ANSI standard function, not all databases support it (yet). You can do something similar with correlated subqueries. Similarly, the above returns the information, but it is possible to do this as an update as well.

Get a list of columns and widths for a specific record

I want a list of properties about a given table and for a specific record of data from that table - in one result
Something like this:
Column Name , DataLength, SchemaLengthMax
...and for only one record (based on a where filter)
So what Im thinking is something like this:
- Get a list of columns from sys.columns and also the schema-based maxlength value
- populate column names into a temp table that includes (column_name, data_length, schema_size_max)
- now loop over that temp table and for each column name, fetch the data for that column based on a specific record, then update the temp table with the length of this data
- finally, select from the temp table
sound reasonable?
Yup. That way works. Not sure if it's the best, since it involves one iteration per column along with the where condition on the source table.
Consider this, instead :
Get the candidate records into a temporary table after applying the where condition. Make sure to get a primary key. If there is no primary key, get a rowid. (assuming SQL Server 2005 or above).
Create a temporary table (Say, #RecValueLens) that has three columns : Primary_key_Value, MyColumnName, MyValueLen
Loop through the list of column names (after taking only the column names into another temporary table) and build sql statement shown in Step 4.
Insert Into #RecValueLens (Primary_Key_Value, MyColumnName, MyValueLen)
Select Max(Primary_Key_Goes_Here), Max('Column_Name_Goes_Here') as ColumnName, Len(Max(Column_Name)) as ValueMyLen From Source_Table_Goes_Here
Group By Primary_Key_Goes_Here
So, if there are 10 columns, you will have 10 insert statements. You could either insert them into a temporary table and run it as a loop. If the number of columns is few, you could concatenate all statements into a single batch.
Run the SQL Statement(s) from above. So, you have Record-wise, column-wise, Value lengths. What is left is to get the column definition.
Get the column definition from sys.columns into a temporary table and join with the #RecValueLens to get the output.
Do you want me to write it for you ?

TSQL Comparing 2 tables

I have 2 tables in 2 database. The scheme for the tables is identical. There are no timestamps or last updated information. Table A is a live table, that is, it's updated in "the" program. Update records, insert records and delete records all happen in Table A. Table B is a backup made weekly. Is there a quick way to compare the 2 tables and give me results similar to:
I | 54
D | 55
U | 60
So record 54 in the live table is new, record 55 in the live table was deleted, record 60 in the live table was updated.
This needs to work in SQL Server 2008 and up.
Fields: id, first_name, last_name, phone, email, address_id, birth_date, last_visit, provider_id, comments
I have no control over the scheme. I have read-only access to Table A, read-write to Table B.
Would it be easier to store a hash of each Table A's rows rather than a full copy of the table? Generally speaking I need to know what rows have been updated/inserted and deleted without a build in timestamp. I have the weekly backup table to look at but I could create a hash table if needed.
Using two full joins the first one isvused to check just for id existance and identify inserts and deletes the second would be used for row equality.
In the example I have used checksum for simplicity but I recommend you read up on the cons of using it and consider alternatives like hashbytes or checking each column for equality
Select id, checksum(*) hash
Into #live
From live.dbo.tbl
Select id, checksum(*) hash
Into #archive
From archive.dbo.tbl
Select l1.id,
Case when l1.id is null then 'd'
when a1.id is null then 'I'
when a2.id is null then 'u' end change_type
From #live l1
Full Join #archive a1 On a1.id = l1.id
Full Join #archive a2 On a2.id = l1.id
And a2.hash = l1.hash
I'm going to recommend a tool, but it's not free, although it has a fully functioning 30 day trial period. If you're going to compare data in SQL Server tables, look at Red Gate's SQL Data Compare. It's not cheap, and it will pay for itself many times over. (If you need to compare schemas, their SQL Compare does that.)
Barring that, having a third table, where you write a compare query and select those in one table and not the other (with a field indicating that), those in the other table and not the first, and then comparing field by field to find those different -- well that should work too. It will take longer, but if it's just one one table, the time it takes to write that code should be less than what you'll pay for the Red Gate tools.
If there is a column or set of columns that can uniquely identify each row, then a series of sql statements could be written to identify the inserts, updates and deletes. If there isn't a unique row identifier or the unique identifier (for example, one of the columns that makes it unique) changes, then no.

Resources