Exporting Kusto table to SQL Server GENERATED ALWAYS column - sql-server

I'm setting up a data pipeline to export from a Kusto table to a SQL Server table. The only problem is the target table has two GENERATED ALWAYS columns. Looking for some help implementing the solution using Kusto.
This is the export statement:
.export async to sql ['CompletionImport']
h#"connection_string_here"
with (createifnotexists="true", primarykey="CompletionSearchId")
<|set notruncation;
apiV2CompletionSearchFinal
| where hash(SourceRecordId, 1) == 0
Which gives the error:
Cannot insert an explicit value into a GENERATED ALWAYS column in table 'server.dbo.CompletionImport'.
Use INSERT with a column list to exclude the GENERATED ALWAYS column, or insert a DEFAULT into GENERATED ALWAYS column.
So I'm a little unsure how to implement this solution in Kusto. Would I just add a project pipe excluding the GENERATED ALWAYS columns? Maybe ideally how could I insert a DEFAULT value into GENERATED ALWAYS SQL Server columns using a Kusto query?
Edit: Trying to use materialize to create a temporary table in the cache and export this cached table. However I can't find any documentation on this and the operation is failing:
let dboV2CompletionSearch = apiV2CompletionSearchFinal
| project every, variable, besides, generated, always, ones;
let cachedCS = materialize(dboV2CompletionSearch);
.export async to sql ['CompletionImport']
h#"connect_string"
with (createifnotexists="true", primarykey="CompletionSearchId")
<|set notruncation;
cachedCS
| where hash(SourceRecordId, 1) == 0
with the following error message:
Semantic error: 'set notruncation;
cachedCS
| where hash(SourceRecordId, 1) == 0'
has the following semantic error: SEM0100:
'where' operator: Failed to resolve table or column expression named 'cachedCS'.

Related

R : problem with the dplyr::tbl() function due to restricted permission

I work with large databases that needs to be stored into a server.
So, to work with them on Rstudio I have to open a connection to my Microsoft SQL Server with the dbConnect function :
conn <- dbConnect(odbc(),"myconnection",uid="***",pwd="***",schema="dbo",access="readonly")
and in order to use dplyr, I have to create data references with the tbl function :
data <- tbl(conn, "data")
But one of the online dataframe contains a columns that I can't read because I dont have the access, but I can read everything else.
The SQL query behind the tbl() function is :
SELECT * FROM data
and this is my problem.
Even when I try to select a specific column it doesn't work (see below), so I can't create my references and I can't work.
select(tbl(conn, "data"), "columnX")
=
SELECT columnX FROM data
I think this is the tbl() function and the call of "SELECT *" that blocks me.
Do you know what can I do ? Is there smilar functions that could resolve my problem ?
If you know the columns that you have access to, then one option is to bypass the default access SELECT * FROM ... with your own SQL query.
A remote table is defined by two components:
The database conneciton
The query to the database
When you connect with the default approach tbl(conn, 'data') then it defaults to a query SELECT * FROM data.
But here is another approach:
custom_query = 'SELECT columnX FROM data'
remote_table = tbl(conn, dbplyr::sql(customer_query))

DBUnit insists on inserting null for unspecified values, but I want the DB default value to be used

I'm having this problem with DBUnit causing a SQL insert error. Say I have this in my dbunit testdata.xml file:
<myschema.mytable id="1" value1="blah" value2="foo" />
I have a table like this (postgres)
myschema.mytable has an id, value1, value2, and a date field, say "lastmodified." The lastmodified column is timestamp with modifiers "not null default now()"
It appears that dbunit reads the table metadata and attempts to insert nulls for any column that isn't specified in my testdata.xml file. So the above xml results in an insert like this:
insert into myschema.mytable (id,value1,value2,lastmodified) values (1,'blah','foo',null)
When running tests (dbunit/maven plugin) I get an error like this:
Error executing database operation: REFRESH: org.postgresql.util.PSQLException: ERROR: null value in column "lastmodified" violates not-null constraint
Is there some way to tell DBUnit to NOT INSERT null values on fields that I don't specify?
Edit: Using dbunit 2.5.3, junit 4.12, postgressql driver 9.4.1208
Use the dbUnit "exclude column" feature:
How to exclude some table columns at runtime?
The FilteredTableMetaData class introduced in DbUnit 2.1 can be used in combination with the IColumnFilter interface to decide the inclusion or exclusion of table columns at runtime.
FilteredTableMetaData metaData = new FilteredTableMetaData(originalTable.getTableMetaData(), new MyColumnFilter());
ITable filteredTable = new CompositeTable(metaData, originalTable);

SSIS Foreach Loop failure

I have created a lookup for a list of IDs and a subsequent Foreach loop to run an sql stmt for each ID.
My variable for catching the list of IDs is called MissingRecordIDs and is of type Object. In the Foreach container I map each value to a variable called RecordID of type Int32. No fancy scripts - I followed these instructions: https://www.simple-talk.com/sql/ssis/implementing-foreach-looping-logic-in-ssis-/ (without the file loading part - I am just running an SQL stmt).
It runs fine from within SSIS, but when I deploy it my Integration Services Catalogue in MSSQL it fails.
This is the error I get when running from SQL Mgt Studio:
I thought I could just put a Precendence Constraint after MissingRecordsIDs get filled to check for NULL and skip the Foreach loop if necessary - but I can't figure out how to check for NULL in an Object variable?
Here is the Variable declaration and Object getting enumerated:
And here is the Variable mapping:
The SQL stmt that is in 'Lookup missing Orders':
select distinct cast(od.order_id as int) as order_id
from invman_staging.staging.invman_OrderDetails_cdc od
LEFT OUTER JOIN invman_staging.staging.invman_Orders_cdc o
on o.order_id = od.order_id and o.BatchID = ?
where od.BatchID = ?
and o.order_id is null
and od.order_id is not null
In the current environment this query returns nothing - there are no missing Orders, so I don't want to go into the 'Foreach Order Loop' at all.
This is a known issue Microsoft is aware of: https://connect.microsoft.com/SQLServer/feedback/details/742282/ssis-2012-wont-assign-null-values-from-sql-queries-to-variables-of-type-string-inside-a-foreach-loop
I would suggest to add an ISNULL(RecordID, 0) to the query as well as set an expression to the component "Load missing Orders" in order to enable it only when RecordID != 0.
In my case it wasn't NULL causing the problem, the ID value which I loaded from database was stored as nvarchar(50), even if it was a integer, I attempted to use it as integer in SSIS and it kept giving me the same error message, this worked for me:
SELECT CAST(id as INT) FROM dbo.Table

SQLBulkCopy: Does Column Count make difference?

I try to search but didn't found answer to relative simple thing. I have a CSV, that doesn't have all the column as in my database table, as well as it miss the auto increment, primary key in CSV too.
All I did is I read CSV into the DataSet, and then run a traditional SQLBulkCopy code to read the first table of dataset to database table. But it give me following error:
The given ColumnMapping does not match up with any column in the source or destination.
My code for bulkcopy is
using (SqlBulkCopy blkcopy = new SqlBulkCopy(DBUtility.ConnectionString))
{
blkcopy.EnableStreaming = true;
blkcopy.DestinationTableName = "Project_" + this.ProjectID.ToString() + "_Data";
blkcopy.BatchSize = 100;
foreach (DataColumn c in ds.Tables[0].Columns)
{
blkcopy.ColumnMappings.Add(c.ColumnName, c.ColumnName);
}
blkcopy.WriteToServer(ds.Tables[0]);
blkcopy.Close();
}
I add Mapping to test, but it doesn't make difference to remove mapping part. If we remove mapping that it try to match column in order and since column are different in count they end up mismatch datatype and lesser column values etc. Oh yes the column names from CSV does match that from Table, and are in same case.
EDIT: I change the mapping code to compare the column name from live DB. For this I simply run a SQL Select query to fetch 1 record from database table and then do following
foreach (DataColumn c in ds.Tables[0].Columns)
{
if (LiveDT.Columns.Contains(c.ColumnName))
{
blkcopy.ColumnMappings.Add(c.ColumnName, c.ColumnName);
}
else
{
log.WriteLine(c.ColumnName + " doesn't exists in final table");
}
}
I would dump the results of CSV into a staging SQL table...and then do a simple insert from staging table to production table.
also do a simple Import of CSV into SQL Table, maybe there are some empty/invalid columns within CSV file.
I once had this problem and the cause was a difference in the case of the column names. One of the columns was "Id", but in the DB it was "id".

Correct method of deleting over 2100 rows (by ID) with Dapper

I am trying to use Dapper support my data access for my server app.
My server app has another application that drops records into my database at a rate of 400 per minute.
My app pulls them out in batches, processes them, and then deletes them from the database.
Since data continues to flow into the database while I am processing, I don't have a good way to say delete from myTable where allProcessed = true.
However, I do know the PK value of the rows to delete. So I want to do a delete from myTable where Id in #listToDelete
Problem is that if my server goes down for even 6 mintues, then I have over 2100 rows to delete.
Since Dapper takes my #listToDelete and turns each one into a parameter, my call to delete fails. (Causing my data purging to get even further behind.)
What is the best way to deal with this in Dapper?
NOTES:
I have looked at Tabled Valued Parameters but from what I can see, they are not very performant. This piece of my architecture is the bottle neck of my system and I need to be very very fast.
One option is to create a temp table on the server and then use the bulk load facility to upload all the IDs into that table at once. Then use a join, EXISTS or IN clause to delete only the records that you uploaded into your temp table.
Bulk loads are a well-optimized path in SQL Server and it should be very fast.
For example:
Execute the statement CREATE TABLE #RowsToDelete(ID INT PRIMARY KEY)
Use a bulk load to insert keys into #RowsToDelete
Execute DELETE FROM myTable where Id IN (SELECT ID FROM #RowsToDelete)
Execute DROP TABLE #RowsToDelte (the table will also be automatically dropped if you close the session)
(Assuming Dapper) code example:
conn.Open();
var columnName = "ID";
conn.Execute(string.Format("CREATE TABLE #{0}s({0} INT PRIMARY KEY)", columnName));
using (var bulkCopy = new SqlBulkCopy(conn))
{
bulkCopy.BatchSize = ids.Count;
bulkCopy.DestinationTableName = string.Format("#{0}s", columnName);
var table = new DataTable();
table.Columns.Add(columnName, typeof (int));
bulkCopy.ColumnMappings.Add(columnName, columnName);
foreach (var id in ids)
{
table.Rows.Add(id);
}
bulkCopy.WriteToServer(table);
}
//or do other things with your table instead of deleting here
conn.Execute(string.Format(#"DELETE FROM myTable where Id IN
(SELECT {0} FROM #{0}s", columnName));
conn.Execute(string.Format("DROP TABLE #{0}s", columnName));
To get this code working, I went dark side.
Since Dapper makes my list into parameters. And SQL Server can't handle a lot of parameters. (I have never needed even double digit parameters before). I had to go with Dynamic SQL.
So here was my solution:
string listOfIdsJoined = "("+String.Join(",", listOfIds.ToArray())+")";
connection.Execute("delete from myTable where Id in " + listOfIdsJoined);
Before everyone grabs the their torches and pitchforks, let me explain.
This code runs on a server whose only input is a data feed from a Mainframe system.
The list I am dynamically creating is a list of longs/bigints.
The longs/bigints are from an Identity column.
I know constructing dynamic SQL is bad juju, but in this case, I just can't see how it leads to a security risk.
Dapper request the List of object having parameter as a property so in above case a list of object having Id as property will work.
connection.Execute("delete from myTable where Id in (#Id)", listOfIds.AsEnumerable().Select(i=> new { Id = i }).ToList());
This will work.

Resources