Microsoft tech references for BULK INSERT may have some gaps ... can't get them to work - sql-server

Huge edit -- I removed the ';' characters and replace them with 'GO' and ... the secondary key and URL worked, except I got this:
Cannot bulk load. The file "06May2013_usr_tmp_cinmachI.csv" does not exist or you don't have file access rights.
BTW, this can't be true :) -- I'm able to use PowerShell to upload the file so I'm sure it's not my account credentials. Here is the code I'm using now, again, it WON'T fit into a {} block no matter what I do with this editor, sorry for the inconvenience.
The docs can CREATE MASTER KEY is used to encrypt SECRET later on but there's no obvious link, assumed this is all under the hood -- is that right? If not, maybe that's what's causing my access error.
So, the issue with the data source not existing was errant syntax -- one can't use ';' evidently to terminate blocks of SQL but 'GO' will work.
The CSV file does exist:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
GO
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'removed'
GO
CREATE EXTERNAL DATA SOURCE myDataSource
WITH (TYPE = BLOB_STORAGE, LOCATION = 'https://dtstestcsv.blob.core.windows.net/sunsource', CREDENTIAL = AzureStorageCredential)
GO
BULK
INSERT dbo.ISSIVISFlatFile
FROM '06May2013_usr_tmp_cinmachI.csv'
WITH
(DATA_SOURCE = 'myDataSource', FORMAT = 'CSV')

I feel obliged to post at some info even if it's not a full answer.
I was getting this error:
Msg 4860, Level 16, State 1, Line 58
Cannot bulk load. The file "container/folder/file.txt" does not exist or you don't have file access rights.
I believe the problem might have been that I generated my SAS key from right now, but that is UTC time, meaning that here in Australia, the key only becomes valid in ten hours. So I generated a new key that started a month before and it worked.
The SAS (Shared Access Signature) is a big string that is created as follows:
In Azure portal, go to your storage account
Press Shared Access Signature
Fill in fields (make sure your start date is a few days prior, and you can leave Allowed IP addresses blank)
Press Generate SAS
Copy the string in the SAS Token field
Remove the leading ? before pasting it into your SQL script
Below is my full script with comments.
-- Target staging table
IF object_id('recycle.SampleFile') IS NULL
CREATE TABLE recycle.SampleFile
(
Col1 VARCHAR(MAX)
);
-- more info here
-- https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/02/23/loading-files-from-azure-blob-storage-into-azure-sql-database/
-- You can use this to conditionally create the master key
select * from sys.symmetric_keys where name like '%DatabaseMasterKey%'
-- Run once to create a database master key
-- Can't create credentials until a master key has been generated
-- Here, zzz is a password that you make up and store for later use
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'zzz';
-- Create a database credential object that can be reused for external access to Azure Blob
CREATE DATABASE SCOPED CREDENTIAL BlobTestAccount
WITH
-- Must be SHARED ACCESS SIGNATURE to access blob storage
IDENTITY= 'SHARED ACCESS SIGNATURE',
-- Generated from Shared Access Signature area in Storage account
-- Make sure the start date is at least a few days before
-- otherwise UTC can mess you up because it might not be valid yet
-- Don't include the ? or the endpoint. It starts with 'sv=', NOT '?' or 'https'
SECRET = 'sv=2016-05-31&zzzzzzzzzzzz'
-- Create the external data source
-- Note location starts with https. I've seen examples without this but that doesn't work
CREATE EXTERNAL DATA SOURCE BlobTest
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://yourstorageaccount.blob.core.windows.net',
CREDENTIAL= BlobTestAccount);
BULK INSERT recycle.SampleFile
FROM 'container/folder/file'
WITH ( DATA_SOURCE = 'BlobTest');
-- If you're fancy you can use these to work out if your things exist first
select * from sys.database_scoped_credentials
select * from sys.external_data_sources
DROP EXTERNAL DATA SOURCE BlobTest;
DROP DATABASE SCOPED CREDENTIAL BlobTestAccount;
One thing that this wont do that ADF does, is pick up a file based on wildcard.
That is: If I have a file called ABC_20170501_003.TXT, I need to explicitly list that in the bulk insert load script, whereas in ADF I can just specify ABC_20170501 and it automatically wildcards the rest
Unfortunately there is no (easy) way to enumerate files in blob storage from SQL Server. I eventually got around this by using Azure Automation to run a powershell script to enumerate the files and register them into a table that SQL Server could see. This seems complicated but actually Azure Automation is a very useful tool to learn and use, and it works very reliably
More opinions on ADF:
I couldn't find a way to pass the filename that I loaded (or other info) into the database.
Do not use ADF if you need data to be loaded in the order it appears in the file (i.e. as captured by an identity field). ADF will try and do things in parallel. In fact, my ADF did insert things in order for about a week (i.e. as recorded by the identity) then one day it just started inserting stuff out of order.
The timeslice concept is useful in limited circumstances (when you have cleanly delineated data in cleanly delineated files that you want to drop neatly into a table). In any other circumstances it is complicated, unwieldy and difficult to understand and use. In my experience real world data needs more complicated rules to work out and apply the correct merge keys.
I don't know the cost difference between importing files via ADF and files via BULK INSERT, but ADF is slow. I don't have to patience to hack through Azure blades to find metrics right now but your talking 5 minutes in ADF vs 5 seconds in Bulk Insert
UPDATE:
Try Azure Data Factory V2. It is vastly improved, and you are no longer bound to timeslices.

Related

How to read from one DB but write to another using Snowflake's Snowpark?

I'm SUPER new to Snowflake and Snowpark, but I do have respectable SQL and Python experience. I'm trying to use Snowpark to do my data prep and eventually use it in a data science model. However, I cannot write to the database from which I'm pulling from -- I need to create all tables in a second DB.
I've created code blocks to represent both input and output DBs in their own sessions, but I'm not sure that's helpful, since I have to be in the first session in order to even get the data.
I use code similar to the following to create a new table while in the session for the "input" DB:
my_table= session.table("<SCHEMA>.<TABLE_NAME>")
my_table.toPandas()
table_info = my_table.select(col("<col_name1>"),
col("<col_name2>"),
col("<col_name3>").alias("<new_name>"),
col("<col_name4"),
col("<col_name5")
)
table_info.write.mode('overwrite').saveAsTable('MAINTABLE')
I need to save the table MAINTABLE to a secondary database that is different from the one where the data was pulled from. How do I do this?
It is possible to provide fully qualified name:
table_info.write.mode('overwrite').saveAsTable('DATABASE_NAME.SCHEMA_NAME.MAINTABLE')
DataFrameWriter.save_as_table
Parameters:
table_name – A string or list of strings that specify the table name or fully-qualified object identifier (database name, schema name, and table name).

Get audit history records of any entity record as per CRM view

I want to display all audit history data as per MS CRM format.
I have imported all records from AuditBase table from CRM to another Database server table.
I want this table records using SQL query in Dynamics CRM format (as per above image).
I have done so far
select
AB.CreatedOn as [Created On],SUB.FullName [Changed By],
Value as Event,ab.AttributeMask [Changed Field],
AB.changeData [Old Value],'' [New Value] from Auditbase AB
inner join StringMap SM on SM.AttributeValue=AB.Action and SM.AttributeName='action'
inner join SystemUserBase SUB on SUB.SystemUserId=AB.UserId
--inner join MetadataSchema.Attribute ar on ab.AttributeMask = ar.ColumnNumber
--INNER JOIN MetadataSchema.Entity en ON ar.EntityId = en.EntityId and en.ObjectTypeCode=AB.ObjectTypeCode
--inner join Contact C on C.ContactId=AB.ObjectId
where objectid='00000000-0000-0000-000-000000000000'
Order by AB.CreatedOn desc
My problem is AttributeMask is a comma separated value that i need to compare with MetadataSchema.Attribute table's columnnumber field. And how to get New value from that entity.
I have already checked this link : Sql query to get data from audit history for opportunity entity, but its not giving me the [New Value].
NOTE : I can not use "RetrieveRecordChangeHistoryResponse", because i need to show these data in external webpage from sql table(Not CRM database).
Well, basically Dynamics CRM does not create this Audit View (the way you see it in CRM) using SQL query, so if you succeed in doing it, Microsoft will probably buy it from you as it would be much faster than the way it's currently handled :)
But really - the way it works currently, SQL is used only for obtaining all relevant Audit view records (without any matching with attributes metadata or whatever) and then, all the parsing and matching with metadata is done in .NET application. The logic is quite complex and there are so many different cases to handle, that I believe that recreating this in SQL would require not just some simple "select" query, but in fact some really complex procedure (and still that might be not enough, because not everything in CRM is kept in database, some things are simply compiled into the libraries of application) and weeks or maybe even months for one person to accomplish (of course that's my opinion, maybe some T-SQL guru will prove me wrong).
So, I would do it differently - use RetrieveRecordChangeHistoryRequest (which was already mentioned in some answers) to get all the Audit Details (already parsed and ready to use) using some kind of .NET application (probably running periodically, or maybe triggered by a plugin in CRM etc.) and put them in some Database in user-friendly format. You can then consume this database with whatever external application you want.
Also I don't understand your comment:
I can not use "RetrieveRecordChangeHistoryResponse", because i need to
show these data in external webpage from sql table(Not CRM database)
What kind of application cannot call external service (you can create a custom service, don't have to use CRM service) to get some data, but can access external database? You should not read from the db directly, better approach would be to prepare a web service returning the audit you want (using CRM SDK under the hood) and calling this service by external application. Unless of course your external app is only capable of reading databases, not running any custom web services...
It is not possible to reconstruct a complete audit history from the AuditBase tables alone. For the current values you still need the tables that are being audited.
The queries you would need to construct are complex and writing them may be avoided in case the RetrieveRecordChangeHistoryRequest is a suitable option as well.
(See also How to get audit record details using FetchXML on SO.)
NOTE
This answer was submitted before the original question was extended stating that the RetrieveRecordChangeHistoryRequest cannot be used.
As I said in comments, Audit table will have old value & new value, but not current value. Current value will be pushed as new value when next update happens.
In your OP query, ab.AttributeMask will return comma "," separated values and AB.changeData will return tilde "~" separated values. Read more
I assume you are fine with "~" separated values as Old Value column, want to show current values of fields in New Value column. This is not going to work when multiple fields are enabled for audit. You have to split the Attribute mask field value into CRM fields from AttributeView using ColumnNumber & get the required result.
I would recommend the below reference blog to start with, once you get the expected result, you can pull the current field value using extra query either in SQL or using C# in front end. But you should concatenate again with "~" for values to maintain the format.
https://marcuscrast.wordpress.com/2012/01/14/dynamics-crm-2011-audit-report-in-ssrs/
Update:
From the above blog, you can tweak the SP query with your fields, then convert the last select statement to 'select into' to create a new table for your storage.
Modify the Stored procedure to fetch the delta based on last run. Configure the sql job & schedule to run every day or so, to populate the table.
Then select & display the data as the way you want. I did the same in PowerBI under 3 days.
Pros/Cons: Obviously this requirement is for reporting purpose. Globally reporting requirements will be mirroring database by replication or other means and won't be interrupting Prod users & Async server by injecting plugins or any On demand Adhoc service calls. Moreover you have access to database & not CRM online. Better not to reinvent the wheel & take forward the available solution. This is my humble opinion & based on a Microsoft internal project implementation.

DB2 row level access control: how to pass a user Id

In our web application we want to use DB2 row level access control to control who can view what. Each table would contain a column named userId which contain the user id. We want log-in users be able to see only row's usereId column with theirs id. I have seen db2 permission examples using DB2 session_id or user, for example taking DB2 given Banking example :
CREATE PERMISSION EXAMPLEBANKING.IN_TELLER_ROW_ACCESS
ON EXAMPLEBANKING.CUSTOMER FOR ROWS WHERE BRANCH in (
SELECT HOME_BRANCH FROM EXAMPLEBANKING.INTERNAL_INFO WHERE EMP_ID = SESSION_USER
)
ENFORCED FOR ALL ACCESS
ENABLE;
Our table gets updated dynamically hence we don't know what row get added or deleted hence we don't know what are all the user Id in the table.
At any given time, different user would log-on to the web to view information retrieve from the tables, the permission declaration above only take SESSION_USER as the input, can I change it to something like Java function parameter where one can pass arbitrary id to the permission? If not then how do I handle different log-in users at arbitrary time? Or do I just keep changing SESSION_USER dynamically as new user login (using "db2 set" ??)? If so then is this the best practice for this kind use case?
Thanks in advance.
Since the user ID in question is application-provided, not originating from the database, using SESSION_USER, which equals to the DB2 authorization ID, would not be appropriate. Instead you might use the CLIENT_USERID variable, as described here.
This might become a little tricky if you use connection pooling in your application, as the variable must be set each time after obtaining a connection from the pool and reset before returning it to the pool.
Check out Trusted Contexts, this is exactly why they exist. The linked article is fairly old (you can use trusted contexts with PHP, ruby, etc. now).

REST Backend with specified columns, SQL questions

I'm working with a new REST backend talking to a SQL Server. Our REST api allows for the caller to pass in the columns/fields they want returned (?fields=id,name,phone).
The idea seems very normal. The issue I'm bumping up against is resistance to dynamically generating the SQL statement. Any arguments passed in would be passed to the database using a parameterized query, so I'm not concerned about SQL injection.
The basic idea would be to "inject" the column-names passed in, into a SQL that looks like:
SELECT <column-names>
FROM myTable
ORDER BY <column-name-to-sort-by>
LIMIT 1000
We sanitize all column names and verify their existence in the table, to prevent SQL injection issues. Most of our programmers are used to having all SQL in static files, and loading them from disk and passing them on to the database. The idea of code creating SQL makes them very nervous.
I guess I'm curious if others actually do this? If so, how do you do this? If not, how do you manage "dynamic columns and dynamic sort-by" requests passed in?
I think a lot of people do it especially when it comes to reporting features. There are actually two things one should do to stay on the safe side:
Parameterize all WHERE clause values
Use user input values to pick correct column/table names, don't use the user values in the sql statement at all
To elaborate on item #2, I would have a dictionary where Key is a possible user input and Value is a correponding column/table name. You can store this dictionary wherever you want: config file, database, hard code, etc. So when you process user input you just check a dictionary if the Key exists and if it does you use the Value to add a column name to your query. This way you just use user input to pick required column names but don't use the actual values in your sql statement. Besides, you might not want to expose all columns. With a predefined dictionary you can easily control the list of available columns for a user.
Hope it helps!
I've done similar to what Maksym suggests. In my case, keys were pulled directly from the database system tables (after scrubbing the user request a bit for syntactic hacks and permissions).
The following query takes care of some minor injection issues through the natural way SQL handles the LIKE condition. This doesn't go as far as handling permissions on each field (as some fields are forbidden based on the log-in) but it provides a very basic way to retrieve these fields dynamically.
CREATE PROC get_allowed_column_names
#input VARCHAR(MAX)
AS BEGIN
SELECT
columns.name AS allowed_column_name
FROM
syscolumns AS columns,
sysobjects AS tables
WHERE
columns.id = tables.id AND
tables.name = 'Categories' AND
#input LIKE '%' + columns.name + '%'
END
GO
-- The following only returns "Picture"
EXEC get_allowed_column_names 'Category_,Cat%,Picture'
GO
-- The following returns both "CategoryID and Picture"
EXEC get_allowed_column_names 'CategoryID, Picture'
GO

Web2Py - Create table from user input

I'm trying to create a table in a database in web2py. I'm new to this and trying to get a hold of the MVC structure and how to call in between.
What I have done is in /modles/db.py I created a DB:
TestDB = DAL("sqlite://storage.sqlite")
Then in my /controllers/default.py I have:
def index():
form = FORM(INPUT(_name='name', requires=IS_NOT_EMPTY()),
INPUT(_type='submit'))
if form.process().accepted:
TestDB().define_table(form.vars.name, Field('testField', unique=True))
return dict(form=form)
return dict(form=form)
But this isn't working. Could somebody help me with understanding how to achieve this?
Thank you.
First, to define the table, it would be TestDB.define_table(...), not TestDB().define_table(...).
Second, table definitions don't persist across requests (of course, the database tables themselves persist, but the DAL table definitions do not). So, if you define a table inside the index() function, that is the only place it will exist and be accessible in your app. If you want to access the table elsewhere in your app, you'll need to store the metadata about the table definition (in this case, just the table name) somewhere, perhaps in the database or a file. You would then retrieve that information on each request (and probably cache it to speed things up) and use it to create the table definition.
Another option would be to generate the table definition code and append it to a model file so it will automatically be run on each request. This is roughly how the new application wizard in the "admin" app works when generating a new application.
Finally, you could leave the table definition in the index() function as you have it, and then when you create the database connection on each request, you can use auto_import:
TestDB = DAL("sqlite://storage.sqlite", auto_import=True)
That will automatically create the table definitions for all tables in TestDB based on the metadata stored in the application's /databases/*.table files. Note, the metadata only includes database-specific metadata, such as table and field names and field types -- it does not include web2py-specific attributes, such as validators, default values, compute functions, etc. So, this option is of limited usefulness.
Of course, all of this has security implications. If you let users define tables and fields, a particular submission could mistakenly or maliciously alter existing database tables. So, you'll have to do some careful checking before processing user submissions.

Resources