Encryption on the fly - sql-server

here is something interesting that I have been asked. It has to do with the encryption of data in a non encrypted database.
The story has as follows. We have a database, not encrypted and also none column encrypted in any of its tables. Now, we'd like to control the trafic of the data depending on who is asking for this. Let me explain more clear:
We have a table with the name: table1
This table has one column with the name: SName
We'd like to reach the following result. A user connected to the SQL Server Management Studio if runs the following query:
select * from table1
to take no result or if he/she takes a result, this result to be scrambled.
Now from inside the application the table should exchange data from/to the application in the normal mode.
Do you know if there is a setting, or an implementation or an external tool that can provide this functionality?
I think that this is quite interesting case!
Thank you.

Use permissions to stop that person reading the table at all.
Or use a VIEW to hide the table and have a WHERE clause in that that applies a filter silently: this could refer to another table with a list of approved users.
This isn't really an encryption (well, obfuscation in this case) issue.

Related

Azure Data Factory- Referencing Lookup activities in Queries

I'm following a tutorial on Azure Data Factory migration from Azure SQL to Blob through pipelines. While most of the concepts make sense, the 'Copy Data' query is a bit confusing. I have a background in writing Oracle SQL, but Azure SQL on ADF is pretty different and I'm struggling to find specific technical documentation, probably because it's not widely adopted yet.
Pipeline configuration shown below:
Query is posted below:
SELECT data_source_table.PersonID,data_source_table.Name,data_source_table.Age,
CT.SYS_CHANGE_VERSION, SYS_CHANGE_OPERATION
FROM data_source_table
RIGHT OUTER JOIN CHANGETABLE(CHANGES data_source_table,
#{activity('LookupLastChangeTrackingVersionActivity').output.firstRow.SYS_CHANGE_VERSION})
AS CT ON data_source_table.PersonID = CT.PersonID
WHERE CT.SYS_CHANGE_VERSION <=
#{activity('LookupCurrentChangeTrackingVersionActivity').output.firstRow.CurrentChangeTrackingVersion}
Output to the sink Blob as a result of the 'Copy Data' query:
2,name2,14,4,U
7,name7,51,3,I
8,name8,31,5,I
9,name9,38,6,I
Couple questions I had:
There's a lot of external referencing from other activities in the 'Copy Data' query like #{activity('...').output.firstRow.CurrentChangeTrackingVersion. Is there a way to know the appropriate syntax to referencing external activities? Can't find any good documentation the syntax, like what .firstRow is or what the changetable output looks like. I can't replicate this query in SSMS, which makes it a bit of a black box for me.
SYS_CHANGE_OPERATION appears in the SELECT with no table name prefix. Is this directly querying from the table in SourceDataset? (It points to data_source_table, which has table tracking enabled) My main confusion stems from how table tracking information is stored in the enabled tables. Is there a way to show all the table's tracked changes in SSMS? I see some documentation on what the return values, but it's hard for me to visualize it without seeing it on the table, so an output query of some return values would be nice.
LookupLastChangeTracking activity queries in all rows from a table (which when I checked, is just one row), but LookupCurrentChangeTracking activity uses a CHANGE_TRACKING function to pull the version of the data sink in table_store_ChangeTracking_version. Why does it use a function when the data sink's version is already recorded in table_store_ChangeTracking_version?
Sorry for the many questions, but I can't find any way to make this learning curve a bit less steep. Any guides or resources would be awesome!
There is an article to get the same thing done from the UI and it will help you understand it better .
https://learn.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-change-tracking-feature-portal .
1 . These are the Lookup activity ,. very straight forward , please read about them here .
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
2.SYS_CHANGE_OPERATION is a column on data_source_table and so that should be fine . Regarding the details on the how the change tracking (CT) is stored , I am not sure if all the system table are exposed on Azure SQL , but we did had few table on the on-prem version of the SQL which could be queried if needed . But for this exercise I think that will be an over kill .

Attribute-based access control with database

I work for a software house and I'm looking for a way to authenticate database access as follow:
Someone whose personal data are stored in a table needs to read his
personal row and he has the right to access the table.
Once the requested row has been retrieved, I need an external check
which ensures he has the right to read that specific row, checking
some table's fields which are contained in the same row.
It seems to me that the attribute-based access control is what I'm looking for, but I'm not sure. Can you confirm if it's able to do what I need?
Yes, ABAC (the model) can do that. However, the check would typically happen before you get access to the data - and that's better actually (it avoids retrieving data).
ABAC gives you two things:
a policy language to express what can and cannot happen. For instance
A user can view a record they own but not the credit card field
a request / response scheme or how to enforce the policies.
In the case of data-centric access control, policies are transformed into SQL filters. For instance, you would go from:
SELECT * FROM transactions;
to
SELECT amount, owner, CASE WHEN (0=1) THEN CREDITCARD ELSE 'xxxx' FROM transactions WHERE owner = 'Alice'
This is called dynamic data filtering and dynamic data masking. Some database vendors have had that capability e.g. Oracle and VPD or MySQL and FGAC. Lately the trend is to outsource this type of behavior to tools like Informatica DDM or Axiomatics ADAF MD (which is where I work).

REST Backend with specified columns, SQL questions

I'm working with a new REST backend talking to a SQL Server. Our REST api allows for the caller to pass in the columns/fields they want returned (?fields=id,name,phone).
The idea seems very normal. The issue I'm bumping up against is resistance to dynamically generating the SQL statement. Any arguments passed in would be passed to the database using a parameterized query, so I'm not concerned about SQL injection.
The basic idea would be to "inject" the column-names passed in, into a SQL that looks like:
SELECT <column-names>
FROM myTable
ORDER BY <column-name-to-sort-by>
LIMIT 1000
We sanitize all column names and verify their existence in the table, to prevent SQL injection issues. Most of our programmers are used to having all SQL in static files, and loading them from disk and passing them on to the database. The idea of code creating SQL makes them very nervous.
I guess I'm curious if others actually do this? If so, how do you do this? If not, how do you manage "dynamic columns and dynamic sort-by" requests passed in?
I think a lot of people do it especially when it comes to reporting features. There are actually two things one should do to stay on the safe side:
Parameterize all WHERE clause values
Use user input values to pick correct column/table names, don't use the user values in the sql statement at all
To elaborate on item #2, I would have a dictionary where Key is a possible user input and Value is a correponding column/table name. You can store this dictionary wherever you want: config file, database, hard code, etc. So when you process user input you just check a dictionary if the Key exists and if it does you use the Value to add a column name to your query. This way you just use user input to pick required column names but don't use the actual values in your sql statement. Besides, you might not want to expose all columns. With a predefined dictionary you can easily control the list of available columns for a user.
Hope it helps!
I've done similar to what Maksym suggests. In my case, keys were pulled directly from the database system tables (after scrubbing the user request a bit for syntactic hacks and permissions).
The following query takes care of some minor injection issues through the natural way SQL handles the LIKE condition. This doesn't go as far as handling permissions on each field (as some fields are forbidden based on the log-in) but it provides a very basic way to retrieve these fields dynamically.
CREATE PROC get_allowed_column_names
#input VARCHAR(MAX)
AS BEGIN
SELECT
columns.name AS allowed_column_name
FROM
syscolumns AS columns,
sysobjects AS tables
WHERE
columns.id = tables.id AND
tables.name = 'Categories' AND
#input LIKE '%' + columns.name + '%'
END
GO
-- The following only returns "Picture"
EXEC get_allowed_column_names 'Category_,Cat%,Picture'
GO
-- The following returns both "CategoryID and Picture"
EXEC get_allowed_column_names 'CategoryID, Picture'
GO

Merging multiple Access databases into SQL Server

We have a program in which each user is given their own Access database. We'd like to merge these all together into a single SQL Server database.
The problem is that, using the SQL Server import/export wizard, the primary/foreign keys do not get updated. So for instance if one user has this table:
1 Apple
2 Banana
and another user has this:
1 Coconut
2 Cheeseburger
the resulting table looks like this:
1 Apple
2 Banana
1 Coconut
2 Cheeseburger
Similarly, anything that referenced Banana by its primary key (2) is now referencing both Banana and Cheeseburger, which will not make the vegans very happy.
Is there any way to automatically update the primary/foreign key references when importing, other than writing an extremely long and complex import-script?
If you need to keep them fully compartmentalized, you have to assign some kind of partitioning column to each table. Is there a reason you need your SQL Server to have the same referential integrity as Access? Are you just importing to SQL Server for read-only reporting? In that case, I would not bother with RI. The queries will all require a partitionid/siteid/customerid. You could enforce that for single-entity access by wrapping tables with a table-valued UDF which required the partitionid. For cross-site that doesn't work.
If you are just loading to SQL Server for reporting, I would also consider altering the data model to support reporting (i.e. a dimensional model is sometimes better than a normalized model) instead of worrying about transaction processing.
I think we need to know more about the underlying goals.
Need more information of requirements.
My basic question is 'Do you need to preserve the original record key?' e.g. 1:apple in table T of user-database A; 1:coconut in table T of user-database B. Table T is assumed to have the same structure in all database instances. Reasons I can suppose that you may want to preserve the original data: (a) you may have a requirement to the reference the original data (maybe a visual for previous reporting), and/or (b) there may be a data dependency in the application itself.
If the answer is 'no,' then you are probably interested only in preserving all of the distinct data values. Allow the SQL table to build using a new key and constrain the SQL table field such that it contains unique data. This approach seems to preserve the original table structure (but not the original key value or its 'location') and may suffice to meet your requirement.
If the answer is 'yes,' I do not see a way around creating an index that preserves a pointer to the original database and the key that was created in its table T. This approach would seem to require an application modification.
The best approach in this case is probably to split the incoming data into two tables: one to identify the database and original key, another to identify the distinct data values. For example: (database) table D has records such as 'A:1:a,' 'A:2:b,' 'B:1:c,' 'B:2:d,' 'B:15:a,' 'C:8:a'; (data) table T1 has records such as 'a:apple,' 'b:banana,' 'c:coconut,' 'd:cheeseburger' where 'A' describes the original database 'location,' 1 is the original value in location 'A,' and 'a' is a value that equates records in table D and table T1. (Otherwise you have a lot of redundant data in the one table; e.g. A:1:apple, B:15:apple, C:8:apple.) Also, T1 has a structure similar to the original T and is seems to be more directly useful in the application.
Ended up creating an SSIS project for this. SSIS is a visual programming tool made by Microsoft (and part of their "Business Integration Studio", which comes with SQL Server) designed for solving exactly these sorts of problems.
Why not let Access use its replication manager to merge the databases? This will allow you to identify the conflicts and resolve them before importing to SQL Server. I'm fairly confident it will retain the foreign key relationships. If I understand your situation correctly, and the databases are the same structure with different data, you could load the combined database to the application and verify the data before moving to SQL Server.
What version of Access are you using? Here's a link for Access 2000. Use the language to adjust search parameters to fit your version.
http://technet.microsoft.com/en-us/library/cc751054.aspx

How would I encrypt string data in SQL server 2008 while keeping the ability to query over it?

I have a database that will be hosted by a third party. I need to encrypt strings in certain columns, but I do not want to loose the ability to query over the encrypted columns.
I have limited control over the SQL instance (I have control over the database I own, but not to any administrative functions.)
I realize that I can use a .net encryption library to encrypt the data before it is inserted into the table, but I would then loose the ability to query the data with sql.
I like using SQL Server's key management: http://technet.microsoft.com/en-us/library/bb895340.aspx . After you have a key setup then its really easy to use:
To insert records you do this:
insert into PatientTable values ('Pamela','Doc1',
encryptByKey(Key_GUID('secret'),'111-11-1111'),
encryptByKey(Key_GUID('secret'),'Migraine'))
To select the record back out its really simple:
select Id, name, Docname
from PatientTable where SSN=encryptByKey(Key_GUID('secret'),SSN)
The cipher text will always be the same so it is much more efficient to compare the cipher text's instead of going though and decrypting each one.
if you use the same encryption key you could encrypt your search query string and match against that. Say my password is runrun i encrypt it to ZAXCXCATXCATXCA then when i want to search for a user with password runrun encrypt it first and it will match the table entry.
AFAIK, Most RDBMS do not support this, what I usually see is either:
A) The DB query API encrypts the data with a key that only the local server knows before it is sent to the remote db and decrypts when it's received.
or
B) The remote database stores everything encrypted with a key that it knows (probably at run time, given physically by an admin, or it's given the key with the query).
A will let you use the database without letting the owners know what's being stored, but you wont be able to do queries on the actual encrypted data other than maybe equality. B only protects against physical server theft (server has to be off though or they can get the key from memory...).
What I assume you want is called Private Information Retrieval. It's a fairly young field, I don't think you're going to find a decent implementation at the moment.
You could generate a hash (such as Md5 ) and store the hash value in the db. When you query you can select * from [my table] where value = {md5 hash}

Resources