Our organization is looking to bring on board the Alation product for data mining.
To do so, Alation requires a service account in Snowflake to touch every database, every schema in Snowflake.
Obviously this is very broad access and the org is concerned about security.
Has anyone else done this, and possibly quantify the risks involved?
Related
I am exploring building on top of Datomic. I am sold on the principle of the database 'as a value'. The thing is, we need to be able to provide sanitized copies of the database to our developers to run locally. Any sensitive data which we are required to keep on the correct side of the firewall must not leak out.
With a standard SQL database this is easy: we just have a service inside the firewall which takes a snapshot of the DB and runs some script against it to update-in-place the sensitive value such that my.secret.email#address.com > email00123#address.com etc. Then the sanitised DB is made available to the developer to lift out of the compliance zone.
However, my understanding of Datomic (and its very strength) is that nothing is ever updated in place. So how would it be possible to sanitise a Datomic DB? Thanks.
This is one use case for filtering databases in Datomic. Filtering for security reasons is also discussed in this talk by Nubank.
The operational model is a bit different from the SQL world because user access and authorization and authentication, etc. are not baked into the database to the same degree. Any peer participates in the database fully and can submit transactions, etc., request an unfiltered database, etc. as API calls. You need an additional application layer (i.e. to create a client for your peer-as-a-server and only expose endpoints for queries against the filtered database) if you want stronger security guarantees.
This is a long question, but please bear with me as I believe it raises important issues about database ownership and access.
I manage and internationally market a "universal" geothermal data management program, written in Delphi, that is a front end to a SQL Server database. The data in the database is derived from many diverse measurements generated and used by the program users over time periods of 30 years or more - i.e. they "own" the data, and the database is primarily a way to efficiently store and manage the data.
Like all databases, the database structure needs to be modified from time to time, including new tables, and this modification is delivered by the release of a new version of the program. The program prompts for a database upgrade, which has to be carried out by a dbo user so that all new tables can be accessed by the other program users. Unfortunately, the program may be used in remote sites and the IT personnel may not be readily available, so that the new version may get installed but the databases are not upgraded. What has frequently happened in such locations is that a program user will upgrade the databases without appropriate SQL Server permissions, and then the other users cannot access the new tables and the program crashes.
One of the program customers has taken another approach. They have created a db_owner role for all the databases used by the program and then make all program users members of the db_owner role. The program has inbuilt permission levels that can restrict the ability to upgrade databases, so normally only one or two users have this permission. However, with everyone a member of the db_owner role, it doesn't matter who upgrades the database, all tables will be accessible to all program users.
The advantage of this approach include the following:
Access permissions can be granted by the group who uses the program, and who has ultimate responsibility for the database.
Knowledge and understanding of the program is passed on within the program users group when staff changes, rather than relying on the IT department as the repository of information on "how it works" (and often they do not know).
Direct data-mining and back-door data modification is possible to selected user experts. While the program has extensive datasearch and editing tools, sometimes these are not enough and the users need hands-on access.
The program users retain "ownership" of their data.
I would appreciate your comments. I believe that in circumstances such as these, it is important that all the database users are db_owners, and the group of users controls access. Not allowing db_owner roles (a strategy commonly employed by IT departments) fails to recognize the importance of data ownership and data accessibility, and the responsibility of the database users to manage their own data.
The way you've stated your question makes it sound like you've already arrived at a conclusion. The one question that I always ask when someone comes to me (a DBA) with this sort of situation is: if someone accidentally deletes data, am I on the hook to get it back? If the answer is "yes", then they don't get db_owner. If the answer is "no", then the db gets moved to its own server and I get the contract in writing.
The only time I wouldn't bother with access control would be with a simple app running on a local single-user database like SqlExpress. As soon as there are multiple users on a centralised database and important data to protect, restricted access is important. I'm not familiar with your domain (geothermal data management), but surely this data is important to your customers, from integrity, tampering and even a data access point of view (theft of data could be resold to a competitor).
the program may be used in remote sites and the IT personnel may not
be readily available, so that the new version may get installed but
the databases are not upgraded
(i.e. I'm assuming an upgrade script needs to be manually and independently run on the database). It is common nowadays for apps to check the database for schema versioning and even for static data population, e.g. Entity Framework code-first migrations on the .net stack. The app will then have the ability to actually perform the schema and data upgrade automatically. It should be quite straightforward for you to add the last N versions of your DB upgrade scripts into your app and then do a version check? (Obviously the app itself would need to prompt for dbo access, assuming that even the app should not have dbo access).
with everyone a member of the db_owner role, it doesn't matter who
upgrades the database
I believe this may place unnecessary responsibility (and power) in the hands of unqualified customer users.
Even the ad-hoc data mining (SELECT) access should be reconsidered, as a badly formed query without consideration can cause performance degradation or block other concurrent writers. If nothing else, providing a few well formed views will at least ensure decent query plans.
/10c
We are working on rewriting our existing RIA and redesigning our database to re-architect it's design. Now we have 2 opinions about database:
(This choices are for SaaS based hosting.)
1) Individual database for each customer.
2) Single DB for all customers.
We are expecting good amount of data, some of our customers have db size ranging from 2GB to 10GB. # of tables are around 100.
Can I get an answer about which choice we shall go for?
We are not thinking about NoSQL solution as of now but we are planning to support about 4-5 databases with JPA (Java Persistence API) which includes MySQL, Postgres, Oracle, MSSQL for now.
P.S: We might leverage Amazon cloud for hosting.
The three main techniques that is usually applied to the database usage for this kind of a multi-tenant requirement is below. You have already specified some of them.
Separate databases for each tenant:
very high cost, easy to maintain/customize, easy to tune, easy to backup, easy to code to.
Shared database but different schema:
Low cost compared to (1), may encounter issues quickly with increased db size, easy to personalize per tenant, difficult to backup/restore per tenant, easy to code to.
Shared database Shared schema:
Low cost, load of one tenant will affect others, security and app development a challenge, difficult to personalize per tenant, difficult to restore/backup.
I think the above points hold good for hosting on premise or on cloud.
If you see the number of tenants growing or the data getting bigger then 1) or 2) is better. I have used option 2) and have seen it helping development and maintenance.
I’m looking at an implement for multi-tenancy in SQL Server. I'm considering a shared database, shared schema and tenant view filter described here. The only drawback is a fragmented connection pool...
Per http://msdn.microsoft.com/en-au/architecture/aa479086, Tenant View Filter is described as follows:
"SQL views can be used to grant individual tenants access to some of the rows in a given table, while preventing them from accessing other rows.
In SQL, a view is a virtual table defined by the results of a SELECT query. The resulting view can then be queried and used in stored procedures as if it were an actual database table. For example, the following SQL statement creates a view of a table called Employees, which has been filtered so that only the rows belonging to a single tenant are visible:
CREATE VIEW TenantEmployees AS
SELECT * FROM Employees WHERE TenantID = SUSER_SID()
This statement obtains the security identifier (SID) of the user account accessing the database (which, you'll recall, is an account belonging to the tenant, not the end user) and uses it to determine which rows should be included in the view"
Thinking this through , if we have one database storing say 5,000 different tenants, then the connection pool is completely fragmented and every time a request is sent to the database ADO.NET needs to establish a new connection and authenticate (remember connection pooling works for each unique connection string) and this approach means you have 5,000 connection strings…
How worried should I be about this? Can someone give me some real world examples of how significant an impact the connection pool has on a busy multi-tenant database server (say servicing 100 requests per second)? Can I just throw more hardware at the problem and it goes away?
Thoughts ??
My sugestion will be to develop a solid API over your database. Scalability, modularity, extensibility, accounting will be the main reasons. Few years down the line you may be found swearing at yourself for playing with SUSER_SID(). For instance, consider multiple tenants managed by one account or situations like whitelabels...
Have a data access api, which will take care of authentication. You can still do authorisation on the DB level, but it's a whole different topic then. Have users and perhaps groups and grant them permissions to tenants.
For huge projects nevertheless, you'll still find it better to have a single DB per big player.
I see I did not answer your main question about fragmented connection pool performance, but I'm convinced there are many valid arguments not to go that path nevertheless.
See http://msdn.microsoft.com/en-us/library/bb669058.aspx for hybrid solution.
See Row level security in SQL Server 2012
We have literally 100's of Access databases floating around the network. Some with light usage and some with quite heavy usage, and some no usage whatsoever. What we would like to do is centralise these databases onto a managed database and retain as much as possible of the reports and forms within them.
The benefits of doing this would be to have some sort of usage tracking, and also the ability to pay more attention to some of the important decentralised data that is stored in these apps.
There is no real constraints on RDBMS (Oracle, MS SQL server) or the stack it would run on (LAMP, ASP.net, Java) and there obviously won't be a silver bullet for this. We would like something that can remove the initial grunt work in an automated fashion.
We upsize (either using the upsize wizard or by hand) users to SQL server. It's usually pretty straight forward. Replace all the access tables with linked tables to the sql server and keep all the forms/reports/macros in access. The investment in access isn't lost and the users can keep going business as usual. You get reliability of sql server and centralized backups. Keep in mind - we’ve done this for a few large access databases, not hundreds. I'd do a pilot of a few dozen and see how it works out.
UPDATE:
I just found this, the sql server migration assitant, it might be worth a look:
http://www.microsoft.com/sql/solutions/migration/default.mspx
Update: Yes, some refactoring will be necessary for poorly designed databases. As for how to handle access sprawl? I've run into this at companies with lots of technical users (engineers esp., are the worst for this... and excel sprawl). We did an audit - (after backing up) deleted any databases that hadn't been touched in over a year. "Owners" were assigned based the location &/or data in the database. If the database was in "S:\quality\test_dept" then the quality manager and head test engineer had to take ownership of it or we delete it (again after backing it up).
Upsizing an Access application is no magic bullet. It may be that some things will be faster, but some types of operations will be real dogs. That means that an upsized app has to be tested thoroughly and performance bottlenecks addressed, usually by moving the data retrieval logic server-side (views, stored procedures, passthrough queries).
It's not really an answer to the question, though.
I don't think there is any automated answer to the problem. Indeed, I'd say this is a people problem and not a programming problem at all. Somebody has to survey the network and determine ownership of all the Access databases and then interview the users to find out what's in use and what's not. Then each app should be evaluated as to whether or not it should be folded into an Enterprise-wide data store/app, or whether its original implementation as a small app for a few users was the better approach.
That's not the answer you want to hear, but it's the right answer precisely because it's a people/management problem, not a programming task.
Oracle has a migration workbench to port MS Access systems to Oracle Application Express, which would be worth investigating.
http://apex.oracle.com
So? Dedicate a server to your Access databases.
Now you have the benefit of some sort of usage tracking, and also the ability to pay more attention to some of the important decentralised data that is stored in these apps.
This is what you were going to do anyway, only you wanted to use a different database engine instead of NTFS.
And now you have to force the users onto your server.
Well, you can encourage them by telling them that you aren't going to overwrite their data with old backups anymore, because now you will own the data, and you won't do that anymore.
Also, you can tell them that their applications will run faster now, because you are going to exclude the folder from on-access virus scanning (you don't do that to your other databases, which is why they are full of sql-injection malware, but these databases won't be exposed to the internet), and planning to turn packet signing off (you won't need that on a dedicated server: it's only for people who put their file-share on their domain-server).
Easy upgrade path, improved service to users, greater centralization and control for IT. Everyone's a winner.
Further to David Fenton's comments
Your administrative rule will be something like this:
If the data that is in the database is just being used by one user, for their own work (alone), then they can keep it in their own network share.
If the data that is in the database is for being used by more than one person (even if it is only two), then that database must go on a central server and go under IT's management (backups, schema changes, interfaces, etc.). This is because, someone experienced needs to coordinate the whole show or we will risk the time/resources of the next guy down the line.