We have an automated workflow which allows database admins to create a firewall rule for their own IP address when they need to access the database via SSMS. At 11:00pm every evening we have an automated runbook which clears all these rules out.
The problem is, there is no way for me to create a defined baseline since the number (and names) of these rules is always changing dynamically. So it's become a bit of cat-and-mouse. Azure keeps flagging the baseline mismatch and I just go ahead and define the new baseline to match. This is very cumbersome and the resource drain is not sustainable. Is there any way I can preserve Advanced Data Security for my database server but exclude this particular check?
The way to do this is to use Disable Rule (see screen-shot). This does not disable the entire category Check for Vulnerability assessment findings on your SQL databases should be remediated, rather it allows you to disable specific rule(s) within this overall category. This is exactly what I need.
I am developing a mobile ticketing system, and I'm reviewing my requirements against embedded SymmetricDS. The only sticking point so far is that I can't find any information directly addressing the question of row-level security.
Use Case:
Some of the mobile point-of-sale nodes will be logged in as sellers, some as managers. Everybody can view everybody else's sales. Sellers can create new sales, but never modify them. Managers can modify existing sales, but not delete them.
Problem:
I don't have strict control over the Android mobile POS units, so they aren't trustworthy -- it's not realistic to prevent a malicious seller from decompiling the APK and creating a malicious client node. My environment has security requirements, such that a malicious seller cannot be allowed to modify the sales table on the server. I can trust that the hypothetical malicious seller does not have access to manager credentials, and I can trust that the server software is secure.
Questions:
Is server-side row-level security a job for a load filter?
Can the filter script get access to the node_id originating the change?
Can the filter script get access to the authentication credentials used to register the originating node?
Yes. Some of its is possible implementing with a writer filter, like the update limiting to a certain group of nodes. Disabling deleting a row from a node just means that delete trigger shouldn't be created at all for that node group
Yes, it's possible. Methods that are implemented in an extension of an abstract load file accept arguments that carry external ID of the caller (org.jumpmind.symmetric.io.data.Batch#getSourceNodeId and DataContext#getBatch())
Basic authentication is the same for all nodes. There's a password that is automatically generated when handshake is performed and it's stored in the db. Let your load file implement symmetricds interface ISymmetricEngineAware which allows for injection of an engine that can be used to access the database
I'm working on a web based Java project that stores end user data in a MySql database. I'd like to implement something that allows the user to have functionality similar to what I have for my source code version control (e.g. Subversion). In other words, I'd like to implement code that allows the user to commit and rollback work and return to an existing branch. Is there an existing framework for this? It seems like putting the database data into version control and exposing the version control functionality to the end user (i.e. write code that allows the user to commit, rollback, etc.) could be a reasonable approach but it also seems their might be some problems with this approach. For example, how would you allow one user to view a rolled back version of the data (i.e. you can't just replace the data the database is pointing to if one user wants to look at a rolled back version of the data)? If given the choice of completely rebuilding the system using any persistence architecture what could be used to store the data that would make this type of functionality easy to implement?
There are 2 very common solutions for what you need:
http://www.liquibase.org/
https://flywaydb.org/
Branching and merging the user data
Your question is about solutions to version the user data in a application, to give your users capabilities such as branching and merging. You pondered about exposing a real version control such as svn.
The side-effects I can foresee are:
You will have to index things by directory and filename. Maybe using an abstraction of directories as entities and filenames as the primary key.
Operating systems (linux, mac and windows alike) does not handle well directories with millions of files. You will have to partition the entity. Usually hashing the ID (md5 for example) and taking the beginning of the hash to create an subdirectory. The number of digits to take from the hash depends on the expected size of the entity.
Operating systems (linux, mac and windows alike) are not prepared for huge quantity of files. I did a test on that. It took me days to backup and finally remove an file tree with hundreds of millions of files.
You will not be able to have additional indexes beyond the primary key, however you can work around that creating a data-mart, as I will describe below.
You will not have database constraints, but similar functionality can be implemented through git/svn/cvs triggers.
You will not have strong transactions, but similar functionality can be implemented through git/svn/cvs triggers.
You will have a working copy for each user, this will consume space depending on the size of the repositories. That way each user will be in a single point in time.
GIT is fast enough to switch from a branch to another, so go back in time and back will take only seconds (unless the user data is big, of course).
I saw a Linus interview where he warned about low performance in huge git repositories. Maybe it is best to have a repository to each user or other means to avoid your application having a single humongous repository.
Resolution of the changes. I bet that if you create gazillions of versions any version control will complaint. I do not what gazillions mean. You will have to test it.
Query database
A version control working copy will be limited to primary key queries using the "=" operator and sequential scans. This is not enough to make good reports and statistics on any usage pattern I can think off. That why you need to build a data-mart from your application data and you have two ways of doing that:
A batch process: that reads the whole repository history and builds cubes and other views to allow easier querying.
GIT/SVN/CVS triggers: can call programs made by you on file addition, modification, exclusion, branch creation and merging. This could be used to update the database when a change happen.
The batch is easier to implement but takes time to the reports and statistics be synchronized with the activity. You probably will want to go that way in the 1.0 version and in time moving to triggers to get things more dynamic.
Simulating constraints and transactions
GIT, SVN and CVS supports triggers that execute programs when a new version is submitted. Then the relationships and consistency can be checked to accept or not the change.
Alternative Solutions
Since you do not specified the kind of application you want, I will talk about blogs, content portals and online stores. For those kinds of applications I see no much reason to reinvent the wheel and build a custom database. Most of the versioning necessary can be predicted in the database model. A good event-oriented database design will be enough.
For example, a revision in a blog post could be modeled as marking the end date/time of the post and creating a new row for the revised post, increasing the version number and setting the previous version id. The same strategy can be used with sales and catalog of an online store. If you model your application with good logs you does not need version control.
Some developers also do a row level trigger that records everything that has changed on the database. This is a bit harder for an auditor that would need to reconstruct the past from bad designed logs. I personally do not like this way because is very difficult to index this kinds of queries. I prefer to make my whole applications around a good designed and meaningful log.
For example:
History Table
10/10/2010 [new process] process_id=1; name=john
11/10/2010 [change name] process_id=1; old_name=john; new_name=john doe
12/10/2010 [change name] process_id=1; old_name=john doe; new_name=john doe junior
Process Table after 12/10/2010.
proc_id=1 name=john doe junior
That way I can reconstruct almost everything on the past and still have my operational data in a easy-to-use format.
However, this is not close to the usage pattern you want (branching and merging)
Conclusion
The applicability of version control as a database seems to me very powerful on one hand and very limited and dangerous in another. It is very inspiring for auditing and error correction purposes. But my main concern would be scale and reliability.
It seems like you want version control for your data rather than the database schema. I could find two databases that implement most of the version control features such as fork, clone, branch, merge, push, and pull:
https://github.com/dolthub/dolt - SQL based
https://github.com/terminusdb/terminusdb - graph based
You mentioned Subversion, which is a Centralized Version Control System. But let us focus on Git, because of reasons. Git is a Decentralized Version Control System. A local copy of a Git repository is the same as a remote copy of the repository, if a remote copy exists at all (services such as GitLab and GitHub provide the remote housing and managing of Git projects). With Git you can have version control in an arbitrary directory in your machine. You can do whatever you are accustomed to doing with SVN, and more, in this arbitrary directory.
What I am getting at, is that you could possibly create per user directories/repositories in your server programmatically, and apply version control in these directories/repositories, keeping a separate repository per user (the specifics of the architecture would be decided later, though, depending on the structure of the user's "work"). Your application would be in charge of adding and removing files on behalf of the user (e.g. Biography, My Sample Project, etc.), editing files, committing the changes, presenting a file history, etc., essentially issuing Git commands. Your application would, thus, interface with the Git repository, exploiting the advanced version control that Git provides. Your database would just make sure that the user is linked to the directory/repository that contains their "work".
To provide a critical analogy, the GitLab project is an open source web-based Git repository manager with wiki and issue tracking features. GitLab is written in Ruby and uses PostgreSQL (preferably). It is a typical (as in Code - Database - Data directories and files) multiuser web-based application. Its purpose is to manage Git repositories. These Git repositories are stored in a designated directory in the server. Part of the code is responsible for accessing the Git repositories that the logged-in user is authorized to access (as the owner or as a collaborator). An interesting use case is of a user editing a file online, which will result in a commit in some branch in some repository. Another interesting use case is of a user checking the history of a file. A final interesting use case is of a user reverting a specific commit. All of these actions are performed online, via a web browser.
To provide an interesting real-world use case, Atlas by O'Reilly is an online platform for publishing-related collaboration using GitLab as the backend.
For Java there is JGit, a lightweight, pure Java library implementing the Git version control system. JGit is used by Eclipse for all actions related to managing Git repositories. Maybe you could look into it. It is an extremely active project, supported by many, Google included.
All of the above make sense, if the "work" you refer to is more than some fields in a database table, which the user will fill in and may later change the values of. For instance, it would make sense for structured text, HTML, etc.
If this "work" is not so large-scale, maybe doing something like what is described above is overkill. In that case, you could employ some of the version control concepts in your database design, such as calculating diffs and applying patches (also in reverse, for viewing past versions / rolling back). Your tables should allow for a tree-like structure, to store the diffs, so you could allow for branches. You could have the active version of a file readily available, as well as the active index (what Git calls HEAD), and navigate to another indexed/hashed/tagged version in the file's history by applying all patches sequentially, if moving forward, or applying patches in reverse, and in the reverse chronological order, if moving backwards. If this "work" is really small-scale, you could even ditch the diff concept, and store the whole version of the "work" in the tree-like structure.
Pure fun.
I would like to share a dilemma and hear your feedback.
As part of the new version of R&D Reporter for ClearCase, we began offering "Lines of Code" (LoC) metrics and charts when comparing baselines and composite baselines (Added, Modified and removed lines; attached is an illustration).
Now we've been asked to provide LoC metrics when comparing two UCM streams (e.g. how many files and code lines have been changed between an integration stream and one of its child streams).
In order to provide this, we must ask the user to provide a view context (in order to access the files inside the stream).
So far we have asked the user to provide one view only. This is not as convenient for some users**, but it's fair enough.
Now we have to ask users to provide TWO view contexts, so we are considering the creation of temporary views (probably dynamics) that live as long as the application is running (after which they will be removed), instead of asking them to provide it twice.
Furthermore, as we provide a "Multiple Pending Change-sets" report that compares multiple streams (e.g. an integration stream with all of its child streams),
I have the same doubt—but now it is multiplied by the number of streams…
I'm curious to know what you think about using temporary views:
Do you find it convenient and safe? If so—do you prefer dynamic or snapshot view?
Does your company's policy confirm creation of temporary view by a 3rd-party tool?
Thank you!
** Providing a view context may be inconvenient as the user must choose a folder from the file-system, or even create a new view.
Moreover, if the user prefers to provide a snapshot view, he or she must provide the folder where the loaded files are, and sometimes this can be quite difficult to find.
Do you find it convenient and safe? If so—do you prefer dynamic or snapshot view?
Convenient only if the view is created for the user, not if the user has to create it.
But there is scalability issue (for views with a large number of file):
using a dynamic view won't scale well: reading the content of all the files can take too much time, considering said content has to be read through the network (and not locally from the disk)
using a snapshot view newly created would take too much time to initialize (load all the files).
Does your company's policy confirm creation of temporary view by a 3rd-party tool?
You generally an create any view you want or need, temporary or otherwise.
The company's policy rarely address or limit that specific point.
I would:
create two temporary non-UCM views
set their config spec to the ones if the requested Stream(s)
The first initialization will be long (loading of all the files).
But the subsequent initialization (when changing the config spec of one temp view to match another UCM Stream) will be much quicker (only the delta would change).
The main idea remains: the end-user shouldn't have to worry about temporary views, and shouldn't have to create/update/maintain them.
I have been developing an application in Access 2003 that uses SQL Server as the back end data store. Access is used only as a GUI and does not store any data. All the code in the application is written in VBA using ADO for data access.
In recent meetings the DBA that works in my organization has become increasingly concerned over the fact that the application logic controls what data is available for viewing and for update. The way I have been developing the application up until this point is to use a single database login for all access to the database. This database login is the only user allowed access to the database and all other databases users (other than DBA types) are restricted.
The DBA for this project is insisting that each user of the application have their account mapped to only those objects in the database to which they should have access. I can certainly see his concern and that is why I was hoping to ask two questions ...
Is having a single application level login to the database a bad practice? I had planned to implement a role based security model where the "access" users were given was dependent upon their application role. However, the application logic determined whether certain queries/updates were allowed to proceed.
Does anyone know of some resources (articles/books) that go over how to design an application where database access is controlled from within SQL Server and not through the application?
It is bad practice but as you have seen - that is how most applications "evolve" starting out as wide open to a few users and getting tightened down as more people (IT/DBAs) get involved.
Your DBA should be able to help out - almost all general SQL Server books have a chapter or two on users and roles and security. They will also be able to explain the nuances of the different security options and what works best in your environment.
Alot of the setup will depend on the environment and the application. For example - if all your users are on Windows (based) connections you will want to use Windows Authentication instead of SQL Authentication. If you have many various roles you will want to use SQL Server Roles. You may want to incorporate AD groups as well as roles (or instead of). Your DBA can help you make those decisions, or even make them for you, as you explain more about your application to them.
The people on SO can certainly provide our opinions as well if you post more information about the environment and application and usage.
In my opinion
Yes it is bad practice, as a user could use the credentials to access the database in another way, circumventing your applications access control and perform actions that they shouldn't be able to. If you are on a windows domain you can create a group in AD for each of the roles and assign users to the group, then apply permissions based on that group so you don't have to add new users to SQL.
If you go down the active directory route you can use an LDAP query to see what groups the user belongs to, and you can decide whether they should have access.
It'll be interesting to read the other responses on this.
You don't say what the size of your database is or the business environment, so the answer is - it depends, but the presumption would be that your DBA is correct.
In a corporate environment the primary concern is usually the data, not the application used to access it. Indeed the data will often have a longer life than the application and changing business considerations may dictate that the data is used, and potentially modified by, different sources and not just 'your' application. In this situation it makes sense to build in security at the database level because you are ensuring the integrity of the database no matter how it is accessed, now or in the future, legitimately or illegitimately.
For 'departmental' level applications, that is where access is limited to half a dozen or so users, the data is not business-critical, and there will never be a need to use the data outside the original application then application-level security tends to be more convenient and the risks are often acceptable. I have clients who sell bespoke vertical application software to small businesses using this approach and as there's no internal IT it's difficult to imagine how else one could conveniently do it without incurring high support overheads.
However one of the defining traits of a corporate as opposed to a departmental level situation is that in the former there will be a dedicated DBA and in the latter there probably won't even be dedicated IT support, so you almost certainly must view the database as a corporate asset, and hence you should follow your DBA's advice. It's more work defining the database objects and security, but the final result is you can be confident about the integrity of your database and you'll safe yourself work when the inevitable upgrade/extension comes around.