I am now working on a project which requires to show the transaction history of one customer and if the product customer buys is under warranty or not. I need to use the data from the current system, the system can provide Web API, which is a .csv file. So how can I make use of the current system data?
A solution I think of is to download all the .csv files and write scripts to insert every record into the database I built which contains the necessary tables and relations to hold the data I retrieve. Then I can have a new database which I want. because I never done this before so I want know if it is feasible?
And one more question would be, if I should store the data locally or use a cloud database like Firebase?
High-end databases like SQL Server and Oracle come with utilities that allow you to read directly from a csv file. Check the docs. Having done this many times, the best procedure I found was to read the file into one holding table. This gives you the chance to examine the data and find any unexpected quirks or missing fields. This allows you to correct the data, where possible.
Then write the scripts to move the data from the holding table into the proper tables you have designed. This must be done in a logical manner. For example, move the customer data before the buy transactions. Thus any error messages you get will not be because you tried to store a transaction before you stored the customer. (You will have referential integrity set up, yes?) This gives you more chances to correct or adjust the data or just identify problems more or less at your leisure.
Whether or not to store the data in the cloud is strictly according to the preferences of your employer.
I have this problem where I need audit trails (typically stored in DB) to be non-editable and deletable even for DBAs and System Admins.
1 way is to apply encryption and checksums, but this only allows detection of changes or prevention of snooping. It does not prevent a DBA to just delete a row.
Any discussion on this matter is appreciated.
If you want the audit trails to be non editable even by the DBAs and system admins, you would need to store them outside of equipment that is in their control.
However that would lead to the same problem - the DBAs and system admins of this system would be able to edit them.
The best bet is to have a system where you store these in two disparate locations that do not share an admin and have periodic comparision checks.
Alternatively you can have triggers on update/delete when they are made by a specific user or from a particular client. These triggers could be programmed to send email or text messages if such a non-application update or delete is made.
It should be known - very well known in the admin/dba community that such triggers exist. You wil not be able to prevent the updates or deletes but will definitely get them to stay away from that table.
There is still a catch however, which is the ability to remove or modify the trigger code.
There exists "write-once" archival storage systems, such as Venti from Plan 9. That doesn't stop anyone with physical access taking a magnet to the hard disk or similar of course ;)
A sufficiently savvy sysadmin could create a slightly modified version of the data and replace the reference to the venti score though... and an equally savvy sysadmin could still recover the original data.
Anyway, I think you could learn a lot from studying append-only storage systems. They make a lot of sense for storing audit trails, compared to a DB.
There exists appliances that act as sniffers and are able to record every single command executed on the database. IBM Guardium is an example.
I have a daily process that relies on flat files delivered to a "drop box" directory on file system, this kicks off a load of this comma-delimited (from external company's excel etc) data into a database, a piecemeal Perl/Bash application, this database is used by multiple applications as well as edited directly with some GUI tools. Some of the data then gets replicated with some additional Perl app into the database that I mainly use.
Needless to say, all that is complicated and error prone, data coming in is sometimes corrupt or sometimes an edit breaks it. My users often complain about missing or incorrect data. Diffing the flat files and DBs to analyze where the process breaks is time consuming, and which each passing day data becomes more out of data and difficult to analyze.
I plan to fix or rewrite parts or all of this data transfer process.
I am looking on recommended reading before I embark on this, websites and articles on how to write robust, failure resistant and auto-recoverable ETL processes or other advice would be appreciated.
This is precisely what Message Queue Managers are designed for. Some examples are here.
You don't say what database backend you have, but in SQL Server I would write this as an SSIS package. We have a system designed to also write data to a meta data database that tells us when teh file was picked up, whether it processed successfully and why if it did not. It also tells things like how many rows the file had (which we can then use to determine if the current row size is abnormal). One of the beauties of SSIS is that I can set up configurations on package connections and variables, sothat moving the package from development to prod is easy (I don't have to go in and manaually change the connections each time once I have a configuration set up in the config table)
In SSIS we do various checks to ensure the data is correct or clean up the data before inserting to our database. Actually we do lots and lots of checks. Questionable records can be removed from the file processing and put in a separate location for the dbas to examine and possibly pass back to the customer. We can also check to see if the data in various columns (and the column names if given, not all files have them) is what would be expected. So if the zipcode field suddently has 250 characters, we know something is wrong and can reject the file before processing. That way when the client swaps the lastname column with the firstname column without telling us, we can reject the file before importing 100,000 new incorrect records. IN SSIS we can also use fuzzy logic to find existing records to match. So if the record for John Smith says his address is at 213 State st. it can matchour record that says he lives at 215 State Street.
It takes alot to set up a process this way, but once you do, the extra confidence that you are processing good data is worth it's weight in gold.
Even if you can't use SSIS, this should at least give you some ideas of the types of things you should be doing to get the information into your database.
I found this article helpful for the error handling aspects of running cron jobs:
IBM DeveloperWorks Article: "Build intelligent, unattended
scripts"
My previous job involved maintenance and programming for a very large database with massive amounts of data. Users viewed this data primarily through an intranet web interface. Instead of having a table of user accounts, each user account was a real first-class account in the RDBMS, which permitted them to connect with their own query tools, etc., as well as permitting us to control access through the RDBMS itself instead of using our own application logic.
Is this a good setup, assuming you're not on the public intranet and dealing with potentially millions of (potentially malicious) users or something? Or is it always better to define your own means of handling user accounts, your own permissions, your own application security logic, and only hand out RDBMS accounts to power users with special needs?
I don't agree that using the database for user access control is as dangerous others are making it out to be. I come from the Oracle Forms Development realm, where this type of user access control is the norm. Just like any design decision, it has it's advantages and disadvantages.
One of the advantages is that I could control select/insert/update/delete privileges for EACH table from a single setting in the database. On one system we had 4 different applications (managed by different teams and in different languages) hitting the same database tables. We were able to declare that only users with the Manager role were able to insert/update/delete data in a specific table. If we didn't manage it through the database, then each application team would have to correctly implement (duplicate) that logic throughout their application. If one application got it wrong, then the other apps would suffer. Plus you would have duplicate code to manage if you ever wanted to change the permissions on a single resource.
Another advantage is that we did not need to worry about storing user passwords in a database table (and all the restrictions that come with it).
I don't agree that "Database user accounts are inherently more dangerous than anything in an account defined by your application". The privileges required to change database-specific privileges are normally MUCH tougher than the privileges required to update/delete a single row in a "PERSONS" table.
And "scaling" was not a problem because we assigned privileges to Oracle roles and then assigned roles to users. With a single Oracle statement we could change the privilege for millions of users (not that we had that many users).
Application authorization is not a trivial problem. Many custom solutions have holes that hackers can easily exploit. The big names like Oracle have put a lot of thought and code into providing a robust application authorization system. I agree that using Oracle security doesn't work for every application. But I wouldn't be so quick to dismiss it in favor of a custom solution.
Edit: I should clarify that despite anything in the OP, what you're doing is logically defining an application even if no code exists. Otherwise it's just a public database with all the dangers that entails by itself.
Maybe I'll get flamed to death for this post, but I think this is an extraordinarily dangerous anti-pattern in security and design terms.
A user object should be defined by the system it's running in. If you're actually defining these in another application (the database) you have a loss of control.
It makes no sense from a design point of view because if you wanted to extend those accounts with any kind of data at all (email address, employee number, MyTheme...) you're not going to be able to extend the DB user and you're going to need to build that users table anyway.
Database user accounts are inherently more dangerous than anything in an account defined by your application because they could be promoted, deleted, accessed or otherwise manipulated by not only the database and any passing DBA, but anything else connected to the database. You've exposed a critical system element as public.
Scaling is out of the question. Imagine an abstraction where you're going to have tens or hundreds of thousands of users. That's just not going to manageable as DB accounts, but as records in a table it's just data. The age old argument of "well there's onyl ever going to be X users" doesn't hold any water with me because I've seen very limited internal apps become publicly exposed when the business feels it's could add value to the customer or the company just got bought by a giant partner who now needs access. You must plan for reasonable extensibility.
You're not going to be able to share conn pooling, you're not going to be any more secure than if you just created a handful of e.g. role accounts, and you're not necessarily going to be able to affect mass changes when you need to, or backup effectively.
All in there seems to be numerous serious problems to me, and I imagine other more experienced SOers could list more.
I think generally. In your traditional database application they shouldnt be. For all the reason already given. In a traditional database application there is a business layer that handles all the security and this is because there is such a strong line between people who interact with the application, and people who interact with the database.
In this situation is is generally better to manage these users and roles yourself. You can decide what information you need to store about them, and what you log and audit. And most importantly you define access based on pure business rules rather than database rules. Its got nothing to do with which tables they access and everything to do with whether they can insert business action here. However these are not technical issues. These are design issues. If that is what you are required to control then it makes sense to manage your users yourself.
You have described a system where you allow users to query the database directly. In this case why not use DB accounts. They will do the job far better than you will if you attempt to analyse the querys that users write and vet them against some rules that you have designed. That to me sounds like a nightmare system to write and maintain.
Don't lock things down because you can. Explain to those in charge what the security implications are but dont attempt to prevent people from doing things because you can. Especially not when they are used to accessing the data directly.
Our job as developers is to enable people to do what they need to do. And in the situation you have described. Specifically connect to the database and query it with their own tools. Then I think that anything other than database accounts is either going to be insecure, or unneccasarily restrictive.
"each user account was a real first-class account in the RDBMS, which permitted them to connect with their own query tools, etc.,"
not a good idea if the RDBMS contains:
any information covered by HIPAA or Sarbanes-Oxley or The Official Secrets Act (UK)
credit card information or other customer credit info (POs, lines of credit etc)
personal information (ssn, dob, etc)
competitive, proprietary, or IP information
because when users can use their own non-managed query tools the company has no way of knowing or auditing what information was queried or where the query results were delivered.
oh and what #annakata said.
I would avoid giving any user database access. Later, when this starts causing problems, taking away their access becomes very dificult.
At the very least, give them access to a read-only replica of the database so they can't kill your whole company with a bad query.
A lot of database query tools are very advanced these days, and it can feel a real shame to reimplement the world just to add restrictions. And as long as the database user permissions are properly locked down it might be okay. However in many cases you can't do this, you should be exposing a high-level API to the database to insert objects over many tables properly, without the user needing specific training that they should "just add an address into that table there, why isn't it working?".
If they only want to use the data to generate reports in Excel, etc, then maybe you could use a reporting front end like BIRT instead.
So basically: if the users are knowledgeable about databases, and resources to implement a proper front-end are low, keep on doing this. However is the resource does come up, it is probably time to get people's requirements in for creating a simpler, task-oriented front-end for them.
This is, in a way, similar to: is sql server/AD good for anything
I don't think it's a bad idea to throw your security model, at least a basic one, in the database itself. You can add restrictions in the application layer for cosmetics, but whichever account the user is accessing the database with, be it based on the application or the user, it's best if that account is restricted to only the operations the user is allowed.
I don't speak for all apps, but there are a large number I have seen where capturing the password is as simple as opening the code in notepad, using an included dll to decrypt the configuration file, or finding a backup file (e.g. web.config.bak in asp.net) that can be accessed from the browser.
*not a good idea if the RDBMS contains:
* any information covered by HIPAA or Sarbanes-Oxley or The Official Secrets Act (UK)
* credit card information or other customer credit info (POs, lines of credit etc)
* personal information (ssn, dob, etc)
* competitive, proprietary, or IP information*
Not true, one can perfectly manage which data a database user can see and which data it can modify. A database (at least Oracle) can also audit all activities, including selects. To have thousands of database users is also perfectly normal.
It is more difficult to build good secure applications because you have to program this security, a database offers this security and you can configure it in a declarative way, no code required.
I know, I am replying to a very old post, but recently came across same situation in my current project. I was also thinking on similar lines, whether "Application users be Database users?".
This is what I analysed:
Definitely it doesn't make sense to create that big number of application users on database(if your application is going to be used by many users).
Let's say you created X(huge number) of users on database. You are opening a clear gateway to your database.
Let's take a scenario for the solution:
There are two types of application users (Managers and Assistant). Both needs access to database for some transactions.
It's obvious you would create two roles, one for each type(Manager and Assistant) in database. But how about database user to connect from application. If you create one account per user then you would end up linearly creating the accounts on the database.
What I suggest:
Create one database account per Role. (Let's say Manager_Role_Account)
Let your application have business logic to map an application user with corresponding role.(User Tom with Manager role to Manager_Role_Account)
Use the database user(Manager_Role_Account) corresponding to identified role in #2 to connect to database and execute your query.
Hope this makes sense!
Updated: As I said, I came across similar situation in my project (with respect to Postgresql database at back end and a Java Web app at front end), I found something very useful called as Proxy Authentication.
This means that you can login to the database as one user but limit or extend your privileges based on the Proxy user.
I found very good links explaining the same.
For Postgresql below Choice of authentication approach for
financial app on PostgreSQL
For Oracle Proxy Authentication
Hope this helps!
It depends (like most things).
Having multiple database users negates connection pooling, since most libraries handle pooling based on connection strings and user accounts.
On the other hand, it's probably a more secure solution than anything you or I will do from scratch. It leaves security up to the OS and Database server, which I trust much more than myself. However, this is only the case if you go to the effort to configure the database permissions well. If you're using a bunch of OS/db users with the same permissions,it won't help much. You'll still get an audit trail, but that's about it.
All that said, I don't know that I'd feel comfortable letting normal users connect directly to the database with their own tools.
I think it's worth highlighting what other answers have touched upon:
A database can only define restrictions based on the data. Ie restrict select/insert/update/delete on particular tables or columns. I'm sure some databases can do somewhat cleverer things, but they'll never be able to implement business-rule based restrictions like an application can. What if a certain user is allowed to update a column only to certain values (say <1000) or only increase prices, or change either of two columns but not both?
I'd say unless you are absolutely sure you'll never need anything but table/column granularity, this is reason enough by itself.
This is not a good idea for any application where you store data for multiple users in the same table and you don't want one user to be able to read or modify another user's data. How would you restrict access in this case?
I've got an ms-access application that's accessing and ms-sql db through an ODBC connection. I'm trying to force my users to update the data only through the application portion, but I don't care if they read the data directly or through their own custom ms-access db (they use it for creating ad hoc reports).
What I'm looking for is a way to make the data only editable if they are using the compiled .mde file I distribute to them. I know I can make the data read only for the general population, and editable for select users.
Is there a way I can get ms-sql to make the data editable only if they are accessing it through the my canned mde?
Thought, is there a way to get ms-access to log into the database as a different user (or change the login once connected)?
#Jake,
Yes, it's using forms. What I'm looking to do is just have it switch users once when I have my launchpad/mainmenu form pop up.
#Peter,
That is indeed the direction I'm headed. What I haven't determined was how to go about switching to that second ID. I'm not so worried about the password being sniffed, the users are all internal, and on an internal LAN. If they can sniff that password, they can certainly sniff the one for my privileged ID.
#no one in general,
Right now its security by obscurity. I've given the uses a special .mdb for doing reporting that will let them read data, but not update it. They don't know about relinking to the tables through the ODBC connection. A slightly more ms-access/DB literate user could by pass what I've done in seconds - and there a few who imagine themselves to be DBA, so they will figure it out eventually.
There is a way to do this that is effective with internal users, but can be hacked. You create two IDs for each user. One is a reporting ID that has read-only access. This is they ID that the user knows about: Fred / mypassword
The second is an ID that can do updates. That id is Fred_app / mypassword_mangled. They log on to your app with Fred. When your application accesses data, it uses the application id.
This can be sniffed, but for many applications it is sufficient.
Does you app allow for linked table updates or does it go through forms? Sounds like your idea of using a centralized user with distinct roles is the way to go. Yes, you could change users but I that may introduce more coding and once you start adding more and more code other solutions (stored procedures, etc) may sound more inviting.