Translucent Databases

Translucent Databases - database

I am building an application with health information inside. This application will be consumer-facing with is new for me. I would like a method to put privacy concerns completely at ease. As I review methods for securing sensitive data in publicly accessible databases I have frequently come across the notion of database translucency. There is the original book on the subject and an excellent tutorial on the subject from Oriellynet.
My concern is that I have seen very little information regarding this idea on what I would consider very-modern programming sites (like this one). There does not seem to be an article about the idea on wikipedia. No questions on the subject here, and no very recent tutorials or articles on the subject. To be uber-brief, the idea is that certain data is clear to some users of the system, while other users a cryptographically prevented from accessing that data, even if they have administrator access.
I have done substantial work on a prototype database that provides translucent data access. I have run across a considerable problem: To be truly translucent, there can be no mechanism for password recovery. If an administrator can reset a users password, then they can briefly gain access to a users data. To be truly translucent, the user must never loose the password.
Those of us who use strong encryption to protect private data in our daily lives (technorati to be sure) are used to this problem when using these kinds of strong encryption systems. If the word "blowfish" is part of your daily lexicon that is one thing, but a website that is consumer focused? I am concerned that users will not be willing to wrap their mind around the "truly encrypted just for you" notion implicit with true database translucency. I am afraid of the support call that begins with "I lost my password" and ends with me saying "There is nothing that I can do for you".
My question: Should I implement this method in my application? Are there other open source applications that have gone down this route that I can compare database designs with (esp using php/MySQL)? I anyone else pursuing these kind of truly secure, but really inconvenient feature sets? Is there another database security model that is more popular and modern that I have missed? Was database translucency a fad or a legitimate database design method that I should embrace? While I always appreciate discussion I would prefer objective answers that I can leverage in my design.

So, I've been looking at something similar to this recently, and hit upon the same issue. The solution I'm considering implementing is as follows:
Upon registration, create a unique, secure (long) key for the user and use this to encrypt their data.
Encrypt this key with the user's password using e.g. AES and store it in the database.
At this point, you're still in the situation where if the user forgets their password, they've had it.
Create a public/private key pair representing your organisation, and store the public key on the server.
Split the private portion of the key into several components and give each to people (e.g. directors of your company) who have a significant stake (preferably financial) in the continued success of your company. Do this such that any two, or any three people can get together and restore the full private key when required. Encrypt each person's key with their own password.
When a user registers, as well as encrypting their key with their password, encrypt it with the organisational public key and store it somewhere.
Create a password reset form which records a request to reset the password of a user, along with some proof that the user is who they say they are (e.g. challenge/response).
Record these reset requests (optionally encrypted using the public key again) in the database.
Once per hour/day/week/month, get the requisite key-holders together, and use their combined keys to process the accrued reset requests, decrypting the keys of users who successfully proved they are who they say they are.
There are lots of challenges and considerations in this. I've got a few thoughts on most of these, but would be interested in others opinions too:
How to split the key safely between multiple people so that no one person can decrypt the stored keys.
How to minimise the number of keys that would be exposed if the 'master keys' genuinely fell into the wrong hands.
How to make sure that if (heaven forbid) your key-holders lost their keys, then (a) there's no risk of exposure of the data, and (b) there's no risk that suddenly the ability to reset passwords is lost forever.
How to successfully validate that someone really is who they say they are without making this a glaring hole in your whole security approach.
Anything you implement in this area WILL reduce the security of the translucent database approach, without a doubt, but this may be a worthwhile compromise depending on the nature of your data.

Should I implement this method in my application?
Well like other things in life, there is a trade off :) It's probably more secure but harder to built.
Are there other open source applications that have gone down this route that I can compare database designs with (esp using php/MySQL)?
Don't know, I guess the tools are there to do it yourself :)
Is anyone else pursuing these kind of truly secure, but really inconvenient feature sets?
Yes, but it seems like it's still in an immature state, like your problem you describe concerning lost passwords.
Is there another database security model that is more popular and modern that I have missed?
Basically there are two kinds of database connections. One option gives users a real database account, the other is to use single sign-on to the database. Prior to the web coming along, there were proponents of both models in the client/server world, but amongst web developers the single sign-on method is leading.
Was database translucency a fad or a legitimate database design method that I should embrace?
Don't think so, the UNIX password database, for instance, is a great example of a basic translucent database ;)
here something to read link text

Re: translucent databases. You could, I suppose, use fingerprints. What about burn victims, or people who end up losing their fingerprints? Oops. Is it worth that small percentage of users?
Familiarize yourself with HIPAA, especially when it comes to technology.
Remember that no system is truly secure, except Skynet*, and look what happened with that! Humans are in charge. When you work in a medical company, you sign an NDA indicating that you won't release any of the information you learn as part of your duties because it is confidential.
There will be someone to reset people's passwords. That's the way it is, because not everyone is technologically competent, and that's the way it stays for now.
You only have to implement security as well as HIPAA says.
in truth, there is another truly secure system: it is unplugged from both the network and the electricity, and it is turned off.

Slightly different solution, you might want to check out cryptdb:
http://css.csail.mit.edu/cryptdb/

Related

Secure database and webpage against modification

My website provides extremely sensible information (think of bank account numbers) publicly available through webpages and webservices. The customers may modify these information when authentified with a username and a password.
Any hacking intrusion that would successfully modify the entries of the database, or modify the information displayed on the webpage, would be disastrous, as account numbers might then be incorrect and money could be directed to a malicious bank account.
Do you have any general advices about the architecture that would make such a service as robust as possible? I would not be responsible in case of a weak password, so my main concern is about attacks that would simply bypass the authentication process and modify the database without triggering any alert on my side; it could also be the html code of the webpage that is directly modified to show different information...
Thank you

In this case i would make sure to harden the system itself as good as possible. This includes a very broad spectrum reaching from Security Roles over transaction based usage of the database, logging as well as the prevention of all sorts of attacks like SQL injection, cross site scripting in general and maybe if its a that sensible system use certificates and general IP checks (like have a white list of IPs that are allowed to populate requests to the system that do not instantly get refused). Not to mention your Host architecture has to be protected regardless of the implemented security features inside your system (key words: firewalls, user privileges etc.). During the development process there should always be auto code checking software (like Sonar) running to detect logical errors and stuff.
Then it could also be a good idear to have a second system just to monitor your primary systems status. This system should log and notify you on:
changes made to the system itself (like if someone has access to your business logic and for examply removes authentication logic)
changes made to the database that are not consistent with your primary systems state.
detect suspicious actions: Banks for example have rules that apply on your account. Like if you used to make payments within europe for the last time and then out of nothing make a huge payment to lets say china you would recive a notification to commit this payment. The payment then would not be triggered unless that second commitment of the customer.
In the end you already pointed out correctly that you just can harden it as good as possible but generally not make it "100%" safe (at least in theory) so to have a good level of security part of the total system would include beeing able to detect unwanted changes, identify the exact changes already beeing made and have information on the overall status of your system to allow a rollback or manual correction of a corruptet state in case it already happened.
Even after having implemented mentioned techniques you would have to continously check for security bugs in used frameworks, librarys and the system as a full (like using security penetration frameworks that auto try to corrupt your system).
What i want to show you with my answer is what the comments already suggest: It is a very broad and complex topic with multiple layers of security concernes you will have to either study yourself or have framework solutions that "ensure" you to take care of the topic (like Webframeworks often include basic XSS prevention).

Without wanting to sound harsh, but if you have to ask this question on Stack Overflow, you're not really qualified to work on this project.
The financial value of your data sounds like it's enough for an attacker to expend significant resources breaching your defenses - and the consequences of such a breach would be disastrous for your organization and its customers; it could lead to the organization having to close down. You really don't want to be learning about security from strangers on the internet in this case.
One place to start learning in is with the established standards for managing financial information, often referred to as "PCI standards"; these provide guidelines for hardware, software and processes for organizations that deal with payment details.
There are numerous books on IT security; I like the "Hacking Exposed" series, and "Security Engineering".
You might also bring in specialized IT security consultants; I've worked with a number of these guys, and many of them are very good at helping you engineer security into your solution.

Safe to store unencrypted password into HTML5 client side database?

I am assuming the answer is that storing a password in a WEBSQL database on the client side, unencrypted is not safe, but i thought i would ask anyway, the reason I am asking, is I am trying to add a dropbox uploading tool to a web app, but i need the password in plain text in order to access the user's dropbox account, i surely could come up with some foobar way to hash the passwords client side, and unhash them when needed, but if I will be able to unhash them, anyone will be able to do so as well, does anyone have a work around if this is the case?

There is no such thing as 100% secure or safe. The goal of security is to be safe enough. You determine what is the risk, and what is the level of pain you are willing to go through and find the sweet spot.
If you have to get a plain text back from a cypher you have no choice but to use encryption not hashing. Of course you have to have the key somewhere, whether user entered or stored somewhere so the key is vulnerable.
Since this is on a client computer, it may be vulnerable to phishing attacks, social engineering attacks, trojan/keylogger/virus attacks, physical security risks, etc.
storing clear text is a bad idea, but other than that you have to decide what level of pain the users will suffer through.
PKI tokens are a good option if the cost is worth it. otherwise most languages have many various encryption algorithms that can be used effectively.

No, it's not safe to store plaintext passwords, period.
Assuming your users log into your web app with a password, why not use that password to encrypt their (salted) dropbox password? That's still less than satisfactory from a security standpoint, but it's better than nothing.
Using the words "foobar" and "dropbox" in the same paragraph is a clear signal that you're asking for trouble with a home-grown solution. You're asking your users to trust you with the security of their dropbox data, which means you're accepting an awful lot of liability. You're also asking your users to violate one of the fundamental laws of security: Never trust your security to a third party.
The best advice I can offer is to delegate all security-related tasks to an expert, and have that code audited by another expert.

Do i really need to hash passwords?

I am building a project, which has a pretty basic login system. There will NO REGISTRATION system available, the users will be added manually. Also i protected the databases data input gates very well. So after all, do i still need to hash and even salt the users passwords?
And if your answer is yes, the next question is why?

Well, what would be the consequence of an intruder being able to impersonate another user? Weigh those consequences against the difficulty (which isn't very great) of adding hashing and salting.
One risk which you may want to consider is that if a user has the same password on multiple sites, then their security is only as safe as the weakest site. Even if you're manually assigning the passwords yourself (and not allowing the user to choose it) they may go on to use the same password in other sites.

Absolutely. It's one of the most important obligations to your users you have to honor - to treat their personal data very carefully.

If you generate the password for each user and do not let the user change the password, then you can make a case for not hashing them.
However:
You will have to explain to everyone that audits the system why you are not hashing the passwords.
You will have to have some way of proving that a system admin did not look at a user’s password then logon as the user.
A lot of programmers will think you don’t know what you are doing.
What if the system is changed at some point, or the code gets copied into another system.
I think of this like crossing a road.
You always look both ways even if the
green man says it is OK to cross.
(It is quicker to look both ways, then explain to any watching children etc why you don’t need to in this case)

In some jurisdictions/industries, storing login credentials in plain text could be a violation of data protection laws. If you're doing something like that in the US on a system that has even the slightest bit to do with medical or financial records, and you get audited, even if there's been no breach, you'll be lucky if the worst that happens is your clients and suppliers refuse to do business with you until your systems pass audit. There could be hefty fines as well. Even if your system doesn't work with sensitive data, if it's intended for use by people who routinely work with such data, the possibility that they may reuse passwords that are also used to access regulated data would at the very least make an auditor very nervous, and make their client extremely reluctant to work with you, even if you were technically in compliance.

Yes, because, e.g., people having access to the database can easily impersonate other users.

Yes, because your database is still there and a user system and its database are no more difficult to compromise without a registration form than with one.
Even if you protect your "database data input gates" very well, your database still isn't 100% attacker-proof. If someone still manages to slip through your defenses and sees everything in your database, and all the passwords are in plain text, your users' accounts are still compromised. By hashing them at least you're costing attackers more time, and at the same time protecting your users.

Yes, because there is always risk of compromising database. Remember, that many people uses the same password for many sites, IMs etc so you are making risk for not only information in your system.

People use their same password for more than just your site as well. If an attacker gets the passwords, there are more consequences than just your site. That user's email, bank accounts, etc may also be compromised. Do the diligent thing.

Why wouldn't you hash passwords? It protects you, your staff and your users and it costs almost nothing to implement. Users have a right to expect that your system administrators / DBAs / whoever cannot see their passwords and your administrators have a right not to be exposed to that information needlessly. In any internal/external technical security audit one of the first things the auditors will do is home in on any password columns in the database and determine whether they are hashed or not.

Also i protected the databases data input gates very well.
I bet every system designer/administrator for every compromised password file in the history of computing thought the same thing.

What DBMS is appropriate for keeping a schema private even when installed 'in the wild'

I have an application server which connects to a database server. I would like to be able to supply users with installers and, with a moderate degree of comfort, trust that the database schema is secure.
I understand that there are some risks that I will just have to accept with not controlling the computer on which it installed - a determined person with the right tools and knowledge could look directly at memory and pull out information.
Initially I thought my area of focus would simply be on adding the credentials to the installer without them being trivially viewed in a hex editor.
However, as I began to research, I learned that for PostGreSQL, even if I install the database silently and don't provide the credentials to the user -- they can simply change a text-based configuration file (pg_hba.conf), and restart the server, enabling full access to the database without credentials.
Is this scenario secured in other DBMS? How do most commercial products protect their schemas in this scenario? Would most products use embedded databases?
Edit: I assume (perhaps wrongly so) that some products rely on databases that the user never actually touches directly. And I of course never see them because they have designed it in such a way that the user does not need to - probably using an embedded database.

As far as I remember, there are no commercial products that "protect" their schemas. What do you want the schema to be protected against?
Consider the following points:
After all, the only person who can protect anything in a RDBMS is the database server administrator. And you want the schema to be protected against this person?
If I was a costumer and I had my data inside your schema, I would not only like, but expect, to be able to see and consume it directly.
Do you really need to protect your relational design? Is it really that interesting? Have you invented something worth hiding? I really don't think so. And I apologize in advance if you have.
EDIT: Additional comment:
I don't care about most database internals for the products I use. That's another reason I think most of them don't take any action to protect them. Most of them are not that interesting.
On one side, I strongly believe that users should not need to know or to care about the internals of the database. But at the same level, as a developer, I don't think it is worth trying to protect them. Hiding them from the user, yes. Protect them against direct access, in most cases, no. And not because I think it is wrong to protect your schema. It is because I think it is a very hard thing to do, and it is not worth your time as a developer.
But at the end, as with any security related topic, the only right answer is about what are the risks involved vs the costs of implementing the security measure.
Current database engines, embedded or server-style, are not designed to easily hide the schema of the database, and therefore, the development cost of doing it is much greater than the risk involved, for most people.
But your case might be different.

Most commerical products do not protect their schemas. They fall into one of two camps:
Either they are making use of an enterprise class database for a key component of the product (such as a payroll system), in which case there is no attempt made to hide the schema/data. In most of these cases the customer needs control over the database anyway - to configure how the database is backed up, to be able to make a clustered environment, etc.
The other case is if your "database" is nothing but a small settings or storage file for a desktop application (ex. the history and bookmark databases in FireFox). In that case you should just use an embedded database (like SQLite, same as FireFox) and add a streaming encryption layer (there is an official version of this called SEE), or just use the embedded database and forget about the encryption layer, since the user will need to have to install their own database tools to read the file in the first place.

What problem are you trying to solve? Nothing can stop the DBA* from doing whatever he wants to standard databases, and as others have pointed out it's actively hostile to interfere with site-specific needs like backups and database upgrades. At most you can encrypt the contents of your database, but even then you have to provide a decryption key for your application to actually run and a motivated and hostile DBA can probably subvert it.
The military and intelligence communities undoubtably have databases where even the schema is highly classified, but I don't know if they're protected by technical means or just large men with guns.
(*) DBA or system administrator able to modify files like pg_hba.conf.

How do most commercial products
protect their schemas in this
scenario?
I don't believe most commercial products do anything to protect their schemas.

How an embedded DBMS can stop someone to tinker with its storage (files in this non-embedded hardware context) when such person has physical access to the machine where this DBMS is running? Security through obscurity is a risky proposition.

This idea will suffer from the same problems as DRM. You can't prevent access by the determined, and you will only cause general pain and suffering for your customers. Just don't do it.
SQLite wraps its entire database format into a single file, and you could conceivably encrypt and decrypt it in-place. The flaw, of course, is that users need the key to use the database now, and the only way that can happen is if you give it to them, perhaps by hard-coding it in at compile-time (security by obscurity) or a phone-home scheme (whole host of reasons why this one's a bad idea). Plus now they'll hate you because you've thwarted any attempt at a useful backup system and they get terrible performance to boot.
Besides, nobody actually cares about schemas. Hate to break it to you, but schema design isn't a hard problem, and certainly never a legitimate competitive advantage (outside of maybe a few specific areas like knowledge representation and data warehousing). Schemas are generally not worth protecting in the first place.
If it's really that important to you, do a hosted application instead.

How do I create a web application where I do not have access to the data?

Premise: The requirements for an upcoming project include the fact that no one except for authorized users have access to certain data. This is usually fine, but this circumstance is not usual. The requirements state that there be no way for even the programmer or any other IT employee be able to access this information. (They want me to store it without being able to see it, ever.)
In all of the scenarios I've come up with, I can always find a way to access the data. Let me describe some of them.
Scenario I: Restrict the table on the live database so that only the SQL Admin can access it directly.
Hack 1: I rollout a change that sends the data to a different table for later viewing. Also, the SQL Admin can see the data, which breaks the requirement.
Scenario II: Encrypt the data so that it requires a password to decrypt. This password would be known by the users only. It would be required each time a new record is created as well as each time the data from an old record was retrieved. The encryption/decryption would happen in JavaScript so that the password would never be sent to the server, where it could be logged or sniffed.
Hack II: Rollout a change that logs keypresses in javascript and posts them back to the server so that I can retrieve the password. Or, rollout a change that simply stores the unecrypted data in a hidden field that can be posted to the server for later viewing.
Scenario III: Do the same as Scenario II, except that the encryption/decryption happens on a website that we do not control. This magic website would allow a user to input a password and the encrypted or plain-text data, then use javascript to decrypt or encrypt that data. Then, the user could just copy the encrypted text and put the in the field for new records. They would also have to use this site to see the plain-text for old records.
Hack III: Besides installing a full-fledged key logger on their system, I don't know how to break this one.
So, Scenario III looks promising, but it's cumbersome for the users. Are there any other possibilities that I may be overlooking?

If you can have javascript on the page, then I don't think there's anything you can do. If you can see it in a browser, then that means it's in the DOM, which means you can write a script to get it and send it to you after it has been decrypted.
Aren't these problems usually solved via controls:
All programmers need a certain level of clearance and background checks
They are trained to understand that rolling out code to access the data is a fireable or worse offense
Every change in certain areas needs some kind of signoff
For example -- no JavaScript on page without signoff.
If you are allowed to add any code you want, then there's always a way, IMO.

Ask the client to provide an Non-disclosure Agreement for you to sign, sign it, then look at as much data as you want.
What I'm wondering is, what exactly will you be able to do with encrypted data anyway? Pretty-much all apps require you to do some filtering of the data, whether it be move it to a required place, modify it, sanitize it, or display it. Otherwise, you're just a glorified pipe, and you don't have to do any work.
The only way I can think of where you wouldn't be looking at the data or doing anything with it would be a simple form to table mapping with CRUD options. If you know what format the data will be coming in as you should be able to roll something out with RoR, a simple skin, put SSL into the mix, and roll it out. Test with dummy data in the same format, and you're set.
In fact, is your client unable to supply dummy data for testing? If they can, then your life is simple as all you do is provide an "installable" and tell them how to edit a config file.

I think you could still create the app in the following way:
Create a dev database and set up a user for it.
Ask them for: the data type, size, and name of each field that needs to be on the screen.
Set up the screens, create columns in the database that accept the data type and size they specify.
Deploy the app to production, hooked up to an empty database. Get someone with permission (not you) to go in and set the password on the database user and set the password for the DB user in the web app.
Authorized users can then do whatever they want and you never saw what any of the data looked like.
Of course, maintaining the app and debugging is gonna be a bitch!
--In answer to comments:
Ok, so after setting up the password for the Username in the database and in the web app's config, write a program that connects to the database, sets a randomized password, then writes that same randomized password to the web config.
Prevent any outgoing packets from the machine except to a set of authorized workstations - so you can't install your spyware.
Then set the Admin password on both servers to the same random password, then delete all other users on the servers, delete the program, and delete the program source code.
Wipe the hard drives of the developer machines with the DOD algorithm, and then toss them into an industrial shredder.
10. If the server ever needs debugging, toss it in the trash, buy a new one, and start back at #1.
But seriously - this is an insolvable problem. The best answer to this really is:
Tell them they can't have an application. Write your stuff on paper. Put it in a folder. Lock it in a vault. Thrust, repeat.

Wouldn't scenario 3 just expose all the data to the magic website? This doesn't sound like a solvable problem (at least I can't think of a solution).

Go with whatever solution is easiest for you to implement, I think the requirements show the the client does not understand software development and so it should be easy to sell any approach you take.

I have to say I really don't like the idea of using JavaScript on the client to decrypt the data. That is a huge hole as any script (hacker, GreaseMonkey, IE7Pro, etc.) can access the DOM and get data out of the page.
Also, it is very hard to get around the problem of key stroke loggers. If you throw those into the mix, then your options are limited. At that point you need a security FOB such as RSA (commonly used with corporate VPNs) to generate truly random PINs. That will probably be expensive, and it is a pain, and I have only seen it used with VPNs but I assume it could work with websites as well.
As far as the website, I'd stick with HTTPS and find a way to encrypt/decrypt through the WebServer rather than relying on JavaScript. The SSL traffic isn't very prone to sniffing (very difficult to decrypt), so that allows the encryption and decryption to happen server-side which (IMHO) is more secure.
Look at banking scenarios and other financial institutions for a starting point, and then go from there. Try not to over-complicate if possible.

You can't guarantee against hacking into the data as long as you have access to the server it lives on. So tell the employer they have to host the data somewhere else and grant access to the client's browser via a secure HTTPS connection.
You can design your web page to dynamically load an XML data stream securely, and format it into a web page using an XSLT script on the client.
See http://www.w3schools.com/xsl/xsl_client.asp for examples
That way you produce the code, but you never have access to the data. Only the user has access to their own data.
As for how the employer is going to host the data without granting any IT people access to it, that's their problem. It's a foolish requirement.

I think that I'll just tell them that they either have to trust a couple of us to have access (and not look at it) or they don't get a project.
Thanks for the answers. Feel free to post more thoughts if you have them.

You can never have 100% security, and extra security comes at a cost of speed/price/convenience etc.
Let's suppose you take scenario 3 - one of your programmers can use social engineering to get the password from one of the users. Goodbye security.
There's no point having a high-security iron door as a gate if people can just walk around it. Just implement a decent level of security.

(They want me to store it without being able to see it, ever.)
Hey, the recording industry wants people to be able to listen to their music, but not copy it. Sounds like they should get together sometime!
Their idea won't work for the same reason DRM doesn't work: the trust chain is inherently compromised. Encryption examples often use Alice, Bob, and Charlie where Alice is trying to communicate with Bob without Charlie listening in. With DRM, the trust chain is compromised because Bob and Charlie are the same person. With your situation, Charlie is the guy writing the software that Alice and Bob use to communicate. There's an implied trust, because if you don't trust Charlie then you can't trust Charlie's software, either.
That's the root of the issue: trust. If they can't trust the programmer, the game is over before it starts.

There are lots of options based on what their goal really is, but I am confused by their paranoia, er, intent:
Is this their (and end-user) data that they wish to keep private or end-user data to be kept private from everyone?
Is it just that your (or any contracted) company is suspect?
Are they afraid of over-the-wire snooping?
Are they afraid of DOM access through JavaScript or browser plugins?
Are they planning staged deployment? In that case you work on test/dev server w/o real data but have no access to the production server with the real data, and DNS logging and/or firewall rules inhibit all of your hacks from working undetected.
Ultimately if the data is stored in a DB then the programmer and DB admin can, by working together, get it. Period. A good audit should uncover that, though.

If this is truly a requirement, the only way to guard against this is to hire an outside firm to audit the code prior to releasing the software, and that's going to be very expensive.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight