Anonymizing your application database - database

I'd like to keep the real names, emails, and any other personal identifiable information out of my primary application database, and in another database/encrypted file. And I'm curious on if there's a best practices solution for this or if I'm totally over looking something.
Some thoughts I had were the following:
User logs in with a username and password that are both hashed in the primary database
This server then makes some sort of secure call to the member database with the user's id
And in return the member database returns the name, email, address etc.
I'm wondering if this is the right approach, and if so where the keys are stored and authenticated etc..

It's an interesting question, I think, but it needs some more context. That is, you need to be clear about who you are wishing to anonymise them against. That is, who is the threat, here? Do you want the information hidden from only the public? Clearly, that's trivial, just don't show it (don't link it). Do you want the information hidden from someone who gains access to your database? How hidden? How will they get access to your db? Can they, if they gain access to the not-anonymous one, get access to the other? OpenID may also be of interest to you (externalise the authentication, you just do role management).
I'd suggest sit down and plan that out a bit.
You don't want to introduce complexity (multiple db's, linking, etc) if they're all just on the same server anyway, and hence accessible to any successful attacker.
I'd think the number 1 solution to keeping things anonymous is to never actually collect any information yourself. It's more of a model thing (i.e. the details of your app matter).

Related

Is it beneficial to encrypt usernames stored in the database?

The first & accepted answer on this question about passwords management suggests to encrypt the user identifiers in DB.
The good point is that if anyone gets a password, he has to know how to decrypt the user login to get the full login/password pair.
Some disadvantages I see, for example:
you have to decrypt user logins every time you want to display them
if you want to do a 'begins with' search on user login to find users, you cannot simply use LIKE '...%'
ORDER BY on login field may be quite difficult too...
What would you recommend (encrypt user identifiers or not)?
As usual, the answer is "it depends".
In general, I'd say that if an attacker has access to your database, your security situation is so badly compromised that encrypting the passwords will likely do you no favours. This is different to using a one-way hash - it's likely that an attacker who has access to your database also has access to your decryption key, whereas one-way hashes, by definition, are one way.
As you already say, it's likely that you will need regular access to the userIDs (esp. if you use email addresses as user IDs); in that case, again, an attacker who can read your database likely can intercept the unencrypted data.
So, if you work for a bank, the government, or any other place where data security has to be at the very top of the list, this additional protection may just be worth it, especially if you have a strong key management system.
For other uses, I'd consider the additional security too small to merit the additional pain.
Encryption is considered to be a lesser form of secret storage than message digest functions. In fact, storing an encrypted password is a clear violation of CWE-257.
But why not hash the username? When the login the application will have the plain text. Depending on your application, you might not need to display a list of users. this would be an added layer of security, as both hashes have to be broken before the attacker can login.
That being said, if you have a plain text list of every username it will be trivial to perform a dictionary attack against any recovered hash. Further more user names are not created to be difficult to guess, often times users choose goofy names of birds or silly games like chess so that they are easy to remember.

User Privacy and Data Security in Databases

Is there a way to design a database where users' profile data can be protected in case of system intrusion?
For example, private data may be valuable, even though it may not be directly related to financial transactions or passwords. [Phone numbers, email addresses, etc may be resold]
Is there a way to design a member-based website in which personal information is stored and manipulated by an authorized agent of the site. However, in this case the data could become devalued if the database was disconnected from the system.
I realize that most of user privacy and data concerns are mostly focused on the UI and preventing unauthorized users from accessing data that does not concern them. However, my question is how can user information be protected within the backend itself. I also realize that this is a case of loss of access of the data [and you can do anything reactive (technical wise)], but what I'm trying to discover is if something can be done proactively to dampen the blow.
My question is: how can personal data be protected, without taking away from the actual purpose of the data?
I can see a need for encrypting all information, but that would prevent groups from accessing the data related to the user. For example, you could encrypt userss zip codes with a private key, but how would you retain the ability to use the zip code's location information. [Maybe to make a claim to the user "these people may be near you"]
In this hypothetical situation, the attacker is given an exploited database. They can not use the originating system to manipulate the DB.
If you only access the users' information when they are logged in, you can use their password to decrypt data from the database. During this time, you could calculate a one-way hash from the zip code, for example, and store that unecrypted. But if you develop your database during the site's lifetime, you won't be able to update the information for old users who never log in. Maybe that isn't a problem in practice.
I'm going to presume that you're indicating an intruder has gained not only access to the machine, but any MySQL passwords as well.
With that being the case, the only solution I can think of would be to use an encrypted filesystem underneath the database.
Even there, though, you've still lost physical control of the device - so if the intruder has figured out the decryption key for the fs, you're still up the proverbial creek.
Another option would be to split the application so that the web site and database do not share the same system. And, of course, typically running the site with a read-only user is a Good Idea™ :)
Storing data in the database in a hashed or encrypted form is a good start, but you still need to have access to the decryption key, or it's going to be gobbledygook.
Ultimately, of course, the best way to secure the database is to prevent access - which is not a viable solution.
This is an issue that appears in the press periodically if/when a bank or store's customer list is accessed in an unauthorized form - the data is secure only via the interface/application: it's still in cleartext at some point underneath everything else.
Proper access controls are probably the most sensible approach in your example since you're not wanting to give up, e.g., the ability to sort or query against the fields in question.

Exposing database IDs - security risk?

I've heard that exposing database IDs (in URLs, for example) is a security risk, but I'm having trouble understanding why.
Any opinions or links on why it's a risk, or why it isn't?
EDIT: of course the access is scoped, e.g. if you can't see resource foo?id=123 you'll get an error page. Otherwise the URL itself should be secret.
EDIT: if the URL is secret, it will probably contain a generated token that has a limited lifetime, e.g. valid for 1 hour and can only be used once.
EDIT (months later): my current preferred practice for this is to use UUIDS for IDs and expose them. If I'm using sequential numbers (usually for performance on some DBs) as IDs I like generating a UUID token for each entry as an alternate key, and expose that.
There are risks associated with exposing database identifiers. On the other hand, it would be extremely burdensome to design a web application without exposing them at all. Thus, it's important to understand the risks and take care to address them.
The first danger is what OWASP called "insecure direct object references." If someone discovers the id of an entity, and your application lacks sufficient authorization controls to prevent it, they can do things that you didn't intend.
Here are some good rules to follow:
Use role-based security to control access to an operation. How this is done depends on the platform and framework you've chosen, but many support a declarative security model that will automatically redirect browsers to an authentication step when an action requires some authority.
Use programmatic security to control access to an object. This is harder to do at a framework level. More often, it is something you have to write into your code and is therefore more error prone. This check goes beyond role-based checking by ensuring not only that the user has authority for the operation, but also has necessary rights on the specific object being modified. In a role-based system, it's easy to check that only managers can give raises, but beyond that, you need to make sure that the employee belongs to the particular manager's department.
There are schemes to hide the real identifier from an end user (e.g., map between the real identifier and a temporary, user-specific identifier on the server), but I would argue that this is a form of security by obscurity. I want to focus on keeping real cryptographic secrets, not trying to conceal application data. In a web context, it also runs counter to widely used REST design, where identifiers commonly show up in URLs to address a resource, which is subject to access control.
Another challenge is prediction or discovery of the identifiers. The easiest way for an attacker to discover an unauthorized object is to guess it from a numbering sequence. The following guidelines can help mitigate that:
Expose only unpredictable identifiers. For the sake of performance, you might use sequence numbers in foreign key relationships inside the database, but any entity you want to reference from the web application should also have an unpredictable surrogate identifier. This is the only one that should ever be exposed to the client. Using random UUIDs for these is a practical solution for assigning these surrogate keys, even though they aren't cryptographically secure.
One place where cryptographically unpredictable identifiers is a necessity, however, is in session IDs or other authentication tokens, where the ID itself authenticates a request. These should be generated by a cryptographic RNG.
While not a data security risk this is absolutely a business intelligence security risk as it exposes both data size and velocity. I've seen businesses get harmed by this and have written about this anti-pattern in depth. Unless you're just building an experiment and not a business I'd highly suggest keeping your private ids out of public eye. https://medium.com/lightrail/prevent-business-intelligence-leaks-by-using-uuids-instead-of-database-ids-on-urls-and-in-apis-17f15669fd2e
It depends on what the IDs stand for.
Consider a site that for competitive reason don't want to make public how many members they have but by using sequential IDs reveals it anyway in the URL: http://some.domain.name/user?id=3933
On the other hand, if they used the login name of the user instead: http://some.domain.name/user?id=some they haven't disclosed anything the user didn't already know.
The general thought goes along these lines: "Disclose as little information about the inner workings of your app to anyone."
Exposing the database ID counts as disclosing some information.
Reasons for this is that hackers can use any information about your apps inner workings to attack you, or a user can change the URL to get into a database he/she isn't suppose to see?
We use GUIDs for database ids. Leaking them is a lot less dangerous.
If you are using integer IDs in your db, you may make it easy for users to see data they shouldn't by changing qs variables.
E.g. a user could easily change the id parameter in this qs and see/modify data they shouldn't http://someurl?id=1
When you send database id's to your client you are forced to check security in both cases. If you keep the id's in your web session you can choose if you want/need to do it, meaning potentially less processing.
You are constantly trying to delegate things to your access control ;) This may be the case in your application but I have never seen such a consistent back-end system in my entire career. Most of them have security models that were designed for non-web usage and some have had additional roles added posthumously, and some of these have been bolted on outside of the core security model (because the role was added in a different operational context, say before the web).
So we use synthetic session local id's because it hides as much as we can get away with.
There is also the issue of non-integer key fields, which may be the case for enumerated values and similar. You can try to sanitize that data, but chances are you'll end up like little bobby drop tables.
My suggestion is to implement two stages of security.
"Security through obscurity": You can have integer Id as primary key and Gid as GUID as surrogate key in tables. Whereas integer Id column is used for relations and other database back-end and internal purposes (and even for select list keys in web apps to avoid unnecessary mapping between Gid and Id while loading and saving) and Gid is used for REST Urls i.e for GET,POST, PUT, DELETE etc. So that one cannot guess the other record id. This gives first level of protection against guess-based attacks. (i.e. number series guessing)
Access based control at Server side : This is most important, and you have various way to validate the request based on roles and rights defined in application. Its up to you to decide.
From the perspective of code design, a database ID should be considered a private implementation detail of the persistence technology to keep track of a row. If possible, you should be designing your application with absolutely no reference to this ID in any way. Instead, you should be thinking about how entities are identified in general. Is a person identified with their social security number? Is a person identified with their email? If so, your account model should only ever have a reference to those attributes. If there is no real way to identify a user with such a field, then you should be generating a UUID before hitting the DB.
Doing so has a lot of advantages as it would allow you to divorce your domain models from persistence technologies. That would mean that you can substitute database technologies without worrying about primary key compatibility. Leaking your primary key to your data model is not necessarily a security issue if you write the appropriate authorization code but its indicative of less than optimal code design.

Should application users be database users?

My previous job involved maintenance and programming for a very large database with massive amounts of data. Users viewed this data primarily through an intranet web interface. Instead of having a table of user accounts, each user account was a real first-class account in the RDBMS, which permitted them to connect with their own query tools, etc., as well as permitting us to control access through the RDBMS itself instead of using our own application logic.
Is this a good setup, assuming you're not on the public intranet and dealing with potentially millions of (potentially malicious) users or something? Or is it always better to define your own means of handling user accounts, your own permissions, your own application security logic, and only hand out RDBMS accounts to power users with special needs?
I don't agree that using the database for user access control is as dangerous others are making it out to be. I come from the Oracle Forms Development realm, where this type of user access control is the norm. Just like any design decision, it has it's advantages and disadvantages.
One of the advantages is that I could control select/insert/update/delete privileges for EACH table from a single setting in the database. On one system we had 4 different applications (managed by different teams and in different languages) hitting the same database tables. We were able to declare that only users with the Manager role were able to insert/update/delete data in a specific table. If we didn't manage it through the database, then each application team would have to correctly implement (duplicate) that logic throughout their application. If one application got it wrong, then the other apps would suffer. Plus you would have duplicate code to manage if you ever wanted to change the permissions on a single resource.
Another advantage is that we did not need to worry about storing user passwords in a database table (and all the restrictions that come with it).
I don't agree that "Database user accounts are inherently more dangerous than anything in an account defined by your application". The privileges required to change database-specific privileges are normally MUCH tougher than the privileges required to update/delete a single row in a "PERSONS" table.
And "scaling" was not a problem because we assigned privileges to Oracle roles and then assigned roles to users. With a single Oracle statement we could change the privilege for millions of users (not that we had that many users).
Application authorization is not a trivial problem. Many custom solutions have holes that hackers can easily exploit. The big names like Oracle have put a lot of thought and code into providing a robust application authorization system. I agree that using Oracle security doesn't work for every application. But I wouldn't be so quick to dismiss it in favor of a custom solution.
Edit: I should clarify that despite anything in the OP, what you're doing is logically defining an application even if no code exists. Otherwise it's just a public database with all the dangers that entails by itself.
Maybe I'll get flamed to death for this post, but I think this is an extraordinarily dangerous anti-pattern in security and design terms.
A user object should be defined by the system it's running in. If you're actually defining these in another application (the database) you have a loss of control.
It makes no sense from a design point of view because if you wanted to extend those accounts with any kind of data at all (email address, employee number, MyTheme...) you're not going to be able to extend the DB user and you're going to need to build that users table anyway.
Database user accounts are inherently more dangerous than anything in an account defined by your application because they could be promoted, deleted, accessed or otherwise manipulated by not only the database and any passing DBA, but anything else connected to the database. You've exposed a critical system element as public.
Scaling is out of the question. Imagine an abstraction where you're going to have tens or hundreds of thousands of users. That's just not going to manageable as DB accounts, but as records in a table it's just data. The age old argument of "well there's onyl ever going to be X users" doesn't hold any water with me because I've seen very limited internal apps become publicly exposed when the business feels it's could add value to the customer or the company just got bought by a giant partner who now needs access. You must plan for reasonable extensibility.
You're not going to be able to share conn pooling, you're not going to be any more secure than if you just created a handful of e.g. role accounts, and you're not necessarily going to be able to affect mass changes when you need to, or backup effectively.
All in there seems to be numerous serious problems to me, and I imagine other more experienced SOers could list more.
I think generally. In your traditional database application they shouldnt be. For all the reason already given. In a traditional database application there is a business layer that handles all the security and this is because there is such a strong line between people who interact with the application, and people who interact with the database.
In this situation is is generally better to manage these users and roles yourself. You can decide what information you need to store about them, and what you log and audit. And most importantly you define access based on pure business rules rather than database rules. Its got nothing to do with which tables they access and everything to do with whether they can insert business action here. However these are not technical issues. These are design issues. If that is what you are required to control then it makes sense to manage your users yourself.
You have described a system where you allow users to query the database directly. In this case why not use DB accounts. They will do the job far better than you will if you attempt to analyse the querys that users write and vet them against some rules that you have designed. That to me sounds like a nightmare system to write and maintain.
Don't lock things down because you can. Explain to those in charge what the security implications are but dont attempt to prevent people from doing things because you can. Especially not when they are used to accessing the data directly.
Our job as developers is to enable people to do what they need to do. And in the situation you have described. Specifically connect to the database and query it with their own tools. Then I think that anything other than database accounts is either going to be insecure, or unneccasarily restrictive.
"each user account was a real first-class account in the RDBMS, which permitted them to connect with their own query tools, etc.,"
not a good idea if the RDBMS contains:
any information covered by HIPAA or Sarbanes-Oxley or The Official Secrets Act (UK)
credit card information or other customer credit info (POs, lines of credit etc)
personal information (ssn, dob, etc)
competitive, proprietary, or IP information
because when users can use their own non-managed query tools the company has no way of knowing or auditing what information was queried or where the query results were delivered.
oh and what #annakata said.
I would avoid giving any user database access. Later, when this starts causing problems, taking away their access becomes very dificult.
At the very least, give them access to a read-only replica of the database so they can't kill your whole company with a bad query.
A lot of database query tools are very advanced these days, and it can feel a real shame to reimplement the world just to add restrictions. And as long as the database user permissions are properly locked down it might be okay. However in many cases you can't do this, you should be exposing a high-level API to the database to insert objects over many tables properly, without the user needing specific training that they should "just add an address into that table there, why isn't it working?".
If they only want to use the data to generate reports in Excel, etc, then maybe you could use a reporting front end like BIRT instead.
So basically: if the users are knowledgeable about databases, and resources to implement a proper front-end are low, keep on doing this. However is the resource does come up, it is probably time to get people's requirements in for creating a simpler, task-oriented front-end for them.
This is, in a way, similar to: is sql server/AD good for anything
I don't think it's a bad idea to throw your security model, at least a basic one, in the database itself. You can add restrictions in the application layer for cosmetics, but whichever account the user is accessing the database with, be it based on the application or the user, it's best if that account is restricted to only the operations the user is allowed.
I don't speak for all apps, but there are a large number I have seen where capturing the password is as simple as opening the code in notepad, using an included dll to decrypt the configuration file, or finding a backup file (e.g. web.config.bak in asp.net) that can be accessed from the browser.
*not a good idea if the RDBMS contains:
* any information covered by HIPAA or Sarbanes-Oxley or The Official Secrets Act (UK)
* credit card information or other customer credit info (POs, lines of credit etc)
* personal information (ssn, dob, etc)
* competitive, proprietary, or IP information*
Not true, one can perfectly manage which data a database user can see and which data it can modify. A database (at least Oracle) can also audit all activities, including selects. To have thousands of database users is also perfectly normal.
It is more difficult to build good secure applications because you have to program this security, a database offers this security and you can configure it in a declarative way, no code required.
I know, I am replying to a very old post, but recently came across same situation in my current project. I was also thinking on similar lines, whether "Application users be Database users?".
This is what I analysed:
Definitely it doesn't make sense to create that big number of application users on database(if your application is going to be used by many users).
Let's say you created X(huge number) of users on database. You are opening a clear gateway to your database.
Let's take a scenario for the solution:
There are two types of application users (Managers and Assistant). Both needs access to database for some transactions.
It's obvious you would create two roles, one for each type(Manager and Assistant) in database. But how about database user to connect from application. If you create one account per user then you would end up linearly creating the accounts on the database.
What I suggest:
Create one database account per Role. (Let's say Manager_Role_Account)
Let your application have business logic to map an application user with corresponding role.(User Tom with Manager role to Manager_Role_Account)
Use the database user(Manager_Role_Account) corresponding to identified role in #2 to connect to database and execute your query.
Hope this makes sense!
Updated: As I said, I came across similar situation in my project (with respect to Postgresql database at back end and a Java Web app at front end), I found something very useful called as Proxy Authentication.
This means that you can login to the database as one user but limit or extend your privileges based on the Proxy user.
I found very good links explaining the same.
For Postgresql below Choice of authentication approach for
financial app on PostgreSQL
For Oracle Proxy Authentication
Hope this helps!
It depends (like most things).
Having multiple database users negates connection pooling, since most libraries handle pooling based on connection strings and user accounts.
On the other hand, it's probably a more secure solution than anything you or I will do from scratch. It leaves security up to the OS and Database server, which I trust much more than myself. However, this is only the case if you go to the effort to configure the database permissions well. If you're using a bunch of OS/db users with the same permissions,it won't help much. You'll still get an audit trail, but that's about it.
All that said, I don't know that I'd feel comfortable letting normal users connect directly to the database with their own tools.
I think it's worth highlighting what other answers have touched upon:
A database can only define restrictions based on the data. Ie restrict select/insert/update/delete on particular tables or columns. I'm sure some databases can do somewhat cleverer things, but they'll never be able to implement business-rule based restrictions like an application can. What if a certain user is allowed to update a column only to certain values (say <1000) or only increase prices, or change either of two columns but not both?
I'd say unless you are absolutely sure you'll never need anything but table/column granularity, this is reason enough by itself.
This is not a good idea for any application where you store data for multiple users in the same table and you don't want one user to be able to read or modify another user's data. How would you restrict access in this case?

How do I create a web application where I do not have access to the data?

Premise: The requirements for an upcoming project include the fact that no one except for authorized users have access to certain data. This is usually fine, but this circumstance is not usual. The requirements state that there be no way for even the programmer or any other IT employee be able to access this information. (They want me to store it without being able to see it, ever.)
In all of the scenarios I've come up with, I can always find a way to access the data. Let me describe some of them.
Scenario I: Restrict the table on the live database so that only the SQL Admin can access it directly.
Hack 1: I rollout a change that sends the data to a different table for later viewing. Also, the SQL Admin can see the data, which breaks the requirement.
Scenario II: Encrypt the data so that it requires a password to decrypt. This password would be known by the users only. It would be required each time a new record is created as well as each time the data from an old record was retrieved. The encryption/decryption would happen in JavaScript so that the password would never be sent to the server, where it could be logged or sniffed.
Hack II: Rollout a change that logs keypresses in javascript and posts them back to the server so that I can retrieve the password. Or, rollout a change that simply stores the unecrypted data in a hidden field that can be posted to the server for later viewing.
Scenario III: Do the same as Scenario II, except that the encryption/decryption happens on a website that we do not control. This magic website would allow a user to input a password and the encrypted or plain-text data, then use javascript to decrypt or encrypt that data. Then, the user could just copy the encrypted text and put the in the field for new records. They would also have to use this site to see the plain-text for old records.
Hack III: Besides installing a full-fledged key logger on their system, I don't know how to break this one.
So, Scenario III looks promising, but it's cumbersome for the users. Are there any other possibilities that I may be overlooking?
If you can have javascript on the page, then I don't think there's anything you can do. If you can see it in a browser, then that means it's in the DOM, which means you can write a script to get it and send it to you after it has been decrypted.
Aren't these problems usually solved via controls:
All programmers need a certain level of clearance and background checks
They are trained to understand that rolling out code to access the data is a fireable or worse offense
Every change in certain areas needs some kind of signoff
For example -- no JavaScript on page without signoff.
If you are allowed to add any code you want, then there's always a way, IMO.
Ask the client to provide an Non-disclosure Agreement for you to sign, sign it, then look at as much data as you want.
What I'm wondering is, what exactly will you be able to do with encrypted data anyway? Pretty-much all apps require you to do some filtering of the data, whether it be move it to a required place, modify it, sanitize it, or display it. Otherwise, you're just a glorified pipe, and you don't have to do any work.
The only way I can think of where you wouldn't be looking at the data or doing anything with it would be a simple form to table mapping with CRUD options. If you know what format the data will be coming in as you should be able to roll something out with RoR, a simple skin, put SSL into the mix, and roll it out. Test with dummy data in the same format, and you're set.
In fact, is your client unable to supply dummy data for testing? If they can, then your life is simple as all you do is provide an "installable" and tell them how to edit a config file.
I think you could still create the app in the following way:
Create a dev database and set up a user for it.
Ask them for: the data type, size, and name of each field that needs to be on the screen.
Set up the screens, create columns in the database that accept the data type and size they specify.
Deploy the app to production, hooked up to an empty database. Get someone with permission (not you) to go in and set the password on the database user and set the password for the DB user in the web app.
Authorized users can then do whatever they want and you never saw what any of the data looked like.
Of course, maintaining the app and debugging is gonna be a bitch!
--In answer to comments:
Ok, so after setting up the password for the Username in the database and in the web app's config, write a program that connects to the database, sets a randomized password, then writes that same randomized password to the web config.
Prevent any outgoing packets from the machine except to a set of authorized workstations - so you can't install your spyware.
Then set the Admin password on both servers to the same random password, then delete all other users on the servers, delete the program, and delete the program source code.
Wipe the hard drives of the developer machines with the DOD algorithm, and then toss them into an industrial shredder.
10. If the server ever needs debugging, toss it in the trash, buy a new one, and start back at #1.
But seriously - this is an insolvable problem. The best answer to this really is:
Tell them they can't have an application. Write your stuff on paper. Put it in a folder. Lock it in a vault. Thrust, repeat.
Wouldn't scenario 3 just expose all the data to the magic website? This doesn't sound like a solvable problem (at least I can't think of a solution).
Go with whatever solution is easiest for you to implement, I think the requirements show the the client does not understand software development and so it should be easy to sell any approach you take.
I have to say I really don't like the idea of using JavaScript on the client to decrypt the data. That is a huge hole as any script (hacker, GreaseMonkey, IE7Pro, etc.) can access the DOM and get data out of the page.
Also, it is very hard to get around the problem of key stroke loggers. If you throw those into the mix, then your options are limited. At that point you need a security FOB such as RSA (commonly used with corporate VPNs) to generate truly random PINs. That will probably be expensive, and it is a pain, and I have only seen it used with VPNs but I assume it could work with websites as well.
As far as the website, I'd stick with HTTPS and find a way to encrypt/decrypt through the WebServer rather than relying on JavaScript. The SSL traffic isn't very prone to sniffing (very difficult to decrypt), so that allows the encryption and decryption to happen server-side which (IMHO) is more secure.
Look at banking scenarios and other financial institutions for a starting point, and then go from there. Try not to over-complicate if possible.
You can't guarantee against hacking into the data as long as you have access to the server it lives on. So tell the employer they have to host the data somewhere else and grant access to the client's browser via a secure HTTPS connection.
You can design your web page to dynamically load an XML data stream securely, and format it into a web page using an XSLT script on the client.
See http://www.w3schools.com/xsl/xsl_client.asp for examples
That way you produce the code, but you never have access to the data. Only the user has access to their own data.
As for how the employer is going to host the data without granting any IT people access to it, that's their problem. It's a foolish requirement.
I think that I'll just tell them that they either have to trust a couple of us to have access (and not look at it) or they don't get a project.
Thanks for the answers. Feel free to post more thoughts if you have them.
You can never have 100% security, and extra security comes at a cost of speed/price/convenience etc.
Let's suppose you take scenario 3 - one of your programmers can use social engineering to get the password from one of the users. Goodbye security.
There's no point having a high-security iron door as a gate if people can just walk around it. Just implement a decent level of security.
(They want me to store it without being able to see it, ever.)
Hey, the recording industry wants people to be able to listen to their music, but not copy it. Sounds like they should get together sometime!
Their idea won't work for the same reason DRM doesn't work: the trust chain is inherently compromised. Encryption examples often use Alice, Bob, and Charlie where Alice is trying to communicate with Bob without Charlie listening in. With DRM, the trust chain is compromised because Bob and Charlie are the same person. With your situation, Charlie is the guy writing the software that Alice and Bob use to communicate. There's an implied trust, because if you don't trust Charlie then you can't trust Charlie's software, either.
That's the root of the issue: trust. If they can't trust the programmer, the game is over before it starts.
There are lots of options based on what their goal really is, but I am confused by their paranoia, er, intent:
Is this their (and end-user) data that they wish to keep private or end-user data to be kept private from everyone?
Is it just that your (or any contracted) company is suspect?
Are they afraid of over-the-wire snooping?
Are they afraid of DOM access through JavaScript or browser plugins?
Are they planning staged deployment? In that case you work on test/dev server w/o real data but have no access to the production server with the real data, and DNS logging and/or firewall rules inhibit all of your hacks from working undetected.
Ultimately if the data is stored in a DB then the programmer and DB admin can, by working together, get it. Period. A good audit should uncover that, though.
If this is truly a requirement, the only way to guard against this is to hire an outside firm to audit the code prior to releasing the software, and that's going to be very expensive.

Resources