Hiding passwords / keys in compiled application - c

In my C application I have a decryption key that is used to decrypt sets in the database (username / password). Currently, I simply declared it with
char * key = "$$$secretSampleDecryptionKey$$$";
Shortly after that line, I prepare the SQL statement and then select from the DB. My question is, if someone was to debug my compiled application or dissassemble it, will they actually see the key? What can I do to hide it from them?
EDIT:
As Mark and Aaron pointed out, I can simply use the Linux / Unix strings command
strings nameOfApplication
to print out all the strings in my application, including the "secret" key.
EDIT 2:
The app runs on my server and the database stores sensitive customer data that is encrypted. I thought I was playing it safe by not having the key in a text file for everyone to read but compile it instead.

An interesting link relating the story of someone retrieving a password from a binary :
Deconstructing an ELF File
This is a step-by-step description of what someone could try to discover a password. It will give you some idea of what "not to do". The use of the command strings is the first item in the list for example.
If you want to hide your secret string from strings, you can store it in as a char array not terminated with \0 character. strings should not pick it up.
There is also a nice trick mentioned (which is bypassed) to avoid someone to use a strace/ltrace on your binary.
Ultimately by disassembling the code, the "hacker" manage to retrieve the password, which as other have pointed out is difficult to protect against. Basically you can't really hide anything in a binary...

If the key is in your source then an attacker will be able to find it. The best you can do is to make it more difficult for them.
The stored key should not be text, but binary. That way you avoid searches for strings. Presumably if you have the key present in the code your users do not need to be able to type it in.
Store the key in at least two random looking binary arrays that are XOR'ed together to make the actual key. Alternatively, pick one of the standard text strings that is present in your application anyway, something like: "Please enter the Zipcode: ", and use that as your key, or as one component of the XOR. Hashing such a message would get it to a standard length if needed.

Using a debugger / disassembler the user will always be able to find out the password. You can make it harder (e.g. use obfuscation), but not impossible.
If you really do have a secret (i.e. a private key needed to decrypt the data), you can perform decryption on a smartcard.
In your scenario concerning usernames and password, you might just store the password-hash in the database (see referenced answers in Best way to store password in database)

Can someone see it?
The command strings will show the string, no need to disassemble the application.
Disassembling will just make it more simple to identify which of the 15'000 strings is used as key.
What can I do to hide it from them?
There is only one solution: Don't put it in the code.
Instead, use a license key or similar technique where the user knows the key.

I wonder if someone could give us a real answer to solve this problem. From my experience as an web dev I can tell you that what you give to client does not belong to you anymore to control. Consider a website using some encryption algorithm on the server-side and a hard-coded javascript technique on the client, and the webdev, himself, guided by his own vanity, do not want to show it to the world, but still to be used by the clients, as it is.
In this case, what can he do? Yes, yes, he can come up with the idea to put his script in an infinite loop based on setTimeout, all as an anonymous function, so it can't be tracked, but still the initialisation must be done somewhere, the code must be visibile, further more, he decide to send the code after load in an encrypted way, but still, on the client you wil still have to have the decryption key, so someone who want's the information will still have the two necessary pieces of this puzzle. But our programmer is perseveringly, so he creates the decryption function every time to match only one encrypted string, but still it does him no good. As the client will still have the string and the matching function.
Anything he can do is to find a way to use the environment so that the function can be used only one time, after that the code used to expire as the string, and the real information to be lost forever. And the thing with the highest importance is to make the use of the environment in such a way that the context of the execution of the decryption function can not be forged.
I know that I do not answered your question but I pin pointed some important details of the problem you mentioned. If you work with C there must be some tools you can use, as creating a context using some memory state or an actual system operation to get you something that can't be forged.
EDIT 1:
You could create an interesting domino efect in your code leaking bits of the encryption key based on the execution as when it is needed you wil have it entirely but it would not be stored in a file or in a string in your compiled file, so it can only be found at runtime, and it only be found in some specific conditions, and further more it might take some hrd reverse engineering to get it. Might be a good solution.
With great respect,
Paul

Related

Read specific section of a line in a formatted file with low level functions

I am trying to build an authentication system using C programming Language. I have already wrote the code for the functions to take user input (username & password) and to inset it into the database (a .txt file) in the following formatted way:
ID USERNAME PASSWORD
... ... ...
... ... ...
... ... ...
EOF(just showing the end of the file for the sake or question comprehensibility)
Between each string there is a \t char.
To make sure the ID (which is pseudo-random generated), the username and the password do not have duplicate inside the database I want to write three functions able to read just the id, just the username and just the password, then compare the result of each with the users input, returning values according to the result of the reading, but I don't know the correct way to do it using low level functions (read(), lseek());
To be sure we are on the same page: I don't want one of you to write code for me, this is unethical and will remove the fun from writing the whole thing by myself, I would just like some hint that will make me understand in which directions the algorithm should go.
… and the password do not have duplicate …
I hope you mean IDs, not passwords. You must never tell a user that their password is already in the database! That means they now just have to try all other user names (which might be easy to guess! Anyway, easier to guess than a password) with the password they've tried to set for themselves.
By the way, I'm assuming this is a learning experience, not a production system. In anything that actually handles user logins, you do not ever store passwords, but salted hashes of passwords. That way, someone that gets your database file still can't authenticate with that – because your system doesn't accept hashes, it accepts passwords and calculates the hashes and checks them against the database.
(If this was a production system, you'd also gladly use a well-tested library to manage your data, because then you don't have to worry about your own bugs, or making sure two concurrent processes don't try to write to the file the same time and corrupt it. It might sound a bit like overkill, but sqlite would be such a system where you can trivially make a compact, safe-to-use system and use the built-in hashing functions to store and check password hashes. It's really ubiquitous!)
but I don't know the correct way to do it using low level functions (read(), lseek());
You can't solve this using seek/lseek, because your text file has variable line length – before reading a line completely, you can't know where the next line starts.
So, use higher-level functions to read tab-separated strings.
The way forward here is to scanf each line, to get ID, USERNAME and PASSWORD, ignore the password, and check against what the user entered.

One way encrypting primary key

What is the best one way permutation function I could use to digest an e-mail so I can use it as a primary key without storing personal data?
I'm getting my first F2P game ready: a simple yet (hopefully) addictive 2D casual puzzler based on aiming mechanics. It's made with Unity and will be released on Android very soon.
In order for the player to keep the same data across different devices, I have an SQL table with the device e-mail as the primary key, then another string as the savegame data.
But I don't want to store the user e-mail for privacy reasons.
So I thought of digesting it with some function that would use the original e-mail to generate a new string that:
is unique (will never collide with another string generated from a different e-mail address)
is not decypherable (there should be no way to obtain the original e-mail from the digested string - or at least it should be hard enough)
This way I could still use the Android device e-mail to retrieve the savegame data, without storing personal data from the player.
As far as I've researched, the solution seems to be called a one way permutation function. The problem is that I can't seem to find an appropriate function on the internet; instead, all answers seem to be plagued with solutions for password hashing, which is very interesting (salting, MD5, SHAXXX...) but don't meet my first requirement of no collision.
Thank you in advance for any answer on this topic.
What you need is a cryptographic hash function such as SHA-256. Such functions are designed to be collision resistant, Git uses an older version SHA-1. Most languages/systems have support of this, just Google "Android SHA-256" along with your language of choice.
One option is to append a creation timestamp.
Update: Since SHA-256 does not provide sufficient collision resistance consider s GUID, from RFC 4122: "A UUID is 128 bits long, and can guarantee uniqueness across space and time.". Of course you need to find a good implementation.

Should I change my License Key output from pure md5 output to a common "XXXX-YYYY-ZZZZ" type code?

I'm creating a simple license key system to "keep honest people honest". I don't care about especially stringent cryptography.
If they get to annoyed with the demo limitations, they go to my registration website, pay, and give me their email. I give them a license key.
I'm keeping things really simple, so:
license_key = md5(email + "Salt_String");
I have PHP and C# functions run that same algorithm and get the same key.
The problem is that the output of these functions is a 32-character string like:
A69761CF99316358D04771C5ECFCCDC5
Which is potentially hard to remember/type. Yes, I know about copy/paste, but I want to make it REALLY easy for all paying customers to unlock the software.
Should I somehow convert this long string into something shorter?
Lets say I use only the first 6 digits, so: A69761
There are obviously way more cryptographic collisions in that, but will it matter at all in practical use?
Any other ideas to make the thing more human readable/typeable?
To left 6-10 symbols will be enough - the user anyway will not be able to guess the code, and it would be easy to type in.
Also good idea would be to register each license on your server, so that you will be able to check that user is really honest, and didn't give a license key to another person.
In my experience, asking the user to type or copy/paste a 30-character code indeed leads to frustrated customers. It's not that it's so difficult. It's simply a hurdle that people don't care for.
The solution I've used for my business is to have separate trial and purchased downloads. To get their licensed copy, the customer types in their email address and a short user ID on the download form. Entering only the email automatically resends the user ID. You didn't ask about this, but a system to automatically look up whatever code the customer needs is even more important than having a simple system. The download system looks up the user's details in the database and serves a SetupSomeProductCustomerName.exe that has the user's license embedded in it. This setup installs the customer's licensed copy without requiring any further identification or server connections.
This system has worked really well for us. The customer has only one file to back up and no serial numbers to lose to make sure they can reinstall the software in the future.
That said, if you prefer to use a system using a one-way hash, simply use an algorithm that generates a smaller hash. E.g. CRC-32 results in 8 hexadecimal digits.
There's no point in the hash being cryptographically secure. A cracker will simply walk through your code, copy the entire block of code that mutates the email address into the license key, and paste that into their keygen. Then they can generate license keys for any email address. They can do that regardless of how complex your hashing algorithm is.
If you want to prevent this, you need to use public key encryption, which results in keys that are far too long to type in. If you go that route, you'll either need to annoy your customers with long keys to paste in or separate key files, or use the personalized download system I described above.

How to tell the database type checking the file

My friend has a system to manage customers. The program per si is terrible, and my friend lost contact with the developers.
The case is, now my friend lost the access to program (something that the developers say "locked to machine" so when moved to another pc, he lost the access to program and data.
I get mission of to try to recover the database, migrating to another database, and create a cool program to my friend.
Now I need to discover which database was used by the developers. I know that the program was made using Visual Basic, because the MSVBVM60.DLL is required.
There is some program to read the metadata in the .dat files and discover which database was used?
You can try Determining File Format tools.
Unfortunately, it is possible that your .dat file is a "random access file", not database.
You cannot read data in that case, and if you don't know the structure of the file. The records are written in blocks and you have to know exact size of block to be able to jump from one block to next one. Probably some kind of encryptions are used.
If the file is a random access file (the VB sense) then it shouldn't be too hard to reverse engineer the format.
The first step would be determine the size of the records which you should be able to do with little forward knowlegde: it's just a matter of finding where strings begin and end and looking for repeats. For example, look for a string that looks like someone's first name and then scan forward until you find the next string that looks like a first name. That's your record size.
The next step would involve working out the actual fields. This will require a little more work, but basically you'll want to look up a record in the original software and then try to find the corresponding record in the first (for example, look for the first name/last name which should be relatively easy). Then it's just a matter of matching up fields in the UI with what's in the file. For example, dates integers and the like.
Of course, that's just a general overview, and that's assuming the file is in VB's native "random access" format. Good luck!
As well as trying to reverse engineer the file itself, as suggested in other responses, you could also try reverse engineering the application (DLL or EXE.)
There are several decompilers available, for example VB P-code/native compiler. A trial version is available. I have not tried this software, but it may give you enough to understand what is being stored in the data file, or help fill in the gaps where you can't figure out the meaning of data from the data file itself.

What are some techniques for stored database keys in URL

I have read that using database keys in a URL is a bad thing to do.
For instance,
My table has 3 fields: ID:int, Title:nvarchar(5), Description:Text
I want to create a page that displays a record. Something like ...
http://server/viewitem.aspx?id=1234
First off, could someone elaborate on why this is a bad thing to do?
and secondly, what are some ways to work around using primary keys in a url?
I think it's perfectly reasonable to use primary keys in the URL.
Some considerations, however:
1) Avoid SQL injection attacks. If you just blindly accept the value of the id URL parameter and pass it into the DB, you are at risk. Make sure you sanitise the input so that it matches whatever format of key you have (e.g. strip any non-numeric characters).
2) SEO. It helps if your URL contains some context about the item (e.g. "big fluffy rabbit" rather than 1234). This helps search engines see that your page is relevant. It can also be useful for your users (I can tell from my browser history which record is which without having to remember a number).
It's not inherently a bad thing to do, but it has some caveats.
Caveat one is that someone can type in different keys and maybe pull up data you didn't want / expect them to get at. You can reduce the chance that this is successful by increasing your key space (for example making ids random 64 bit numbers).
Caveat two is that if you're running a public service and you have competitors they may be able to extract business information from your keys if they are monotonic. Example: create a post today, create a post in a week, compare Ids and you have extracted the rate at which posts are being made.
Caveat three is that it's prone to SQL injection attacks. But you'd never make those mistakes, right?
Using IDs in the URL is not necessarily bad. This site uses it, despite being done by professionals.
How can they be dangerous? When users are allowed to update or delete entries belonging to them, developers implement some sort of authentication, but they often forget to check if the entry really belongs to you. A malicious user could form a URL like "/questions/12345/delete" when he notices that "12345" belongs to you, and it would be deleted.
Programmers should ensure that a database entry with an arbitrary ID really belongs to the current logged-in user before performing such operation.
Sometimes there are strong reasons to avoid exposing IDs in the URL. In such cases, developers often generate random hashes that they store for each entry and use those in the URL. A malicious person tampering in the URL bar would have a hard time guessing a hash that would belong to some other user.
Security and privacy are the main reasons to avoid doing this. Any information that gives away your data structure is more information that a hacker can use to access your database. As mopoke says, you also expose yourself to SQL injection attacks which are fairly common and can be extremely harmful to your database and application. From a privacy standpoint, if you are displaying any information that is sensitive or personal, anybody can just substitute a number to retrieve information and if you have no mechanism for authentication, you could be putting your information at risk. Also, if it's that easy to query your database, you open yourself up to Denial of Service attacks with someone just looping through URL's against your server since they know each one will get a response.
Regardless of the nature of the data, I tend to recommend against sharing anything in the URL that could give away anything about your application's architecture, it seems to me you are just inviting trouble (I feel the same way about hidden fields which aren't really hidden).
To get around it, we usaully encrypt the parameters before passing them. In some cases, the encyrpted URL also includes some form of verification/authentication mechanism so the server can decide if it's ok to process.
Of course every application is different and the level of security you want to implement has to be balanced with functionality, budget, performance, etc. But I don't see anything wrong with being paranoid when it comes to data security.
It's a bit pedantic at times, but you want to use a unique business identifier for things rather than the surrogate key.
It can be as simple as ItemNumber instead of Id.
The Id is a db concern, not a business/user concern.
Using integer primary keys in a URL is a security risk. It is quite easy for someone to post using any number. For example, through normal web application use, the user creates a user record with an ID of 45 (viewitem/id/45). This means the user automatically knows there are 44 other users. And unless you have a correct authorization system in place they can see the other user's information by created their own url (viewitem/id/32).
2a. Use proper authorization.
2b. Use GUIDs for primary keys.
showing the key itself isn't inherently bad because it holds no real meaning, but showing the means to obtain access to an item is bad.
for instance say you had an online store that sold stuff from 2 merchants. Merchant A had items (1, 3, 5, 7) and Merchant B has items (2, 4, 5, 8).
If I am shopping on Merchant A's site and see:
http://server/viewitem.aspx?id=1
I could then try to fiddle with it and type:
http://server/viewitem.aspx?id=2
That might let me access an item that I shouldn't be accessing since I am shopping with Merchant A and not B. In general allowing users to fiddle with stuff like that can lead to security problems. Another brief example is employees that can look at their personal information (id=382) but they type in someone else id to go directly to someone else profile.
Now, having said that.. this is not bad as long as security checks are built into the system that check to make sure people are doing what they are supposed to (ex: not shopping with another merchant or not viewing another employee).
One mechanism is to store information in sessions, but some do not like that. I am not a web programmer so I will not go into that :)
The main thing is to make sure the system is secure. Never trust data that came back from the user.
Everybody seems to be posting the "problems" with using this technique, but I haven't seen any solutions. What are the alternatives. There has to be something in the URL that uniquely defines what you want to display to the user. The only other solution I can think of would be to run your entire site off forms, and have the browser post the value to the server. This is a little trickier to code, as all links need to be form submits. Also, it's only minimally harder for users of the site to put in whatever value they wish. Also this wouldn't allow the user to bookmark anything, which is a major disadvantage.
#John Virgolino mentioned encrypting the entire query string, which could help with this process. However it seems like going a little too far for most applications.
I've been reading about this, looking for a solution, but as #Kibbee says there is no real consensus.
I can think of a few possible solutions:
1) If your table uses integer keys (likely), add a check-sum digit to the identifier. That way, (simple) injection attacks will usually fail. On receiving the request, simply remove the check-sum digit and check that it still matches - if they don't then you know the URL has been tampered with. This method also hides your "rate of growth" (somewhat).
2) When storing the DB record initially, save a "secondary key" or value that you are happy to be a public id. This has to be unique and usually not sequential - examples are a UUID/Guid or a hash (MD5) of the integer ID e.g. http://server/item.aspx?id=AbD3sTGgxkjero (but be careful of characters that are not compatible with http). Nb. the secondary field will need to be indexed, and you will lose benefits of clustering that you get in 1).

Resources