I have a database which has particular table "Attribs" that contains a list of attributes for a user like name, address, phone number etc. each of which is the key. The REST API I have uses this table to get attribute name and fills up another table. However, the keys in "Attribs" are being exposed in API.
e.g. if I am trying to put user name which is an attribute user_name then my REST API will have a line like
"user_name":"abcd"
Is is safe to expose these keys or may it cause any possible security issues?
We don't have enough context but no, it is an insecure practice, and there is an entire section on OWASP about it. https://www.owasp.org/index.php/Top_10_2010-A4-Insecure_Direct_Object_References It is a principle, do not directly expose IDs, but use and indirection layer.
Impacts in your case are not assessable with the description you provide, but I would apply the principle if I was you :)
Related
Pretty much as the title says, I wanted to check if it is completely fine or if it is bad practice for me to store the users database ID in my websites redux store.
E.g. its a website which can have a store or customer user, if for example the user is logged in they could have their favourite store's data in an array property in redux and in that array could be a list of objects with:
{storeId:Guid, storeName:string, storeIndustryType:enum, etc...}
The guid for storeId would be the primary key of that user from the store database and aspNetUsers database.
I use these ID's for CRUD operations related to things they want to do like add a new favourite store, remove a favourite store etc..
It's not ideal, but also commonly done this way. I wouldn't call it a bad practice, but the "best practice" is certainly different:
Ideally you'd use a separate unique ID that is exposed to the outside (of the internal DB system), which will then be used to identify the user in API requests etc. Like an alias.
The ID shouldn't allow to guess other existing IDs (by incrementing a counter for example), but you seem to have that covered with guid.
This question isn't specific to redux btw, it's more about how you identify entities in your system on the client side. You can alias them as described above, or expose the internally used IDs to the client. The latter is simpler and easier but less secure.
I have a question regarding specific data vault modelling.
I have a source table which captures call center CALL informations like this:
CallId (business key)
Date
Call_alert
Call_acw
etc
The same source table also has a bunch of foreign keys in it, like this:
RouteID (on which line the call eventually ends)
ConnectionType (phone, email etc)
Via each foreign key it is possible to retrieve extra-information about the key (which is not linked to the CALL).
My question is how to model these foreign keys in my model? Do i keep them as attributes in my satellite or do i model them as links? Or any other option i haven't thought about?
Thanks!!
I'll focus on one example you give (RouteID) but the discussion is probably the same for each.
The first thing to remember is that the aim in Data Vault is to model the business and business processes not the system the data is stored in. Foreign keys may be an indication of something meaningful (a linking between two hubs) or they may not (the product of normalisation within the database that you may not need to replicate).
The first step in your case is to think about what RouteID and the data it links through to means to the business. If route (or the line it represents) is a meaningful concept to the business in its own right then it probably requires it's own hub, satellites for data relating to it and then link tables to join it up to your call data.
On the other hand the data might only have meaning as categorisation of another hub (call in your case) in which case think about de-normalising it in to a satellite that connects to the call hub. Remember that you can have multiple satellites connected to one hub, there's nothing preventing you having a call route satellite, a connection type satellite and so on.
You'll need to make this decision for each foreign key and probably end up with different choices for each. Staff member receiving the call for example would almost certainly be a link to another hub as you almost certainly have other data you want to link staff to. Something like the connection type you mention is unlikely to be meaningful in its own right so is more likely to form part of a satellite.
I do not want to create an autogenerated key for my entities so I specify my own:
Entity employee = Entity.newBuilder().setKey(makeKey("Employee", "bobby"))
.addProperty(makeProperty("fname", makeValue("fname").setIndexed(false)))
.addProperty(makeProperty("lname", makeValue("lname").setIndexed(false)))
.build();
CommitRequest request = CommitRequest.newBuilder()
.setMode(CommitRequest.Mode.NON_TRANSACTIONAL)
.setMutation(Mutation.newBuilder().addInsert(employee))
.build();
datastore.commit(request);
When I check to see what the entity looks like it looks like this:
Why is this auto-generated key generated if I specified my own key (bobby)? It seems bobby was also created, but now I have bobby and this autogenerated key. What is the difference between the key and id/name?
You can't specify your own key, keys actually contain information necessary for the datastore operation. This note in the documentation gives you an idea:
Note: The URL-safe string looks cryptic, but it is not encrypted! It
can easily be decoded to recover the original entity's kind and
identifier:
key = Key(urlsafe=url_string)
kind_string = key.kind()
ident = key.id()
If you use such URL-safe keys, don't use sensitive data such as email
addresses as entity identifiers. (A possible solution would be to use
the MD5 hash of the sensitive data as the identifier. This stops third
parties, who can see the encrypted keys, from using them to harvest
email addresses, though it doesn't stop them from independently
generating their own hash of a known email address and using it to
check whether that address is present in the Datastore.)
What you can specify is the ID portion of the key, either as a number or as a string:
A key is a series of kind-ID pairs. You want to make sure each entity
has a key that is unique within its application and namespace. An
application can create an entity without specifying an ID; the
Datastore automatically generates a numeric ID. If an application
picks some IDs "by hand" and they're numeric and the application lets
the Datastore generate some IDs automatically, the Datastore might
choose some IDs that the application already used. To avoid, this, the
application should "reserve" the range of numbers it will use to
choose IDs (or use string IDs to avoid this issue entirely).
This is the url-safe version of your key, suitable for use in links. Use KeyFactory.stringToKey to convert it to an actual key, and you'll see that it contains your string name.
What you create with makeKey("Employee", "bobby") is a key for an Entity with the entity name Employee and the name bobby. What you see as Key in the datastore viewer is a representation for exactly that.
Generally speaking a key always consists of
optional parent key (with entity type and name/id)
entity type
entity name/id
Maybe someone here can tell you how to decode the key into its components but rest asured that you're doing everything right and the behavior is as expected.
Why do many applications replace the primary key of a database with a seemingly random alternative id when revealing the record to the user?
My guess is that it prevents users from guessing other rows in the table. If so, isn't that just false sense of security?
I guess you are talking about surrogate keys here. One of the desired or supposed advantages of surrogate keys is that they aren't burdened by any external meaning or dependency on anything outside the database. So for example the surrogate key values can safely be reassigned or the key can be refactored or discarded without any consequences for users of the system.
Generally surrogate keys are kept hidden from users so that they don't acquire any such external dependencies. Being hidden from users was in fact part of the original definition of a surrogate key as proposed by E.F.Codd. If key values reside in the user's browser cache or favourites list then they aren't much use as "surrogates" any more. So that's one common reason why you will see one key used only inside the database and a different key for the same table made visible in the application.
I think it may depend on the type of application you are working with. I work with Enterprise software that is only used by the company I work for and is not generally available to the outside world. In this case, it is often critical to let the user see the surrogate key for people-related records because the information in the person table has no uniqueness. There can be two John Smiths (we actually have over 1000 of them) who are genuinely different people. They may even have the same business address and be different people (Sons are often named for fathers and work in the same medical practice for instance). So they need to refer to the surrogate key on forms and in reporting to ensure they are using the record they thought they wanted. OItherwise if they wanted to research further details about the John Smith that they saw in a report, how would they look it up in the aaplication without having to go through all 1000 to find the right one? Creating a fake id as well as the real one would be time consuming (we import millions of records at a time) and for no real gain since the data would not be visible outside our comapny application.
For a web app that is open to the general public, I can see where you might not want to show this information.
I have a web app where I register users based on their email id.
From a design/ease of use/flexibility point of view, should I assign a unique number to each user or identify user based on emailid?
Advantage of assigning unique number:
I can change the login itself at a later point without losing the data of the user(flexible).
Disadvantage:
I have to deal with numbers when using the sql command line(error prone).
Which is better? Do you see any other issues that need to be considered for either scheme?
The identity of your users should be unique and immutable. Choosing the email address as identity is not a good idea for several reasons:
The email is one facet of the user's identity that can change at any point in time.
You might decide to allow more than one emails.
You might decide to add other facets, like OpenID or Live ID, or even just old plain username.
There's nothing wrong with allowing multiple identityies to share the same email facet. It is a rare scenario, but not unheard of.
Normalizing the email address is hard and error prone, so you might have problems enforcing the uniqueness. (Are email addresses case sensitive? Do you ignore . or + inside emails? How do you compare non-english emails?)
Btw, using the email as a public representation of the user identity can be a security and privacy problem. Especially if some of your users are under 13 years. You will need a different public facet for the user identity.
Use both.
You have to add an id because you really don't want other tables to use the email address as a foreign key.
Make the email address unique so that you can still use it to identify a user with sql command line.
Unique number - ALWAYS!
But keep the number hidden from the user.
The user should be allowed to change their email. If this is used as the primary identifier then it can cause lots of complications when the key is used in multiple tables.
You should have another identifier other then the users email address which is not visible to the user and never changes. You should then enforce uniqueness on the email address so it can be used as a candidate key.
You will find that users will want to change their email address, or anything really which they can see, so you should as good practice have an identifier which cannot be changed.
Dealing with numbers in sql command object would not really be any more error prone then using the actual email address, if anything I would think it would be less error prone.
Your disadvantage is not a disadvantage. Using numbers with sql is not more or less a problem than using emails or anything else for the matter.
On the other hand your advantage is quite a strong one, you might want to associate users with each other, different emails with one user account, etc. and always using the email will make things harder.
Think also of urls including user identication, an ID is much easier to handle there than an email where you have to think about the proper url endocing.
So in favour of flexiblity and ease of use, I would strongly recommend a unique userID.
Just some points to consider.
How will you validate the email address?
How do you ensure that it is really unique (I don't always use my real address e.g. m.mouse at disney.com
I like to use a unique key generated by the database to identify the record and then add attributes which are out of my control separately
A person's email can change but the id will not
Unique numbers. As well as the reasons identified, I think it would be less error prone than using an email address. Shorter, no funny characters, easier to validate, etc.