Generate unique id that is not easily geussed - uuid

I have a system where the users can view ids of rows in a database. i.e. https://www.example.com/user?id=1
Currently it's auto incremented so it's fairly easy to guess the ids. While I don't believe this exposes any risk, I've been advised it'd be best to make them less easy to geuss. Something like 913458ugfg444 or something like that.
Now, to generate an id like this, you'd need to make it and then confirm in your database that it is already not taken and then repeat the process if it is. I'm not a big fan of this since it requires additional resources.
Is there a way that I can generate unique ids without the chance of multiple try/catches on the database and that are not easily geussable?

Related

If I am having any random accountId then How to find its ultimate parent account - looking for best optimized solution (for multiple level hierarchy)

If I am having any random accountId then How to find its ultimate parent account - looking for best-optimized solution (for multiple level hierarchy)
except 10 levels of formula field solution
It depends. Optimized for what, read operations (instant simple answer when querying) or write (easy save but more work when reading).
If you want easy read - you need to put some effort when saving the data. And remember you can't get away with a simple custom lookup called "Ultimate Parent" - because for standalone account SF will not let you form a cycle, create record that looks up itself. You might need 2 text fields (Id and Name) or some convention that yes, you'll make a lookup to Account - but if it's blank - the reading process needs to check the ParentId field too to determine what exactly is going on. (you could make a formula field to simplify reading but still - don't think you're getting away with simple lookup)
How much data you have, how deep hierarchies? The basic answer is to keep track of ultimate parent on every insert, update, delete and undelete. Write a trigger, SOQL query can go "up" 5 "dots" max
SELECT ParentId,
Parent.ParentId,
Parent.Parent.ParentId,
Parent.Parent.Parent.ParentId,
Parent.Parent.Parent.Parent.ParentId,
Parent.Parent.Parent.Parent.Parent.ParentId
FROM Account
WHERE Id IN :trigger.new
It gets messier if you need multiple queries (but still, this form would be most effective). And also you might hit performance issues when something reparents close to top of the tree and you're suddenly looking at having to cascade update hundreds of accounts. Remember you have a limit of 10K rows inserted/updated/deleted in single operation. You might have to propagate the changes down as a batch/future/queueable async process.
Another option would be to have a flat helper object aside from account table, with unique id set to account id. Have a background process periodically refreshing that table, even every hour. Using a batch job or reporting snapshot. Still not great if you have milions of accounts, waste of storage... but maybe you could use Big Objects.
Have you ever used platform cache? If the ultimate parent has to be fetched via apex (instead of being a real field on Account) - you could try to make some kind of "linked list" implementation where you store Id -> ParentId in cache and can travel it without wasting any queries. Cache's max is 48h (so might still need a nightly job to rebuild it) and you'd still have to update it on every insert/update/delete/undelete...
So yeah, "it depends". Write more about your requirement.

CakePHP virtual HABTM table/relation, something like fixture

first of all I'd like to tell you that you're terrific audience.
I'm making an application where I have model Foo with table Foos. And I'd like to give Foo another parameter, HABTM parameter, lets say Bar. But I'd rather don't create table for Bar. Because Bar will have like 5 positions on start and in 5 years it will grow to maybe 7 positions or not at all. So I don't see a need to create another table and make CakePHP look into that table with another SELECT. Anyone have an idea this can be achieved ?
One solution I think is making an fixture for Bars table and adding only Bars_Foos table for real (it won't be big anyway). But I can't find a way to use test fixtures in normal Controller
Second solution is to save a JSON or serialized array in Foo one field and move logic to model, but I don't know if it is best solution. Something like virtual field.
Real life example:
So I have like Bikes. And every Bike have its main_type. Which is for now {"MTB","Road","Trekking","City","Downhill"}. I know that in long time this list would not grow much. Maybe 2 or 5 positions in few years. Still it will relatively short.
(For those who say that there maybe a hundred of specialized bike types. I have another parameter column specialized_type)
It needs to be a HABTM relation, but main_types table will be very small, so Id like to avoid creating it and find a way for simpler solution.
Because
It bothers MySQL for such small amount of data
It complicates MySQL queries
I have to make additional model for MainType
I have more models to unbind when I don't need most of data and would like use recursive
Insert here anything you'd like...
Judging from your real life example, I'd say you're on the wrong track. The queries won't be complicated, CakePHP uses additional queries for HABTM relations, it would be just one additional query which shouldn't be very costly, also it's very easy to sparse it out by using the containable behaviour. And if you really need to use recursive only (for whatever reason), then it's just one single additional model to unbind, that doesn't seem like overkill to me.
This might not be what you wanted to hear, but I really think a proper database solution is better than trying to hack in "virtual data". Also note that fixtures as used in tests, only define data which is written to the database on the fly when running the test, so that would be definitely more costly than using data that already exists in the database.
Maybe you'll get a small performance boost for selects that do not query the main type when using an additional column to store the data, but you'll definitely lose all the flexibility that the RDBMS has to offer, including faster selects using proper indexing, affecting multiple records by updating a single related value, etc. That doesn't sound like a good trade-off to me. Think about it, how would you select all Downhill Tracking bikes when this information is stored as a string in a single column? You would probably end up using ugly LIKE selects.
Now wait, there's a SET data type in MySQL hat can hold multiple values. Right, and it looks easier and less complex. Right, but in the background it isn't, while using a complex looking join-query can be pretty fast using proper indexing, the query for the SET type will have to scan every single row since the data stored in the column cannot be indexed appropriately in order to make more specific selects.
In the end it probably depends on your data, so I'd suggest testing both methods in your specific environment and see how they compare under workload.

When is it appropriate to use UUIDs for a web project?

I'm busy with the database design of a new project, and I'm not sure whether to use UUIDs or normal table-unique auto-increment ids.
Up to now, the sites I've built have all run on a single server, and very heavy traffic has never been too much of a concern. However, this web application will eventually run concurrently on multiple servers, serve an API, and need to process thousands of requests per second, and I want to make sure that the design I choose now doesn't cripple any of those possibilities later.
I have my suspicions, of course, and they should be clear through the way I phrased my question, but I would like to hear from those with more experience what trouble I can run into later if I do or don't have UUIDs, and what I should really be basing my decision on.
So, in short: What are the considerations I should give into deciding whether or not to use UUIDs for all database models, so that any one object can be identified uniquely by one string, and when is it appropriate to use this as the primary key, instead of table-by-table auto-increment?
Note: I've seen this question (When are you truly forced to use UUID as part of the design?), and read all the answers, but they mostly answer "How rarely do UUIDs collide", instead of "When is it appropriate to use them".
One consideration that I've used when deciding on UUIDs vs. auto-increment ids is whether they're going to be user-visible, and if so, whether I want users to know how many I have of that table. For example, if I didn't want to make public the number of registered users my site has, I wouldn't assign auto-increment user ids.
And to address one other specific point you raised, it's still possible to use auto-incrementing ids with multiple servers (though not with the built-in MySQL). You just need to start all the ids at different offsets, and increment accordingly. That is, if you had 3 servers, you could start server A at 1, server B at 2, and server C at 3, and then increment the ids by 10 each time instead of 1. That way, you could guarantee no collisions.
And finally, the last thing I consider is how important performance is to my application. Integers are much more easily indexed than UUIDs that are string-based, so indexes are smaller, more quickly searched, etc.
UUID's or GUID's can be very useful especially for the web. If you use auto-increment values to store UserId anyone can view the source of your web pages and see the simplicity of it's use. They could then try any integer value to get data they are not supposed to see.
GUID's are not created in any sequential format, therefore if you create them one right after the other, there sequence can not easily be guessed.
I don't think it's necessary to use GUID's for simple lookup type data such as ColorId 1=Blue, 2=Red, 3=Green.
GUID's are also very useful for session and state management.
That's my $0.02

Names of businesses keyed differently by different people

I have this table
tblStore
with these fields
storeID (autonumber)
storeName
locationOrBranch
and this table
tblPurchased
with these fields
purchasedID
storeID (foreign key)
itemDesc
In the case of stores that have more than one location, there is a problem when two people inadvertently key the same store location differently. For example, take Harrisburg Chevron. On some of its receipts it calls itself Harrisburg Chevron, some just say Chevron at the top, and under that, Harrisburg. One person may key it into tblStore as storeName Chevron, locationoOrBranch Harrisburg. Person2 may key it as storeName Harrisburg Chevron, locationOrBranch Harrisburg. What makes this bad is that the business's name is Harrisburg Chevron. It seems hard to make a rule (that would understandably cover all future opportunities for this error) to prevent people from doing this in the future.
Question 1) I'm thinking as the instances are found, an update query to change all records from one way to the other is the best way to fix it. Is this right?
Questions 2) What would be the best way to have originally set up the db to have avoided this?
Question 3) What can I do to make future after-the-fact corrections easier when this happens?
Thanks.
edit: I do understand that better business practices are the ideal prevention, but for question 2 I'm looking for any tips or tricks that people use that could help. And question 1 and 3 are important to me too.
This is not a database design issue.
This is an issue with the processes around using the database design.
The real question I have is why are users entering in stores ad-hoc? I can think of scenarios, but without knowing your situation it is hard to guess.
The normal solution is that the tblStore table is a lookup table only. Normally users only have access to stores that have already been entered.
Then there is a controlled process to maintain the tblStore table in a consistent manner. Only a few users would have access to this process.
Of course as I alluded to above this is not always possible, so you may need a different solution.
UPDATE:
Question #1: An update script is the best approach. The best way to do this is to have a copy of the database if possible, or a close copy if not, and test the script against this data. Once you have ensured that the script runs correctly, then you can run it against the real data.
If you have transactional integrity you should use that. Use "begin" before running the script and if the number of records is what you expect, and any other tests you devise (perhaps also scripted), then you can "commit"
Do not type in SQL against a live DB.
Question #3: I suggest your first line of attack is to create processes around the creation of new stores, but this may not be wiuthin your ambit.
The second is possibly to get proactive and identify and enter new stores (if this is the problem) before the users in the field need to do so. I don't know if this works inside your scenario.
Lastly if you had a script that merged "store1" into "store2" you can standardise on that as a way of reducing time and errors. You could even possibly build that into an admin only screen that automated merging stores.
That is all I can think of off the top of my head.

Default database IDs; system and user values

As part of our current database work, we are looking at a dealing with the process of updating databases.
A point which has been brought up recurrently, is that of dealing with system vs. user values; in our project user and system vals are stored together. For example...
We have a list of templates.
1, <system template>
2, <system template>
3, <system template>
These are mapped in the app to an enum (1, 2, 3)
Then a user comes in and adds...
4, <user template>
...and...
5, <user template>
Then.. we issue an upgrade.. and insert as part of our upgrade scripts...
<new id> [6], <new system template>
THEN!!... we find a bug in the new system template and need to update it... The problem is how? We cannot update record using ID6 (as we may have inserted it as 9, or 999, so we have to identify the record using some other mechanism)
So, we've come to two possible solutions for this.
In the red corner (speed)....
We simply start user Ids at 5000 (or some other value) and test data at 10000 (or some other value). This would allow us to make modifications to system values and test them up to the lower limit of the next ID range.
Advantage...Quick and easy to implement,
Disadvantage... could run out of values if we don't choose a big enough range!
In the blue corner (scalability)...
We store, system and user data separately, use GUIDs as Ids and merge the two lists using a view.
Advantage...Scalable..No limits w/regard to DB size.
Disadvantage.. More complicated to implement. (many to one updatable views etc.)
I plump squarely for the first option, but looking for some ammo to back me up!
Does anyone have any thoughts on these approaches, or even one(s) that we've missed?
I have never had problems (performance or development - TDD & unit testing included) using GUIDs as the ID for my databases, and I've worked on some pretty big ones. Have a look here, here and here if you want to find out more about using GUIDs (and the potential GOTCHAS involved) as your primary keys - but I can't recommend it highly enough since moving data around safely and DB synchronisation becomes as easy as brushing your teeth in the morning :-)
For your question above, I would either recommend a third column (if possible) that indicates whether or not the template is user or system based, or you can at the very least generate GUIDs for system templates as you insert them and keep a list of those on hand, so that if you need to update the template, you can just target that same GUID in your DEV, UAT and /or PRODUCTION databases without fear of overwriting other templates. The third column would come in handy though for selecting all system or user templates at will, without the need to seperate them into two tables (this is overkill IMHO).
I hope that helps,
Rob G
I recommend using the second with the modification that you store the system and user values in one table. GUID is quite reliable in this manner.
Another idea: use any text-based ID (not necessary GUID), which you give for the system values and is generated by a random string or a string based on some kind of custom logic for the user values.
Another idea: use the first approach, but extend the table with a flag which shows if a value is system or user. Maybe this is the easiest. Ok, you have to write some kind of mechanism to update the correct system value, but it can be done easily.
+1 for Biri's text based ID - define a "template_mnemonic" text based column and make it the primary key. This will be a known value when you insert it as you, the developers will have decided on it (or auto-generated it) and you will always be able to reference a template by its mnemonic regardless of how many user specified templates there are. It also allows users to have a meaningful naming convention for their templates.
Maybe I didn't get it, but couldn't you use GUIDs as Ids and still have user and system data together? Then you can access the system data by the (non-changable) GUIDs.
I don't think that GUID should make any problem.
If you want to avoid it, then use a flag:
ID int
template whatever
flag enum/int/bool
Flag shows whether the actual value is a system or a user value.
If you would like to update a system value, then ask only for system values ordered by ID, and it will show you actual order of insertion (you should have a bigint or something for ID to make sure that it doesn't get full and it doesn't get the deleted IDs back to work). With this list the x. record is the x. inserted system value.
I think there is a better third solution.
It strikes me that you're storing two different things in the same table and that you might be better off creating 2 separate tables one for user templates and one for system templates. You might then be able to create a view over the two tables to make them appear as a single object to your application.
Obviously I don't have full knowledge of your application and this may be impossible for you for any number of reasons but I think it's a neater solution than GUIDs and way safer than ranges of IDs (seriously don't do ID ranges it'll bite you one day)

Resources