alternative to GAE Keys for mySQL? - google-app-engine

wondering if the GAE keys (com.google.appengine.api.datastore.Key) can be used with local mysql apps? I presume it's not possible, so if I define my primary keys in my models as longs, do I lose too much of the key functionality, like the KeyService and querying using keys ?
Thanks

No, you do not lose functionality. You can still query using the long keys and the performance is the same.
The 'Key' datastore type allows you to create keys based on a string. For example, you might want to create the key based on the user's email address. You won't be able to have this functionality with long keys.
But in most cases you do not need this option.

Related

ASP.NET Core: how to hide database ids?

Maybe this has been asked a lot, but I can't find a comprehensive post about it.
Q: What are the options when you don't want to pass the ids from database to the frontend? You don't want the user to be able to see how many records are in your database.
What I found/heard so far:
Encrypt and decrypt the Id on backend
Use a GUID instead of a numeric auto-incremented Id as PK
Use a GUID together with an auto-incremented Id as PK
Q: Do you know any other or do you have experience with any of these? What are the performance and technical issues? Please provide documentation and blog posts on this topic if you know any.
Two things:
The sheer existence of an id doesn't tell you anything about how many records are in a database. Even if the id is something like 10, that doesn't mean there's only 10 records; it's just likely the tenth that was created.
Exposing ids has nothing to do with security, one way or another. Ids only have a meaning in the context of the database table they reside in. Therefore, in order to discern anything based on an id, the user would have to have access directly to your database. If that's the case, you've got far more issues than whether or not you exposed an id.
If users shouldn't be able to access certain ids, such as perhaps an edit page, where an id is passed as part of the URL, then you control that via row-level access policies, not by obfuscating or attempting to hide the id. Security by obscurity is not security.
That said, if you're just totally against the idea of sequential ids, then use GUIDs. There is no performance impact to using GUIDs. It's still a clustered index, just as any other primary key. They take up more space than something like an int, obviously, but we're talking a difference of 12 bytes per id - hardly anything to worry about with today's storage.

Google Cloud Datastore unique autogenerated ids

I'm using Google Cloud Datastore and using namespaces to partition data. Some kinds are using autogenerated IDs from Cloud Datastore by creating keys like this:
var key = Datastore.key([
'example'
]);
This code will generate a key with kind 'example' and Cloud Datastore automatically assign the entity an integer numeric ID. (Source: https://cloud.google.com/datastore/docs/concepts/entities#kinds_and_identifiers)
But this "unique" ID is only unique for its namespace. I have seen the same ID in different namespaces.
So my question is, is it possible to tell Cloud Datastore that autogenerated IDs must be unique for all the namespaces?
Maybe this question has not sense, but I would prefer to have unique IDs in all the datastore (if possible).
I have seen "allocateIds" function in Cloud Datastore documentation, but I would like to know if this function take care about namespaces or not, because I've seen I can include them in the request and I'm afraid the IDs are the same than the ones autogenerated by Cloud Datastore.
Thank you in advance!
No: You can not tell Datastore to allocate unique IDs across all entity groups and namespaces.
However there is an easy fix: if you believe in statistics and correctly seeded random number generators you be generally better off if you generate your own GUIDs for keys.
If you don't believe in statistics and random numbers you can still generate a GUID and transactionally verify that it doesn't exist in your Datastore before writing the entity in question.
If you are truly desperate to have Datastore do id allocation for you it is possible to make a call AllocateIds manually and ask it to allocate an id for a constant key. (For example, ask it to allocate for an arbitrary (but unchanging) key in the default namespace, and it will return you an integer which will be unique to use somewhere else).

Guid(Random ints) vs regular int primary key for chat application

I am not asking a general "guid vs ints which is better" question. I know that, or atleast think I know that they both have specific scenarios where one is better than the other. But I do have a specific scenario where I am writing a chat application. I started the application with all my primary keys being ints. But after doing a bit of research I realized that I might be having a security flaw in the application as I am passing the primary key to a user's browser for different entity requests such as chatrooms, talkers and such. One would agree that a subsequent primary key could be guessed and hence it is possible for users to alter other users sessions and states. In trying to prevent this, I was considering using guids as my primary key but then another thought hit me. A thought about performance. Since this is a chat application, is using a guid really going to cost me performance-wise if the application comes under heavy load, say 10s or 100s of thousands of users simultaneously for e.g. I am not really informed enough to make a choice here. My question in short is, do I need to change my primary keys to guids to help make guessing primary keys harder or is there some other way that I could secure my application without resorting to guids. If the question is not clear enough, let me know and I will clarify, thanks.
No you don't need to change your primary key to GUIDs. This will not necessarily give you any additional security and it is arguably security by obscurity.
Using GUIDs makes it infeasible for an attacker to guess another user's ID, or a chatroom's unique key. However, it doesn't stop them seeing the IDs if they're transmitted and then being able to perform whatever action you're currently worried about.
Instead, when a user makes a request on the database you must validate whether they are allowed access to that data. For example, they should only be able to retrieve their own user information, or basic user information about other users (such as handle, but not e-mail address).

Can someone explain how do we implement simple key/value stores with mysql?

http://royal.pingdom.com/2010/06/18/the-software-behind-facebook/:
Facebook uses MySQL, but primarily as
a key-value persistent storage, moving
joins and logic onto the web servers
since optimizations are easier to
perform there (on the “other side” of
the Memcached layer).
Can someone explain how do we implement simple key/value stores with mysql? Is it simply a table with bigint as primary key + a single column of LONGTEXT ?
The starting point should really be "is your data relational?"
If so, use a relational db!
Key-value is a great solution for non-relational data, but if your data is relational, use SQL and be done with it.
To answer your first question, yes, a key/value store is just that, you store a key and a value associated with that key. And you query based on the key.
The big advantages you get from this is,
Scalability. You can now easily distribute your data across many(thousands) machines. This is something traditional RDBMS is not good at, joins and acid guarantees across many machines is either impossible or very very slow.
Facebook also have a lot of data that doesn't fit the relation model that ordinary RDBMS uses, namely graphs. That means they query/store/handle the graph nature of the data themselves instead of handling with SQL.
The cost of doing it that way is complexity, and often you have to give up a few points of ACID properties.
The rest of us, that's not facebook/google/linkedin/etc. that only need to handle sites with up to just a few million users can usually just stick to using a traditional database.

GUIDs as Primary Keys - Offline OLTP

We are working on designing an application that is typically OLTP (think: purchasing system). However, this one in particular has the need that some users will be offline, so they need to be able to download the DB to their machine, work on it, and then sync back once they're on the LAN.
I would like to note that I know this has been done before, I just don't have experience with this particular model.
One idea I thought about was using GUIDs as table keys. So for example, a Purchase Order would not have a number (auto-numeric) but a GUID instead, so that every offline client can generate those, and I don't have clashes when I connect back to the DB.
Is this a bad idea for some reason?
Will access to these tables through the GUID key be slow?
Have you had experience with these type of systems? How have you solved this problem?
Thanks!
Daniel
Using Guids as primary keys is acceptable and is considered a fairly standard practice for the same reasons that you are considering them. They can be overused which can make things a bit tedious to debug and manage, so try to keep them out of code tables and other reference data if at all possible.
The thing that you have to concern yourself with is the human readable identifier. Guids cannot be exchanged by people - can you imagine trying to confirm your order number over the phone if it is a guid? So in an offline scenario you may still have to generate something - like a publisher (workstation/user) id and some sequence number, so the order number may be 123-5678 -.
However this may not satisfy business requirements of having a sequential number. In fact regulatory requirements can be and influence - some regulations (SOX maybe) require that invoice numbers are sequential. In such cases it may be neccessary to generate a sort of proforma number which is fixed up later when the systems synchronise. You may land up with tables having OrderId (Guid), OrderNo (int), ProformaOrderNo (varchar) - some complexity may creep in.
At least having guids as primary keys means that you don't have to do a whole lot of cascading updates when the sync does eventually happen - you simply update the human readable number.
#SqlMenace
There are other problems with GUIDs, you see GUIDs are not sequential, so inserts will be scattered all over the place, this causes page splits and index fragmentation
Not true. Primary key != clustered index.
If the clustered index is another column ("inserted_on" springs to mind) then the inserts will be sequential and no page splits or excessive fragmentation will occur.
This is a perfectly good use of GUIDs. The only draw backs would be a slight complexity in working with GUIDs over INTs and the slight size difference (16 bytes vs 4 bytes).
I don't think either of those are a big deal.
Will access to these tables through
the GUID key be slow?
There are other problems with GUIDs, you see GUIDs are not sequential, so inserts will be scattered all over the place, this causes page splits and index fragmentation
In SQL Server 2005 MS introduced NEWSEQUENTIALID() to fix this, the only problem for you might be that you can only use NEWSEQUENTIALID as a default value in a table
You're correct that this is an old problem, and it has two canonical solutions:
Use unique identifiers as the primary key. Note that if you're concerned about readability you can roll your own unique identifier instead of using a GUID. A unique identifier will use information about the date and the machine to generate a unique value.
Use a composite key of 'Actor' + identifier. Every user gets a numeric actor ID, and the keys of newly inserted rows use the actor ID as well as the next available identifier. So if two actors both insert a new row with ID "100", the primary key constraint will not be violated.
Personally, I prefer the first approach, as I think composite keys are really tedious as foreign keys. I think the human readability complaint is overstated -- end-users shouldn't have to know anything about your keys, anyways!
Make sure to utilize guid.comb - takes care of the indexing stuff. If you are dealing with performance issues after that then you will be, in short order, an expert on scaling.
Another reason to use GUIDs is to enable database refactoring. Say you decide to apply polymorphism or inheritance or whatever to your Customers entity. You now want Customers and Employees to derive from Person and have them share a table. Having really unique identifiers makes data migration simple. There are no sequences or integer identity fields to fight with.
I'm just going to point you to What are the performance improvement of Sequential Guid over standard Guid?, which covers the GUID talk.
For human readability, consider assigning machine IDs and then using sequential numbers from those machines as a possibility. This will require managing the assignment of machine IDs, though. Could be done in one or two columns.
I'm personally fond of the SGUID answer, though.
Guids will certainly be slower (and use more memory) than standard integer keys, but whether or not that is an issue will depend on the type of load your system will see. Depending on your backend DB there may be issues with indexing guid fields.
Using guids simplifies a whole class of problems, but you pay for it part will performance and also debuggability - typing guids into those test queries will get old real fast!
The backend will be SQL Server 2005
Frontend / Application Logic will be .Net
Besides GUIDs, can you think of other ways to resolve the "merge" that happens when the offline computer syncs the new data back into the central database?
I mean, if the keys are INTs, i'll have to renumber everything when importing basically. GUIDs will spare me of that.
Using GUIDs saved us a lot of work when we had to merge two databases into one.
If your database is small enough to download to a laptop and work with it offline, you probably don't need to worry too much about the performance differences between ints and Guids. But do not underestimate how useful ints are when developing and troubleshooting a system! You will probably need to come up with some fairly complex import/synch logic regardless of whether or not you are using Guids, so they might not help as much as you think.
#Simon,
You raise very good points. I was already thinking about the "temporary" "human-readable" numbers i'd generate while offline, that i'd recreate on sync. But i wanted to avoid doing with with foreign keys, etc.
i would start to look at SQL Server Compact Edition for this! It helps with all of your issues.
Data Storage Architecture with SQL Server 2005 Compact Edition
It specifically designed for
Field force applications (FFAs). FFAs
usually share one or more of the
following attributes
They allow the user to perform their
job functions while disconnected from
the back-end network—on-site at a
client location, on the road, in an
airport, or from home.
FFAs are usually designed for
occasional connectivity, meaning that
when users are running the client
application, they do not need to have
a network connection of any kind. FFAs
often involve multiple clients that
can concurrently access and use data
from the back-end database, both in a
connected and disconnected mode.
FFAs must be able to replicate data
from the back-end database to the
client databases for offline support.
They also need to be able to replicate
modified, added, or deleted data
records from the client to the server
when the application is able to
connect to the network
First thought that comes to mind: Hasn't MS designed the DataSet and DataAdapter model to support scenarios like this?
I believe I read that MS changed their ADO recordset model to the current DataSet model so it works great offline too. And there's also this Sync Services for ADO.NET
I believe I have seen code that utilizes the DataSet model which also uses foreign keys and they still sync perfectly when using the DataAdapter. Havn't try out the Sync Services though but I think you might be able to benefit from that too.
Hope this helps.
#Portman By default PK == Clustered Index, creating a primary key constraint will automatically create a clustered index, you need to specify non clustered if you don't want it clustered.

Resources