How to create database table in Google App Engine
You don't. You create Entities of different kinds. Datastore is not a relational database[*].
If you want to imagine that GAE creates one "table" for each kind, the "columns" of that "table" being the properties of the entities, then you're welcome to do so. But I don't think it helps.
[*] I don't know whether it meets some technical definition, but it certainly doesn't drive like SQL-based databases.
According to http://code.google.com/appengine/docs/python/datastore/
App Engine Datastore is a schemaless object datastore providing
robust, scalable storage for your web application, with the following
features:
No planned downtime
Atomic transactions
High availability of reads and writes
Strong consistency for reads and ancestor queries
Eventual consistency for all other queries
The Python Datastore interface includes a rich data modeling API and a SQL-like query language called GQL.
In simple words just create you model class, create an object of this class and after first call of put() method for this object the "table"(I think the term here is kind) will be created on the fly. But you definitely have to read the documentation and check some examples. The will help you to understand the specifics of Google Datastore and how it differs from the common RDBMS
In simple words, i would say that with Google BigTable you don't need to create your tables because there are already six Big Tables ready to store whatever you want.
Related
Let's say that we have the following three entities:
Organization
- id
Role
- id
Member
- id
A Role can be granted to a Member within an Organization, thus giving that Member certain access control rights to that Organization. I'd like to be able to answer the following two queries:
List the IDs of all the members who have a given Role within a given Organization (e.g. given a Role ID and Org ID give me the list of Members).
List all of the IDs of the Roles that a member has been granted within a given Organization (e.g. given a Member ID and Org ID give me the list of Roles).
I'm trying to find recommendations on how to model this in Bigtable (ideally with a single row for atomic mutations)... I'm also open to other technology recommendations (I'm trying to design within the constrains my company has given me).
If we model the relationship described above using the Bigtable row key org#{orgID}#role#{roleID}#member#{memberID}, I can easily answer the first question. However, it doesn't allow me to easily answer the second question. If I duplicate data and store another row key org#{orgID}#member#{memberID}#role#{roleID} then I can easily answer the second question, but now I have two rows to manage and atomic updates cannot be guaranteed between the two, so that may lead to consistency issues.
Has anyone in the community ran into a similar problem, and if so, how did you solve it?
Cloud Bigtable doesn't natively support secondary indexes, which is what you would need to only need a single row and be able to efficiently run both of those queries without requiring a full table scan. The alternative to that that you've already identified would be to write two rows via a process that would ensure eventual consistency. This might be sufficient for your needs depending on the underlying requirements of your system.
Depending on your constraints (cloud provider, data scale, atomicity, multi-region replication, etc.), you might be better served with a standard relational database (e.g. Postgres, MySQL), or Google Cloud Spanner.
Possible approaches with Spanner to accomplish this:
Have a single table that represents a a Member <-> Role relationship. Have RoleID being the primary index for the row, and then add a Secondary Index for MemberID and you'd be able to run queries against either.
Go the traditional relational database route of having Member, Role and MemberRole joining table. With Spanner you should have atomic updates via a Transaction. When querying you could potentially have issues with reads going across multiple splits, but you'd have to do some real world testing to see what your performance would be like.
Disclosures:
I lead product management for Cloud Bigtable.
I co-founded the JanusGraph project.
Reading through your problem statement, i sounds like you want to use either a relational database, or a graph database. Each one will have its own pros/cons.
Relational DBMS approach
As Dan mentioned in his answer, you can use a managed MySQL or PostgreSQL via Google Cloud SQL, or Google Cloud Spanner, depending on your needs for scale, replication, consistency, compatibility with existing code/frameworks, etc.
Graph database approach
Alternatively, you can use a graph database which can help you model this information easily and query it efficiently.
For example, you can deploy Janusgraph on GKE with Bigtable and Elasticsearch and query the data using the Gremlin language, which is a standard graph traversal/query language supported by many graph databases.
Note that JanusGraph + Bigtable inherits the transactionality of Bigtable (which as you noted, is row-level atomic). Since JanusGraph stores each vertex in a separate row in Bigtable, only single-vertex updates will be atomic. If you want transactional updates via JanusGraph, you may need to use a different storage backend, e.g.,
BerkeleyDB (local, non-distributed storage backend)
FoundationDB (recent contribution by the JanusGraph community)
There are many other graph databases you can consider, some of which also support Gremlin or other graph query languages. For example, you can deploy Neo4j on GCP if you prefer, which supports Gremlin as well as Cypher.
I'm building a web app in GAE that needs to make use of some simple relationships between the datastore entities. Additionally, I want to do what I can from the outset to make import and exportability easier, and to reduce development time to migrate the application to another platform.
I can see two possible ways of handling relationships between entities in the datastore:
Including the key (or ID) of the related entity as a field in the entity
OR
Creating a unique identifier as an application-defined field of an entity to allow other entities to refer to it
The latter is less integrated with GAE, and requires some kind of mechanism to ensure the unique identifier is in fact unique (which in turn will rely on ancestor queries).
However, the latter may make data portability easier. For example, if entities are created on a local machine they can be uploaded (provided the unique identifier is unique) without problem. By contrast, relying on the GAE defined ID will not work as the ID will not be consistent from the development to the deployed environment.
There may be data exportability considerations too that mean an application-defined unique identifier is preferable.
What is the best way of doing this?
GAE's datastore just doesn't export well to SQL. There's often situations where data needs to be modeled very differently on GAE to support certain queries, ie many-to-many relationships. Denormalizing is also the right way to support some queries on GAE's datastore. Ancestor relationships are something that don't exist in the SQL world.
In order to import export data, you'll need to write scripts specific to your data models.
If you're planning for compatibility with SQL, use CloudSQL instead of the datastore.
In terms of moving data between dev/production, you've already identified the ways to do it. There's no real "easy" way.
I'm working on a project that need to run on App Engine and other Java Application Server. In App Engine we use datastore, and in other environment we will use traditional relational database (mostly MySQL).
I want know if it's possible that "have one JDO/JPA model that works on both".
If it's possible. How? Specifically, how do we handle the Key? Datastore required us to use it's own Key object or using "Key as encoded string", how do we port those keys to relational database.
If not, what would be the best practice? The idea we have right now is define abstract DAO, and have two set of DAO implementations. I believe the best way is using Objectify for datastore and JPA for relational database. But that way we could not leverage GWT RequestFactory (another technology we are using). Or can we?
Clearly JDO is designed to work on all datastores, whether RDBMS, ODBMS, document, map-based, web-based, document-based, file-based ... blah blah. Yes such portability is realistic. If you don't want portability you could use Objectify, but you say you want portability so that's not an option (so no idea why you think its the "best way"). You can use a String as PK in all datastores.
I don't know about GAE but I know JDO should be datastore independent so you can map your classes using JDO annotations and make sure while you are doing that, you aren't using any RDBMS based extensions (i.e. Datanucleus), i'm not sure if there are such extensions in the first place.
For keys, well obviously you shouldn't use GAE's but again, I'm not sure if it's a must or not.
I find it really hard to match the same "persistence" model on both a relational database and hierarchical database (the datastore here) since most of the time it requires thinking/structuring your data in a different way.
For example, you might need to duplicate data accross many entities in order to be able to run queries on it with the datastore.
From the few you said about your project, if you need to have it both in Google App Engine AND traditional servers (tomcat, JBOSS, WebSphere, whatever...) I would use Google Cloud SQL to keep my data model the same...
Or if you need a hierarchical database in both cases, install an open source one with your "traditional" servers...
What kind of projects are we talking about in the first place ? :)
First off, I come from a RDBMS/SQL/C++/Java/Python background and I'm a newbie
to Gaelyk, the Google API and the Google datastore.
I like to model (using flowcharts for code and DB modeling tools for the database)
before I code.
I've used Erwin heavily in the past to do DB modeling.
In Erwin, I've designed a logical / physical data model of a database I'd like to
implement using the Google datastore and Gaelyk with the Google AppEngine SDK.
I wanted to design the data layout before coding anything.
My design tool of choice has been Erwin Data Modeler.
When I looked at the Google datastore, I saw that there
are no relational constraints, and joins are done via
WHERE clause :bind variables.
How can I map my existing model (with PKs/FKs, dependent entities, heavy relational links) to the Google datastore?
Is there a modeling tool that will allow me to design for the Google datastore?
Is the DB design supposed to flow from the Gaelyk MVC pattern and direct coding?
I'm not used to this as I come from an RDBMS background where you model heavily
and all good things come from good relational design.
Also, before coding a database client app in an imperative language (C++, C, Java, Python),
I like to write pseudocode, BUT first and foremost comes the DB design (if the app
has a DB back-end)
Am I doing this all wrong? It looks like there's a set of tools available to me
to start coding, but the design tool set is not there.
Addendum:
Here is the logical model I'm trying to map
How would I map a circular relationship
account --(1:m)-- following --(m:1)-- following_account_id --(1:1)-- account_id?
In general, the guiding principle of the App Engine datastore - and all nonrelational databases - is "optimize for reads". In short that means denormalize, denormalize, denormalize. In some cases, that will make updates harder - for example, if you make your username the primary key of your accounts table, and a user wants to change usernames - and in some cases that will require duplicating data, such as storing persistent counts. All of this is worthwhile, though, since it gives much better read performance and scalability, and in a typical webapp, reads outnumber writes by factors of hundreds to one.
Looking at your model in particular, it's very normalized - more so than most RDBMS models I've seen, even. Some suggestions:
Roll up things like 'user_name_id' into your main accounts table.
For things like 'following', use a list property if the number of people someone follows is typically small (<1000), or the fan-out pattern otherwise.
Pick a reasonable primary key for each table where practical, such as username or email, and use that as a key name. This allows looking up records with get operations instead of queries, which are substantially faster.
When a lookup table such as 'account type' is necessary, make sure the foreign key is sufficiently descriptive you only have to look up the corresponding record for administrative actions. Better, store small, infrequently changing details like this outside the datastore, so they can be accessed instantly.
For things like tags, use list properties to reduce the number of times you have to lookup related entities, and to make indexing easier.
This only scratches the surface, of course, and there's a lot of collected wisdom here on SO, in the groups, and on blogs like mine. Feel free to come back and ask specific questions about data modelling!
To answer your other questions, no, there are no GAE-specific data modelling tools I'm aware of, but you can use a standard diagramming tool as you already are. Models are indeed defined in code, since the datastore is schemaless, but that doesn't have to be a barrier to the order in which you implement things.
I know that app engine is implemented on big table, can anyone describe the difference between actual implementation of big table and google's implementation of big table .i.e (App engine)
Bigtable provides a basic key/value store, described in the paper here. Values are stored in rows and columns. Row and column keys are arbitrary byte strings. For more details see the paper. The basic operations Bigtable provides are lookups on individual row and column keys, and ranges of rows.
On top of Bigtable, there's an abstraction layer called Megastore. Megastore uses the bigtable primitives to construct a more versatile database platform. It adds indexing - using separate bigtables as indexes - and queries using those indexes. It also adds replication support. It's Megastore that provides most of what we think of as the App Engine datastore, such as composite indexes and the variety of queries the datastore provides.
Finally, App Engine implements a few things of its own on top of Megastore, such as the format of App Engine entity keys, giving each app its own datastore, and implementing certain operations like 'IN' and '!=' in an abstraction layer in each language's SDK.