Parent/Child key relationship design with app-engine data store - database

I am implementing a simple application for expenses reporting.The application will use GAE. In my application I have several entities (Classes) like Year, Month,Day, Expenses, Account and so on. The picture is as follow: a user can create an Account, then start to declare expenses with a simple form. The expenses are stored in GAE Datasotre. Every Year has Months, every month has Days and every day has a declared Expenses.the problem is that i don't know how to arrange theses entities in the non-relational database of GAE. I read several tutorial and articles from Google Developers website, but still don't understand the concept of Parent/child relationship and groups of entities. Anyone can help with some tutorial,video, articles or books on how to design the relationship and store your entities in a Non-relations database like GAE Data store. thanks in advance. I forget to mention that I would like to use GAE low-level data store.

If you are using java, I would suggest using objectify. It's just so much easier than JPA, for me at least.
You are paying by the read and write, so if for instance you can fit all of the data for a month in 1mb, then I would not have a separate entity for day. Anyway, I don't understand your requirements like why year has to be an entity and can just be a property that you filter by. I would actually think about just having a Day entity with Year, and Month properties to filter by.
http://code.google.com/p/objectify-appengine/wiki/IntroductionToObjectify#Relationships

In MongoDb you would have "embeded" documents. I dont know if GAE is as evolved as MongoDB - I suspect not. Perhaps you should look at another better documented NOSQL database if you are having problems with documetation at this stage. I'd have a look at the MongoDB site anyway, so if you ahve a background in SQL, you can see the mapping in terminology between the two cultures. Of course any NOSQL database is inherantly non transactional, so when the app develops to track expense payments, there may be some insuperable issues later.

Related

How to design Firebase database for Booking app?

I am new to databases. In learning purposes I'm creating a simple booking app (with React-Redux) that would have tennis courts to book. First, you choose day and time, then you can see the exact courts (the courts are different) available for that date and time.
I was reading docs in Firebase and answers to similar questions in SO but I'm still confused.
Could you tell me, how can I structure my Firebase database and query only available courts for each date and time, or at least, what should I study to be able to make it by myself?
Your question does not have anything to do with Firebase platform necessarily.
Since you mentioned that you are new to working with databases in general, It is good for you to search for "How to design a schemaless/NoSql database". There are tons of material online for you to read. Firebase has offered two separate solutions when it comes to databases:
Firestore
Realtime database
and both of them are NoSql databases.
here is a list of things I think you have to do in order to enable yourself in this matter:
Understand what are schemaless databases and their differences with SQL databases.
Come up with a structure for your data which represent the flow of data and the use frequency of them properly in case of your application , including the indexes and etc.
Pick one of the two aforementioned databases of Firebase which suits you best after realizing how they differ from each other.
The rest is easy as eating a piece of cake!

Database structure for multi-users web application

I'm undertaking a project with a learning purpose. Since this project is compelling to me because of its topic I want to build good foundations and maybe put it live eventual.
Since my project is quite complex, to explain you what my question is I'm gonna use a fiction project that is an agenda application.
This web application will have a calendar where the user can add events and reminders.It will be used by, lets say, 10,000 users and those 10,000 users will add thousands of events and reminders.
My question is which of the two methods would you recommend related to database structure?
Should I create a separate database with reminders and events tables for each user (on user creation) and relate the databases to a user in a separate database
or should I make one table for events, one for reminders and one for users and relate them to one another in a single database?
I haven't done any multi-user web applications so far and I am not familiar with database structures approach when it comes to many users. Please if there are any design patterns that you think of, I would appreciate sharing :)
Here's my opinion:
No, you should not create a separate database for each user. It can't scale. It means that every time you add a user, you have to create a new database? Never.
One database, multiple users - that's what relational databases are born for.
10,000 users is not that large an audience. Each creating thousands of events and reminders would mean 10M events, 10M reminders. That's not considered a large relational database.
You may need to worry about partitioning and purging old records. What kind of policy will you have in place for keeping those events and reminders? What access will users have after a year? Five years? Ten years? Those would be good topics to think about, too.
Get a good book about entity/relationship modeling and read it carefully. Anything modern on Amazon will do.
I used to work with a database where each user data was held in a separate database (your option 1) and believe me it was a nightmare to work with and the company spent enormous amount of resources to consolidate all these databases to one single database and it was not an easy task.
As #duffymo stated one database/multiple users that's what relational databases are for.

Graph Database Design Methodologies

I want to use a graph database for a web application (involving a web of Users, Posts, Comments, Votes, Answers, Documents and Document-Merges and some other transitive relationships on Users and Documents). So I start asking myself if there is something like a design methodology for Graph Databases, i.e. a kind of analogon to the design principles recommended for Relational Databases (like those normal forms)?
Example questions (of many questions arising):
Is it a good idea, to create a Top-Node Users, having relationships ("exist") on any User-Node in the Database?
Is it a good idea to build in version management (i.e. create relationships (something like "follows")) pointing to updated versions of a Document / Post in a way that going back this relationship means watching the changes the document went through.
etc...
So, do we need a Graph Database Design Cookbook?
The Gremlin User Group (http://tinkerpop.com/) and Neo4j User Group (https://groups.google.com/forum/?fromgroups#!forum/neo4j) are good places to discuss graph-database modeling.
You can create supernodes such as "Users," but it may be better and more performant to use indexes and create an index entry for each user with a key=element_type, value="user", id=user_node_id.
A "follows" relation is often used for people/friends like on Facebook and Twitter so I wouldn't use that for versioning. You can build a versioning system into to Neo4j that timestamps each entry and use a last-write wins algorithm, and there are other database systems like Datomic that have this built in.
See Lightbulb's model (https://github.com/espeed/lightbulb/blob/master/lightbulb/model.py) for an example blog model in Bulbs/Python (http://bulbflow.com).

Architecture for database analytics

We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may want to know how many queries we had in the last month, weekly compared, etc... that is the order of billions entries if not more.
The way it is currently done is quite standard: daily scripts which scan the databases, and generate big CSV files. I don't like this solutions for several reasons:
as typical with those kinds of scripts, they fall into the write-once and never-touched-again category
tracking things in "real-time" is necessary (we have separate toolset to query the last few hours ATM).
this is slow and non-"agile"
Although I have some experience in dealing with huge datasets for scientific usage, I am a complete beginner as far as traditional RDBM go. It seems that using column-oriented database for analytics could be a solution (the analytics don't need most of the data we have in the app database), but I would like to know what other options are available for this kind of issues.
You will want to google Star Schema. The basic idea is to model a special data warehouse / OLAP instance of your existing OLTP system in a way that is optimized to provided the type of aggregations you describe. This instance will be comprised of facts and dimensions.
In the example below, sales 'facts' are modeled to provide analytics based on customer, store, product, time and other 'dimensions'.
You will find Microsoft's Adventure Works sample databases instructive, in that they provide both the OLTP and OLAP schemas along with representative data.
There are special db's for analytics like Greenplum, Aster data, Vertica, Netezza, Infobright and others. You can read about those db's on this site: http://www.dbms2.com/
The canonical handbook on Star-Schema style data warehouses is Raplh Kimball's "The Data Warehouse Toolkit" (there's also the "Clickstream Data Warehousing" in the same series, but this is from 2002 I think, and somewhat dated, I think that if there's a new version of the Kimball book it might serve you better. If you google for "web analytics data warehouse" there are a bunch of sample schema available to download & study.
On the other hand, a lot of the no-sql that happens in real life is based around mining clickstream data, so it might be worth see what the Hadoop/Cassandra/[latest-cool-thing] community has in the way of case studies to see if your use case matches well with what they can do.

What is couchdb, for what and how should I use it?

I hear a lot about couchdb, but after reading some documents about it, I still don't get why to use it and how.
Could you clarify this mystery for me?
It's a non-relational database, open-source, distributed (incremental, bidirectional replication), schema-free. A CouchDB database is a collection of documents; each document is a bunch of string "keys" and corresponding "values" (which can be numbers, strings, lists, dates, ...). You can have indices, queries, views.
If a relational DB feels confining to you (you find schemas too rigid, can't spread the DB engine work around a very large numbers of servers, etc), CouchDB is worth considering (it's one of the most interesting of the many non-relational DBs that are emerging these days).
But if all of your work happily fits in a relational database, that's what you probably want to continue using for production work (even though "playing around" with some non-relational DB is still well worth your time, just for personal growth and edification, that's quite different from transferring huge production systems over from a relational DB!-).
It sounds like you should be reading Why CouchDB
To quote from wikipedia
It is not a relational database management system. Instead of storing data in rows and columns, the database manages a collection of JSON documents. The documents in a collection need not share a schema, but retain query abilities via views.
CouchDB provides a different model for data storage than a traditional relational database in that it does not represent data as rows within tables, instead it stores data as "documents" in JSON format.
This difference in data storage model is what differenciates CouchDB from products like MySQL and SQL Server.
In terms of programatic access to CouchDB, it exposes a REST API which you can access by sending HTTP requests from your code
I hope this has been somewhat helpful, though I acknowlege it may not be given my minimal familiarity with the product
I'm far from an expert(all I've done is play around with it some...) but here's how I'm thinking of using it:
Usually when I'm designing an app I've got a bunch of app servers behind a load balancer. Often times, I've got sticky sessions so that each user will go back to the same app server during that session. What I'm thinking of doing is have a couchdb instance tied to each app server.
That way you can use that local couchdb to access user preferences, product data...whatever data you've got that doesn't have to be perfectly up to date.
So...now you've got data on these local CouchDBs. CouchDB allows replication. So, every fixed time period, merge the data back(every X seconds?) into it's peers to keep them up to date.
As a whole you shouldn't have to worry about conflicts b/c each appserver has it's own CouchDB and users are attached to the appserver, and you've got eventual consistency because you've got replication.
Does that answer your question?
A good example is when you say have to deal with people data in either a website or application. If you set off wishing to design the data and keep the individuals' information seperate, that makes a good case for CouchDB, which stores data in documents rather than relational tables. In a production deployment, my users may end up adding adhoc data about 10% of the people and some other funny details for another selected 5%. In a relational context, this could add up to loads of redundancy but not for CouchDB.
And it's not just about the fact that CouchDB is non-relational: if you're too focus on that, you're missing the point. CouchDB is plugged into the web, all you need to start with is HTTP for creating and making queries (GET/PUT/POST/DELETE...), and it's RESTful, plus the fact that it's portable and great for peer to peer sharing. It can also serve up web applications in what is termed as 'CouchApps', where CouchDB totally holds the images, CSS, markup as data stored under special documents called design documents.
Check out this collection of videos introducing non-relational databases, the one on CouchDB should give you a better idea.

Resources