What is the suitable db for bulk writes - google-app-engine

My application is currently on app engine server. My application writes the records(for logging and reporting) continuously.
Scenario: Views count in the website. When we open the website it hits the server to add the record with time and type of view. Showing these counts in the users dashboard.
Seems these requests are huge now. For now 40/sec. Google App Engine writes are going heavy and cost is increasing like anything.
Is there any way to reduce this or any other db to log the views?

Google App Engine's Datastore is NOT suitable for such a requirement where you have to continuously write to datastore and read less often.
You need to offload this task to a third party service (either you write one or use existing one)
Better option for user tracking and analytics is Google Analytics (Although you wont be directly able to show the hit counters on website using analytics).
If you want to show your user page hit count use a page hit counter: https://www.google.com/search?q=hit+counter

In this case you should avoid Datastore.
For this kind of analytics it's best to do the following:
Dump data to GAE log (yes, this sounds counter-intuitive, but it's actually advice from google engineers). GAE log is persistent and is guaranteed to not loose data you write to it.
Periodically parse the log for your data and then export it to BigQuery.
BigQuery has a quite powerful query language so it's capable of doing complex analytics reports.
Luckily this was already done before: see the Mache framework. Also see related video.
Note: there is now a new BigQuery feature called streaming inserts, which could potentially replace the cumbersome middle step (files on Cloud Storage) used in Mache.

Related

Stackdriver vs ELK for app engine

Im a little confused about this because the docs say I can use stackdriver for "Request logs and application logs for App Engine applications" so does that mean like web requests? Does that mean like millions of web requests?
Stackdriver's pricing is per resource so does that mean I can log all of my web servers web request logs (which would be HUGE) for no extra cost (meaning I would not be charged by the volume of storage the logs use)?
Does stackdriver use GCP cloud storage as a backend and do I have to pay for the storage? It just looks like I can get hundreds of gigabytes of log aggregation for virtually no money just want to make sure Im understanding this.
I bring up ELK because elastic just partnered with google so it must not do everything elasticsearch does (for almost no money) otherwise it would be a competitor?
Things definitely seem to be moving quickly at Google's cloud division and documentation does seem to suffer a bit.
Having said that, the document you linked to also details the limitations -
The request and application logs for your app are collected by a Cloud
Logging agent and are kept for a maximum of 90 days, up to a maximum
size of 1GB. If you want to store your logs for a longer period or
store a larger size than 1GB, you can export your logs to Cloud
Storage. You can also export your logs to BigQuery and Pub/Sub for
further processing.
It should work out of the box for small to medium sized projects. The built in log viewer is also pretty basic.
From your description, it sounds like you may have specific needs, so you should not assume this will be free. You should factor in costs for Cloud Storage for the logs you want to retain and BigQuery depending on your needs to crunch the logs.

Daily change each page on google app engine

I put one app on google app engine.
My app has one cronjob which is parse data from Internet and store into my db.
When user using my app, it will extract data from db, and show data to users.
I found that is too time consuming and too many request from db.
I want to revise each page when the cronjob running daily.
Then user can see the page without query my database.
How can I do that in GAE ?
Thank you for your reply.
Not nearly enough info in the question to help you. For example, what does "too many request from db" mean? Is it because you have a lot of traffic? Or you are querying the db too much?
Possible solutions are:
Edge cache your page: https://groups.google.com/forum/#!topic/google-appengine/6xAV2Q5x8AU/discussion
Store your page in memcache.
Optimize your database accesses. Most likely you're doing something very inefficiently.
Use a cronjob to generate your page, store it in the blobstore, and redirect your fetches to the blobstore. You can do this, but this is a pretty dumb way to go about it, given that there are better options.
I'm afraid it's a limitation of GAE, according to this post, no matter how much caching and tricky solutions you find. They just worsen google's policies.
You can't have what you need if you don't write directly inside the html file and store it on the server every time, which is far more resource consuming and, in my opinion, just pointless. Since GAE is a free service, with the purpose of testing, you should acquaint with what you have, or pay.

Exporting/Extracting data from datastore

New in GAE development and have some question regarding extracting data.
I have an app that collects data from end users and data is stored in high availability datastore, and there is a need to send subset of data the app collected to business partners on a regular basis.
Here are my questions,
1. How can I backup data in the datastore on a regular basis, say daily incremental backup and weekly full backup?
2. what are the best practices to generate daily data dump files that can be downloaded or send to my partners in a secured approach. I expect few hundred MB data files everyday and eventually will be in few GB range.
3. Can my business partners be authenticated though basic HTTP auth, or have to use OAuth?
Google is in effect backing up your data by storing it in multiple data centres.
You can however use the bulk loader if desired and back it up manually:
Uploading and Downloading Data
You can authenticate users however you choose, it's totally up to you. The "users" service is integrated into app engine directly however so if everybody has or could have google accounts that's even easier for you to use.
The users service
Due to the size of your files unless you want to piece them together from the datastore you'll have to use something else, as the datastore has a 1MB limit per model. It's perfectly possible to do that however.
But you should probably look at The Google Cloud Storage API instead as there are no file size limits.

How can I export data from Google App Engine High Replication datastore?

I am looking into using Google App Engine for a project and would like make sure I have a way to export all my data if I ever decide to leave GAE (or GAE shuts down).
Everything I search about exporting data from GAE points to https://developers.google.com/appengine/docs/python/tools/uploadingdata. However, that page contains this note:
Note: This document applies to apps that use the master/slave
datastore. If your app uses the High Replication datastore, it is
possible to copy data from the app, but Google does not currently
support this use case. If you attempt to copy from a High Replication
datastore, you'll see a high_replication_warning error in the Admin
Console, and the downloaded data might not include recently saved
entities.
The problem is that recently the master/slave datastore was recently deprecated in favor of the High Replication datastore. I understand that the master/slave datastore is still supported for a little while, but I don't feel comfortable using something that has officially been deprecated and is on its way out. So that leaves me with the High Replication datastore and the only way it seems to export the data is the method above that is not officially supported (and thus does not provide me with a guarantee that I can get my data out).
Is there any other (officially supported) way of exporting data from the High Replication datastore? I don't feel comfortable using Google App Engine if it means my data could be locked in there forever.
It took me quite a long time to setup the download of data from GAE as the documentation is not as clear as it should be.
If you extracting data from a Unix server, you maybe could reuse the script below.
Also, if you do not provide the "config_file" parameter, it will extract all your data for this kind but in a proprietary format which can only be used for restoring data afterwards.
#!/bin/sh
#------------------------------------------------------------------
#-- Param 1 : Namespace
#-- Param 2 : Kind (table id)
#-- Param 3 : Directory in which the csv file should be stored
#-- Param 4 : output file name
#------------------------------------------------------------------
appcfg.py download_data --secure --email=$BACKUP_USERID -- config_file=configClientExtract.yml --filename=$3/$4.csv --kind=$2 --url=$BACKUP_WEBSITE/remote_api --namespace=$1 --passin <<-EOF $BACKUP_PASSWORD EOF
Currently app engine datastore supports another option also. Data backup provision can be used to copy selected data into blob store or google cloud storage. This function is available under datastore admin area in app engine console. If required, the backed up data can then be downloaded from the blob viewer or cloud storage. For doing the backup for high replication datastore, it is recommended that datastore writes are disabled before taking the backup.
You need to configure a builtin called remote_api. This article has all the information and guide you need to be able to download all your data today and in the future.

Replicating data from GAE data store

We have an application that we're deploying on GAE. I've been tasked with coming up with options for replicating the data that we're storing the the GAE data store to a system running in Amazon's cloud.
Ideally we could do this without having to transfer the entire data store on every sync. The replication does not need to be in anything close to real time, so something like a once or twice a day sync would work just fine.
Can anyone with some experience with GAE help me out here with what the options might be? So far I've come up with:
Use the Google provided bulkloader.py to export the data to CSV and somehow transfer the CSV to Amazon and process there
Create a Java app that runs on GAE, reads the data from the data store and sends the data to another Java app running on Amazon.
Do those options work? What would be the gotchas with those? What other options are there?
You could use a logic similar to what App Engine HRD migration or backup tool are doing:
Mark modified entities with a child entity marker
Run a MapperPipeline using App Engine mapreduce library iterating on those entity using a Datastore Input Reader
In your map function fetch the parent entity and serialize it to Google Storage using a File Output Writer and remove the marker
Ping the remote host to import those entity from the Google Storage url
As an alternative to 3 and 4, you could make multiple urlfetch(POST) to send each serialized entity to the remote host directly, but it is more fragile as an single failure could compromise the integrity of your data import.
You could look at the datastore admin source code for inspiration.

Resources