On the app engine development server why does the "Applying all pending transactions and saving the datastore" step take several seconds? - google-app-engine

It seems like something that should take place in a flash, at least locally. Is there some time emulation of the production server that causes it?

It depends how much data is saved in your local datastore and how fast your disk is. I've noticed if you have a lot of deleted data in your datastore, the file still ends up being big.
If you want to clear the database, it's better to delete the whole file than delete all the individual entities.

Related

Google appengine, least expensive way to run heavy datastore write cron job?

I have a Google appengine application, written in Go, that has a cron process which runs once a day at 3am. This process looks at all of the changes that have happened to my data during the day and stores some meta data about what happened. My users can run reports on this meta data to see trends that have happened over several months. The process does around 10-20 million datastore writes every night. It all works just fine, but since I have started running it I have noticed a significant increase in my monthly bill from Google (from around $50/month to around $400/month).
I have just setup a very basic taskqueue that this runs in, I have not changed the default settings at all. Is there a better way that I could be running this process at night that could save me money? I have never messed around with the backends (which are now depreciated) or the modules api, and I know they've changed a lot of this stuff recently so I'm not sure where to start looking. Any advice would be greatly appreciated.
Look at your instances at 3am. It might be that GAE spins up a lot of them to handle the job. You could configure your job to make it run less paralel so it will take longer but perhaps it will need only 1 instance then.
However, if your database writes are indeed the biggest factor this won't make a big impact.
You can try looking at your data models and indexes. Remember that each indexed field costs 2 writes extra, so see if you can remove indexes from some fields if you don't need them.
One improvement that you can do is to batch your write operations, you can use memcache for this (pay the dedicated one since it's more reliable). Write the updates to memcache, once it's about 900K, flush it to datastore. This will reduces the number of write to datastore A LOT, especially if your metadata's size is small.

Database time acces in Heroku with Play Framework

I am having a problem and I need your help.
I am working with Play Framework v1.2.4 in java, and my server is uploaded in the Heroku servers.
All works fine, I can access to my databases and all is ok, but I am experiment troubles when I do a couple of saves to the database.
I have a method who store data many times in the database and return a notification to a mobile phone. My problem is that the notification arrives before the database finish to save the data, because when it arrives I request for the update data to the server, and it returns the data without the last update. After a few seconds I have trying to update again, and the data shows correctly, therefore I think there is a time-access problem.
The idea would be that when the databases end to save the data, the server send the notification.
I dont know if this is caused because I am using the free version of the Heroku Servers, but I want to be sure before purchasing it.
In general all requests to cloud databases are always slower than the same working on your local machine. Even simply query that on your computer needs just 0.0001 sec can be as slow as 0.5 sec in the cloud. Reason is simple clouds providers uses shared databases + (geo) replications, which just... cannot be compared to the database accessed only by one program on the same machine.
Also keep in mind that free Heroku DB plans doesn't offer ANY database cache, which means that every query is fetched from the cloud directly.
As we don't know your application it's hard to say what is the bottleneck anyway almost for sure you have at least 3 ways to solve your problem. They are not an alternatives, probably you will need to use (or at least check) all of them.
You need to risk some basic plan and see how things changed with paid version, maybe it will be good enough for you, maybe not.
Redesign your application to make less queries. For an example instead sending 10 queries to select 10 different rows, you will need to send one query, which selects all 10 records at once.
Use Play's cache API to avoid repeating selecting the same set of data again and again. For an example, if you have some categories, which changes rarely, but you need category tree for each article, you don't need to fetch categories from DB every time, instead you can store a List of categories in cache, so you will need to use only one request to fetch article's content (which can be cached for some short time as well...)

Caching database table on a high-performance application

I have a high-performance application I'm considering making distributed (using rabbitMQ as the MQ). The application uses a database (currently SQLServer, but I can still switch to something else) and caches most of it in the RAM to increase performance.
This causes a problem because when one of the applications writes to the database, the others' cached database becomes out-of-date.
I figured it is something that happens a lot in the High-Availability community, however I couldn't find anything useful. I guess I'm not searching for the right thing.
Is there an out-of-the-box solution?
PS: I'm sorry if this belongs to serverfault - Since this a development issue I figured it belongs here
EDIT:
The application reads and writes to the database. Since I'm changing the application to be distributed - Now more than one application reads and writes to the database. The caching is done in each of the distributed applications, which are not aware to DB changes from another application.
I mean - How can one know if the DB was updated, if he wasn't the one to update it?
So you have one database and many applications on various servers. Each application has its own cache and all the applications are reading and writing to the database.
Look at a distributed cache instead of caching locally. Check out memcached or AppFabric. I've had success using AppFabric to cache things in a Microsoft stack. You can simply add new nodes to AppFabric and it will automatically distribute the objects for high availability.
If you move to a shared cache, then you can put expiration times on objects in the cache. Try to resist the temptation to proactively evict items when things change. It becomes a very difficult problem.
I would recommend isolating your critical items and only cache them once. As an example, when working on an auction site, we cached very aggressively. We only cached an auction listing's price once. That way when someone else bid on it, we only had to do one eviction. We didn't have to go through the entire cache and ask "Where does the price appear? Change it!"
For 95% of your data, the reads will expire on their own and writes won't affect them immediately. 5% of your data needs to be evicted when a new write comes in. This is what I called your "critical items". Things that always need to be up to date.
Hope that gives you ideas!

My redis server "crashed" and I lost all my data because it's all in memory?

I realize that all my data is gone when I log in... KEYS * shows nothing.
Luckily, I'm doing this in dev server.
What am I supposed to do if this happens in the future on production?
Am I supposed to back it up every second?
You can find a number of answers/options here:
http://redis.io/topics/persistence
From what I could gather, you should:
Configure your server instance to periodically persist its data to file every 5 minutes or so. That way at most you will lose a few minutes of data if the server goes down.
Configure your server instance to write an AOF redo log (append-only-file). You have various options to favor durability or performance.
Add at least one additional server and use that for replication. That way you will only ever lose any data if both/all of the servers go down simultaneously.
Redis is not the most durable way to store your data. With journaling mode your data is written to disk but you could still lose some data in the event of a crash.
Are you sure you have picked up the right solution for your service? Seems like you need something else than redis?
See also this; Is redis a durable datastore?

What's the best way to send pictures to a browser when they have to be stored as blobs in a database?

I have an existing database containing some pictures in blob fields. For a web application I have to display them.
What's the best way to do that, considering stress on the server and maintenance and coding efforts.
I can think of the following:
"Cache" the blobs to external files and send the files to the browser.
Read them from directly the database every time it's requested.
Some additionals facts:
I cannot change the database and get rid of the blobs alltogether and only save file references in the database (Like in the good ol' Access days), because the database is used by another application which actually requires the blobs.
The images change rarely, i.e. if an image is in the database it mostly stays that way forever.
There'll be many read accesses to the pictures, 10-100 pictures per view will be displayed. (Depending on the user's settings)
The pictures are relativly small, < 500 KB
I would suggest a combination of the two ideas of yours: The first time the item is requested read it from the database. But afterwards make sure they are cached by something like Squid so you don't have to retrieve the every time they are requested.
one improtant thing is to use proper HTTP cache control ... that is, setting expiration dates properly, responding to HEAD requests correctly (not all plattforms/webservers allow that) ...
caching thos blobs to the file system makes sense to me ... even more so, if the DB is running on another server ... but even if not, i think a simple file access is a lot cheaper, than piping the data through a local socket ... if you did cache the DB to the file system, you could most probably configure any webserver to do a good cache control for you ... if it's not the case already, you should maybe request, that there is a field to indicate the last update of the image, to make your cache more efficient ...
greetz
back2dos

Resources