What should I do when Heroku clear my database after the deployment? [duplicate] - database

I have a small Node.js / Express app deployed to Heroku.
I'd like to use a lightweight database like NeDB to persist some data. Is it possible to periodically backup / copy a file from Heroku if I used this approach?

File-based databases aren't a good fit for Heroku due to its ephemeral filesystem (bold added):
Each dyno gets its own ephemeral filesystem, with a fresh copy of the most recently deployed code. During the dyno’s lifetime its running processes can use the filesystem as a temporary scratchpad, but no files that are written are visible to processes in any other dyno and any files written will be discarded the moment the dyno is stopped or restarted. For example, this occurs any time a dyno is replaced due to application deployment and approximately once a day as part of normal dyno management.
Depending on your use case I recommend using a client-server database (this looks like a good fit here) or something like Amazon S3 for file storage.

Related

Why cloud application's file system is ephemeral

The "Beyond 12 factor APP" and "Considerations for Designing and Running an Application in the Cloud "(https://docs.cloudfoundry.org/devguide/deploy-apps/prepare-to-deploy.html)
states file system is ephemeral.
However I got different result when testing with openstack:
create VM using openstack server create with centos qcow2 image, no
external storage
ssh to the VM, create file under /home/centos
reboot VM
after VM startup, the file is still there.
Did I understand something wrong?
quote from the book:
cloud-friendly applications don’t just run in the cloud;they embrace
elastic scalability, ephemeral filesystems
in the "Logs" chapter: Cloud applications can make no assumptions about the file system on which they run, other than the fact that it is ephemeral.
quote from "Considerations for Designing and Running an Application in the Cloud " :
"Avoid Writing to the Local File System": "Local file system storage is short-lived."..."When an application instance crashes or stops, the resources assigned to that instance are reclaimed by the platform including any local disk changes made since the app started. When the instance is restarted, the application will start with a new disk image. Although your application can write local files while it is running, the files will disappear after the application restarts."
The meaning is that when running containerized applications you can't trust the file system to be longed lived between restarts, as it may be purged, or you might be running next time on a different instance.
It doesn't mean the data is guaranteed to disappear - just that it isn't guaranteed to stay - very much like a temp folder on a regular server
ephemeral(non-persistent) storage is given by default to the guests, if persistent storage is required for the apps, Cinder can be used.

Restore app-engine entities locally

Hi guys I've dumped (made a backup) of my Appengine datastore entities,following this tutorial, now I wonder if there is a way to restore the data locally ? so I can do some test and debug.
In windows, the datastore is in the directory
C:\Users\UserName\AppData\Local\Temp\AppName
In OSx this question can help you
In this directory are storade the datastore.db (the local storage), change the name (the app should not be running, and if is locked, kill all the python process)
Now go to the appengine dashboard
click in your app link
click in Blob Viewer (i'm assumming that you did the backup into a blobstore)
click in the file name
click in download
rename the file to datastore.db
copy to the previous path
start the app
Remote API (as koma mentions) is the main GAE-documented approach, and it's a good approach. Alternatively, you can download the entities using the cloud download tool, write your own store reader/deserializer, and execute it within your dev server local instance: http://gbayer.com/big-data/app-engine-datastore-how-to-efficiently-export-your-data. Read the part about the New Approach...
While these options are not automatic and require engineering, I really wanted to point out the side effect of doing this: We have been facing performance issues in the local development server for months now, specifically when the datastore has more than 1,000 entities with over 50 indexes. Just search for "require_indexes slow" and you'll see what I'm talking about.
I'm sure you have a solid reason to import lots of data locally for testing and debugging, just wanted to let you know your application will perform extremely slow, and debug mode will be impossibly slow; we can't even use debug mode with our setup anymore.
If you want to get some test data in your local db, you could copy some using the remote api

How to download an ephemeral file from Heroku Cedar

I have a rails project hosted on Heroku Cedar that does the following:
crawls daily newsfeed and store them into the database
manually judge the feeds and classify them into categories
use the judgments to build a classifier that automatically classifies new incoming feed
iteratively improve the classification with additional judgments
The problem is that the classifier requires writing to a file. However, when I run the scripts on Heroku Cedar, it creates an ephemeral file that isn't permanent.
My questions are:
Is there a way to download the ephemeral file I created by running a script on Heroku?
What's a better way to handle situation like this?
In short No. You want to be storing any generated data in some sort of persistent file/data store. You should look at pushing these files to S3 or similar.

local GAE datastore does not keep data after computer shuts down

On my local machine (i.e. http://localhost:8080/), I have entered data into my GAE datastore for some entity called Article. After turning off my computer and then restarting next day, I find the datastore empty: no entity. Is there a way to prevent this in the future?
How do I make a copy of the data in my local datastore? Also, will I be able to upload said data later into both localhost and production?
My model is ndb.
I am using Max OS X and Python 2.7, if theses matter.
I have experienced the same problem. Declaring the datastore path when running dev_appserver.py should fix it. These are the arguments I use when starting the dev_appserver
python dev_appserver.py --high_replication --use_sqlite --datastore_path=myapp.datastore --blobstore_path=myapp_blobs
This will use sqlite and save the data in the file myapp.datastore. If you want to save it in a different directory, use --datastore_path=/path/to/myapp/myapp.datastore
I also use --blobstore_path to save my blobs in a specific directory. I have found that it is more reliable to declare which directory to save my blobs. Again, that is --blobstore_path=/path/to/myapp/blobs or whatever you would like.
Since declaring blob and datastore paths, I haven't lost any data locally. More info can be found in the App Engine documentation here:
https://developers.google.com/appengine/docs/python/tools/devserver#Using_the_Datastore
Data in the local datastore is preserved unless you start it with the -c flag to clear it, at least on the PC. You therefore probably have a different issue with temp files or permissions or something.
The local data is stored using a different method to the actual production servers, so not sure if you can make a direct backup as such. If you want to upload data to both the local and deployed servers you can use the Upload tool suite: uploading data
The bulk loader tool can upload and download data to and from your application's datastore. With just a little bit of setup, you can upload new datastore entities from CSV and XML files, and download entity data into CSV, XML, and text files. Most spreadsheet applications can export CSV files, making it easy for non-developers and other applications to produce data that can be imported into your app. You can customize the upload and download logic to use different kinds of files, or do other data processing.
So you can 'backup' by downloading the data to a file.
To load/pull data into the local development server just give it the local URL.
The datastore typically saves to disk when you shut down. If you turned off your computer without shutting down the server, I could see this happening.

My GAE python development datastore is never persisted to a file

I have just started using GAE (Python 2.7 SDK 1.6.4) , I have set up a
simple test project using Pydev (latest version) in eclipse (indigo)
on Windows XP (SP3).
It all works fine, my app can record data in the datastore and the blobstore
and then retrieve it, but when I stop the development server and start
it again the data in the datastore is lost. This is not the case for
the blobstore which is retaining blobs fine and I can see the
blobstore folder that gets created in C:\Temp
I did the sensible thing and look back through old posts and found
that most people who have this problem solve it by changing the
location of the datastore file, so I used the following parameters;
--datastore_path="${workspace_loc}/myproject/datastore"
--blobstore_path="${workspace_loc}/myproject/blobstore"
"${workspace_loc}/myproject/src"
I moved the blobstore at the same time as you can see.
The blobstore still works, and now the blobstore folder is created in
myproject folder as expected. The datastore file is still not created
however, and when I stop and restart the development server the data
is still lost.
The dev server startup logs include the following entry
WARNING 2012-04-20 10:49:04,513 datastore_file_stub.py:513] Could not
read datastore data from C:\myworkspace\myproject\datastore
So I know it is trying to create the datastore in the correct place.
Finally I lifted the whole eclipse workspace folder and copied it to
another computer with exactly the same setup except it is running
Windows 7 instead of Windows XP.
Everything works fine there - both the datastore file and blobstore
folder are now created where I expect them to be.
I have set up eclipse, python, gae, my project and my eclipse launch
file in exactly the same way on two computers, it works on one and
not the other. Maybe XP is something to do with it but to be honest I
think that's unlikely.
The only other clue I have come up with is that a recent change to the
GAE development server stopped writing to the datastore file after
every change and only flushes on exit, this problem may be closely related to mine;
App Engine local datastore content does not persist
However adding the following to my code did not help at all.
from google.appengine.tools import dev_appserver
import atexit
atexit.register(dev_appserver.TearDownStubs)
So it's not down to incorrect termination sequence either as far as I
can tell although it may be that I was just added it in the wrong place (I'm am new to python).
Anyway I am stumped and I would be really grateful for suggestions you
guys can come up with.
It's probably http://code.google.com/p/googleappengine/issues/detail?id=7244 and a bug. Hopefully a fix will be available soon.
did you try:
--storage_path=...
Path at which all local files (such as the Datastore, Blobstore files, Google Cloud Storage Files, logs, etc) will be stored, unless overridden by --datastore_path, --blobstore_path, --logs_path, etc.
found at https://developers.google.com/appengine/docs/python/tools/devserver?csw=1

Resources