No space left on device [Amazon SageMaker]

No space left on device [Amazon SageMaker] - amazon-sagemaker

I am working on training my model in P2.xlarge instance. When I download a dataset, I get the following error: "Exception during downloading or extracting: [Errno 28] No space left on device"\
I checked that P2.xlarge has 61GiB storage, translating to 64GB. I hardly have 5GB worth of data in my instance. Could you please let me know how to proceed?

Thank you for using Amazon SageMaker.
The 61GiB on P2.xlarge is RAM storage which is different from persistent storage you get with any Notebook Instance. At this time, by default each Notebook Instance have 5 GB of storage regardless of instance type. Please note that only data stored in this 5 GB storage is persisted across Notebook Instance restarts, you can also use ~10* GB of non-peristent storage in /tmp but that data will not be persisted across Notebook Instance restarts.
On the side note, you can also use EFS file system with SageMaker to have more space with Notebook Instance. There is blog on how to add an EFS mount with Lifecycle actions, here Mount an EFS file system to an Amazon SageMaker notebook (with lifecycle configurations).
Let us know if there is any other way we can be of assistance.
Disclaimer : The available size for non-persistent storage in /tmp may change over time.
Note : This looks like duplicate of post on AWS forums, here https://forums.aws.amazon.com/thread.jspa?messageID=858201.
Thanks,
Neelam

by default, notebook instances have 5GB of storage, no matter what type they are. You can also use 20GB of non-persistent storage in /tmp.
https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html
Hope this helps.

Even when you add extra storage, docker on SageMaker Notebook instances uses / mounted on the default block storage. You can configure SageMaker to use another directory (inside your extra storage device) by following instructions here. Usually, your home directory (/home/ec2-user/SageMaker) is mounted on this extra storage
Create ~/.sagemaker/config.yaml
Set a new container root where you have more space
local:
container_root: /home/ec2-user/SageMaker/temp

Related

Google Colab: about lifetime of files on VM

is there anyone knows the lifetime of files on the colab virtual machine?
for example, in a colab notebook, I save the data to a csv file as:
data.to_csv('data.csv')
then how long will the data.csv exist?
This is the scenario:
I want to maintain and update over 3000 small datasets everyday, but it seems that the interaction between colab and google drive by using pydrive is pretty slow(as I need to check every dataset everyday), so if the lifetime of files on the virtual machine is long enough, I can update the files on virtual machine everyday(which would be much faster) then synchronize them to google drive several days a time rather than everyday.

VMs are discarded after a period of inactivity, so your best bet is to save files to Drive that you'd like to keep as generated.
With pydrive, this is possible, but a bit cumbersome. An easier method is to use a FUSE interface to Drive so that you can automatically sync files as they are saved normally.
For an example, see:
https://colab.research.google.com/drive/1srw_HFWQ2SMgmWIawucXfusGzrj1_U0q

How to restore WP site missing the database?

I have acquired a poorly backed up site. The owner should have done an export but instead just copied the 3 folders and a few files at the top area. Can the site be recreated without the database, or will I need to start over?

I am afraid you will need to start over.
Without the database or an export it is practically impossible to recreate the site.
There is no physical storage of content inside the WordPress filestructure. In rare cases the physical MySQL files can be found and used to import but, again, this is something dependant on the export you received. If its only the wp-admin and so on there most likely nog going to be there.

Change database location of docker couchdb instance?

I have a server with two disks: one is an SSD used by the operating system and the other is a normal 2.5TB HDD.
Now on this server I'm runnning Fedora Server 22 with Docker and there is one image currently running: Fedora/couchdb.
the problem is that this container is saving the database to the much smaller SSD when it should really be stored in the much bigger HDD.
How can I setup this image to store the database on the HDD?
Other Information:

you can specify the disk and index location for couchdb in the config file with
[couchdb]
database_dir = /path/to/dir
view_index_dir = /path/to/dir
how to add a custom configuration to the startup is explained here
http://docs.couchdb.org/en/1.6.1/config/intro.html
of course in order to use the desired path for your dbs in the container you need to make sure it is accessible inside the docker image
this explains everything you need to do so:
https://docs.docker.com/engine/userguide/dockervolumes/
if that does not help please explain in more detail what your setup is and what you want to acchieve.

zipped packages and in memory storage strategies

i have a multitenant app with a zipped package for each tenant/client which contains the templates and handlers for the public site for each of them. right now i have under 50 tenants and its fine to keep the imported apps in memory after the first request to that specific clients domain.
this approach works well but i have to redeploy the app with the new clients zipped package every time i make changes and/or a new client gets added.
now im working to make it possible to upload those packages via web upload and store them into the blobstore.
my concerns now are:
getting the packages from the blobstore is of course slower than importing a zipped package in the filesystem.
but this is not the biggest issue.
how do i load/import a module that is not in the filesystem and has no path?
if every clients package is around 1mb its not a problem as long as the client base is low but what if it raises
to 1k or even more? obviously there i dont have enough memory to store a few GB of data in memory.
what is the best way to deal with this?
if i use the instance memory to store the previously tenant package in memory how would
invalidate the data in memory if there would be a newly uploaded package?
i would appreciate some thougts about how to deal this kind of situation.

i agree with nick. there should be no python code in the tenant specific zip. to solve the memory issue i would cache most of the pages in the datastore. to serve them you don't need to have all tenants loaded in your instances. you might also wanna look in pre generating html views on save rather then on request.

Get Drives available on Azure VM from code

Through code Can I get the list of drives available on Azure VM(Medium size instance) along with their sizes..I am looking for a way where i can store the file temporarily on VM disk and delete it after reading from it.
THanks for you time
Regards,
Vivek

See Neil Mackenzie's post about local storage: https://convective.wordpress.com/2009/05/09/local-storage-on-windows-azure/.
It's as simple as doing RoleEnvironment.GetLocalResource("<name>").RootPath and then performing your file operations in that directory.

Do take a look at this Azure Storage article. They are essentially page blobs which can be mounted on your web/worker role's file system. One catch with Azure drives is that it only allows one instance to write to that while other instances have read only access to it.
HTH.