How to determine size of images available in aws? - amazon-sagemaker

I'm using one of the images listed here https://github.com/aws/deep-learning-containers/blob/master/available_images.md, to create an sagemaker endpoint, but I keep getting "failed reason: Image size 15136109518 is greater that suppported size 1073741824" .
is there a way to find out the size of images provided https://github.com/aws/deep-learning-containers/blob/master/available_images.md or any aws managed images?

I suspect you are trying to deploy a serverless endpoint provisioned with 1GB of memory. As discussed here "You can increase the memory size of your endpoint with the MemorySizeInMB parameter, more info in this documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints-create.html#serverless-endpoints-create-config"
In order to view the uncompressed size of an image you can use the following example command:
$ docker pull 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:1.15.2-cpu-py27-ubuntu18.04
$ docker inspect -f "{{ .Size }}" 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:1.15.2-cpu-py27-ubuntu18.04
Kindly also note that you will need to provision enough memory to accommodate your model as well. Please see this link for more information.

Related

No space left on device [Amazon SageMaker]

I am working on training my model in P2.xlarge instance. When I download a dataset, I get the following error: "Exception during downloading or extracting: [Errno 28] No space left on device"\
I checked that P2.xlarge has 61GiB storage, translating to 64GB. I hardly have 5GB worth of data in my instance. Could you please let me know how to proceed?
Thank you for using Amazon SageMaker.
The 61GiB on P2.xlarge is RAM storage which is different from persistent storage you get with any Notebook Instance. At this time, by default each Notebook Instance have 5 GB of storage regardless of instance type. Please note that only data stored in this 5 GB storage is persisted across Notebook Instance restarts, you can also use ~10* GB of non-peristent storage in /tmp but that data will not be persisted across Notebook Instance restarts.
On the side note, you can also use EFS file system with SageMaker to have more space with Notebook Instance. There is blog on how to add an EFS mount with Lifecycle actions, here Mount an EFS file system to an Amazon SageMaker notebook (with lifecycle configurations).
Let us know if there is any other way we can be of assistance.
Disclaimer : The available size for non-persistent storage in /tmp may change over time.
Note : This looks like duplicate of post on AWS forums, here https://forums.aws.amazon.com/thread.jspa?messageID=858201.
Thanks,
Neelam
by default, notebook instances have 5GB of storage, no matter what type they are. You can also use 20GB of non-persistent storage in /tmp.
https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html
Hope this helps.
Even when you add extra storage, docker on SageMaker Notebook instances uses / mounted on the default block storage. You can configure SageMaker to use another directory (inside your extra storage device) by following instructions here. Usually, your home directory (/home/ec2-user/SageMaker) is mounted on this extra storage
Create ~/.sagemaker/config.yaml
Set a new container root where you have more space
local:
container_root: /home/ec2-user/SageMaker/temp

Change database location of docker couchdb instance?

I have a server with two disks: one is an SSD used by the operating system and the other is a normal 2.5TB HDD.
Now on this server I'm runnning Fedora Server 22 with Docker and there is one image currently running: Fedora/couchdb.
the problem is that this container is saving the database to the much smaller SSD when it should really be stored in the much bigger HDD.
How can I setup this image to store the database on the HDD?
Other Information:
you can specify the disk and index location for couchdb in the config file with
[couchdb]
database_dir = /path/to/dir
view_index_dir = /path/to/dir
how to add a custom configuration to the startup is explained here
http://docs.couchdb.org/en/1.6.1/config/intro.html
of course in order to use the desired path for your dbs in the container you need to make sure it is accessible inside the docker image
this explains everything you need to do so:
https://docs.docker.com/engine/userguide/dockervolumes/
if that does not help please explain in more detail what your setup is and what you want to acchieve.

GAE Soft private memory limit error on post requests

I am working on an application where I am using the paid services of Google app engine. In the application I am parsing a large xml file and trying to extracting data to the datastore. But while performing this task GAE is throwing me an error as below.
I also tried to change the performance setting by increasing frontend instance class from F1 to F2.
ERROR:
Exceeded soft private memory limit of 128 MB with 133 MB after servicing 14 requests total.
After handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application.
Thank you in advance.
When you face the Exceeded soft private memory limit error you have two alternatives to follow:
To upgrade your instance to a more powerful one, which gives you more memory.
To reduce the chunks of data you process in each request. You could split the XML file to smaller pieces and keep the smaller instance doing the work.
I agree with Mario's answer. Your options are indeed to either upgrade to an Instance class with more memory such as F2 or F3 or process these XML files in smaller chunks.
To help you decide what would be the best path for this task, you would need to know if these XML files to be processed will grow in size. If the XML file(s) will remain approximately this size, you can likely just upgrade the instance class for a quick fix.
If the files can grow in size, then augmenting the instance memory may only buy you more time before encountering this limit again. In this case, your ideal option would be to use a stream to parse the XML file(s) in smaller units, consuming less memory. In Python, xml.sax can be used to accomplish just that as the parse method can accept streams. You would need to implement your own ContentHandler methods.
In your case, the file is coming from the POST request but if the file were coming from Cloud Storage, you should be able to use the client library to stream the content through to the parser.
I had a similar problem, almost sure it's my usage of /tmp directory was causing it, this directory is mounted in memory which was causing it. So, if you are writing any files into /tmp don't forget to remove them!
Another option is that you actully have a memory leak! It says after servicing 14 requests - this means the getting more powerful instance will only delay the error. I would recommend cleaning memory, now I don't know what your code looks like, I'm trying following with my code:
import gc
# ...
#app.route('/fetch_data')
def fetch_data():
data_object = fetch_data_from_db()
uploader = AnotherHeavyObject()
# ...
response = extract_data(data_object)
del data_object
del uploader
gc.collect()
return response
After trying things above, now it seems that issue was with FuturesSession - related to this https://github.com/ross/requests-futures/issues/20. So perhaps it's another library you're using - but just be warned that some of those libraries leak memory - and AppEngine preserves state - so whatever is not cleaned out - stays in memory, and affects following requests on that same instance.

Chrome (Webkit) WebSQL Database maximum file size?

I'm trying to find some information on the maximum size a WebSQL (SQLite) database can be on Google Chrome. I've read conflicting information such as max size is 5MB and the User is prompted when DB reaches 10, 50, 100MB etc.
I've tried creating DB's of various sizes and they open fine at 500MB and 5,000MB, however I've yet to try adding data up to these large sizes.
Does anyone have any first hand experience with large WebSQL DB's or can point me at relevant information?
Here's a link to the most solid article on the subject I've found so far:http://html5doctor.com/introducing-web-sql-databases/. Helped me a lot.
i'm not sure about the max size for a database on a normal webpage, but if you need a lot of storage, you can make an extension that requests the "unlimitedStorage" permission. See http://code.google.com/chrome/extensions/manifest.html.
If you try to create a database over the size of the default database size, 5MB. The popup is shown the image below, asking whether you want to grant the database permission to scale up to the next size of database — 5MB++ to onwards. In short, you can have unlimited size but if user want.

Elegant way to determine total size of website?

is there an elegant way to determine the size of data downloaded from a website -- bearing in mind that not all requests will go to the same domain that you originally visited and that other browsers may in the background be polling at the same time. Ideally i'd like to look at the size of each individual page -- or for a Flash site the total downloaded over time.
I'm looking for some kind of browser plug-in or Fiddler script. I'm not sure Fiddler would work due to the issues pointed out above.
I want to compare sites similar to mine for total filesize - and keep track of my own site also.
Firebug and HttpFox are two Firefox plugin that can be used to determine the size of data downloaded from a website for one single page. While Firebug is a great tool for any web developer, HttpFox is a more specialized plugin to analyze HTTP requests / responses (with relative size).
You can install both and try them out, just be sure to disable the one while enabling the other.
If you need a website wide measurement:
If the website is made of plain HTML and assets (like CSS, images, flash, ...) you can check how big the folder containing the website is on the server (this assumes you can login on the server)
You can mirror the website locally using wget, curl or some GUI based application like Site Sucker and check how big the folder containing the mirror is
If you know the website is huge but you don't know how much, you can estimate its size. i.e. www.mygallery.com has 1000 galleries; each gallery has an average of 20 images loaded; every image is stored in 2 different sizes (thumbnail and full size) an average of for _n_kb / image; ...
Keep in mind that if you download / estimating a dynamic websites, you are dealing with what the website produces, not with the real size of the website on the server. A small PHP script can produce tons of HTML.
Have you tried Firebug for Firefox?
The "Net" panel in Firebug will tell you the size and fetch time of each fetched file, along with the totals.
You can download the entire site and then you will know for sure!
https://www.httrack.com/

Resources