google app engine deployment files limit per directory - google-app-engine

I got the following error message while deploying my app:
...
07:45 AM Scanned 5500 files.
...
Error 400: --- begin server output ---
Exceeded the limit of 1000 for allowable files per directory within gaelibs/romn/cscd/
--- end server output ---
but in the document it says:
maximum total number of files (app files and static files) 10,000 per directory 10,000 total
Could somebody tell me what's wrong with this? Thanks.

check issue 9256 opened by me. Now Google guys change the limit from 10,000 to 1,000.

Check that you're running the latest version of the SDK. I was not aware of this change, I thought the limit was still 1000. So if it says 10000 it means the limit was raised and it can have been introduced in a later version of the SDk than the version that you're using.

I discovered that you can get around the 1000 file limit by creating subdirectories. For example, if you have 1500 files in one directory Google will complain. If you divvy up those files into, let's say 500 files each for 3 new subdirectories, Google won't complain. You still have 1500 files in one directory, sort of.
This was a lot easier for me than migrating everything to gcs.

Related

Host server bloating with numerous fake session files created in hundreds every minute on php(7.3) website

I'm experiencing an unusual problem with my php (7.3) website creating huge number of unwanted session files on server every minute (around 50 to 100 files) and i noticed all of them having a fixed size of 125K or 0K in cPanel's file manager, hitting iNode counts going uncontrolled into thousands in hours & hundred thousands+ in a day; where as my website really have a small traffic of less than 3K a day and google crawler on top it. I'm denying all bad bots in .htaccess.
I'm able to control situation with help of a cron command that executes every six hours cleaning all session files older than 12hours from /tmp, however this isn't an ideal solution as fake session files getting created great in number eating all my server resources like RAM, Processor & most importantly Storage getting bloated impacting overall site performance.
I opened many of such files to examine but found them not associated with any valid user as i add user id, name, email to session upon successful authentication. Even assuming a session created for every visitor (without acc/login), it shouldn't go beyond 3K on a day but sessions count going as high as 125.000+ just in a day. Couldn't figure out the glitch.
I've gone through relevant posts and made checks like adding IP & UserAgent to sessions to track suspecious server monitoring, bot crawling, overwhelming proxy activities, but with no luck! I can also confirm by watching their timestamps that there is no human or crawler activity taken place when they're being created. Can see files being created every single minute without any break throughout the day!!.
Didn't find any clue yet in order to figure out root cause behind this weird behavior and highly appreciate any sort of help to troubleshoot this! Unfortunately server team unable to help much but added clean-up cron. Pasting below content of example session files:
0K Sized> favourites|a:0:{}LAST_ACTIVITY|i:1608871384
125K Sized> favourites|a:0:{}LAST_ACTIVITY|i:1608871395;empcontact|s:0:"";encryptedToken|s:40:"b881239480a324f621948029c0c02dc45ab4262a";
Valid Ex.File1> favourites|a:0:{}LAST_ACTIVITY|i:1608870991;applicant_email|s:26:"raju.mallxxxxx#gmail.com";applicant_phone|s:11:"09701300000";applicant|1;applicant_name|s:4:Raju;
Valid Ex.File2> favourites|a:0:{}LAST_ACTIVITY|i:1608919741;applicant_email|s:26:"raju.mallxxxxx#gmail.com";applicant_phone|s:11:"09701300000";IP|s:13:"13.126.144.95";UA|s:92:"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:82.0) Gecko/20100101 Firefox/82.0 X-Middleton/1";applicant|N;applicant_name|N;
We found that the issue triggered following hosting server's PHP version change from 5.6 to 7.3. However we noticed unwanted overwhelming session files not created on PHP 7.0! It's same code base we tested against three versions. Posting this as it may help others facing similar issue due to PHP version changes.

How to upload .gz files into Google Big Query?

I have an idea of a 90 GB .csv file that I want to make on my local computer and then upload into Google BigQuery for analysis. I create this file by combining thousands of smaller .csv files into 10 medium-sized files and then combining those medium-sized files into the 90 GB file, which I then want to move to GBQ. I am struggling with this project because my computer keeps crashing from memory issues. From this video I understood that I should first transform the medium-sized .csv files (about 9 GB each) into .gz files (about 500MB each), and then upload those .gz files into Google Cloud Storage. Next, I would create an empty Table (in Google BigQuery / Datasets) and then append all of those files to the created Table. The issue I am having is finding some kind of tutorial about how to do this or and documentation of how to do this. I am new to the Google Platform so maybe this is a very easy job that can be done with 1 click somewhere, but all I was able to find was from the video that I linked above. Where can I find some help or documentation or tutorials or videos on how people do this? Do I have the correct idea on the workflow? Is there some better way (like using some downloadable GUI to upload stuff)?
See the instructions here:
https://cloud.google.com/bigquery/bq-command-line-tool#creatingtablefromfile
As Abdou mentions in a comment, you don't need to combine them ahead of time. Just gzip all of your small CSV files, upload them to a GCS bucket, and use the "bq.py load" command to create a new table. Note that you can use a wildcard syntax to avoid listing all of the individual file names to load.
The --autodetect flag may allow you to avoid specifying a schema manually, although this relies on sampling from your input and may need to be corrected if it fails to detect in certain cases.

Why is appcfg.sh only downloading the last 100 lines of my logs

I am attempting to download my google appengine logs using the appcfg.sh client utility, but no matter what I do I only get (exactly) 100 log lines. I have tried the --num_days specifying a few days or 0 as per the docs to retrieve all available but it has no effect. My logs are not particularly large and the 100 lines result in a few hours of logs totaling about 40kB. And of course if I view the logs in the web console I can see many weeks (or months) worth of logs just fine.
I've been trying variations on the following command:
appcfg.sh--num_days=0 --include_all -A <<my app name>> request_logs <<path to my app>> api_2017_04_10.log
and the output I get is:
Reading application configuration data...
Apr 10, 2017 1:12:41 PM com.google.apphosting.utils.config.IndexesXmlReader readConfigXml
INFO: Successfully processed <<my app path>>/WEB-INF/datastore-indexes.xml
Beginning interaction for module <<my module name>>...
0% Beginning to retrieve log records...
25% Received 100 log records...
Success.
Cleaning up temporary files for module <<my module name>>...
Note that it always ends at "25%" and "100 log records"... and 100 lines is nowhere near 25% of the total I'd expect regardless.
After a week of intermittently messing with the this and always getting that same result this evening I ran the exact same script again and to my surprise I got 400 lines of the logs instead of 100. I ran it again immediately and it chugged along for several minutes all the while reporting "97%" finished but continuing to indicate thousands of additional lines of the logs. However it was not actually writing any data to the log at that point (I think it wants to buffer all of the data)... So I backed it down to --num_days=7 and that appears to have worked.
I think the client or API just very buggy.

Does CKAN have a limit size of data to upload?

I have set CKAN and it is running fine, but have two questions.
Both problems below happen only if uploading file. If I add a new resource by a URL, everything runs fine.
1) I can upload small files (around 4kb) to a given dataset, but when trying with bigger files (65 kb) I get Error 500 An Internal Server Error Occurred. So is there a size limit for uploading files? What can I do to be able to upload bigger files?
2) I get another error, for the small uploaded files, and that is: when clicking in Go to Resource to download the data, it gives me Connection to localhost refused, and I cant visualize the data either. What am I doing wrong?
I appreciate any help. If you need me to provide more info on anything, I'll happily do.
Many thanks.
CKAN has an upload size limit of 10MB for resources by default. You can raise that in your ini with ckan.max_resource_size = XX, for example ckan.max_resource_size = 100 (which means = 100MB).
As for question 2): have you set ckan.site_url correctly in your ini?
As far as I'm aware, CKAN can easily cope with terabytes of data(Used for millions of medical records in hospitals etc) so there shouldn't be an issue with your file size. It could be an issue on their end while receiving your data.

How many blobs may be submitted to GAE blobstore in one call?

I am trying to upload 1744 small files to the blobstore (total size of all the files is 4 MB) and get HTTP/1.1 503 Service Unavailable error.
This is 100% reproducible.
Is it a bug, I do I violate any constraints? I don't see any constraints in the documentation about number of blobs submitted in one call.
The answer that claims that create_upload_url can only accept one file per upload above is wrong. You can upload multiple files in a single upload and this is the way you should be approaching your problem.
That being said, there was a reliability problem when doing a batch upload that was worked on and fixed around a year or so ago. If possible I would suggest keeping the batch sizes a little smaller (say 100 or so files in a batch). Each file in the batch results in a write to the datastore to record the blob key so 1744 files == 1744 writes and if one of them fails then your entire upload will fail.
If you give me the app_id I can take a look at what might be going wrong with your uploads.
So, the answer. Currently only < 500 files may be submitted in one request.
This is going to be fixed in the scope of the ticket http://code.google.com/p/googleappengine/issues/detail?id=8032 so that unlimited number of files may be submitted. But it may take a GAE release or 2 before the fix is deployed.

Resources