I am using Apache POI library to read excel files in to Google App engine in Java. The excel file is about 4 MB size with around 70 k records.
Workbook workbook = new XSSFWorkbook(inputStream);
I am getting OOM error while reading this excel file.
Is there any better way of reading Excel file in Google App Engine.
Related
I want to import a CSV file into Memgraph Cloud. I'm using 2GB free instance.
I tried to use LOAD CSV for a file located on AWS S3, but Memgraph Lab returns "Query failed: Load CSV not allowed on this instance because it was disabled by a config."
Is this limitation of the free version? Which options are available for loading data into Memgraph Cloud?
This is not related to the fact that you are using a free version of Memgraph Cloud.
Memgraph Cloud doesn’t support importing data from CSV files directly. Both the LOAD CSV clause and the CSV Import Tool require access to the local file system, and this is not possible when it comes to Memgraph Cloud.
There are two options you could try out:
Import the data programmatically by connecting to the instance via driver. This way, you could use each entry in your CSV file to execute a Cypher CREATE query.
Using the .cypherl bulk import 2 in Memgraph Lab. You can transform your CSV entries into Cypher CREATE queries and save them to a file with the extension .cypherl. Such a file can be imported by going to Memgraph Lab → Datasets tab → Select or drag and drop a file.
You can look up documentation on the Memgraph website.
Working on porting GAE Datastore files to a new data base and need a fully readable, idealy csv format.
I am not getting the most readable information when i pulled my google datastore files into atom editor. Is there a better way to read google datastore files or is there an atom plugin
I was wondering if Google App Engine supports reading and writing large files ( example, text files of size more than 2GB ) into and from Google Drive or Cloud Storage ?
What problems can I expect?
I'm using the Python 2,7 Google App Engine SDK by the way.
Updates.
I intend to read up to a million rows of data from Google's Data Store ( or maybe the new NDB ), and save the data into a text file for further processing either on Google Cloud Compute or some third party services like PiCloud.
The data is basically a network relationship and it goes like this:
A -> B
B -> C
A -> D
The above means that A is linked to B, B is linked to C and A is linked to D and so on...
As i have over a million edges, i think i might have to use the task queue or cron job to do this?
So after i processed the relationships, i'll have another text file which contains some scores between each pair of nodes, which i will than write back into the database.
Best Regards.
Yes it does - the question is how do you intend to write the files?
You can either upload direct to Cloud Storage using either gsutil or create_upload_url, or you can write from your app using this files API.
If you're using the file API then you need to read or write in chunks no larger than 32MB.
an API is around for app engine or I suppose you can use the REST cloud api. I just started with this but here is the api page
https://developers.google.com/appengine/docs/python/googlestorage/
I have a 10 MB CSV file of Geolocation data that I tried to upload to my App Engine datastore yesterday. I followed the instructions in this blog post and used the bulkloader/appcfg tool. The datastore indicated that records were uploaded but it took several hours and used up my entire CPU quota for the day. The process broke down in errors towards the end before I actually exceeded my quota. But needless to say, 10 MB of data shouldn't require this much time and power.
So, is there some other way to get this CSV data into my App Engine datastore (for a Java app).
I saw a post by Ikai Lan about using a mapper tool he created for this purpose but it looks rather complicated.
Instead, what about uploading the CSV to Google Docs - is there a way to transfer it to the App Engine datastore from there?
I do daily uploads of 100000 records (20 megs) through the bulkloader. Settings I played with:
- bulkloader.yaml config: set to auto generate keys.
- include header row in raw csv file.
- speed parameters are set on max (not sure if reducing would reduce cpus consumed)
These settings burn through my 6.5 hrs of free quota in about 4 minutes -- but it gets the data loaded (maybe its' from the indexes being generated).
appcfg.py upload_data --config_file=bulkloader.yaml --url=http://yourapp.appspot.com/remote_api --filename=data.csv --kind=yourtablename --bandwidth_limit=999999 --rps_limit=100 --batch_size=50 --http_limit=15
(I autogenerate this line with a script and use Autohotkey to send my credentials).
I wrote this gdata connector to pull data out of a Google Docs Spreadsheet and insert it into the datastore, but it uses Bulkloader, so it kind of takes you back to square one of your problem.
http://code.google.com/p/bulkloader-gdata-connector/source/browse/gdata_connector.py
What you could do however is take a look at the source to see how I pull data out of gdocs and create a task(s) that does that, instead of going through bulkloader.
Also you could upload your document into the blobstore and similarly create a task that reads csv data out of blobstore and creates entities. (I think this would be easier and faster than working with gdata feeds)
Is there a consistent code-base that allows me to upload a zip file to both GAE and Tomcat-based servers, extract the contents (plain-text files), and process them?
Both supports Java, so you can just use Java :)
In all seriousness, processing file uploads can be done with Apache Commons FileUpload and extracting them can be done with java.util.zip API.
See also the answers on those similar questions which are asked last two days, probably by your classmates/friends:
JSP/Servlets: How do I Upload a zip file, unzip it and extract the CSV file…
Upload a zip file, unzip and read file