Reading data from XLS stored in Google Cloud Storage

Reading data from XLS stored in Google Cloud Storage - google-app-engine

I have uploaded Excel file into GCS . Using Apache POI library from local excel file
i am able to read data.
I am not getting avaliable file readers and methods to Read data from GCS.
please suggest me excel file reading methods from GCS.
thanks in advance.

Since this is still unanswered I'll expand upon the previous comment : The GCS Client Library[1] will give you an InputStream which you can use to read the data from GCS:
GcsFilename fileName = new GcsFilename("bucket", "test.xlsx");
GcsInputChannel readChannel = gcsService.openPrefetchingReadChannel(fileName, 0, BUFFER_SIZE);
InputStream inputStream = Channels.newInputStream(readChannel);
XSSFWorkbook workbook = new XSSFWorkbook(inputStream);
XSSFSheet sheet = workbook.getSheetAt(0);
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
// Do stuff...
}
Note that if you want to use POI on the App Engine runtime itself you will need to use a nightly build of POI or build from source yourself, otherwise you will run into the issue 'com.sun.org.apache.xerces.internal.util.SecurityManager is a restricted class'[2].
[1] https://cloud.google.com/appengine/docs/java/googlecloudstorageclient/
[2] Google App Engine and Apache Poi loading templates

Related

How do I load a big CSV file into WSO2 ML

I was trying to upload a 10GB CSV file into WSO2 ML, but I could not do it, it gave me errors, I followed this link to change the size limit of my dataset in WSO2 ML(https://docs.wso2.com/display/ML100/FAQ#FAQ-Isthereafilesizelimittomydataset?Isthereafilesizelimittomydataset?)
I am running wso2 ML in a PC with the following characteristics:
- 50GB RAM
- 8 Cores
Thanks

When it comes to uploading datasets into WSO2 Machine Learner, we have given three options.
Uploading files from your local file system. As you have mentioned, maximum uploading limit is kept to 100MB and you can increase the limit by setting -Dog.apache.cxf.io.CachedOutputStream.Threshold option your wso2server.dat file. We have tested this feature with a 1GB file. However, for large files, we don't recommend this option. The main use case of this functionality is to allow users to quickly try out some machine learning algorithm with small datasets.
Since you are working with a large dataset we would like to recommend following two approaches for uploading your dataset into WSO2 ML server.
Upload data using Hadoop file system (HDFS). We have given a detailed description on how to use HDFS files in WSO2 ML in our documentation [1].
If you have up and running WSO2 DAS instance, by integrating WSO2 ML with WSO2 DAS you can easily point out a DAS table as your source type in the WSO2 ML's "Create Dataset" wizard. For more details on integrating WSO2 ML with WSO2 DAS please refer [2].
If you need more help regarding this issue please let me know.
[1]. https://docs.wso2.com/display/ML100/HDFS+Support
[2]. https://docs.wso2.com/display/ML110/Integration+with+WSO2+Data+Analytics+Server

For those who want to use HDP (Hortonworks) as part of your HDFS solution to load a large sized dataset for WSO2 ML using the NameNode port of 8020 via IPC, i.e. hdfs://hostname:8020/samples/data/wdbcSample.csv, you may also need to ingest such a data file onto HDFS in the first place using the following Java client:
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
FileSystem hdfs = FileSystem.get(new URI("hdfs://hostname:8020"), configuration);
Path dstPath = new Path("hdfs://hostname:8020/samples/data/wdbcSample.csv");
if (hdfs.exists(dstPath)) {
hdfs.delete(dstPath, true);
} else {
System.out.println("No such destination ...");
}
Path srcPath = new Path("wdbcSample.csv"); // a local file path on the client side
try {
hdfs.copyFromLocalFile(srcPath, dstPath);
System.out.println("Done successfully ...");
} catch (Exception ex) {
ex.printStackTrace();
} finally {
hdfs.close();
}
}

Reading PlayStore csv review files from Google storage bucket using Java App Engine

This problem has stumped me for most part of my day.
BACKGROUND
I am attempting to read the Play Store reviews for my Apps via my own Google App Engine Java project.
Now I am able to get the list of all the files using Google Cloud Storage client api (java).
I can also read the meta for each of the csv files in that bucket and print it to the logs:
PROBLEM
I simply can't find a way to read the actual object and get the csv data.
My java code snippet:
BUCKET_NAME = "pubsite_prod_rev_*******";
objectFileName = "reviews/reviews_*****_***.csv"
Storage.Objects.Get obj = client.objects().get(BUCKET_NAME, objectFileName);
InputStream is = obj.executeMediaAsInputStream();
Now when I print this inputstream, it tells me its GZIPInputStream (java.util.zip.GZIPInputStream#f0be2c). Converting this inputstream to byte[] or String (desired) does not work.
And if I try to envelope it inside GZIPInputStream object using:
zis = new GZIPInputStream(is);
it throws ZipException : Not in GZIP format.
Metadata of the file:
"contentType": "text/csv; charset=utf-16le",
"contentEncoding": "gzip",
What wrong am I doing?
Sub Question: In the past I have successfully read text data from Google Cloud Storage using GcsService, but it does not seem to work with the Buckets which have the Play Store review csv files. Does anybody know if my Google App Engine project (connected to same Google developer account) can read these Buckets?

Solved it using executeMedia() and parseAsString
HttpResponse response = obj.executeMedia();
response.parseAsString(); //works!!

open document in browser without generating file in server side

I have stored user document (ms word) in data store as blob object.
How do I open this in a browser? I can make it downloadable but I want user to view the document in browser before downloading it. how do I do that?
I think, to be able to open the file in browser by setting response headers (inline and content-type), the file should physically exist on a file location.
I don't have space to generate the file and then write to the browser. How do I handle it?
can I store them in google drive? does reading and uploading will be charged? any pointer to examples?
daoIntf=new DAOImpl();
document = daoIntf.getEntity("Document", id);
Blob docBlob=(Blob)document.getProperty("resume");
String fileName=(String)document.getProperty("fileName");
String contentType=(String)document.getProperty("contentType");
response.setContentType(contentType);
response.setHeader("Pragma" , "no-cache");
response.setHeader("Cache-Control" , "no-cache");
response.setDateHeader("Expires" , 0);
response.setHeader("Content-Disposition" , "inline;filename=\""+fileName+"\"");
ServletOutputStream out = response.getOutputStream();
out.write(docBlob.getBytes());
-thanks
Ma

If you have a document in the Blobstore, you simply need to provide a link to it (i.e. .getServingUrl() in Java). When a user clicks on that link, his browser will either offer to download it, or it will open it in the preview mode if this user has a plugin or extension that can do it.

There is an online Google Docs viewer that works on external documents: https://docs.google.com/a/iddiction.com/viewer?pli=1
Expose your documents via servlet using blobstoreService.serve(key, response) and then open them in the viewer.
http://docs.google.com/viewer?url=<url_of_your_document>&embedded=<true|false>

Load data from Google Cloud Storage to BigQuery using Java

I want to upload data from Google Cloud Storage to BigQuery, but I can't find any Java sample code describing how to do this. Would someone please give me some hint as how to do this?
What I actually wanna do is to transfer data from Google App Engine tables to BigQuery (and sync on a daily basis), so that I can do some analysis. I use the Google Cloud Storage Service in Google App Engine to write (new) records to files in Google Cloud Storage, and the only missing part is to append the data to tables in BigQuery (or create a new table for first time write). Admittedly I can manually upload/append the data using the BigQuery browser tool, but I would like it to be automatic, otherwise I need to manually do it everyday.

I don't know of any java samples for loading tables from Google Cloud Storage into BigQuery. That said, if you follow the instructions for running query jobs here, you can run a Load job instead with the folowing:
Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationLoad loadConfig = new JobConfigurationLoad();
config.setLoad(loadConfig);
job.setConfiguration(config);
// Set where you are importing from (i.e. the Google Cloud Storage paths).
List<String> sources = new ArrayList<String>();
sources.add("gs://bucket/csv_to_load.csv");
loadConfig.setSourceUris(sources);
// Describe the resulting table you are importing to:
TableReference tableRef = new TableReference();
tableRef.setDatasetId("myDataset");
tableRef.setTableId("myTable");
tableRef.setProjectId(projectId);
loadConfig.setDestinationTable(tableRef);
List<TableFieldSchema> fields = new ArrayList<TableFieldSchema>();
TableFieldSchema fieldFoo = new TableFieldSchema();
fieldFoo.setName("foo");
fieldFoo.setType("string");
TableFieldSchema fieldBar = new TableFieldSchema();
fieldBar.setName("bar");
fieldBar.setType("integer");
fields.add(fieldFoo);
fields.add(fieldBar);
TableSchema schema = new TableSchema();
schema.setFields(fields);
loadConfig.setSchema(schema);
// Also set custom delimiter or header rows to skip here....
// [not shown].
Insert insert = bigquery.jobs().insert(projectId, job);
insert.setProjectId(projectId);
JobReference jobRef = insert.execute().getJobReference();
// ... see rest of codelab for waiting for job to complete.
For more information on the load configuration object, see the javadoc here.

Google App Engine (Java) - Get URL to static file

I have a static XML file called rules.xml in war/xml/. It is a rules file for the Apache Commons Digester. In order to be able to use the file I need to be able to open it with a Reader. How can I open the file?

try using
final String file = "xml/rules.xml";
FileReader fileReader;
try
{
fileReader = new FileReader(file);
...
}
catch(..)
edit: after some intensive usage, FileReader in GAE seems have some trouble with accentuated characters (Only visible on GAE cloud instances, local tests runs perfectly).
If someone encounters this kind of bugs, use FileInputStream instead. It worked for me.
Check this page for more informations : http://code.google.com/intl/en/appengine/kb/java.html#readfile
Cheers.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Reading data from XLS stored in Google Cloud Storage - google-app-engine

I have uploaded Excel file into GCS . Using Apache POI library from local excel file i am able to read data. I am not getting avaliable file readers and methods to Read data from GCS. please suggest me excel file reading methods from GCS. thanks in advance.

Related

How do I load a big CSV file into WSO2 ML

Reading PlayStore csv review files from Google storage bucket using Java App Engine

open document in browser without generating file in server side

Load data from Google Cloud Storage to BigQuery using Java

Google App Engine (Java) - Get URL to static file

Categories

Resources