Bulk insert with app engine endpoints

Bulk insert with app engine endpoints - google-app-engine

I have generated endpoints methods:get,list,remove,update.But what if i have collection of objects that i want to insert,is it only way -insert in loop, or exists solution of bulk insert in AppEngine?

You will have to look at alternate strategies to load data into your application. The reason is that there could be hundreds / thousands of records that you want to do insert as part of your bulk insert.
Now having said that, you could look at the following approach with Cloud Endpoints:
Considering uploading a file (CSV , JSON , XML) to your endpoint API Method. This file will have multiple records that you want to insert.
Process the file in your Endpoint #APIMethod implementation. Process each record and insert them accordingly.
While the above is achievable .. you have to consider the fact that a client has made this API call and is waiting for the response. So if you are going to end up processing multiple records (insert) and then throw back the response, things could time out quickly and also it is not best practices to make the API client wait.
So I suggest that while there are ways to do it via the API, you should look at various alternatives to get the data into your App Engine app. In case you really have to do the File thing, consider accepting the file and throwing back an ACCEPT Response. You could then use a Task Queue on the App Engine to process the file.

Related

Uploading images to S3 with React and Elixir/Phoenix

I'm trying to scheme how I'm going to accomplish this and so far I have the following:
I grab a file in the front end and on submit send the file name and type to the back end where it generates a presigned URL. I send that to the FE. I then send the file on the front end.
The issue here is that when I generate the presign, I want to commit my UUID filename going to S3 in my database via the back end. I don't know if the front end will successfully complete this task. I can think of some janky ways to garbage collect this - but I'm wondering, is there a typically prescribed way to do this that doesn't introduce the possibility of failures the BE isn't aware of?

Yes there's an alternate way. You can configure your bucket so that it sends an event whenever an object is created/updated. You can either send this event to a SNS topic or AWS Lambda.
From there you can make a request to your Phoenix app webhook, that can insert it into the database.
The advantage is that the event will come only when the file has been created.
For more info, you can read the following: https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html

The way I'm currently handling this is as such:
Compress the image client side.
Send the image to the backend application server.
Create a UUID on the backend.
Send the image from s3 to the backend, using the UUID as the key.
On success, put the UUID into the database.
Respond to the client with the UUID so it can display the image.
By following these steps, you don't introduce error into your database.

API to Database?

Please presume that I do not know anything about any of the things I will be mentioning because I really do not.
Most OpenData sites have the possibility of exporting the presented file either in for example .csv or .json formats (Example). They also always have an API tab (Example API).
I presume using the API would mean that if the data is updated you would receive the change whereas exporting it as .csv would mean the content will not be changed anymore.
My questions is: how does one use this API code to display the same table one would get when exporting a .csv file.
Would you use a database to extract this information? What kind of database and how do you link the API to the database?

I presume using the API would mean that if the data is updated you
would receive the change whereas exporting it as .csv would mean the
content will not be changed anymore.
You are correct in the sense that, if you download the csv to your computer, that csv file won't be updated any more.
An API is something you would call - in this case, you can call the API, saying "Hey, do you have the latest data on xxx?", and you will be given back the latest information about what you have asked. This does not mean though, that this site will notify you when there's a new update - you will have to keep calling the API (every hour, every day etc) to see if there are any changes.
My questions is: how does one use this API code to display the same
table one would get when exporting a .csv file.
You would:
Call the API from a server code, or a cloud service
Let the server code or cloud service decipher (or "Parse") the response
Use the deciphered response to create a table made out of HTML, or to place it into a database
Would you use a database to extract this information? What kind of
database and how do you link the API to the database?
You wouldn't necessarily need a database to extract information, although a database would be nice to place the final data inside.
You would first need some sort of way to "call the REST API". There are many ways to do this - using Shell Script, using Python, using Excel VBA etc.
I understand this is hard to visualize, so here is an example of step 1, where you can retrieve information.
Try placing in the below URL (taken from the site you showed us) in your address bar of your Chrome browser, and hit enter
http://opendata.brussels.be/api/records/1.0/search/?dataset=associations-clubs-sportifs
See how it gives back a lot of text with many brackets and commas? You've basically asked the site to give you some data, and this is the response they gave back (different browsers work differently - IE asks you to download the response as a .json file). You've basically called an API.
To see this data more cleanly, open your developer tools of your Chrome browser, and enter the following JavaScript code
var url = 'http://opendata.brussels.be/api/records/1.0/search/?dataset=associations-clubs-sportifs';
var xhr = new XMLHttpRequest();
xhr.open('GET', url);
xhr.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
xhr.onload = function() {
if (xhr.status === 200) {
// success
console.log(JSON.parse(xhr.responseText));
} else {
// error
console.log(JSON.parse(xhr.responseText));
}
};
xhr.send();
When you hit enter, a response will come back, stating "Object". If you click through the arrows, you can see this is a cleaner version of the data we just saw - more human readable.
In this case, I used JavaScript to retrieve the data, but you can use whatever code you want. You could proceed to use JavaScript to decipher the data, manipulate it, and push it into a database.
kintone is an online cloud database where you can customize it to run JavaScript codes, and have it store the data in their database, so you'll have the data stored online like in the below image. This is just one example of a database you can use.
There are other cloud services which allow you to connect API end points of different services with each other, like IFTTT and Zapier, but I'm not sure if they connect with open data.

The page you linked to shows that the API returns values as a JSON object. To access the data you can just send an appropriate http request and the response will be the requested data as a JSON. You can send requests like that over your browser if you want to.
Most languages allow JSON objects to be manipulated pro grammatically if you need to do work on the data.

Restful APIs publish model is "request and publish". Wen you request data via an API endpoint, you would receive response strings in JSON objects, CSV tables or XML.
The publisher, in this case Opendata.brussel.be would update their database on regular basis and publish the results via an API endpoint.
If you want to download the table as a relational data table in a CSV file, you'd need to parse the JSON objects into relational tables. This can be tricky since each JSON response string can vary in their paths.
There're several ways to do it. You can either write scripts to flatten the JSON objects or use a tool to parse and flatten the objects for you.
I use a tool called Acho to turn API endpoints into CSV files. It would parse almost all API endpoints through the parameters and even configure for multiple requests, such as iterative and recursive requests.
Acho API parser

Need to wait for heroku-connect SFID

I'm facing a challenge on using heroku-connect with Salesforce. I'm inserting a record into a parent object (order) on PG db and I get the PG id when I do the insert, then I have to insert the child (order lines) but heroku-connect hasn't done the insert into Salesforce and I don't have the SFID to be able to insert it.
What would you guys recommend I do? Do I do a requery of the field that tells me if it's synched and refresh the $digest in NG? Or do I do it on the API backend with a interval. I'm a little lost on what route to take.
I'm using streaming API but still can't get the SFID from the callback when I do the insert.
rows: [ { id: 85, sfid: null } ],
EDIT
Got this from Heroku support, works great.
https://devcenter.heroku.com/articles/herokuconnect#relationships-between-objects

We ran into this issue before, when we were evaluating Heroku Connect. We ended up with writing our own sync due several limitations of our data model.
In your case, before saving record into Postgres database, I would suggest to send API REST call to SFDC in order to get SFDC Id - only after that persist a record into Postgres. It will keep data model on SFDC/Postgres side consistent. At the same time API call limit utilization will be relatively low as you will be using API only for record creation.
As you are using Heroku Connect(not stand alone sync app) I would not recommend to put backend scheduler into your web app to control refresh of populated Ids. Logic will be highly coupled and might be painful in future to support you system

What is a proper way to initialize data store for static data in Google App Engine?

I have a model called "Category" in my app in GAE.
This model simply contains a name and it's parent category, and this won't be changed frequently after the website go online.
I'd like to know what is a better way to put these model instances in the beginning?
I now only know to execute (category.put()) in a webapp.RequestHandler by issuing a http request. But I suspect there is a proper way to do this.
Thanks!

You can use the remote API to connect to your datastore in a shell and add data as required.
Or, if it's a huge amount, you could think about using the bulk loader - but I suspect that the remote API will be more suitable.

Multiple data sources: data storage and retrieval approaches

I am building a website (probably in Wordpress) which takes data from a number of different sources for display on various pages.
The sources:
A Twitter feed
A Flickr feed
A database on a remote server
A local database
From each source I will mainly retrieve
A short string, e.g. for Twitter, the Tweet, and from the local database the title of a blog page.
An associated image, if one exists
A link identifying the content at its source
My question is:
What is the best way to a) store the data and b) retrieve the data
My thinking is:
i) Write a script that is run every 2 or so minutes on a cron job
ii) the script retrieves data from all sources and stores it in the local database
iii) application code can then retrieve all data from the one source, the local database
This should make application code easier to manage - we only ever draw data from one source in application code - and that's the main appeal. But is it overkill for a relatively small site?

I would recommend putting the twitter feed and flickr feed in JavaScript. Both flickr and twitter have REST APIs. By putting it on the client you free up resources on your server, create less complexity, your users won't be waiting around for your server to fetch the data, and you can let twitter and flickr cache the data for you.
This assumes you know JavaScript. Once you get past JavaScript quirks, it's not a bad language. Give Jquery a try. JQuery Twitter plugin Flickery JQuery plugin. There are others, that's just the first results from Google.
As for your data on the local server and remote server, that will depend more on the data that is being fetched. I would go with whatever you can develop the fastest and gives acceptable results. If that means making a REST call from server to sever, then go for it. IF the remote server is slow to respond, I would go the AJAX REST API method.
And for the local database, you are going to have to write server side code for that, so I would do that inside the Wordpress "framework".
Hope that helps.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Bulk insert with app engine endpoints - google-app-engine

I have generated endpoints methods:get,list,remove,update.But what if i have collection of objects that i want to insert,is it only way -insert in loop, or exists solution of bulk insert in AppEngine?

Related

Uploading images to S3 with React and Elixir/Phoenix

API to Database?

Need to wait for heroku-connect SFID

What is a proper way to initialize data store for static data in Google App Engine?

Multiple data sources: data storage and retrieval approaches

Categories

Resources