Fetch thousand of files from S3/minio with a single page webapp (no server) - reactjs

I'm developing a single page app for image annotation. Each .jpg file is stored on S3/minIO services, coupled with a .xml file (Pascal VOC notation), which describes the coordinates and positions for each annotation associated to the image.
I'd like to fetch all the xml data, to be able filtering my image results within the webapp project (based upon ReactJS). But thousand of request to an S3 server directly from a web app seems a bit odd to me; nevertheless, I would prefer avoid using any "middleware" servers (like python/flask or nodejs), relying on the ReactJS app.
I've not been able to find any workaround to download all the xml files content with a single ajax call; do you have some idea to address this kind of issue?

The S3 API doesn't provide an API to fetch multiple files in a single operation. As you have suggested in your question, your application will need to handle this logic by first getting a list of the objects then iterating through that list.
Alternatively, if you can consider storing the xml files as a single archive.

Related

Loading TensorFlow.js model from File Server

I am trying to load Tensorflow.js model via HTTP protocol. Tensorflow.js requires me to store 'model.json' and 'weights.bin' files in the same folder. But I can only call 'model.json' as a parameter. It refers to the binary file by itself. That is how it works as far as I know.
For now, in the local environment, I am loading the model from the localhost(Http://127.0.0.1:8080) and it works fine.
However, the actual application accepts HTTPS protocol only. So I have tried to store them with models and weights in the same buckets in S3 and called via Lambda but it seems like only 'model.json' is retrieved. I am thinking of using EC2 instances where the Python Flask server is running but it seems like the same that only model.json is retrieved, not binary files.
Is there any way that I can retrieve 'model.json' with referring to the weight file? Is there anyway to host file server remotely with HTTPS protocol?
TFJS downloads model JSON, parses it and uses whatever paths are specified in the JSON - you can edit that file and set any URL you want for weights.
Alternatively, you can also use lower-level methods to load weights manually (in case you want to have a custom loader, etc.), but leave that for future until you're more comfortable with TFJS.

Use .json file or database for static data?

I am building a web app using nodeJS with an angular based frontend and a Firebase/AngularFire2 backend. I have a list of about 80 cities and couple of details about each of them that I need to display with checkboxes for the user.
Should I save them as a json object in a .json file on the server and call it, or just store it in my Real-time Database and query it? Are there any speed/memory benefits to either?
There are two scenarios :
1.Your task is search oriented. You have to query the data and manipulate it. Memory management is key issues for you. You want some complex searching methods on your data. Then go for the database.
2.Your task require whole data at a time. You don't need to worry about memory management. Then directly load the data from file. Obviously this method will save the connection making time with your database. It will work as simple as file streams. [suggested for your case]

API to Database?

Please presume that I do not know anything about any of the things I will be mentioning because I really do not.
Most OpenData sites have the possibility of exporting the presented file either in for example .csv or .json formats (Example). They also always have an API tab (Example API).
I presume using the API would mean that if the data is updated you would receive the change whereas exporting it as .csv would mean the content will not be changed anymore.
My questions is: how does one use this API code to display the same table one would get when exporting a .csv file.
Would you use a database to extract this information? What kind of database and how do you link the API to the database?
I presume using the API would mean that if the data is updated you
would receive the change whereas exporting it as .csv would mean the
content will not be changed anymore.
You are correct in the sense that, if you download the csv to your computer, that csv file won't be updated any more.
An API is something you would call - in this case, you can call the API, saying "Hey, do you have the latest data on xxx?", and you will be given back the latest information about what you have asked. This does not mean though, that this site will notify you when there's a new update - you will have to keep calling the API (every hour, every day etc) to see if there are any changes.
My questions is: how does one use this API code to display the same
table one would get when exporting a .csv file.
You would:
Call the API from a server code, or a cloud service
Let the server code or cloud service decipher (or "Parse") the response
Use the deciphered response to create a table made out of HTML, or to place it into a database
Would you use a database to extract this information? What kind of
database and how do you link the API to the database?
You wouldn't necessarily need a database to extract information, although a database would be nice to place the final data inside.
You would first need some sort of way to "call the REST API". There are many ways to do this - using Shell Script, using Python, using Excel VBA etc.
I understand this is hard to visualize, so here is an example of step 1, where you can retrieve information.
Try placing in the below URL (taken from the site you showed us) in your address bar of your Chrome browser, and hit enter
http://opendata.brussels.be/api/records/1.0/search/?dataset=associations-clubs-sportifs
See how it gives back a lot of text with many brackets and commas? You've basically asked the site to give you some data, and this is the response they gave back (different browsers work differently - IE asks you to download the response as a .json file). You've basically called an API.
To see this data more cleanly, open your developer tools of your Chrome browser, and enter the following JavaScript code
var url = 'http://opendata.brussels.be/api/records/1.0/search/?dataset=associations-clubs-sportifs';
var xhr = new XMLHttpRequest();
xhr.open('GET', url);
xhr.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
xhr.onload = function() {
if (xhr.status === 200) {
// success
console.log(JSON.parse(xhr.responseText));
} else {
// error
console.log(JSON.parse(xhr.responseText));
}
};
xhr.send();
When you hit enter, a response will come back, stating "Object". If you click through the arrows, you can see this is a cleaner version of the data we just saw - more human readable.
In this case, I used JavaScript to retrieve the data, but you can use whatever code you want. You could proceed to use JavaScript to decipher the data, manipulate it, and push it into a database.
kintone is an online cloud database where you can customize it to run JavaScript codes, and have it store the data in their database, so you'll have the data stored online like in the below image. This is just one example of a database you can use.
There are other cloud services which allow you to connect API end points of different services with each other, like IFTTT and Zapier, but I'm not sure if they connect with open data.
The page you linked to shows that the API returns values as a JSON object. To access the data you can just send an appropriate http request and the response will be the requested data as a JSON. You can send requests like that over your browser if you want to.
Most languages allow JSON objects to be manipulated pro grammatically if you need to do work on the data.
Restful APIs publish model is "request and publish". Wen you request data via an API endpoint, you would receive response strings in JSON objects, CSV tables or XML.
The publisher, in this case Opendata.brussel.be would update their database on regular basis and publish the results via an API endpoint.
If you want to download the table as a relational data table in a CSV file, you'd need to parse the JSON objects into relational tables. This can be tricky since each JSON response string can vary in their paths.
There're several ways to do it. You can either write scripts to flatten the JSON objects or use a tool to parse and flatten the objects for you.
I use a tool called Acho to turn API endpoints into CSV files. It would parse almost all API endpoints through the parameters and even configure for multiple requests, such as iterative and recursive requests.
Acho API parser

Download large file on Google App Engine Python

On my appspot website, I use a third party API to query a large amount of data. The user then downloads the data in CSV. I know how to generate a csv and download it. The problem is that because the file is huge, I get the DeadlineExceededError.
I have tried tried increasing the fetch deadline to 60 (urlfetch.set_default_fetch_deadline(60)). It doesn't seem reasonable to increase it any further.
What is the appropriate way to tackle this problem on Google App Engine? Is this something where I have to use Task Queue?
Thanks.
DeadlineExceededError means that your incoming request took longer than 60 secs, not your UrlFetch call.
Deploy the code to generate the CSV file into a different module that you setup with basic or manual scaling. The URL to download your CSV will become http://module.domain.com
Requests can run indefinitely on modules with basic or manual scaling.
Alternately, consider creating a file dynamically in Google Cloud Storage (GCS) with your CSV content. At that point, the file resides in GCS and you have the ability to generate a URL from which they can download the file directly. There are also other options for different auth methods.
You can see documentation on doing this at
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/
and
https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/functions
Important note: do not use the Files API (which was a common way of dynamically create files in blobstore/gcs) as it has been depracated. Use the above referenced Google Cloud Storage Client API instead.
Of course, you can delete the generated files after they've been successfully downloaded and/or you could run a cron job to expire links/files after a certain time period.
Depending on your specific use case, this might be a more effective path.

Multiple data sources: data storage and retrieval approaches

I am building a website (probably in Wordpress) which takes data from a number of different sources for display on various pages.
The sources:
A Twitter feed
A Flickr feed
A database on a remote server
A local database
From each source I will mainly retrieve
A short string, e.g. for Twitter, the Tweet, and from the local database the title of a blog page.
An associated image, if one exists
A link identifying the content at its source
My question is:
What is the best way to a) store the data and b) retrieve the data
My thinking is:
i) Write a script that is run every 2 or so minutes on a cron job
ii) the script retrieves data from all sources and stores it in the local database
iii) application code can then retrieve all data from the one source, the local database
This should make application code easier to manage - we only ever draw data from one source in application code - and that's the main appeal. But is it overkill for a relatively small site?
I would recommend putting the twitter feed and flickr feed in JavaScript. Both flickr and twitter have REST APIs. By putting it on the client you free up resources on your server, create less complexity, your users won't be waiting around for your server to fetch the data, and you can let twitter and flickr cache the data for you.
This assumes you know JavaScript. Once you get past JavaScript quirks, it's not a bad language. Give Jquery a try. JQuery Twitter plugin Flickery JQuery plugin. There are others, that's just the first results from Google.
As for your data on the local server and remote server, that will depend more on the data that is being fetched. I would go with whatever you can develop the fastest and gives acceptable results. If that means making a REST call from server to sever, then go for it. IF the remote server is slow to respond, I would go the AJAX REST API method.
And for the local database, you are going to have to write server side code for that, so I would do that inside the Wordpress "framework".
Hope that helps.

Resources