Please presume that I do not know anything about any of the things I will be mentioning because I really do not.
Most OpenData sites have the possibility of exporting the presented file either in for example .csv or .json formats (Example). They also always have an API tab (Example API).
I presume using the API would mean that if the data is updated you would receive the change whereas exporting it as .csv would mean the content will not be changed anymore.
My questions is: how does one use this API code to display the same table one would get when exporting a .csv file.
Would you use a database to extract this information? What kind of database and how do you link the API to the database?
I presume using the API would mean that if the data is updated you
would receive the change whereas exporting it as .csv would mean the
content will not be changed anymore.
You are correct in the sense that, if you download the csv to your computer, that csv file won't be updated any more.
An API is something you would call - in this case, you can call the API, saying "Hey, do you have the latest data on xxx?", and you will be given back the latest information about what you have asked. This does not mean though, that this site will notify you when there's a new update - you will have to keep calling the API (every hour, every day etc) to see if there are any changes.
My questions is: how does one use this API code to display the same
table one would get when exporting a .csv file.
You would:
Call the API from a server code, or a cloud service
Let the server code or cloud service decipher (or "Parse") the response
Use the deciphered response to create a table made out of HTML, or to place it into a database
Would you use a database to extract this information? What kind of
database and how do you link the API to the database?
You wouldn't necessarily need a database to extract information, although a database would be nice to place the final data inside.
You would first need some sort of way to "call the REST API". There are many ways to do this - using Shell Script, using Python, using Excel VBA etc.
I understand this is hard to visualize, so here is an example of step 1, where you can retrieve information.
Try placing in the below URL (taken from the site you showed us) in your address bar of your Chrome browser, and hit enter
http://opendata.brussels.be/api/records/1.0/search/?dataset=associations-clubs-sportifs
See how it gives back a lot of text with many brackets and commas? You've basically asked the site to give you some data, and this is the response they gave back (different browsers work differently - IE asks you to download the response as a .json file). You've basically called an API.
To see this data more cleanly, open your developer tools of your Chrome browser, and enter the following JavaScript code
var url = 'http://opendata.brussels.be/api/records/1.0/search/?dataset=associations-clubs-sportifs';
var xhr = new XMLHttpRequest();
xhr.open('GET', url);
xhr.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
xhr.onload = function() {
if (xhr.status === 200) {
// success
console.log(JSON.parse(xhr.responseText));
} else {
// error
console.log(JSON.parse(xhr.responseText));
}
};
xhr.send();
When you hit enter, a response will come back, stating "Object". If you click through the arrows, you can see this is a cleaner version of the data we just saw - more human readable.
In this case, I used JavaScript to retrieve the data, but you can use whatever code you want. You could proceed to use JavaScript to decipher the data, manipulate it, and push it into a database.
kintone is an online cloud database where you can customize it to run JavaScript codes, and have it store the data in their database, so you'll have the data stored online like in the below image. This is just one example of a database you can use.
There are other cloud services which allow you to connect API end points of different services with each other, like IFTTT and Zapier, but I'm not sure if they connect with open data.
The page you linked to shows that the API returns values as a JSON object. To access the data you can just send an appropriate http request and the response will be the requested data as a JSON. You can send requests like that over your browser if you want to.
Most languages allow JSON objects to be manipulated pro grammatically if you need to do work on the data.
Restful APIs publish model is "request and publish". Wen you request data via an API endpoint, you would receive response strings in JSON objects, CSV tables or XML.
The publisher, in this case Opendata.brussel.be would update their database on regular basis and publish the results via an API endpoint.
If you want to download the table as a relational data table in a CSV file, you'd need to parse the JSON objects into relational tables. This can be tricky since each JSON response string can vary in their paths.
There're several ways to do it. You can either write scripts to flatten the JSON objects or use a tool to parse and flatten the objects for you.
I use a tool called Acho to turn API endpoints into CSV files. It would parse almost all API endpoints through the parameters and even configure for multiple requests, such as iterative and recursive requests.
Acho API parser
Related
I'm trying to request data from a JSON API from Vemcount - a footfall-sensor system. I want to retrieve all my company's data and store it in a SQL Server db using SSIS. During my research I've realized that JSON is not natively integrated with SSIS, so solutions are either hardcoding the import or using a 3rd party - I've chose to go with the latter, so currently I'm trying to do the request using Kingswaysoft and their JSON Source component in VS.
The URL looks something like this (example from their documentation, which can be found here):
https://login.vemcount.com/api/fetch_data/?data={"api_key":"xxxxxxx","type":"shop_id","id":[10,14,24],"date_from":
"2014-01-01 00:00","interval":"15min"}
So, the keys api_key, type, and id are mandatory. The key/value pairs "type":"shop_id" and "id":[10,14,24] are the type of ID I want to search on and the ID itself. In the above example I've entered 3 different shop_id's. However, we have more than 250 id's, and more are being added continuously.
How do I request ALL the different id's without adding all the id's in the array? I've pasted my HTTP CM and JSON Source Editor below.
As you've probably guessed by now, this is somewhat unexplored territory for me, so I'm aware I might have chosen to proceed in a completely wrong way. If you have any better solutions for my case, please let me know.
UPDATE: So I've stored all Shop ID's in a variable, which I wanna pass as a value.
I am answering on phone so I don't have ssis available but here are the steps id do.
Exec sql to get the list of shop ids saved into an object.
For each loop on ado object from 1 storing shop id into variable.
Data flow
Script component source passing in shop ID as variable.
Build your url.
Using web client download string.
Deserialize JSON and write to data flow to finish.
Step 7 is huge and requires its own question. It involves setting up classes for your json.
This should get you started. Research step7 before posting question though.
Good luck.
Also, here is a link to help I received in regards to deserializing JSON.
C# Checking if Object is Null
In reading about the File API and wanting to write data directly from an indexedDB database to the client disk instead of first building and holding a large blob in RAM to download to disk, there are a few basic items I'm not understanding.
In the MDN documents these two statements are found:
In Gecko, privileged code can create File objects representing any local file without user interaction.
If you want to use the DOM File API in chrome code, you can do so without restriction. In fact, you get one bonus feature: you can create File objects specifying the path of the file on the user's computer. This only works from privileged code, so web content can't do it.
Where exactly does one write chrome code and/or Gecko priveleged code? Is this beyond a web extension? I've read and experimented with extensions; so, I'm not asking specifically about how to access them.
I'm not concerned about a 'normal' web page and server accessing the client disk. I know that it's not permitted inorder to protect the individual.
I'm interested in what can be done offline through the browser--with the aid of web extensions and/or a separate profile granting special permissions but without node.js, electron, etc.-- by an individual who knowingly wants to use the browser to do maybe what they should have built in the OS rather than the browser.
Put another way, if I want to use the browser just to run my javascript code to perform tasks all offline on my own machine, where is the privileged code written that gives access to these types of APIs that aren't subject to the security issues of a normal web page?
Is it still javascript or C++ in these areas?
Thank you.
This old question provides a link to their extension which includes the File API that writes to disk in a way that appears to provide a means to bypass the creation of a large blob of data. It's six years old but appears to contain what is needed, at least to get started.
I'm not referring to their trying to get around using indexedDB, but just that using this type of extension could allow for writing each object from the database directly to the client disk without first having to generate a large blob to download.
Attempt at employing Andrew Swan's suggestion
I'm trying to put the pieces together but have reached a point where am not sure how to continue. I wrote the code below in the background script of an extension. In attempting to employ Andrew Swan's suggestion, the plan is to intitiate a GET request for a text/csv file, which is intercepted and replaced by data extracted from database and written to the GET request by the stream filter.
First, make a GET request to a bogus url and listen for response, as follows:
let request = new XMLHttpRequest();
request.open("GET", url );
request.setRequestHeader( "Content-Type", "text/csv" );
request.send(null);
request.onreadystatechange = () =>
{
portFromCS.postMessage( { 'func' : 'disp_result', 'args' : { 'msg' : "request.status :", 'value' : request.status + ' : ' + request.statusText} } );
};
Second, intercept the request and write to the GET, as follows:
browser.webRequest.onBeforeRequest.addListener(
listener,
{ urls : ["<all_urls>"] },
["blocking"] );
function listener( details )
{
let filter = browser.webRequest.filterResponseData(details.requestId);
let decoder = new TextDecoder("utf-8");
let encoder = new TextEncoder();
filter.onstart = event =>
{
let str = decoder.decode(event.data, {stream: true});
str = '' +
'HTTP/1.1 200 OK \r\n' +
'Content-Length: 17 \r\n' +
'Content-Type: text/csv \r\n\r\n\r\n' +
'This is a string.';
filter.write(encoder.encode(str));
filter.disconnect();
};
}
The message sent from the background script in the request.onreadystatechange function is received in the content script, and the request.status is '0'.
The filter.onstart is used because the ondata event will never fire since the url is bogus. Also, that means there will be no converting of data from the url, but only the writing of new data through the filter.
The str data is written and received by the request, but only as responseText and not as a response header. The request.status remains '0' instead of '200'.
It seems that can't change the response header unless in onHeadersReceived which will never take place, it appears, for a bogus url. However, I tried this on a real url and, even though the event fired, an error of webRequest.HttpHeaders is not a function was thrown. I had "responseHeaders" in the webRequest extraInfoSpec at that time.
My questions are:
Can a response header be written to set the request.status to '200' and then start writing the database data through an async function in small blocks as retrieved?
Can the Content Disposition section of the header response be set such that it automatically starts the download of the response.text and allows the user to select the file name and save location, and stay "open" as keep writing to the file as the data is extracted from the database and passed to the GET request through the filter.write()?
Thank you.
Conclusion
It was a good idea but I don't think it is possible for at least two reasons.
One is that webRequest doesn't appear to intercept a downloads.download() function at all, or any download event; so, you can't intercept a download, and an event with a Content Disposition of 'attachment' is needed to even try to write to it with a stream. I could intercept a forced click to an anchor tag href but no other events fired beyond onBeforeRequest.
The other is that a response header can't be modified until an onHeadersReceived event, which means the fake URL has to return something. You can't just cancel it in onBeforeRequest. So, this wouldn't work offline. But, even if you let it process online to an existing URL that returns a reponse header, it won't accept a modification. I tried repeatedly to modify the response header and it just won't work. I tried an XMLHttpRequest GET and can intercept the events that fire but can't modify the response header; so, can't set Content Disposition to 'attachment', with or without file, to start a download. I can write to it but it's no good unless it is going to download what is written. It would be ok if the written content was going to a web page.
Also, if you redirect the URL along the way to anything other than a webRequest acceptable URL, the other events won't be interceptable. So, if redirect to an object URL in onBeforeRequest, you won't intercept the response headers stage in webRequest but can view threm in the onreadystatechange event of the XMLHttpRequest.
So, the upshot is that it appears the response headers cannot be modified even though the MDN Web Docs say it is possible. And, this idea of using awebRequest stream filter to stream data generated on the client or extracted from an indexedDB database, as opposed to building one large blob for download, won't work because can't intercept a download or change the response headers to trigger a download into which to write via the stream filter.
It was an interesting idea though. I still wonder whether or not the download would remain 'open', so to speak, while the data was being written on the client and passed in blocks or chunks. Perhaps, if that part of the response headers that states how data is to be passed and received was modified also it would work.
For now, I am no longer pursuing this approach. One of the Web Docs or a bug records stated that it is planned to allow a data URL to be intercepted. Perhaps, for an offline download to the client, that would be preferrable to a fake URL.
If anyone gets this to work, please let us know. Thank you.
A couple of terms:
"Gecko" is the rendering engine on which Firefox (and a few other applications like Thunderbird) is built
"Chrome" in this context means the browser user interface and features, as opposed to the contents of a web page being displayed by the browser.
In Firefox, much of the browser chrome is implemented in Javascript. The code that implements the user interface needs to be able to do things that normal web pages cannot do (such as reading and writing the local filesystem). Therefore, this code runs with different privileges than Javascript that runs as part of a web page. The terms "privileged code", "chrome privileged code", "Gecko privileged code" are all different ways to describe the same thing: Javascript code that is built in to the browser and has access to capabilities that web pages do not have.
Prior to the Firefox Quantum (version 57) release, Firefox extensions were allowed to run privileged Javascript code. As you might imagine, this was fraught with problems for security, performance, and stability, among other things. With WebExtensions, extensions now run with the same level of privilege as regular web content (ie, they do not execute with elevated privileges). Some browser features are exposed to extensions through extension APIs.
So, if you're interested in what you can do from an extension, any documents on MDN that reference privileged code, are effectively irrelevant. There are not currently any APIs available to WebExtensions that would allow you to directly access the filesystem, but there is an open bug to add some this capability. (that bug has existed for quite some time, but I suspect there will be progress relatively soon...)
I have a React/Redux front-end with an Express back-end application, and I'm rather new, but I have a question regarding how to deal with the flow of data.
So, on my front end side, I have a search bar. When a user enters a search term, I sent a post request from React which is handled in my Express routes.js file. In this file, I am taking that search term, and I am looking for that term in my Mongo database. After that, all I want to do is send an object back if the term was found in the database.
I have used axios in this application to make an HTTP request to a certain route to pull off some data, but that was within an app.get(...) on the express side, and I used an axios.get(...) on the React side to retrieve the information.
But, this situation is slightly different since the data is flowing in two directions. Initially, from a front end to the backend, and then back-end to front-end. And in this case, I'm using app.post(...).
Now my question is, how would I retrieve the data to the front end? Could I simply just do an axios.get(...) on an app.post(...) or is there some other way to do this?
If you GET from the browser to your back-end's route which is implemented to respond to POST only, you will probably get a 405 error. Implement a POST Axios request and a POST Express reply.
You can use either GET or POST, but you need to be consistent on the server and the client side. If you do an http GET from the client, the server will only respond if you have a app.get(...) as a server route.
As far as the flow of data is concerned, both a get and a post can return data, it just needs to be specified on the express route.
After the business logic of looking if the key exists in mongo do something like res.send({'found': true}) or res.json({'found': false}). This will ensure that the data gets back to the client.
If I were to do this, I would:
1.) Use an Axios get request and pass in as a parameter the identifying attributes, such as a related _id or key phrase.
2.) Use mongoDB's query filter search parameters to index and aggregate the schema data in the DB. I would probably use .findOne or .find.
3.) Use the router callback to pass in the filtered data, then dispatch a function to save it to a state.
This way you can set up specific terms or keywords to search with, and utilize the searched data throughout the app.
I am building a web app using nodeJS with an angular based frontend and a Firebase/AngularFire2 backend. I have a list of about 80 cities and couple of details about each of them that I need to display with checkboxes for the user.
Should I save them as a json object in a .json file on the server and call it, or just store it in my Real-time Database and query it? Are there any speed/memory benefits to either?
There are two scenarios :
1.Your task is search oriented. You have to query the data and manipulate it. Memory management is key issues for you. You want some complex searching methods on your data. Then go for the database.
2.Your task require whole data at a time. You don't need to worry about memory management. Then directly load the data from file. Obviously this method will save the connection making time with your database. It will work as simple as file streams. [suggested for your case]
I am building a website (probably in Wordpress) which takes data from a number of different sources for display on various pages.
The sources:
A Twitter feed
A Flickr feed
A database on a remote server
A local database
From each source I will mainly retrieve
A short string, e.g. for Twitter, the Tweet, and from the local database the title of a blog page.
An associated image, if one exists
A link identifying the content at its source
My question is:
What is the best way to a) store the data and b) retrieve the data
My thinking is:
i) Write a script that is run every 2 or so minutes on a cron job
ii) the script retrieves data from all sources and stores it in the local database
iii) application code can then retrieve all data from the one source, the local database
This should make application code easier to manage - we only ever draw data from one source in application code - and that's the main appeal. But is it overkill for a relatively small site?
I would recommend putting the twitter feed and flickr feed in JavaScript. Both flickr and twitter have REST APIs. By putting it on the client you free up resources on your server, create less complexity, your users won't be waiting around for your server to fetch the data, and you can let twitter and flickr cache the data for you.
This assumes you know JavaScript. Once you get past JavaScript quirks, it's not a bad language. Give Jquery a try. JQuery Twitter plugin Flickery JQuery plugin. There are others, that's just the first results from Google.
As for your data on the local server and remote server, that will depend more on the data that is being fetched. I would go with whatever you can develop the fastest and gives acceptable results. If that means making a REST call from server to sever, then go for it. IF the remote server is slow to respond, I would go the AJAX REST API method.
And for the local database, you are going to have to write server side code for that, so I would do that inside the Wordpress "framework".
Hope that helps.