How to export data as csv format from Druid that already exits in the Druid? - export

I loaded data (almost 1-billion row data) from hdfs (Hadoop) to Apache Druid. Now, I am trying to export this data set as a CSV to my local. Is there any way to do this in Druid?
There is a download icon on the druid SQL. However, when you click it, it allows you to download the data up to which page you are on. I have soo many pages, so I cannot go through all pages to download all data.

You can POST a SQL query to the Query API and provide a resultFormat in your POST of csv.
https://druid.apache.org/docs/latest/querying/sql.html#responses

Related

Google Data Studio send API request

I have my own community connector built which is pulling data through API. Everything works as it should, as I am getting data into the report.
Now I want to be able to query the API from the report, using a dedicated field/filter. What I mean is having the option to write a string and request API for results including that string.
What I have done so far is I've used the request.configParams.field_name parameter to pass request data from Google Data Studio back to my data source but this means reloading the data source into the report every time I change the value.
Is there another way to pass custom request data from Google Data Studio to my connector API query?
For Community Connectors, it is not possible to push down arbitrary filters for report viewers at the moment (other than date filters).

Use .json file or database for static data?

I am building a web app using nodeJS with an angular based frontend and a Firebase/AngularFire2 backend. I have a list of about 80 cities and couple of details about each of them that I need to display with checkboxes for the user.
Should I save them as a json object in a .json file on the server and call it, or just store it in my Real-time Database and query it? Are there any speed/memory benefits to either?
There are two scenarios :
1.Your task is search oriented. You have to query the data and manipulate it. Memory management is key issues for you. You want some complex searching methods on your data. Then go for the database.
2.Your task require whole data at a time. You don't need to worry about memory management. Then directly load the data from file. Obviously this method will save the connection making time with your database. It will work as simple as file streams. [suggested for your case]

How to downlaod a file (image, pdf,......) using talend

I want to migrate Files (Attachments) from a FTP server to another server (Salesforce), to do that i am going to use talend. i have no clue which components to use and in which order in order to download the files (multiple formats but downloadable via a http link), and to insert them into salesforce database, i will be grateful if someone explains to me how to proceed (what are the components to use and how to relate them) ?
Based on the info provided, first you will obtain the files from the remote server, then load them as a BLOB into a database.
See the diagram for a typical FTP flow. The first component is a connection to the server which allows connection reuse. The second component is optional, it allows you to get a count of file prior to your operations (you can use it later to make sure you retrieved all the files). The third component (tFTPGet) is technically all you need. This component actually grabs the files based on the file mask you set. The final component tFTPDelete cleans up the remote directory.
Once you have the files locally see this help link for information on how to insert files as BLOBs into a database. You will have to tweak it for your SalesForce db.

API to Database?

Please presume that I do not know anything about any of the things I will be mentioning because I really do not.
Most OpenData sites have the possibility of exporting the presented file either in for example .csv or .json formats (Example). They also always have an API tab (Example API).
I presume using the API would mean that if the data is updated you would receive the change whereas exporting it as .csv would mean the content will not be changed anymore.
My questions is: how does one use this API code to display the same table one would get when exporting a .csv file.
Would you use a database to extract this information? What kind of database and how do you link the API to the database?
I presume using the API would mean that if the data is updated you
would receive the change whereas exporting it as .csv would mean the
content will not be changed anymore.
You are correct in the sense that, if you download the csv to your computer, that csv file won't be updated any more.
An API is something you would call - in this case, you can call the API, saying "Hey, do you have the latest data on xxx?", and you will be given back the latest information about what you have asked. This does not mean though, that this site will notify you when there's a new update - you will have to keep calling the API (every hour, every day etc) to see if there are any changes.
My questions is: how does one use this API code to display the same
table one would get when exporting a .csv file.
You would:
Call the API from a server code, or a cloud service
Let the server code or cloud service decipher (or "Parse") the response
Use the deciphered response to create a table made out of HTML, or to place it into a database
Would you use a database to extract this information? What kind of
database and how do you link the API to the database?
You wouldn't necessarily need a database to extract information, although a database would be nice to place the final data inside.
You would first need some sort of way to "call the REST API". There are many ways to do this - using Shell Script, using Python, using Excel VBA etc.
I understand this is hard to visualize, so here is an example of step 1, where you can retrieve information.
Try placing in the below URL (taken from the site you showed us) in your address bar of your Chrome browser, and hit enter
http://opendata.brussels.be/api/records/1.0/search/?dataset=associations-clubs-sportifs
See how it gives back a lot of text with many brackets and commas? You've basically asked the site to give you some data, and this is the response they gave back (different browsers work differently - IE asks you to download the response as a .json file). You've basically called an API.
To see this data more cleanly, open your developer tools of your Chrome browser, and enter the following JavaScript code
var url = 'http://opendata.brussels.be/api/records/1.0/search/?dataset=associations-clubs-sportifs';
var xhr = new XMLHttpRequest();
xhr.open('GET', url);
xhr.setRequestHeader('X-Requested-With', 'XMLHttpRequest');
xhr.onload = function() {
if (xhr.status === 200) {
// success
console.log(JSON.parse(xhr.responseText));
} else {
// error
console.log(JSON.parse(xhr.responseText));
}
};
xhr.send();
When you hit enter, a response will come back, stating "Object". If you click through the arrows, you can see this is a cleaner version of the data we just saw - more human readable.
In this case, I used JavaScript to retrieve the data, but you can use whatever code you want. You could proceed to use JavaScript to decipher the data, manipulate it, and push it into a database.
kintone is an online cloud database where you can customize it to run JavaScript codes, and have it store the data in their database, so you'll have the data stored online like in the below image. This is just one example of a database you can use.
There are other cloud services which allow you to connect API end points of different services with each other, like IFTTT and Zapier, but I'm not sure if they connect with open data.
The page you linked to shows that the API returns values as a JSON object. To access the data you can just send an appropriate http request and the response will be the requested data as a JSON. You can send requests like that over your browser if you want to.
Most languages allow JSON objects to be manipulated pro grammatically if you need to do work on the data.
Restful APIs publish model is "request and publish". Wen you request data via an API endpoint, you would receive response strings in JSON objects, CSV tables or XML.
The publisher, in this case Opendata.brussel.be would update their database on regular basis and publish the results via an API endpoint.
If you want to download the table as a relational data table in a CSV file, you'd need to parse the JSON objects into relational tables. This can be tricky since each JSON response string can vary in their paths.
There're several ways to do it. You can either write scripts to flatten the JSON objects or use a tool to parse and flatten the objects for you.
I use a tool called Acho to turn API endpoints into CSV files. It would parse almost all API endpoints through the parameters and even configure for multiple requests, such as iterative and recursive requests.
Acho API parser

How do I make a CRON job to use the export feature of phpmyadmin to export DB

I want to export some tables in my DB to an Excel/Spreadsheet every month.
In PHPMyAdmin there is a direct option of exporting the result of a query to the desired filetype. How do I make use of this export feature without another script to run a cronjob on a monthly basis?
Basically on a CPanel (the DB is hosted in the web) we just have to give the path to the script to be executed via a cronjob. But in PHPMyAdmin there is no such opportunity. Its an included feature of PHPMyAdmin where we generally click and do it mannually. So how do i do it in Cpanel?
Do you have ssh access to the box? Personally I'd implement this outside of phpmyadmin, as phpmyadmin is just intended for manual operations via the interface. Why not write a simple script to export the db?
Something like mysqldump database table.
Being a web-app, the export function is a POST request. In the demo application the URL is http://demo.phpmyadmin.net/STABLE/export.php, and then the post data contains all the required parameters, for example: (You can use Fiddler/Chrome dev tools too view it)
token:3162d3b849cf652c2577a45f90022df7
export_type:server
export_method:quick
quick_or_custom:custom
output_format:sendit
filename_template:#SERVER#
remember_template:on
charset_of_file:utf-8
compression:none
what:excel
codegen_structure_or_data:data
codegen_format:0
csv_separator:,
csv_enclosed:"
.....
The one tricky bit is the authentication token, but I believe this is also possible to overcome using some configuration and/or extract parameters (like the 'direct login' in http://demo.phpmyadmin.net/)
See here
How to send data using curl from Linux command line?
If you want to avoid all this, there are many other web-automation tools that can record the scenario and play it back.
just write a simple php script to connect to your database and use the answer here:How to output MySQL query results in CSV format?

Resources