Is there any automatic/manual way to export data from firestore database to a BigQuery table?
Tried to look around, it looks like there's no way to export data from firestore without using code.
Any news about this one?
Thanks.
The simplest way to import Firestore data into BigQuery which does not require coding is using the command line. You can first export the data using the command below. Note that for importing the data into BigQuery you can only import specific collections and not all your documents in one batch.
gcloud beta firestore export gs://[BUCKET_NAME] --collection-ids='[COLLECTION_ID]'
Next, in the bucket you specified above, you will find a folder named after the timestamp of your export. Navigate the directories, locate the file ending with the file extension “export_metadata” and use its file path as the import source. You can then import the data into BigQuery using the command below:
bq --location=[LOCATION] load --source_format=DATASTORE_BACKUP [DATASET].[TABLE] [PATH_TO_SOURCE]
The best way to do this now is to use the official Firebase extension for exporting data in real-time from Firestore to BigQuery: https://github.com/firebase/extensions/tree/master/firestore-bigquery-export
Configuration can be done through the console or the CLI without using any code. The configured extension syncs data from Firestore to BigQuery in essentially realtime.
The extension creates an event listener on the Firestore collection you configure and a cloud function to sync data from Firestore to BigQuery.
Related
I loaded data (almost 1-billion row data) from hdfs (Hadoop) to Apache Druid. Now, I am trying to export this data set as a CSV to my local. Is there any way to do this in Druid?
There is a download icon on the druid SQL. However, when you click it, it allows you to download the data up to which page you are on. I have soo many pages, so I cannot go through all pages to download all data.
You can POST a SQL query to the Query API and provide a resultFormat in your POST of csv.
https://druid.apache.org/docs/latest/querying/sql.html#responses
I'm trying to use Google BigQuery to download a large dataset for the GitHub Data Challenge. I have designed my query and am able to run it in the console for Google BigQuery, but I am not allowed to export the data as CSV because it is too large. The recommended help tells me to save it to a table. This requires requires me to enable billing on my account and make a payment as far as I can tell.
Is there a way to save datasets as CSV (or JSON) files for export without payment?
For clarification, I do not need this data on Google's cloud and I only need to be able to download it once. No persistent storage required.
If you can enable the BigQuery API without enabling billing on your application, you can try using the getQueryResult API call. You're best bet is probably to enable billing (you probably won't be charged for the limited usage you need as you will probably stay within the free tier but if you do get charged it should only be a few cents) and save your query as a Google Storage object. If its too large I don't think you'll be able to use the Web UI effectively.
See this exact topic documentation:
https://developers.google.com/bigquery/exporting-data-from-bigquery
Summary: Use the extract operation. You can export CSV, JSON, or Avro. Exporting is free, but you need to have Google Cloud Storage activated to put the resulting files there.
use BQ command line tool
$ bq query
use the --format flag to save results as CSV.
I was trying to load one of my data store table to BigQuery. When I found there is an option "AppEngine Datastore Backup" in the web ui of BigQuery, I'm very happy cause all my data are located in one data store table. It should be the easiest approach (I thought) to just export data via "Datastore Admin" page of Google App Engine and then import it in BigQuery.
The export process went quite smoothly and I happily watched all mapper tasks successfully finished. After this step, what I got are 255 files in one of my Cloud Storage bucket. The problem arose when I try to import it in the web-ui of BigQuery. I input the url of one of the 255 files as the source of data load. And all I got is following error message:
Errors:
Not Found: URI gs://your_backup_hscript/datastore_backup_queue_status_backup_2013_05_23_QueueStats-1581059100105C09ECD88-output-54-retry-0
I'm sure above URL is the right one cause I can download it with gsutil. And I can import one CSV file located in the same bucket. May I know your suggestion of next step?
Found the reason now. I shall use the file with ".backup_info" suffix instead of arbitrary data file.
Cheers!
I have a csv file and I'd need to get it into a list object in app inventor.
I'm not sure if there is a better / simpler method, but I've looked at the following methods and I'm not really sure the best route.
Also I'm using python but I could switch to use java app engine.
Google Fusion Tables (gft)
Google Docs & TinyGSdb
App Engine & Python
Down in the comments there is an example on how to update the app.yaml to include some code to parse a csv file.
import csv
reader = csv.reader(open(‘efile_newestSFO_8354d71d-e3fb-4864-b9bf-5312a89e24d7_2010.csv’,”rU”), delimiter=’,')
for row in reader:
print row[0],row[1]
I'd rather not go out to the web every time the app loads to retrieve the list.
Thoughts?
You can write a handler to let you upload the cvs to BlobStore, then use BlobStore APIs from your app to read the file.
That approach is well-described here (in Java, but the same idea applies to Python).
I have about 4000 records that I need to upload to Datastore.
They are currently in CSV format. I'd appreciate if someone would
point me to or explain how to upload data in bulk to GAE.
You can use the bulkloader.py tool:
The bulkloader.py tool included with
the Python SDK can upload data to your
application's datastore. With just a
little bit of set-up, you can create
new datastore entities from CSV files.
I don't have the perfect solution, but I suggest you have a go with the App Engine Console. App Engine Console is a free plugin that lets you run an interactive Python interpreter in your production environment. It's helpful for one-off data manipulation (such as initial data imports) for several reasons:
It's the good old read-eval-print interpreter. You can do things one at a time instead of having to write the perfect import code all at once and running it in batch.
You have interactive access to your own data model, so you can read/update/delete objects from the data store.
You have interactive access to the URL Fetch API, so you can pull data down piece by piece.
I suggest something like the following:
Get your data model working in your development environment
Split your CSV records into chunks of under 1,000. Publish them somewhere like Amazon S3 or any other URL.
Install App Engine Console in your project and push it up to production
Log in to the console. (Only admins can use the console so you should be safe. You can even configure it to return HTTP 404 to "cloak" from unauthorized users.)
For each chunk of your CSV:
Use URLFetch to pull down a chunk of data
Use the built-in csv module to chop up your data until you have a list of useful data structures (most likely a list of lists or something like that)
Write a for loop, iterating through each each data structure in the list:
Create a data object with all correct properties
put() it into the data store
You should find that after one iteration through #5, then you can either copy and paste, or else write simple functions to speed up your import task. Also, with fetching and processing your data in steps 5.1 and 5.2, you can take your time until you are sure that you have it perfect.
(Note, App Engine Console currently works best with Firefox.)
By using remote API and operations on multiple entities. I will show an example on NDB using python, where our Test.csv contains the following values separated with semicolon:
1;2;3;4
5;6;7;8
First we need to import modules:
import csv
from TestData import TestData
from google.appengine.ext import ndb
from google.appengine.ext.remote_api import remote_api_stub
Then we need to create remote api stub:
remote_api_stub.ConfigureRemoteApi(None, '/_ah/remote_api', auth_func, 'your-app-id.appspot.com')
For more information on using remote api have a look at this answer.
Then comes the main code, which basically does the following things:
Opens the Test.csv file.
Sets the delimiter. We are using semicolon.
Then you have two different options to create a list of entities:
Using map reduce functions.
Using list comprehension.
In the end you batch put the whole list of entities.
Main code:
# Open csv file for reading.
with open('Test.csv', 'rb') as file:
# Set delimiter.
reader = csv.reader(file, delimiter=';')
# Reduce 2D list into 1D list and then map every element into entity.
test_data_list = map(lambda number: TestData(number=int(number)),
reduce(lambda list, row: list+row, reader)
)
# Or you can use list comprehension.
test_data_list = [TestData(number=int(number)) for row in reader for number in row]
# Batch put whole list into HRD.
ndb.put_multi(test_data_list)
The put_multi operation also takes care of making sure to batch appropriate number of entities in a single HTTP POST request.
Have a look at this documentation for more information:
CSV File Reading and Writing
Using the Remote API in a Local Client
Operations on Multiple Keys or Entities
NDB functions
the later version of app engine sdk, one can upload using the appcfg.py
see appcfg.py