I am having a problem exporting a collection from Mongo Atlas to my local machine. I have tried several different formats including this one, which I found in the official Atlas documentation on importing and exporting data.
First I log into my Atlas like so:
mongosh "mongodb+srv://cluster0.oyvrw.mongodb.net/dbname" --username uname
Then I try the command from the official docs:
mongoexport --uri mongodb+srv://uname:password#cluster0.oyvrw.mongodb.net/dbname --collection colname --type json --out cats.json
I have looked around at other similar questions and tried everything I can find online without success. One suggestion was not to run the command from the mongo shell but from the regular command line, but this does not work either.
It seems like it should be easier to get a collection out of Atlas to JSON. Any help or suggestions are much appreciated. Thanks!
For anyone facing this error, the mongoexport command does not work with mongosh. It must be run with the system shell.
However, mongoexport is part of mongo-database-tools, which as of MongoDB 4.4, is released separately. As a result, running mongoexport in the system shell will throw a command not found if the installed version of MongoDB is 4.4 or greater.
To solve this you can install the database tools using homebrew:
brew install mongodb/brew/mongodb-database-tools
Of course, make sure you have homebrew already installed. If not a quick Google will help.
Then following command should work to perform an export:
mongoexport --uri mongodb+srv://<username>:<password>#cluster0.oyvrw.mongodb.net/<dbName> --collection <collectionName> --type json --out /Users/macuser/desktop/exportBU.json
Hope that helps anyone having similar problems getting data in/out of MongoDB.
Is it possible to use any datasets available via the kaggle API in Google Colab? I see the Kaggle API is used in this Colab notebook, but it's a bit unclear to me what datasets it provides access to.
Step-by-step --
Create an API key in Kaggle.
To do this, go to kaggle.com/ and open your user settings page.
Next, scroll down to the API access section and click generate
to download an API key.
This will download a file called kaggle.json to your computer.
You'll use this file in Colab to access Kaggle datasets and
competitions.
Navigate to https://colab.research.google.com/.
Upload your kaggle.json file using the following snippet in
a code cell:
from google.colab import files
files.upload()
Install the kaggle API using !pip install -q kaggle
Move the kaggle.json file into ~/.kaggle, which is where the
API client expects your token to be located:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
Now you can access datasets using the client, e.g., !kaggle datasets list.
Here's a complete example notebook of the Colab portion of this process:
https://colab.research.google.com/drive/1DofKEdQYaXmDWBzuResXWWvxhLgDeVyl
This example shows uploading the kaggle.json file, the Kaggle API client, and using the Kaggle client to download a dataset.
You should be able to access any dataset on Kaggle via the API. In this example, only the datasets for competitions are being listed. You can see that datasets you can access with this command:
kaggle datasets list
You can also search for datasets by adding the -s tag and then the search term you're interested in. So this would give you a list of datasets about dogs:
kaggle datasets list -s dogs
You can find more information on the API and how to use it in the documentation here.
Hope that helps! :)
Detailed approach:
Go to my account in your profile
Scroll down, until you find an option Create new Api Token, this will download a file called kaggle.json
Go to Colab upload the file kaggle.json
pip install kaggle
create a new folder named kaggle, copy kaggle.json into the kaggle folder, and set read-write permissions only for you(user).
6.Go to Kaggle website.For example, you want to download any data, click on the three dots in the right hand side of the screen. Then click copy API command
Go to colab, paste the API command
8.When you do an !ls, you will see that our download is a zip file.
To unzip the file use the following command
Now, when you do !ls you'll find our csv file is extracted from the zip file.
To read the file perform a simple pd.read_csv, import pandas
12.As you see, we have successfully read our file into colab.
This downloads the kaggle dataset into google colab, where you can perform analysis and build amazing machine learning models or train neural networks.
Happy Analysis!!!
Combined the top response to this Github gist as Colab Implementation. You can directly copy the code and use it.
How to Import a Dataset from Kaggle in Colab
Method:
First a few things you have to do:
Sign up for Kaggle
Sign up for a competition you want to access data from (for example LANL-Earthquake-Prediction competition).
Download your credentials to access Kaggle API as kaggle.json
# Install kaggle packages
!pip install -q kaggle
!pip install -q kaggle-cli
# Colab's file access feature
from google.colab import files
# Upload `kaggle.json` file
uploaded = files.upload()
# Retrieve uploaded file
# print results
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
# Then copy kaggle.json into the folder where the API expects to find it.
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!ls ~/.kaggle
Now check if it worked!
#list competitions
!kaggle competitions list -s LANL-Earthquake-Prediction
Have a look at this.
It uses official kaggle api behind scene, but automates the process so you dont have to re-download manually every time your VM is taken away. Also, another issue i faced with using Kaggle API directly on Colab was the hassle of transferring Kaggle API token via Google Drive. Above method automates that as well.
Disclaimer: I am one of the creators of Clouderizer.
First of all, run this command to find out where this colab file exists, how it executes.
!ls -d $PWD/*
It will show /content/data /content/gdrive /content/models
In other words, your current directory is root/content/. Your working directory(pwd) is /content/. so when you do !ls, it will show data gdrive models.
FYI, ! allows you to run linux commands inside colab.
Google Drive keeps cleaning up the /content folder. Therefore, every session you use colab, downloaded data sets, kaggle json file will be gone. That's why it's important to automate the process, so you can focus on writing code, not setting up the environment every time.
Run this in colab code block as an example with your own api key. open kaggle.json file. you will find them out.
# Info on how to get your api key (kaggle.json) here: https://github.com/Kaggle/kaggle-api#api-credentials
!pip install kaggle
{"username":"seunghunsunmoonlee","key":""}
import json
import zipfile
import os
with open('/content/.kaggle/kaggle.json', 'w') as file:
json.dump(api_token, file)
!chmod 600 /content/.kaggle/kaggle.json
!kaggle config path -p /content
!kaggle competitions download -c dog-breed-identification
os.chdir('/content/competitions/dog-breed-identification')
for file in os.listdir():
zip_ref = zipfile.ZipFile(file, 'r')
zip_ref.extractall()
zip_ref.close()
Then run !ls again. You will see all data you need.
Hope it helps!
To download the competitve data on google colab from kaggle.
I'm working on google colab and I've been through the same problem. but i did two tings .
First you have to register your mobile number along with your country code.
Second you have to click on last submission on the kaggle dataset page
Then download kaggle.json file from kaggle.upload kaggle.json on the google colab
After that on google colab run these code is given below.
!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle competitions download -c web-traffic-time-series-forecasting
A quick guide to use Kaggle datasets inside Google Colab using Kaggle API
(1) Download the Kaggle API token.
Go to “Account”, go down the page, and find the “API” section.
Click the “Create New API Token” button.
The “kaggle.json” file will be downloaded.
(2) Mount the Google drive to the Colab notebook.
It means giving access to the files in your google drive to Colab notebook.
from google.colab import drive
drive.mount("/content/gdrive", force_remount=True)
(3) Upload the “kaggle.json” file into the folder in google drive where you want to download the Kaggle dataset.
(4) Install Kaggle API.
!pip install kaggle
(5) Change the current working directory to where you want to download the Kaggle dataset.
%cd /content/gdrive/MyDrive/DataSets/house_price_data/
(6) Run the following code to configure the path to “kaggle.json”.
import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/gdrive/MyDrive/DataSets/house_price_data/"
(7) Download the dataset.
!kaggle competitions download -c house-prices-advanced-regression-techniques
After the steps (1-6) above from Bob Smith's answer, to use dataset from a particular competition in colab,
you can use the command:
!kaggle competitions download -c elo-merchant-category-recommendation
Here, elo-merchant-category-recommendation is the name of the competition.
Most important part is before to download files:
In the Kaggle webpage, in the Competition section you must clicked on:
Late Submission or on Join Competition
and
ACCEPT RULE AND CONDITIONS ON KAGGLE COMPETITION WEBPAGE
if not, after copying api file, and after launched downloading the dataset, 403 error shows as result.
A hacky way:
Go to the dataset page after login
Open Chrome Developer Tools, then go to Network pane
Click Download button on Kaggle
When clicked you will see many requests in Network pane, find the request starting archive.zip
Right click on that request, then Copy -> Copy as cURL (bash). Now you copied the command
On Colab, paste the command and append an ! to the beginnning of the command then run it
This is definitely a less reliable way than the API, but still remains as an option.
I find the accepted answer to be very comprehensive, but would like to add that:
!kaggle competitions download -c dogs-vs-cats
or most other downloads still wont work. You will probably get the following error:
403 - Forbidden
which is not very verbose. It wants to say: "Please visit kaggle.com and accept the rules (e.g. for that competition). You cannot accept through the API! It is explicitly stated in the docs (see Public API documentation | Kaggle):
Just like participating in a Competition normally through the user interface, you must read and accept the rules in order to download data or make submissions. You cannot accept Competition rules via the API. You must do this by visiting the Kaggle website and accepting the rules there.
Yes, this could have been a comment, but I am missing enough reputation to comment.
import os
os.makedirs("/content/.kaggle/")
import json
token = {"username":"your_username_here","key":"your_kaggle_key_here"}
with open('/content/.kaggle/kaggle.json', 'a+') as file:
json.dump(token, file)
import shutil
os.makedirs("/.kaggle/")
src="/content/.kaggle/kaggle.json"
des="/.kaggle/kaggle.json"
shutil.copy(src,des)
os.makedirs("/root/.kaggle/")
!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json
!kaggle config set -n path -v /content
#https://towardsdatascience.com/setting-up-kaggle-in-google-colab-ebb281b61463
!kaggle datasets download -d xhlulu/siim-covid19-resized-to-512px-png
Works for me on Colab as of 29-05-21!
I am trying to use command mongoimport.
my mongoshell doesn't autocomplete ( when i use tab key) when i use mongoim. Where it puts me on doubt does mongoimport is not available ?
snippet:
C:\data\db>mongo
MongoDB shell version v3.4.4
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.4
Server has startup warnings:
2017-09-16T18:55:26.051-0400 I CONTROL [initandlisten]
2017-09-16T18:55:26.051-0400 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2017-09-16T18:55:26.051-0400 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2017-09-16T18:55:26.051-0400 I CONTROL [initandlisten]
MongoDB Enterprise > mongo
Mongo( MongoBridge( MongoRunner(
MongoDB Enterprise > mongoimp
I found relevant question here in stackoverflow, quit from 'MongoDB Enterprise' can solve this, when i use this command exit(), this makes complete exit from shell.
On server starts, it gives me warning message 'Access control is not enabled for the database' is this could be the problem ?
Help is appreciated
mongoimport works from the command prompt, not from Mongo Shell. So go back to your system command prompt and fire it there. For example:
C:\>mongoimport --db students --collection scores --file scores.json
Here MongoDB imports data from scores.json file into scores collection in students database of your running MongoDB instance. So you have to ensure following 3 things to make your mongoimport work.
You are inside System command prompt, not inside Mongo Shell.
Your Mongo instance is running.
Your Mongo server bin directory is in PATH environment variable.
If the json file is a json array, make sure you add the jsonArray at the end of your command.
Using RLD example, you would write it like:
C:\>mongoimport --db students --collection scores --file scores.json --jsonArray
There is nothing wrong with the default MongoDB installation.
We are supposed to install MongoDB database tools.
Download MongoDB Database Tools(.zip) from the official website by choosing the correct option
Extract them to "C:\Program Files\MongoDB\Server\5.0\bin"
You are done.
These 2 websites will be helpful:
https://www.youtube.com/watch?v=v2hsB_e0mFA&ab_channel=SriwWorldofCoding
https://www.mongodb.com/try/download/database-tools?tck-docs_databasetools
I'm working on a Rails 3.2.9 app , on performing a certain action, the app doesnt go any further and when i chech the log file i get this line to be the last in the log
Connecting to database specified by database.yml
I have no idea what's causing this problem.. When i sign up or sign in also it needs to connect to db and it works fine then.. only when a function (called execute test case) is clicked, the app doesnt go further and freezes there itself..
Please help me if you have come across this ...or suggest what may be the cause!!
Check this answer. This may help you.
Rails Connecting to database specified by database.yml
I found the cause of this error.. Problem is when the gem ‘mysql2’ is installed for the app , it might not be the one compatible with the version of MySQL server that is installed in our machine. And also a corresponding libmysql.dll file to be copied to Ruby folder.
So install the gem by specifying the local directory of Mysql
1.In the cmd,
gem install mysql2 -- --with-mysql-dir=C:\Program Files\MySQL
As directed in the cmd, follow the link to download the dll. Extract the zip from that location and copy the file as instructed in cmd
If the zip is empty or the link shows file not exist.(which does happen for some versions!!)
--> Go to the link and follow the flow in the url.. Like the website.. http://dev.mysql.com ->downloads -> MySQL Connectors -> MySQL Connector/C -> the latest version zips are displayed.. Choose the one with the exact file name as in the empty zip/broken link. If not click on previous GA versions and find the according zip file. Download, extract and copy the libmysql.dll to Ruby’s bin folder
Using Rails 3.2.2, finishing up my migration from sqlite to postgres 9.2.
Used answer in this tutorial as a guide to install postgres and got stuck on Step 11 where it asks run heroku db:pull where I get:
Failed to connect to database: Sequel::AdapterNotFound -> LoadError: cannot load such file --pg
I dug deeper and found db:pull (taps gem) is deprecated and came across a few recommendations for pg:transfer. Installed pg:transfer, but I get the impression it may be *nix only(?) as if I run: heroku pg:transfer it returns:
Heroku client internal error. No such file or directory - .env (Errno:ENOENT)
If I do pg:transfer with -f and -t it gives me:
'env' is not recognized as an internal or external command, operable program or batch file which means it isn't bound to path or doesn't exist as a command in windows.
Any thoughts on above errors?
Resolved by using pg:backups gem, which was recommended as the replacement for taps in the Heroku docs. I used this guide and uploaded my dump to dropbox for Heroku to pick it up.
Here's my exact list of steps and cmds:
Added pgbackups from heroku.com add-ons to my instance.
heroku pgbackups:capture DATABASE (this just backs up your heroku db)
pg_dump -h localhost -U <pg username> -Fc dbname > dbname.dump
Moved dbname.dump into a folder on my dropbox
In Dropbox, right-click on dbname.dump => "Share link"
Cancel the sharing dialogue pop-up, right-click on "Download button", Copy Link Address (Chrome)
heroku pgbackups:restore DATABASE <paste dropbox download link here>
Dropbox trickiness: don't use the file link provided by Dropbox since it's an html redirect and will cause pg:restore to fail, even though the extension ends in .dump
Instead, navigate to your dropbox page and "right-click copy link address" on the Download button. That's the address you use in your pgbackups:restore (should be something like db.dump?token=<long random string>)
A bit clunky, but got the job done. If you know a better way please let me know!
You need to make a .env file containing something like:
DATABASE_URL=postgres://localhost/myapp_development
References:
https://github.com/ddollar/heroku-pg-transfer
https://devcenter.heroku.com/articles/config-vars#local-setup