How import the csv file into database using APIView - database

How can I import the csv file data to the database using the APIView in rest django

You can use xl2dict (https://pypi.org/project/xl2dict/) which converts spreadsheets in to python dictionary.
$ pip install xl2dict
Then You can easily update your database using,
from xl2dict import XlToDict
obj= XlToDict()
obj.convert_sheet_to_dict(file_path=, sheet=,
filter_variables_dict={"User Type" : "Admin", "Environment" : "Dev"})
Updating the database
Model.objects.update_or_create(**obj)
You can use the above method for updating database.
Alternate method is to use python pandas.

Related

Using SQLAlchemy in AWS LAMBDA

I am trying to use SQL Alchemy in AWS Lambda function but it is throwing error:
module not found.
Also I am attaching the folder structure which I am deploying to Lambda layer after zipping.
I am using the following command to create the folder for lambda layer.
pip3 install sqlalchemy --target Alchemy_layer/
The layer needs a different folder structure according to the documentation.
Try this:
pip3 install sqlalchemy --target python/
zip -r sqlalchemy-layer.zip python/
Upload that ZIP as layer and try again.

Snowflake Python Pandas Connector - Unknown error using fetch_pandas_all

I am trying to connect to snowflake using the python pandas connector.
I use the anaconda distribution on Windows, but uninstalled the existing connector and pyarrow and reinstalled using instructions on this page: https://docs.snowflake.com/en/user-guide/python-connector-pandas.html
I have the following versions
pandas 1.0.4 py37h47e9c7a_0
pip 20.1.1 py37_1
pyarrow 0.17.1 pypi_0 pypi
python 3.7.7 h81c818b_4
snowflake-connector-python 2.2.7 pypi_0 pypi
When running step 2 of this document: https://docs.snowflake.com/en/user-guide/python-connector-install.html, I get: 4.21.2
On attempting to use fetch_pandas_all() I get an error: NotSupportedError: Unknown error
The code I am using is as follows:
import snowflake.connector
import pandas as pd
SNOWFLAKE_DATA_SOURCE = '<DB>.<Schema>.<VIEW>'
query = '''
select *
from table(%s)
LIMIT 10;
'''
def create_snowflake_connection():
conn = snowflake.connector.connect(
user='MYUSERNAME',
account='MYACCOUNT',
authenticator = 'externalbrowser',
warehouse='<WH>',
database='<DB>',
role='<ROLE>',
schema='<SCHEMA>'
)
return conn
con = create_snowflake_connection()
cur = con.cursor()
temp = cur.execute(query, (SNOWFLAKE_DATA_SOURCE)).fetch_pandas_all()
cur.close()
I am wondering what else I need to install/upgrade/check in order to get fetch_pandas_all() to work?
Edit: After posting an answer below, I have realised that the issue is with the SSO (single sign on) with authenticator='externalbrowser'. When using a stand-alone account I can fetch.
I found a workaround that avoids the SSO error by relying on fetchall() instead of fetch_all_pandas():
try:
cur.execute(sql)
all_rows = cur.fetchall()
num_fields = len(cur.description)
field_names = [i[0] for i in cur.description]
finally:
cur.close()
con.close()
df = pd.DataFrame(all_rows)
df.columns = field_names
The reason is snowflake-connector-python does not install "pyarrow" which you need to play with pandas.
Either you could install and Import Pyarrow or
Do :
pip install "snowflake-connector-python[pandas]"
and try it .
What happens when you run this code?
from snowflake import connector
import time
import logging
for logger_name in ['snowflake.connector', 'botocore', 'boto3']:
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.FileHandler('test.log')
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
from snowflake.connector.cursor import CAN_USE_ARROW_RESULT
import pyarrow
import pandas as pd
print('CAN_USE_ARROW_RESULT', CAN_USE_ARROW_RESULT)
This will output whether CAN_USE_ARROW_RESULT is true and if it's not true, then pandas won't work. When you did the pip install, which of these did you run?
pip install snowflake-connector-python
pip install snowflake-connector-python[pandas]
Also, what OS are you running on?
I have this working now, but am not sure which part helps - the following steps were taken:
Based on comment by #Kirby, I tried pip3 install --upgrade snowflake-connector-python .. this is based on a historic screenshot .. I should have have [pandas] in brackets, i.e. pip3 install --upgrade snowflake-connector-python[pandas], but regardless, I got the following error message:
Error: Microsoft Visual C++ 14.0 is required. Get it with "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads
I therefore downloaded (exact filename: vs_buildtools__121011638.1587963829.exe) and installed VS Build Tools.
This is the tricky part .. I subsequently got admin access to my machine (so hoping it is the visual studio build tools that helped, and not admin access)
I then followed the Snowflake Documentation Python Connector API instructions originally referred to:
a. Anaconda Prompt (opened as admin): pip install snowflake-connector-python[pandas]
b. Python:
import snowflake.connector
import pandas as pd
ctx = snowflake.connector.connect(
user=user,
account=account,
password= 'password',
warehouse=warehouse,
database=database,
role = role,
schema=schema)
# Create a cursor object.
cur = ctx.cursor()
# Execute a statement that will generate a result set.
sql = "select * from t"
cur.execute(sql)
# Fetch the result set from the cursor and deliver it as the Pandas DataFrame.
df = cur.fetch_pandas_all()
Edit I have since realised that I still have the error when executing df = cur.fetch_pandas_all() when using my Okta (single sign on) account, i.e. when I use my username and authenticator = 'externalbrowser'. When I use a different account, I no longer get the error (with password).
NOTE: That I am still able to connect with externalbrowser (and I see the query has executed successfully in Snowflake history); I am just not able to fetch.
Using ..
python -m pip install "snowflake-connector-python[pandas]"
..as in the docs did not fetch the correct version of pyarrow for me (docs says you need 3.0.x).
With my conda (using python 3.8) I had to manually update pyarrow to the specifiv version:
python -m pip install pyarrow=6.0

How to share a MongoDB Database

I am working on a website with someone and I am trying to figure out if there is a way to push the db to github and have it work on his computer? We do not have a server yet and what is messing me up is that I need to specify a full log path and that means that it will not work on his computer.
Thanks
You can use mongoexport command to export your collection in .json format and then upload all .json file on github.
mongoexport --db test --collection collection_nam --out file_name.json
After that, you can import same .json file in MongoDB by using mongoimport command
https://docs.mongodb.com/manual/reference/program/mongoexport/
https://docs.mongodb.com/manual/reference/program/mongoimport/

Using Kaggle Datasets in Google Colab

Is it possible to use any datasets available via the kaggle API in Google Colab? I see the Kaggle API is used in this Colab notebook, but it's a bit unclear to me what datasets it provides access to.
Step-by-step --
Create an API key in Kaggle.
To do this, go to kaggle.com/ and open your user settings page.
Next, scroll down to the API access section and click generate
to download an API key.
This will download a file called kaggle.json to your computer.
You'll use this file in Colab to access Kaggle datasets and
competitions.
Navigate to https://colab.research.google.com/.
Upload your kaggle.json file using the following snippet in
a code cell:
from google.colab import files
files.upload()
Install the kaggle API using !pip install -q kaggle
Move the kaggle.json file into ~/.kaggle, which is where the
API client expects your token to be located:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
Now you can access datasets using the client, e.g., !kaggle datasets list.
Here's a complete example notebook of the Colab portion of this process:
https://colab.research.google.com/drive/1DofKEdQYaXmDWBzuResXWWvxhLgDeVyl
This example shows uploading the kaggle.json file, the Kaggle API client, and using the Kaggle client to download a dataset.
You should be able to access any dataset on Kaggle via the API. In this example, only the datasets for competitions are being listed. You can see that datasets you can access with this command:
kaggle datasets list
You can also search for datasets by adding the -s tag and then the search term you're interested in. So this would give you a list of datasets about dogs:
kaggle datasets list -s dogs
You can find more information on the API and how to use it in the documentation here.
Hope that helps! :)
Detailed approach:
Go to my account in your profile
Scroll down, until you find an option Create new Api Token, this will download a file called kaggle.json
Go to Colab upload the file kaggle.json
pip install kaggle
create a new folder named kaggle, copy kaggle.json into the kaggle folder, and set read-write permissions only for you(user).
6.Go to Kaggle website.For example, you want to download any data, click on the three dots in the right hand side of the screen. Then click copy API command
Go to colab, paste the API command
8.When you do an !ls, you will see that our download is a zip file.
To unzip the file use the following command
Now, when you do !ls you'll find our csv file is extracted from the zip file.
To read the file perform a simple pd.read_csv, import pandas
12.As you see, we have successfully read our file into colab.
This downloads the kaggle dataset into google colab, where you can perform analysis and build amazing machine learning models or train neural networks.
Happy Analysis!!!
Combined the top response to this Github gist as Colab Implementation. You can directly copy the code and use it.
How to Import a Dataset from Kaggle in Colab
Method:
First a few things you have to do:
Sign up for Kaggle
Sign up for a competition you want to access data from (for example LANL-Earthquake-Prediction competition).
Download your credentials to access Kaggle API as kaggle.json
# Install kaggle packages
!pip install -q kaggle
!pip install -q kaggle-cli
# Colab's file access feature
from google.colab import files
# Upload `kaggle.json` file
uploaded = files.upload()
# Retrieve uploaded file
# print results
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
# Then copy kaggle.json into the folder where the API expects to find it.
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!ls ~/.kaggle
Now check if it worked!
#list competitions
!kaggle competitions list -s LANL-Earthquake-Prediction
Have a look at this.
It uses official kaggle api behind scene, but automates the process so you dont have to re-download manually every time your VM is taken away. Also, another issue i faced with using Kaggle API directly on Colab was the hassle of transferring Kaggle API token via Google Drive. Above method automates that as well.
Disclaimer: I am one of the creators of Clouderizer.
First of all, run this command to find out where this colab file exists, how it executes.
!ls -d $PWD/*
It will show /content/data /content/gdrive /content/models
In other words, your current directory is root/content/. Your working directory(pwd) is /content/. so when you do !ls, it will show data gdrive models.
FYI, ! allows you to run linux commands inside colab.
Google Drive keeps cleaning up the /content folder. Therefore, every session you use colab, downloaded data sets, kaggle json file will be gone. That's why it's important to automate the process, so you can focus on writing code, not setting up the environment every time.
Run this in colab code block as an example with your own api key. open kaggle.json file. you will find them out.
# Info on how to get your api key (kaggle.json) here: https://github.com/Kaggle/kaggle-api#api-credentials
!pip install kaggle
{"username":"seunghunsunmoonlee","key":""}
import json
import zipfile
import os
with open('/content/.kaggle/kaggle.json', 'w') as file:
json.dump(api_token, file)
!chmod 600 /content/.kaggle/kaggle.json
!kaggle config path -p /content
!kaggle competitions download -c dog-breed-identification
os.chdir('/content/competitions/dog-breed-identification')
for file in os.listdir():
zip_ref = zipfile.ZipFile(file, 'r')
zip_ref.extractall()
zip_ref.close()
Then run !ls again. You will see all data you need.
Hope it helps!
To download the competitve data on google colab from kaggle.
I'm working on google colab and I've been through the same problem. but i did two tings .
First you have to register your mobile number along with your country code.
Second you have to click on last submission on the kaggle dataset page
Then download kaggle.json file from kaggle.upload kaggle.json on the google colab
After that on google colab run these code is given below.
!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle competitions download -c web-traffic-time-series-forecasting
A quick guide to use Kaggle datasets inside Google Colab using Kaggle API
(1) Download the Kaggle API token.
Go to “Account”, go down the page, and find the “API” section.
Click the “Create New API Token” button.
The “kaggle.json” file will be downloaded.
(2) Mount the Google drive to the Colab notebook.
It means giving access to the files in your google drive to Colab notebook.
from google.colab import drive
drive.mount("/content/gdrive", force_remount=True)
(3) Upload the “kaggle.json” file into the folder in google drive where you want to download the Kaggle dataset.
(4) Install Kaggle API.
!pip install kaggle
(5) Change the current working directory to where you want to download the Kaggle dataset.
%cd /content/gdrive/MyDrive/DataSets/house_price_data/
(6) Run the following code to configure the path to “kaggle.json”.
import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/gdrive/MyDrive/DataSets/house_price_data/"
(7) Download the dataset.
!kaggle competitions download -c house-prices-advanced-regression-techniques
After the steps (1-6) above from Bob Smith's answer, to use dataset from a particular competition in colab,
you can use the command:
!kaggle competitions download -c elo-merchant-category-recommendation
Here, elo-merchant-category-recommendation is the name of the competition.
Most important part is before to download files:
In the Kaggle webpage, in the Competition section you must clicked on:
Late Submission or on Join Competition
and
ACCEPT RULE AND CONDITIONS ON KAGGLE COMPETITION WEBPAGE
if not, after copying api file, and after launched downloading the dataset, 403 error shows as result.
A hacky way:
Go to the dataset page after login
Open Chrome Developer Tools, then go to Network pane
Click Download button on Kaggle
When clicked you will see many requests in Network pane, find the request starting archive.zip
Right click on that request, then Copy -> Copy as cURL (bash). Now you copied the command
On Colab, paste the command and append an ! to the beginnning of the command then run it
This is definitely a less reliable way than the API, but still remains as an option.
I find the accepted answer to be very comprehensive, but would like to add that:
!kaggle competitions download -c dogs-vs-cats
or most other downloads still wont work. You will probably get the following error:
403 - Forbidden
which is not very verbose. It wants to say: "Please visit kaggle.com and accept the rules (e.g. for that competition). You cannot accept through the API! It is explicitly stated in the docs (see Public API documentation | Kaggle):
Just like participating in a Competition normally through the user interface, you must read and accept the rules in order to download data or make submissions. You cannot accept Competition rules via the API. You must do this by visiting the Kaggle website and accepting the rules there.
Yes, this could have been a comment, but I am missing enough reputation to comment.
import os
os.makedirs("/content/.kaggle/")
import json
token = {"username":"your_username_here","key":"your_kaggle_key_here"}
with open('/content/.kaggle/kaggle.json', 'a+') as file:
json.dump(token, file)
import shutil
os.makedirs("/.kaggle/")
src="/content/.kaggle/kaggle.json"
des="/.kaggle/kaggle.json"
shutil.copy(src,des)
os.makedirs("/root/.kaggle/")
!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json
!kaggle config set -n path -v /content
#https://towardsdatascience.com/setting-up-kaggle-in-google-colab-ebb281b61463
!kaggle datasets download -d xhlulu/siim-covid19-resized-to-512px-png
Works for me on Colab as of 29-05-21!

Titan DB 1.0.0 : Cannot import Json file into titan TinkerPop 3.x

How make import from JSON file into titan DB , when i use geolocation property ??
I'm working with Titan DB TP3 - version 3.0.1-incubating
gremlin> Gremlin.version()
==>3.0.1-incubating
gremlin>
and using GeoShape index property (geolocation) ,
trying to export and import into new DB.
My steps are as following:
//export :
tg = TitanFactory.open(‘../conf/titan-db.properties’)
tg.io(IoCore.graphson()).writeGraph('/var/backups/PRODUCTION_DATA_27_10_16.json');
//import to new DB:
tg.io(IoCore.graphson()).readGraph('/var/backups/PRODUCTION_DATA_27_10_16.json');
but unfortunately got exception :
gremlin> tg.io(IoCore.graphson()).readGraph('/var/backups/PRODUCTION_DATA_27_10_16.json');
Property value [{type=Point, coordinates=[33.0, 32.0]}] is of type class java.util.LinkedHashMap is not supported
Display stack trace? [yN] y
java.lang.IllegalArgumentException: Property value [{type=Point, coordinates=[33.0, 32.0]}] is of type class java.util.LinkedHashMap is not supported
at org.apache.tinkerpop.gremlin.structure.Property$Exceptions.dataTypeOfPropertyValueNotSupported(Property.java:159)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.verifyAttribute(StandardTitanTx.java:564)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.addProperty(StandardTitanTx.java:716)
at com.thinkaurelius.titan.graphdb.vertices.AbstractVertex.property(AbstractVertex.java:142)
at com.thinkaurelius.titan.graphdb.vertices.AbstractVertex.property(AbstractVertex.java:23)
at org.apache.tinkerpop.gremlin.structure.util.Attachable$Method.lambda$createVertex$26(Attachable.java:296)
Please any solutions .. .?
It looks like you are running into this Issue 1183: Titan 1.0.0 GraphSONWriter.writeGraph JsonMappingException, which has already been fixed. Try building the titan11 branch from source code. If you need directions for building it, review the steps in this Titan mailing list post.
If you want to patch the Titan 1.0.0 build with the serialization fix and not move up to titan11, try this instead (discussed here):
git clone https://github.com/thinkaurelius/titan.git
cd titan
git checkout 1.0.0
git cherry-pick 6dfc816d821a7739398e5cebc1e999d75c866c19
mvn clean install -DskipTests=true -Dgpg.skip=true -Paurelius-release
unzip titan-dist/titan-dist-hadoop-1/target/titan-1.0.0-hadoop1.zip

Resources