I created the below method to connect to SQL Server using SQL Alchemy and Pyodbc.
def getDBEngine(server, database):
Params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+server+";DATABASE="+database+";TRUSTED_CONNECTION=Yes")
Engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % Params)
return Engine
I'm able to then use that engine to read and write data via methods like to_sql from pandas per the below.
def toSQL(Server, Database, Tablename):
writeEngine= getDatabaseEngine(Server, Database)
data.to_sql('write_Tablename',writeEngine,if_exists='append')
My question is whether there is a simple way to check the connection/ status of the engine before actually using it to read or write data. What's the easiest way?
One pattern I've seen used at multiple engagements is essentially a "is alive" check that is effectively select 1 as is_alive;. There's no data access, so it just checks where your connection is receptive to receiving commands from your application.
Related
I have a lambda to connect to SQL Server database like this.
module.exports = async (event) => {
const sqlClient = getConnection(); // connection to ms SQL
doMyWork(sqlConnection) // my core lambda logic
closeConnection(sqlConnection) // closing Connection
}
when ever lambda is triggered, my SQL Server gets connected, work is done and connection is closed. Isn't there a way that i can use the connection object across multiple invocations of the lambda so that i can reduce a) no. of connection / disconnection attempts to server b) reduce the overall execution time of the lambda?
I have mentioned SQL Server here as I am currently using it here. as such I have to connect to MySQL and redis. what is the recommended way for connecting to databases (especially which support pools)?
Please suggest.
Generally, there are 2 ways to address this problem.
Store the connections/connection pool in a variable that persists across Lambda invocations. How to do it depends on the runtime/language you use. An example with node.js is here.
Use an external application that is long running and is able to persist a connection pool. Another Lambda function cannot do this and you'd need an EC2 or another long running compute instance. Thankfully AWS recently introduced RDS Proxy, which is a managed service that achieves this. It is however, still in preview.
For MSSQL/MySQL on RDS, you'll be able to use option 1 or 2 but for Redis you'll have to use option 1.
I'm currently trying to write a pandas data frame into a new SQL Server table, and I'm having trouble figuring out how to connect WITHOUT USING USER/PASSWORD.
Pandas documentation states that an engine must be created via sqlalchemy, and the company only gave me a sample code (not using pandas, for other tasks) for the connection via pymssql:
server = "server name"
conn = pymssql.connect(server, database='TestDatabase')
cursor = conn.cursor()
cursor.execute(instruction)
conn.close()
Now I must pass a connection to sqlalchemy, which the sqlalchemy documentation states would be something like
engine = create_engine("mssql+pymssql://<username>:<password>#<freetds_name>/?charset=utf8",
encoding='latin1', echo=True)
but our SQL Server instance uses local authentication, and I found no model for this.
How can I create this connection string?
I've reached the writing to a SQL Server database part of my data journey, I hope someone is able to help.
I've been able to successfully connect to a remote Microsoft SQL Server database using PYODBC this allows me to pass in SQL queries into dataframes and to create reports.
I now would want to automate the "select import" manual method I've had a read of many blogs but I'm none the wiser to understanding the how behind it all.
import pandas as pd
import pyodbc
SERVER = r'Remote SQL Server'
database = 'mydB'
username = 'datanovice'
password = 'datanovice'
cnxn = pyodbc.connect('Driver={SQL
Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+
password)
cursor = cnxn.cursor()
I'm able to read queries easily using this and pass them into dataframes.
what's the best way to write into my MS SQL dB? noting that it's not local I'm happy to pass this into SQL Alchemy but I wasn't sure of the correct syntax.
Things to consider:
This is a mission critical database and some of the DataFrames must be written as delete queries
If this is an unsafe method and if I need to go back and study more to understand proper database methodology I'm very happy to do so
I'm not looking for someone to write or provide the code for me, but rather point me in the right direction
I envisage this to be something like.. but I'm not sure how I specify the correct table:
df.to_sql('my_df', con, chunksize=1000)
As you've seen from the pandas documentation you need to pass a SQLAlchemy engine object as the second argument to the to_sql method. Then you can use something like
df.to_sql("table_name", engine, if_exists="replace")
The SQLAlchemy documentation shows how to create the engine object. If you use an ODBC DSN then the statement will look something like this:
from sqlalchemy import create_engine
# ...
engine = create_engine("mssql+pyodbc://scott:tiger#some_dsn")
I have a database which is not JDBC enabled where I am able to fire a query and get the result using an input stream. I want to access this using a map reduce program.
For a JDBC enabled database there are "DBInputFormat.java" and "DBConfiguration.java" files in Hadoop which take care of accessing the database and getting the result in a user-defined class which extends DBWritable and Writable interfaces.
Is there a way in which I can access the above mentioned non-JDBC database in the same fashion ?
I am not sure if your DB supports ODBC. If so you can try jdbc:odbc driver with DBInputFormat. I am not sure if this works as never tried this.
Another option which should be your final option is to implement your own FileInputFormat
Is it possible to get the data scraped from websites using Scrapy, and saving that data in an Microsoft SQL Server Database?
If Yes, are there any examples of this being done? Is it mainly a Python issue? i.e. if I find some code of Python saving to an SQL Server database, then Scrapy can do same?
Yes, but you'd have to write the code to do it yourself since scrapy does not provide an item pipeline that writes to a database.
Have a read of the Item Pipeline page from the scrapy documentation which describes the process in more detail (here's a JSONWriterPipeline as an example). Basically, find some code that writes to a SQL Server database (using something like PyODBC) and you should be able to adapt that to create a custom item pipeline that outputs items directly to a SQL Server database.
Super late and completely self promotion here, but I think this could help someone. I just wrote a little scrapy extension to save scraped items to a database. scrapy-sqlitem
It is super easy to use.
pip install scrapy_sqlitem
Define Scrapy Items using SqlAlchemy Tables
from scrapy_sqlitem import SqlItem
class MyItem(SqlItem):
sqlmodel = Table('mytable', metadata
Column('id', Integer, primary_key=True),
Column('name', String, nullable=False))
Add the following pipeline
from sqlalchemy import create_engine
class CommitSqlPipeline(object):
def __init__(self):
self.engine = create_engine("sqlite:///")
def process_item(self, item, spider):
item.commit_item(engine=self.engine)
Don't forget to add the pipeline to settings file and create the database tables if they do not exist.
http://doc.scrapy.org/en/1.0/topics/item-pipeline.html#activating-an-item-pipeline-component
http://docs.sqlalchemy.org/en/rel_1_1/core/tutorial.html#define-and-create-tables