Azure Synapse dedicated pools data pulling in jupyter notebbok

Azure Synapse dedicated pools data pulling in jupyter notebbok - sql-server

I have data saved in one of view created in Azure synapse dedicated pools? I need to access this data into the jupyter notebook for further processing ? would there any way to access/extract the data from dedciated pools in jupyter notebook written in python.

The Azure Synapse Dedicated SQL Pool Connector for Apache Spark in Azure Synapse Analytics enables efficient transfer of large data sets between the Apache Spark runtime and the Dedicated SQL pool. The connector is shipped as a default library with Azure Synapse Workspace.
Sample code -
# Add required imports
import com.microsoft.spark.sqlanalytics
from com.microsoft.spark.sqlanalytics.Constants import Constants
from pyspark.sql.functions import col
# Read from existing internal table
dfToReadFromTable = (spark.read
# If `Constants.SERVER` is not provided, the `<database_name>` from the three-part table name argument
# to `synapsesql` method is used to infer the Synapse Dedicated SQL End Point.
.option(Constants.SERVER, "<sql-server-name>.sql.azuresynapse.net")
# Defaults to storage path defined in the runtime configurations
.option(Constants.TEMP_FOLDER, "abfss://<container_name>#<storage_account_name>.dfs.core.windows.net/<some_base_path_for_temporary_staging_folders>")
# Three-part table name from where data will be read.
.synapsesql("<database_name>.<schema_name>.<table_name>")
# Column-pruning i.e., query select column values.
.select("<some_column_1>", "<some_column_5>", "<some_column_n>")
# Push-down filter criteria that gets translated to SQL Push-down Predicates.
.filter(col("Title").contains("E"))
# Fetch a sample of 10 records
.limit(10))
# Show contents of the dataframe
dfToReadFromTable.show()
You can refer this link for more information

Related

Intellij embedded H2 database tables do not appear

I'm creating a Spring Boot application and I'm using Intellij's embedded h2 database.
I have added the following lines in my application.properties file:
spring.datasource.url=jdbc:h2:~/testdb;MV_STORE=false;AUTO_SERVER=TRUE
This is my data source configuration
Although the connection is successful and I can query the database using Intellij's query console, the tables do not appear in the Database tab.
Succeeded
DBMS: H2 (ver. 2.1.210 (2022-01-17))
Case sensitivity: plain=upper, delimited=exact
Driver: H2 JDBC Driver (ver. 2.1.210 (2022-01-17), JDBC4.2)
Ping: 16 ms
When I refresh the connection or go to the schemas tab of the data source configuration, I get the following error:
[42S02][42102] org.h2.jdbc.JdbcSQLSyntaxErrorException: Table "INFORMATION_SCHEMA_CATALOG_NAME" not found; SQL statement: select CATALOG_NAME from INFORMATION_SCHEMA.INFORMATION_SCHEMA_CATALOG_NAME [42102-210].

By going to the advanced tab of the data source and clicking on expert options, we are presented with a checkbox labeled "Introspect using JDBC metadata"
By checking that box, the tables successfully appear in the Database tab
Regarding why this works, this is taken from the official documentation:
https://www.jetbrains.com/help/datagrip/data-sources-and-drivers-dialog.html
Introspect using JDBC metadata
Switch to the JDBC-based introspector. Available for all the databases.
To retrieve information about database objects (DB metadata), DataGrip uses the following introspectors:
A native introspector (might be unavailable for certain DBMS). The native introspector uses DBMS-specific tables and views as a source of metadata. It can retrieve DBMS-specific details and produce a more precise picture of database objects.
A JDBC-based introspector (available for all the DBMS). The JDBC-based introspector uses the metadata provided by the JDBC driver. It can retrieve only standard information about database objects and their properties.
Consider using the JDBC-based intorspector when the native introspector fails or is not available.
The native introspector can fail, when your database server version is older than the minimum version supported by DataGrip.
You can try to switch to the JDBC-based introspector to fix problems with retrieving the database structure information from your database. For example, when the schemas that exist in your database or database objects below the schema level are not shown in the Database tool window.

Error when copying CSV files from Windows directory into SQL Server DB by using Apache NiFi

I am trying to copy CSV files from my local directory into a SQL Server database running in my local machine by using Apache NiFi.
I am new to the tool and I have been spending few days googling and building my flow. I managed to connect to source and destination but still I am not able to populate the database since I get the following error: "None of the fields in the record map to the columns defined by the tablename table."
I have been struggling with this for a while and I have not been able to find a solution in the Web. Any hint would be highly appreciated.
Here are further details.
I have built a simple flow using GetFile and PutDatabaseRecord processors 1.
My input is a simple table with 8 columns 2.
My configurations for GetCSV process are here (I have added the input directory and left the rest as default): 3
The configuration for PutDatabaseRecord process is here (I have referred to the CSVReader and DBCPConnectionPool controller services, used the MS SQL 2012+ database type (I have 2019 version), configured INSERT statement type, inserted the schema and correct table name and left everything else as default): 4
The CSVReader configuration looks as shown here (Schema Access Strategy = Use String Fields From Header; CSV Format = Microsoft Excel): 5
And this is the configuration of the DBCPConnectionPool (I have added the correct URL, DB driver class name, driver location, DB user and password): 6
Finally, this is a snapshot of the description of the table I have created in the database to host the content: 7
Many thanks in advance!

The warning "None of the fields in the record map to the columns defined by the tablename table." is also obtained when the processor is not able to find the table and this can happen also when the table name is correctly configured in PutDatabaseRecord but there is some issue with user access rights (which ended up to be the actual cause of my error ...).

How to get snowflake host and port number to create connection in SAP Analytics Cloud?

The SAP Analytics Cloud's Snowflake Connector needs these details for setting up a Snowflake connection
[
How can I get these details from Snowflake?
I'm trying to follow this guide

It appears that you're attempting to configure SAP Analytics Cloud's Snowflake Connector.
The host and port of your Snowflake account (also known as its deployment URL) can be taken from the URL you use to connect to Snowflake's Web UI. Here's an example:
For the above URL, the input in the Server field of the form will be mzf0194.us-west-2.snowflakecomputing.com:443 (the 443 port number is the default HTTPS port that Snowflake serves on).
Or alternatively, if you have access to any other Snowflake connected application (such as SnowSQL, etc.) that lets you run a SQL query, run the following to extract it:
select t.value:host || ':443' snowflake
from table(flatten(parse_json(system$whitelist()))) t
where t.value:type = 'SNOWFLAKE_DEPLOYMENT';
An example output that carries the host/port:
+---------------------------------------------+
| SNOWFLAKE |
|---------------------------------------------|
| p7b41m.eu-west-1.snowflakecomputing.com:443 |
+---------------------------------------------+
If you're uncertain about what these all mean, you'll need to speak to other, current Snowflake users or administrators in your organization.

How to configure Snowflake to send logs to Azure Log Analytics

Expert,
How can we configure Azure/Snowflake to access all snowflake logs using Azure Log Analytics and use kusto and alert to create alert?
Rana

it depends what data you want to unload from Snowflake to log files, as there is lots of information available in account_usage and information schema. But it's easy enough to write that data out to files on Azure storage, for ingestion and use in Azure Log Analytics. Here's an example - pushing errors recorded in the login_history view to JSON files:
copy into #~/json_error_log.json from
(select object_construct(*) from (
select event_timestamp, event_type,user_name,reported_client_type,error_code,error_message
from table(information_schema.login_history(dateadd('days',-7,current_timestamp()),current_timestamp()))
where error_code is not null
order by event_timestamp))
file_format = (type ='JSON');
And you can find more information here:
https://docs.snowflake.com/en/user-guide/data-unload-azure.html
Can't comment on the A.L.A tool operations but hopefully this gives you some idea of what to do on the Snowflake side.

Scrapy / Python and SQL Server

Is it possible to get the data scraped from websites using Scrapy, and saving that data in an Microsoft SQL Server Database?
If Yes, are there any examples of this being done? Is it mainly a Python issue? i.e. if I find some code of Python saving to an SQL Server database, then Scrapy can do same?

Yes, but you'd have to write the code to do it yourself since scrapy does not provide an item pipeline that writes to a database.
Have a read of the Item Pipeline page from the scrapy documentation which describes the process in more detail (here's a JSONWriterPipeline as an example). Basically, find some code that writes to a SQL Server database (using something like PyODBC) and you should be able to adapt that to create a custom item pipeline that outputs items directly to a SQL Server database.

Super late and completely self promotion here, but I think this could help someone. I just wrote a little scrapy extension to save scraped items to a database. scrapy-sqlitem
It is super easy to use.
pip install scrapy_sqlitem
Define Scrapy Items using SqlAlchemy Tables
from scrapy_sqlitem import SqlItem
class MyItem(SqlItem):
sqlmodel = Table('mytable', metadata
Column('id', Integer, primary_key=True),
Column('name', String, nullable=False))
Add the following pipeline
from sqlalchemy import create_engine
class CommitSqlPipeline(object):
def __init__(self):
self.engine = create_engine("sqlite:///")
def process_item(self, item, spider):
item.commit_item(engine=self.engine)
Don't forget to add the pipeline to settings file and create the database tables if they do not exist.
http://doc.scrapy.org/en/1.0/topics/item-pipeline.html#activating-an-item-pipeline-component
http://docs.sqlalchemy.org/en/rel_1_1/core/tutorial.html#define-and-create-tables