Correctly specify dbtable in SqlContext [duplicate] - sql-server

I think I am missing something but can't figure what.
I want to load data using SQLContext and JDBC using particular sql statement
like
select top 1000 text from table1 with (nolock)
where threadid in (
select distinct id from table2 with (nolock)
where flag=2 and date >= '1/1/2015' and userid in (1, 2, 3)
)
Which method of SQLContext should I use? Examples I saw always specify table name and lower and upper margin.
Thanks in advance.

You should pass a valid subquery as a dbtable argument. For example in Scala:
val query = """(SELECT TOP 1000
-- and the rest of your query
-- ...
) AS tmp -- alias is mandatory*"""
val url: String = ???
val jdbcDF = sqlContext.read.format("jdbc")
.options(Map("url" -> url, "dbtable" -> query))
.load()
* Hive Language Manual SubQueries: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries

val url = "jdbc:postgresql://localhost/scala_db?user=scala_user"
Class.forName(driver)
val connection = DriverManager.getConnection(url)
val df2 = spark.read
.format("jdbc")
.option("url", url)
.option("dbtable", "(select id,last_name from emps) e")
.option("user", "scala_user")
.load()
The key is "(select id,last_name from emps) e", here you can write a subquery in place of table_name.

Related

Using PySpark in a Microsoft SQL Server using JDBC for connection

I'm using PySpark in a Microsoft SQL Server using JDBC for connection.
query = """(
WITH table_1 AS (
SELECT
code_1,
a
FROM my_database_table_1
),
table_2 AS (
SELECT
code_2,
b
FROM my_database_table_2
)
SELECT
table_1.code_1 AS tb1_code_1,
table_2.code_2 AS tb2_code_2
FROM table_1
INNER JOIN table_2
ON table_1.code_1 = table_2.APRCH_CODIGO
) AS _
"""
df_python = spark.read.jdbc(url=jdbc_url, table=query, properties=properties)
I'm getting the following error:
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near the keyword 'WITH'.
Does anyone know why I'm getting such error?
Edit 1:
I replaced == by = in the INNER JOIN clause.
I didn't include a , after the closing parenthesis in table_2, as it's not necessary.
( and ) as _ is required by JDBC.
To simplify, this is another query that returns the same error as the query above:
query = """(
WITH table_1 (code_1)
AS
(
SELECT code_1
FROM my_database_table_1
)
SELECT code_1
FROM table_1
) as _
"""
And this is a query that works:
query = """(
SELECT code_1
FROM my_database_table_1
) as _
"""
I'm starting to think that the ( and ) as _ clauses, required by JDBC, may be causing problems with the WITH clause.
Edit 2:
Well, apparently CTEs simply don't work with this driver, so I'll have to find another way out without using WITH.
In sql the equality operator is = and not ==, you put == there in the JOIN, maybe this is the error
In sql the equality operator is = and not ==, you put == there in the JOIN, maybe this is the error.
try with this code
from pyspark.sql import SparkSession
import os
driver = "/home/romerito/Documents/apache-spark-3.1.2/spark-3.1.2-bin-hadoop3.2/jars/mssql-jdbc-9.2.1.jre11.jar"
spark = (
SparkSession
.builder
.appName("load-sample-jdbc")
.master("local[2]")
.config("spark.driver.extraClassPath", driver)
.getOrCreate()
)
credentials = (
readCredentials(os.getcwd()+f"/others/credentials-mssql.txt")
)
server = credentials['server']
port = credentials['port']
database = credentials['database']
user = credentials['user']
password = credentials['password']
connection = f"jdbc:sqlserver://{server}:{port};databaseName={database}"
query = """(
WITH table_1 AS (
SELECT
code_1,
a
FROM my_database_table_1
),
table_2 AS (
SELECT
code_2,
b
FROM my_database_table_2
),
SELECT
table_1.code_1 AS tb1_code_1,
table_2.code_2 AS tb2_code_2
FROM table_1
INNER JOIN table_2
ON table_1.code_1 = table_2.APRCH_CODIGO
) AS _
"""
query = spark.read \
.format('jdbc') \
.option('url', f'{connection}') \
.option('user', f'{user}') \
.option('password', f'{password}') \
.option('dbtable', f'{query}')
query.show()

SQL: exclude certain criteria from select

Trying to exclude a set of values that meet the criteria from the query, but the query returns nothing.
select *
from rpt_StockInventorySummary a
where a.[DepartmentId] ='P'
and not exists (
select *
from rpt_StockInventorySummary b
where b.Manufacturer = 'warrington'
and b.LowestGroupDescription = 'Boots, Leather, 14 Inch, Pro'
and b.Instock = 0
and b.barcode = a.barcode
)
order by a.SortOrder
Edit
I think adding "and b.barcode = a.barcode" at the end of the query in the NOT EXISTS was what was missing.
What was missing from the query was this:
and b.barcode = a.barcode
Adding this to the query inside of the not exists did the trick.

export multiple table with ssis

I have 2 tables are like this (Table1 and Table2)
ID NAME No Addrress Notes
------------ ----------------------------
1 John 111 USA Done
2 Steve 222 Brazil Done
Now I want to create a SSIS package which will create a csv file like:
Table1;ID;NAME
Table2;No;Addrress;Notes
"Detail1";"1";"John";"2";"Steve"
"Detail2";"111";"USA";"Done";"222";"Brazil";"Done"
Can we achieve the same output? I have searched on google but haven't found any solution.
Please help ....
You can create a script task to generate a CSV file for you which can handle your issue:
You can try this:
SqlConnection sqlCon = new SqlConnection("Server=localhost;Initial Catalog=LegOgSpass;Integrated Security=SSPI;Application Name=SQLNCLI11.1");
sqlCon.Open();
SqlCommand sqlCmd = new SqlCommand(#"Select ID,Name from dbo.Table1", sqlCon);
SqlDataReader reader = sqlCmd.ExecuteReader();
string fullpath = #"C:\Users\thoje\Desktop\stack\New folder\table1.csv";
StreamWriter sw = new StreamWriter(fullpath);
object[] output = new object[reader.FieldCount];
for (int i = 0; i < reader.FieldCount; i++)
output[i] = reader.GetName(i);
sw.WriteLine(#"Table1;"+string.Join(";", output));
List<object> values = new List<object>();
while (reader.Read())
{
reader.GetValues(output);
values.Add($"\"{output[0]}\"");
values.Add($"\"{output[1]}\"");
}
sw.WriteLine(#"""Detail1"";"+ string.Join(";", values));
sw.Flush();
sw.Close();
reader.Close();
sqlCon.Close();
Dts.TaskResult = (int)ScriptResults.Success;
Result:
You really should put in your question what you have tried so far, it helps out a lot and makes it more fun to help people.
The two ways I can think of in t-sql to solve this still need you to specify in your code what your column names are. You can get around this with using dynamic SQL and creating a view that spits out data in the same fashion for all the tables you need.
If SSIS is more your thing you could use the dynamic approach with BIML.
--Option 1 (SQL Server 2008 R2 and later)
with Table1 AS (
SELECT * FROM (values(1,'John'),(2,'Steve')) AS x(ID,NAME)
)
,Table2 AS (
SELECT * FROM (values(111,'USA','Done'),(222,'Brazil','Done'))AS y(No,Addrress,Notes)
)
SELECT '"Detail1"'+ CAST(foo as VARCHAR(4000))
FROM (
SELECT ';"' + CAST(ID AS VARCHAR(4))+'";"' + [NAME] +'"' FROM Table1 FOR XML PATH('')
) AS bar(foo)
UNION ALL
SELECT '"Detail2"'+ CAST(foo as VARCHAR(4000))
FROM (
SELECT ';"' + CAST([No] AS VARCHAR(4))+'";"' + [Addrress] +'";"' + [Notes] +'"' FROM Table2 FOR XML PATH('')
) AS bar(foo)
--Option 2 (SQL Server 2017 and later)
with Table1 AS (
SELECT * FROM (values(1,'John'),(2,'Steve')) AS x(ID,NAME)
)
,Table2 AS (
SELECT * FROM (values(111,'USA','Done'),(222,'Brazil','Done'))AS y(No,Addrress,Notes)
)
SELECT '"Detail1";' + STRING_AGG('"'+CAST(ID AS varchar(4))+'";"'+[NAME]+'"',';') FROM Table1
UNION ALL
SELECT '"Detail2";' + STRING_AGG('"'+CAST([No] AS varchar(4))+'";"'+[Addrress]+'";'+'"'+[Notes]+'"',';') FROM Table2
;

SQL not exists returning query values

I'm having some trouble with a query to check differences between 2 identical tables with different rows.
This is the query
SELECT *
FROM [PROD01].[myDefDB].[forward].[fv] as DB01
WHERE TargetDate = '20150429' and
NOT EXISTS (SELECT *
FROM [PROD02].[myDefDB].[forward].[fv] as DB02
WHERE DB02.TargetDate = '20150429' and
DB02.Id_Fw = DB01.Id_Fw and
DB02.Id_Bl = DB01.Id_Bl and
DB02.Id_Pt = DB01.Id_Pt and
DB02.TargetDate = DB01.TargetDate and
DB02.StartDate = DB01.EndDate and
DB02.EndDate = DB01.EndDate and
DB02.[Version] = DB01.[Version]
)
Consider that [PROD02].[myDefDB].[forward].[fv] is a subset of [PROD01].[myDefDB].[forward].[fv], that performing a SELECT count(*) on both tables for the TargetDate = '20150429' returns me 2367 and 4103, so I expect to get 1736 from that query but I get more than 2000.
I considered all PKs in the WHERE clause. What am I missing?
You can use EXCEPT like this.
SELECT Id_Fw,Id_Bland,Id_Pt,TargetDate,StartDate,EndDate,[Version]
FROM [PROD01].[myDefDB].[forward].[fv] as DB01
WHERE TargetDate = '20150429'
EXCEPT
SELECT Id_Fw,Id_Bl,Id_Pt,TargetDate,StartDate,EndDate,[Version]
FROM [PROD02].[myDefDB].[forward].[fv] as DB02
WHERE TargetDate = '20150429'
This will get you all the rows in PROD01 which are not in PROD02

SQL Server - IN clause with multiple fields

Is it possible to include in a IN clause multiple fields? Something like the following:
select * from user
where code, userType in ( select code, userType from userType )
I'm using ms sql server 2008
I know this can be achieved with joins and exists, I just wanted to know if it could just be done with the IN clause.
Not the way you have posted. You can only return a single field or type for IN to work.
From MSDN (IN):
test_expression [ NOT ] IN
( subquery | expression [ ,...n ]
)
subquery - Is a subquery that has a result set of one column.
This column must have the same data type as test_expression.
expression[ ,... n ] - Is a list of expressions to test for a match.
All expressions must be of the same type as
test_expression.
Instead of IN, you could use a JOIN using the two fields:
SELECT U.*
FROM user U
INNER JOIN userType UT
ON U.code = UT.code
AND U.userType = UT.userType
You could use a form like this:
select * from user u
where exists (select 1 from userType ut
where u.code = ut.code
and u.userType = ut.userType)
Only with something horrific, like
select * from user
where (code + userType) in ( select code + userType from userType )
Then you have to manage nulls and concatenating numbers rather than adding them, and casting, and a code of 12 and a usertype of 3 vs a code of 1 and a usertype of 23, and...
..which means you start heading into perhaps something like:
--if your SQLS supports CONCAT
select * from user
where CONCAT(code, CHAR(9), userType) in ( select CONCAT(code, CHAR(9), userType) from ... )
--if no concat
select * from user
where COALESCE(code, 'no code') + CHAR(9) + userType in (
select COALESCE(code, 'no code') + CHAR(9) + userType from ...
)
CONCAT will do a string concatenation of most things, and won't zip the whole output to NULL if one element is NULL. If you don't have CONCAT then you'll string concat using + but anything that might be null will need a COALESCE/ISNULL around it.. And in either case you'll need something like CHAR(9) (a tab) between the fields to prevent them mixing.. The thing between the fields should be southing that is not naturally present in the data..
Tis a shame SQLS doesn't support this, that Oracle does:
where (code, userType) in ( select code, userType from userType )
but it's probably not worth switching DB for; I'd use EXISTS or a JOIN to achieve a multi column filter
So there ya go: a solution that doesn't use joins or exists.. and a bunch of reasons why you shouldn't use it ;)
How about this instead:
SELECT user.* FROM user JOIN userType on user.code = userType.code AND user.userType = userType.userType
You can either use joins
SELECT * FROM user U
INNER JOIN userType UT on U.code = UT.code
AND U.userType = UT.userType
I had to do something very similar but EXISTS didn't work in my situation. Here is what worked for me:
UPDATE tempFinalTbl
SET BillStatus = 'Non-Compliant'
WHERE ENTCustomerNo IN ( SELECT DISTINCT CustNmbr
FROM tempDetailTbl dtl
WHERE dtl.[Billing Status] = 'NEEDS FURTHER REVIEW'
AND dtl.CustNmbr = ENTCustomerNo
AND dtl.[Service] = [Service])
AND [Service] IN ( SELECT DISTINCT [Service]
FROM tempDetailTbl dtl
WHERE dtl.[Billing Status] = 'NEEDS FURTHER REVIEW'
AND dtl.CustNmbr = ENTCustomerNo
AND dtl.[Service] = [Service])
EDIT: Now that I look, this is very close to #v1v3kn's answer
I don't think that query is quite portable,it would be safer to use something like
select * from user
where code in ( select code from userType ) and userType in (select userType from userType)
select * from user
where (code, userType) in ( select code, userType from userType );

Resources