FSharp.Data.SqlProvider is slow - sql-server

I have simple DB with 4 tables. Table Results has 18 columns. 3 of them are foreign keys. I am trying to get number of all results (about 800k) with this code:
#I #"..\packages\SQLProvider.1.1.3\lib"
#r "FSharp.Data.SqlProvider.dll"
open FSharp.Data.Sql
let [<Literal>] ConnectionStringmdf = #"Data Source=(localdb)\MSSQLLocalDB;AttachDbFilename=C:\Users\Me\Desktop\myDb.mdf;Integrated Security=True;Connect Timeout=10"
type Sqlmdf = SqlDataProvider<
ConnectionString = ConnectionStringmdf,
DatabaseVendor = Common.DatabaseProviderTypes.MSSQLSERVER,
IndividualsAmount = 1000,
UseOptionTypes = true,
CaseSensitivityChange = Common.CaseSensitivityChange.ORIGINAL
>
let dbm = Sqlmdf.GetDataContext()
printfn "Results count:\t %i" (dbm.Dbo.Results |> Seq.length )
It takes about 40 seconds to get count of records in one table.
Why is it so slow? What am I doing wrong?

The types returned by SqlDataProvider implement IQueryable which means you can write a query expression or use Queryable.Count
open System.Linq
dbm.Dbo.Results |> Queryable.Count
or
query { for it in dbm.Dbo.Results do
count
}

You should just execute the query directly on the table, and have the server return the result to you. For example, I get an 8M row count instantenously:
type dbSchema = SqlDataConnection<connectionString1>
let dbx = dbSchema.GetDataContext()
dbx.DataContext.ObjectTrackingEnabled <- false
dbx.DataContext.CommandTimeout <- 60
let table1 = dbx.MyTable
table1.Count()
//val it : int = 7189765
You could also wrap it into a query.
Here's a query version, that (unless sqlprovider doesn't do count) should work on the other TP as well. Again, speed is almost instantenous.
query { for row in table1 do
select row
count
}
I tested the same with SqlDataProvider with similar results. Open the System.Linq namespace to access the .Count() extension function if necessary.

Related

Stored procedure - get anticipated columns before fully executing statement?

I'm working through a stored procedure and wondering if there's a way to retrieve the anticipated result column list from a sql statement before fully executing.
Scenarios:
dynamic SQL
a UDF that might vary the columns outside of our control
EX:
//inbound parameter
SET QUERY_DEFINITION_ID = 12345;
//Initial statement pulls query text from bank of queries
var sqlText = getQueryFromQueryBank(QUERY_DEFINITION_ID);
//now we run our query
var cmd = {sqlText: sqlText };
stmt = snowflake.createStatement(cmd);
What I'd like to be able to do is say "right - before you run this, give me the anticipated column list" so I can compare it to what's expected.
EX:
Expected: [col1, col2, col3, col4]
Got: [col1]
Result: Oops. Don't run.
Rationale here is that I want to short-circuit the execution if something is missing - before it potentially runs for a while. I can validate all of this after the fact, but it would be really helpful to stop early.
Any ideas very much appreciated!
This sample SP code shows how to get a list of columns that a query will project into the result before you run the query. It should only be used for large, long running queries because it will take a few seconds to get the column list.
There are a couple of caveats. 1) It will only return the names of the columns. It won't tell you how they were built, that is, whether they're aliased, direct from a table, calculated, etc. 2) The example query I used is straight from the Snowflake documentation here https://docs.snowflake.com/en/user-guide/sample-data-tpcds.html#functional-query-definition. For convenience, I minimized the query to a single line. The output of the columns includes object qualifiers in addition to the column names, so V1.I_CATEGORY, V1.D_YEAR, V1.D_MOY, etc. If you don't want them to make it easier to compare names, you can strip off the qualifiers using the JavaScript split function on the dot and take index 1 of the resulting array.
create or replace procedure EXPLAIN_BEFORE_RUNNING()
returns string
language javascript
execute as caller
as
$$
// Set the context for the session to the TPC-H sample data:
executeNonQuery("use schema snowflake_sample_data.tpcds_sf10tcl;");
// Here's a complex query from the Snowflake docs (minimized to one line for convienience):
var sql = `with v1 as( select i_category, i_brand, cc_name, d_year, d_moy, sum(cs_sales_price) sum_sales, avg(sum(cs_sales_price)) over(partition by i_category, i_brand, cc_name, d_year) avg_monthly_sales, rank() over (partition by i_category, i_brand, cc_name order by d_year, d_moy) rn from item, catalog_sales, date_dim, call_center where cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and cc_call_center_sk= cs_call_center_sk and ( d_year = 1999 or ( d_year = 1999-1 and d_moy =12) or ( d_year = 1999+1 and d_moy =1)) group by i_category, i_brand, cc_name , d_year, d_moy), v2 as( select v1.i_category ,v1.d_year, v1.d_moy ,v1.avg_monthly_sales ,v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum from v1, v1 v1_lag, v1 v1_lead where v1.i_category = v1_lag.i_category and v1.i_category = v1_lead.i_category and v1.i_brand = v1_lag.i_brand and v1.i_brand = v1_lead.i_brand and v1.cc_name = v1_lag.cc_name and v1.cc_name = v1_lead.cc_name and v1.rn = v1_lag.rn + 1 and v1.rn = v1_lead.rn - 1) select * from v2 where d_year = 1999 and avg_monthly_sales > 0 and case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 order by sum_sales - avg_monthly_sales, 3 limit 100;`;
// Before actually running the query, generate an explain plan.
executeNonQuery("explain " + sql);
// Now read the column list from the explain plan from the result set.
var columnList = executeSingleValueQuery("COLUMN_LIST", `select "expressions" as COLUMN_LIST from table(result_scan(last_query_id())) where "operation" = 'Result';`);
// For now, just exit with the column list as the output...
return columnList;
// Your code here...
// Helper functions:
function executeNonQuery(queryString) {
var out = '';
cmd = {sqlText: queryString};
stmt = snowflake.createStatement(cmd);
var rs;
rs = stmt.execute();
}
function executeSingleValueQuery(columnName, queryString) {
var out;
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs;
try{
rs = stmt.execute();
rs.next();
return rs.getColumnValue(columnName);
}
catch(err) {
if (err.message.substring(0, 18) == "ResultSet is empty"){
throw "ERROR: No rows returned in query.";
} else {
throw "ERROR: " + err.message.replace(/\n/g, " ");
}
}
return out;
}
$$;
call Explain_Before_Running();

How to Improve the ADO Lookup Speed?

I write a C++ application via Visual Studio 2008 + ADO(not ADO.net). Which will do the following tasks one by one:
Create a table in SQL Server database, as follows:
CREATE TABLE MyTable
(
[S] bigint,
[L] bigint,
[T] tinyint,
[I1] int,
[I2] smallint,
[P] bigint,
[PP] bigint,
[NP] bigint,
[D] bit,
[U] bit
);
Insert 5,030,242 records via BULK INSERT
Create an index on the table:
CREATE Index [MyIndex] ON MyTable ([P]);
Start a function which will lookup for 65,000,000 times. Each lookup using the following query:
SELECT [S], [L]
FROM MyTable
WHERE [P] = ?
Each time the query will either return nothing, or return one row. If getting one row with the [S] and [L], I will convert [S] to a file pointer and then read data from offset specified by [L].
Step 4 takes a lot of time. So I try to profile it and find out the lookup query takes the most of the time. Each lookup will take about 0.01458 second.
I try to improve the performance by doing the following tasks:
Use parametered ADO query. See step 4
Select only the required columns. Originally I use "Select *" for step 4, now I use Select [S], [L] instead. This improves performance by about 1.5%.
Tried both clustered and non-clustered index for [P]. It seems that using non-clustered index will be a little better.
Are there any other spaces to improve the lookup performance?
Note: [P] is unique in the table.
Thank you very much.
You need to batch the work and perform one query that returns many rows, instead of many queries each returning only one row (and incurring a separate round-trip to the database).
The way to do it in SQL Server is to rewrite the query to use a table-valued parameter (TVP), and pass all the search criteria (denoted as ? in your question) together in one go.
First we need to declare the type that the TVP will use:
CREATE TYPE MyTableSearch AS TABLE (
P bigint NOT NULL
);
And then the new query will be pretty simple:
SELECT
S,
L
FROM
#input I
JOIN MyTable
ON I.P = MyTable.P;
The main complication is on the client side, in how to bind the TVP to the query. Unfortunately, I'm not familiar with ADO - for what its worth, this is how it would be done under ADO.NET and C#:
static IEnumerable<(long S, long L)> Find(
SqlConnection conn,
SqlTransaction tran,
IEnumerable<long> input
) {
const string sql = #"
SELECT
S,
L
FROM
#input I
JOIN MyTable
ON I.P = MyTable.P
";
using (var cmd = new SqlCommand(sql, conn, tran)) {
var record = new SqlDataRecord(new SqlMetaData("P", SqlDbType.BigInt));
var param = new SqlParameter("input", SqlDbType.Structured) {
Direction = ParameterDirection.Input,
TypeName = "MyTableSearch",
Value = input.Select(
p => {
record.SetValue(0, p);
return record;
}
)
};
cmd.Parameters.Add(param);
using (var reader = cmd.ExecuteReader())
while (reader.Read())
yield return (reader.GetInt64(0), reader.GetInt64(1));
}
}
Note that we reuse the same SqlDataRecord for all input rows, which minimizes allocations. This is documented behavior, and it works because ADO.NET streams TVPs.
Note: [P] is unique in the table.
Then you should make the index on P unique too - for correctness and to avoid wasting space on the uniquifier.

snowflake jdbc paramter returning VARCHAR for all datatypes

Snowflake JDBC driver is reporting parameter metadata for all the datatypes as VARCHAR. Is there any way to overcome this problem?
DDL:-
CREATE TABLE INTTABLE(INTCOL INTEGER)
Below is the output from Snowflake ODBC Driver
SQLPrepare:
In:StatementHandle = 0x00000000021B1B50, StatementText = "INSERT INTO INTTABLE(INTCOL) VALUES(?)", TextLength = 42
Return: SQL_SUCCESS=0
SQLDescribeParam:
In:StatementHandle = 0x00000000021B1B50, ParameterNumber = 1, DataTypePtr = 0x00000000001294D0, ParameterSizePtr = 0x0000000000126950,DecimalDigits =0x0000000000126980, NullablePtr = 0x00000000001269B0
Return: SQL_SUCCESS=0
Out:*DataTypePtr = SQL_VARCHAR=12, *ParameterSizePtr = 16777216, *DecimalDigits = 0, *NullablePtr = SQL_NULLABLE=1
Below is Output with Snowflake JDBC Driver.
PreparedStatement ps = c.prepareStatement("INSERT INTO INTTABLE(INTCOL) VALUES(?)");
ParameterMetaData psmd = ps.getParameterMetaData();
for(int i=1 ;i<=psmd.getParameterCount(); i++) {
System.out.println(psmd.getParameterType(i)+ " " + psmd.getParameterTypeName(i));
}
Output:-
12 text
Thank you for adding more information to your thread. I still may be doing a little guesswork though.
If you are trying to change the table values type from Varchar, and there are no values in it, you can drop the table, then re-recreate it.
If you want to ALTER what is already in the table try altering the table first: Manual Reference
There is also the CREATE OR REPLACE TABLE(col , col 2 ) that takes care of both.
Is this what you are looking for?

dbHasCompleted always returns TRUE

I'm using R to do a statistical analysis on a SQL Server 2008 R2 database. My database client (aka driver) is JDBC and thereby I'm using RJDBC package.
My query is pretty simple and I'm sure that query would return a lot of rows (about 2 million rows).
SELECT * FROM [maindb].[dbo].[users]
My R script is as follows.
library(RJDBC);
javaPackageName <- "com.microsoft.sqlserver.jdbc.SQLServerDriver";
clientJarFile <- "/home/abforce/mystuff/sqljdbc_3.0/enu/sqljdbc4.jar";
driver <- JDBC(javaPackageName, clientJarFile);
conn <- dbConnect(driver, "jdbc:sqlserver://192.168.56.101", "username", "password");
query <- "SELECT * FROM [maindb].[dbo].[users]";
result <- dbSendQuery(conn, query);
dbHasCompleted(result)
In the codes above, the last line always returns TRUE. What could be wrong here?
The fact of function dbHasCompleted always returning TRUE seems to be a known issue as I've found other places in the Internet where people were struggling with this issue.
So, I came with a workaround. Instead of function dbHasCompleted, we can use conditional statement nrow(result) == 0.
For example:
result <- dbSendQuery(conn, query);
repeat {
chunk <- dbFetch(result, n = 10);
if(nrow(chunk) == 0){
break;
}
# Do something with 'chunk';
}
dbClearResult(result);

Multiple result sets and return value handling while calling sql server stored procedure from Groovy

I have MS SQL Server stored procedure (SQL Server 2012) which returns:
return value describing procedure execution result in general (successfull or not) with return #RetCode statement (RetCode is int type)
one result set (several records with 5 fields each)
another result set (several records with 3 fields each)
I calling this procedure from my Groovy (and Java) code using Java's CallableStatement object and I cannot find right way to handle all three outputs.
My last attempt is
CallableStatement proc = connection.prepareCall("{ ? = call Procedure_Name($param_1, $param_2)}")
proc.registerOutParameter(1, Types.INTEGER)
boolean result = proc.execute()
int returnValue = proc.getInt(1)
println(returnValue)
while(result){
ResultSet rs = proc.getResultSet()
println("rs")
result = proc.getMoreResults()
}
And now I get exception:
Output parameters have not yet been processed. Call getMoreResults()
I tried several approaches for some hours but didn't find correct one. Some others produced another exceptions.
Could you please help me with the issue?
Thanks In Advance!
Update (for Tim):
I see rc while I launched code:
Connection connection = dbGetter().connection
CallableStatement proc = connection.prepareCall("{ ? = call Procedure_Name($param_1, $param_2)}")
boolean result = proc.execute()
while(result){
ResultSet rs = proc.getResultSet()
println(rs)
result = proc.getMoreResults()
}
I see rc as object: net.sourceforge.jtds.jdbc.JtdsResultSet#1937bc8

Resources