sqlQuery failing with memory error when row_at_time too large - sql-server

I am sending this query to a sql server in R using RODBC::sqlQuery
MERGE "mytable" AS Target USING ( VALUES ('myname','POLYGON ((148.0000000000000000 -20.0000000000000000, 148.0000000000000000 -20.0000000000000000, 148.0000000000000000 -20.0000000000000000, 148.0000000000000000 -20.0000000000000000, 148.0000000000000000 -20.0000000000000000))')) AS Source ("name","polygon")
ON (Target."name" = Source."name")
WHEN MATCHED THEN
UPDATE SET Target."polygon" = Source."polygon"
WHEN NOT MATCHED BY TARGET THEN
INSERT ("name","polygon")
VALUES (Source."name", Source."polygon")
OUTPUT $action, Inserted.*, Deleted.*
It fails when row_at_time argument of sqlQuery is more than 10,
Error in odbcQuery(channel, query, rows_at_time) :
'Calloc' could not allocate memory (107374182400 of 1 bytes)
but works if row_at_time < 10. (still the query takes quite a few seconds which is surprising as the table is indexed and very small: less than 100 rows)
Any idea why?
Thank you
EDIT: This is the structure of the table I am writing on:

Related

Sqlite3 calculate the percentages from totals

Question(*):
The total number of cases and deaths as a percentage of the population, for each country (with country, % cases of population, % deaths of population as columns)
I have two tables :
countriesAffected(countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp)
victimsCases(dateRep,cases,deaths,geoId)
where primary key(geoid)
I tried to do (*) by this method:
SELECT countriesAndTerritories, (100 *SUM(victimsCases.cases) / popData2019)as "cases" ,(100 * SUM(deaths) / popData2019) as "deaths"
FROM countriesAffected
INNER JOIN victimsCases ON victimsCases.geoId = countriesAffected.geoId
GROUP BY countriesAndTerritories
ORDER BY countriesAndTerritories DESC;
Error: near line 2: near "SELECT countriesAndTerritories": syntax error
But for some reason I get all types of syntax errors, i tried to sort it out but with no results. And not sure where did i went wrong.
If you are getting the error Error: near line 2: near "SELECT countriesAndTerritories": syntax error then the issue is with LINE 1 (perhaps no ; at the end of line 1).
Otherwise your query works albiet probably not as intended (as you may well want decimal places for the percentages).
Consider the following (that shows your SQL with additional SQL added to work as intended (see casesV2 and deathsV2 that utilise CAST to force INTEGER to REAL)).
DROP TABLE If EXISTS victimsCases;
DROP TABLE IF EXISTS countriesAffected;
CREATE TABLE IF NOT EXISTS countriesAffected (countriesAndTerritories TEXT,geoId INTEGER PRIMARY KEY,countryterritoryCode TEXT,popData2019 INTEGER,continentExp TEXT);
CREATE TABLE IF NOT EXISTS victimsCases (dateRep TEXT,cases INTEGER ,deaths INTEGER,geoId INTEGER);
INSERT INTO countriesAffected VALUES
('X',1,'XXX',10000,'?'),('Y',2,'YYY',20000,'?'),('Z',3,'ZZZ',30000,'?')
;
INSERT INTO victimsCases VALUES
('2019-01-01',100,20,1),('2019-01-02',100,25,1),('2019-01-03',100,15,1),
('2019-01-01',30,5,2),('2019-01-02',33,2,2),
('2019-01-01',45,17,3),('2019-01-02',61,4,3),('2019-01-03',75,7,3)
;
SELECT countriesAndTerritories,
(100 *SUM(victimsCases.cases) / popData2019)as "cases", /* ORIGINAL */
(100 * SUM(deaths) / popData2019) as "deaths", /* ORIGINAL */
CAST(SUM(victimsCases.cases) AS FLOAT) / popData2019 * 100 AS "casesV2",
CAST(SUM(victimscases.deaths) AS FLOAT) / popData2019 * 100 as "deathsV2"
FROM countriesAffected
INNER JOIN victimsCases ON victimsCases.geoId = countriesAffected.geoId
GROUP BY countriesAndTerritories
ORDER BY countriesAndTerritories DESC;
DROP TABLE If EXISTS victimsCases;
DROP TABLE IF EXISTS countriesAffected;
The result of the above is :-

MSSQL JSON auto to Go struct

I am attempting to use MSSQL JSON AUTO to easily go from a query to a Go Struct. Data returned looks jsony but am having troubles converting it from a string to the known struct I want.
func main() {
type LOBData struct {
COB_ID int `json:"COB_ID"`
GrossLoss float64 `json:"GrossLoss"`
}
type ResultData struct {
YearID int `json:"YearID"`
EventID int `json:"EventID"`
Modelcode int `json:"modelcode"`
Industry float64 `json:"Industry"`
LOB []LOBData `json:"y"`
}
db, err := sql.Open("sqlserver", ConnString())
checkErr(err)
defer db.Close()
var result string
err = db.QueryRow(`
SELECT i.YearID, i.EventID, i.modelcode, totalloss as Industry, y.COB_ID, y.GrossLoss
FROM dbo.CS_IndustryLossv8_7938 AS i INNER JOIN
dbo.Tb_YLT AS y ON i.YearID = y.YearID AND i.EventID = y.EventID AND i.modelcode = y.Modelcode
where YLT_DVID=25
FOR JSON AUTO`).Scan(&result)
fmt.Println(result)
YLT:= ResultData{}
//var YLT []ResultData
err=json.Unmarshal([]byte(result), &YLT)
checkErr(err)
fmt.Println(YLT)
}
fmt.Printlin(result) prints:
[{"YearID":7687,"EventID":101900,"modelcode":41,"Industry":1.176648913256758e+010,"y":[{"COB_ID":5,"GrossLoss":6.729697615695682e+003}]},.....
but fmt.Println(YLT) returns:
{0 0 0 0 []}
I am getting an error of "unexpected end of json input".
While Go does not have a string limit, MSSQL does of 8,000 characters. If I limit my query to top 3 rows and use var YLT []ResultData it works. Anyway of doing this using MSSQL and Go or should I being using different server tech?
I'm not sure why FOR JSON specifically does that, but it's not normally an issue with selecting nvarchar(max) columns.
Another way to get out of the issue is to assign it to a variable first:
DECLARE #j nvarchar(max) =
(
SELECT ...
FROM...
FOR JSON AUTO
);
SELECT #j;
Apologies...found the answer, unlike the grid output, MSSQL splits up result into multiple rows that than needs to be concatenated.
https://learn.microsoft.com/en-us/sql/relational-databases/json/format-query-results-as-json-with-for-json-sql-server?view=sql-server-ver15
Output of the FOR JSON clause
The output of the FOR JSON clause has the following characteristics:
The result set contains a single column.
A small result set may contain a single row.
A large result set splits the long JSON string across multiple rows.
By default, SQL Server Management Studio (SSMS) concatenates the results into a single row when the output setting is Results to Grid. The SSMS status bar displays the actual row count.
Other client applications may require code to recombine lengthy results into a single, valid JSON string by concatenating the contents of multiple rows. For an example of this code in a C# application, see Use FOR JSON output in a C# client app.

How to Improve the ADO Lookup Speed?

I write a C++ application via Visual Studio 2008 + ADO(not ADO.net). Which will do the following tasks one by one:
Create a table in SQL Server database, as follows:
CREATE TABLE MyTable
(
[S] bigint,
[L] bigint,
[T] tinyint,
[I1] int,
[I2] smallint,
[P] bigint,
[PP] bigint,
[NP] bigint,
[D] bit,
[U] bit
);
Insert 5,030,242 records via BULK INSERT
Create an index on the table:
CREATE Index [MyIndex] ON MyTable ([P]);
Start a function which will lookup for 65,000,000 times. Each lookup using the following query:
SELECT [S], [L]
FROM MyTable
WHERE [P] = ?
Each time the query will either return nothing, or return one row. If getting one row with the [S] and [L], I will convert [S] to a file pointer and then read data from offset specified by [L].
Step 4 takes a lot of time. So I try to profile it and find out the lookup query takes the most of the time. Each lookup will take about 0.01458 second.
I try to improve the performance by doing the following tasks:
Use parametered ADO query. See step 4
Select only the required columns. Originally I use "Select *" for step 4, now I use Select [S], [L] instead. This improves performance by about 1.5%.
Tried both clustered and non-clustered index for [P]. It seems that using non-clustered index will be a little better.
Are there any other spaces to improve the lookup performance?
Note: [P] is unique in the table.
Thank you very much.
You need to batch the work and perform one query that returns many rows, instead of many queries each returning only one row (and incurring a separate round-trip to the database).
The way to do it in SQL Server is to rewrite the query to use a table-valued parameter (TVP), and pass all the search criteria (denoted as ? in your question) together in one go.
First we need to declare the type that the TVP will use:
CREATE TYPE MyTableSearch AS TABLE (
P bigint NOT NULL
);
And then the new query will be pretty simple:
SELECT
S,
L
FROM
#input I
JOIN MyTable
ON I.P = MyTable.P;
The main complication is on the client side, in how to bind the TVP to the query. Unfortunately, I'm not familiar with ADO - for what its worth, this is how it would be done under ADO.NET and C#:
static IEnumerable<(long S, long L)> Find(
SqlConnection conn,
SqlTransaction tran,
IEnumerable<long> input
) {
const string sql = #"
SELECT
S,
L
FROM
#input I
JOIN MyTable
ON I.P = MyTable.P
";
using (var cmd = new SqlCommand(sql, conn, tran)) {
var record = new SqlDataRecord(new SqlMetaData("P", SqlDbType.BigInt));
var param = new SqlParameter("input", SqlDbType.Structured) {
Direction = ParameterDirection.Input,
TypeName = "MyTableSearch",
Value = input.Select(
p => {
record.SetValue(0, p);
return record;
}
)
};
cmd.Parameters.Add(param);
using (var reader = cmd.ExecuteReader())
while (reader.Read())
yield return (reader.GetInt64(0), reader.GetInt64(1));
}
}
Note that we reuse the same SqlDataRecord for all input rows, which minimizes allocations. This is documented behavior, and it works because ADO.NET streams TVPs.
Note: [P] is unique in the table.
Then you should make the index on P unique too - for correctness and to avoid wasting space on the uniquifier.

Select data based on datestamp with sqlite3 query

I'm trying to use sqlite3 to access data in a database based on a value for the datestamp. Please consider the following code:
live_db_conn = sqlite3.connect('/Users/user/Documents/database.db')
time_period = (dt.now() - timedelta(seconds=time)).strftime('%H:%M:%S')
time_period_data = pd.read_sql_query('SELECT * FROM table1 WHERE Datestamp > {}'.format(str(time_period)), live_db_conn)
When I run this code I get the following error:
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * FROM table1 WHERE Datestamp > 12:33:33': near ":33": syntax error
I don't understand where this error comes from, because if I run the following code:
df = pd.read_sql_query('SELECT Datestamp FROM table1 LIMIT 10', live_db_conn)
print(df)
I get the following output:
Datestamp
0 10:46:54
1 10:46:59
2 10:47:04
3 10:47:09
4 10:47:14
5 10:47:19
6 10:47:24
7 10:47:29
8 10:47:34
9 10:47:39
So it seems (to me at least) that my sql query is correct. I've tried to do .format(time_period) instead of .format(str(time_period)) but I can't figure out what I'm doing wrong.
Question: How do I select the portion of the data that corresponds to the selected time period?
Edit: It seems that something is going wrong with the minutes in the timestamp. When I ran the code again I got the same error but with a different timestamp:
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * FROM table1 WHERE Datestamp > 12:49:10': near ":49": syntax error
So I'd say that the syntax error has something to do with the minutes in the timestamp..
Instead of
time_period_data = pd.read_sql_query('SELECT * FROM table1 WHERE Datestamp > {}'.format(str(time_period)), live_db_conn)
I did:
time_period_data = pd.read_sql_query('SELECT * FROM table1 WHERE Datestamp > "{}"'.format(time_period), live_db_conn)
which solved the problem!

Microsoft SQL Server: wrong query execution plan taking too long

This is on Windows SQL Server Cluster.
Query is coming from 3rd party application so I can not modify the query permanently.
Query is:
DECLARE #FromBrCode INT = 1001
DECLARE #ToBrCode INT = 1637
DECLARE #Cdate DATE = '31-mar-2017'
SELECT
a.PrdCd, a.Name, SUM(b.Balance4) as Balance
FROM
D009021 a, D010014 b
WHERE
a.PrdCd = LTRIM(RTRIM(SUBSTRING(b.PrdAcctId, 1, 8)))
AND substring(b.PrdAcctId, 9, 24) = '000000000000000000000000'
AND a.LBrCode = b.LBrCode
AND a.LBrCode BETWEEN #FromBrCode AND #ToBrCode
AND b.CblDate = (SELECT MAX(c.CblDate)
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate)
GROUP BY
a.PrdCd, a.Name
HAVING
SUM(b.Balance4) <> 0
ORDER BY
a.PrdCd
This particular query is taking too much time to complete execution. The same problem happens on a different SQL Server.
No table lock was found, processor and memory usage is normal while the query is running.
Normal "select top 1000" working and showing output instantly in both tables (D009021, D010014)
Reindex and rebuild / update stats done in both tables but problem did not resolve (D009021, D010014)
The same query is working if we reduce number of branch but slowly
(
DECLARE #FromBrCode INT =1001
DECLARE #ToBrCode INT =1001
)
The same query is working faster giving output within 2 mins if we replace any one variable and use the value directly
AND a.LBrCode BETWEEN #FromBrCode AND #ToBrCode
changed to
AND a.LBrCode BETWEEN 1001 AND #ToBrCode
The same query is working faster and giving output within 2 mins if we add "OPTION (RECOMPILE)" at end
I tried to clean cache query execution plan and optimized new one but problem still exists
Found that the query estimate plan and actual execution plan are different (see screenshots)
Table D010014 is aliased twice once as b and once as c
the they are joined to the same table.
Try toto remove the sub query below and create a temp table to store
the values you need. I added * to the fields you self join
SELECT MAX(c.CblDate)
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate
if you cant do that then try
SELECT TOP 1 c.CblDate
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate
ORDER BY c.CblDate DESC

Resources