Concept related to transaction? - database

I was reading about transactions in DBMS. Just get started with it. But I am super confused with one concept.
Suppose initially A=100
For T1 A=100 and doing A = A-50
For T2 temp would be 10 and change A=90. So T2 would write A=90.
Now when T1 is doing write(A) then it should write A=50 because it has done A=A-50. But it is showing T1 is writing A=90 same as T2.
Why it is happening?

If A is a variable then it's context is shared accross the program, you should try to use a temporary value in calculations.

Related

Combine parts of different tables [duplicate]

I'm kind of new to writing sql and I have a question about joins. Here's an example select:
select bb.name from big_box bb, middle_box mb, little_box lb
where lb.color = 'green' and lb.parent_box = mb and mb.parent_box = bb;
So let's say that I'm looking for the names of all the big boxes that have nested somewhere inside them a little box that's green. If I understand correctly, the above syntax is another way of getting the same results that we could get by using the 'join' keyword.
Questions: is the above select statement efficient for the task it's doing? If not, what is a better way to do it? Is the statement syntactic sugar for a join or is it actually doing something else?
If you have links to any good material on the subject I'd gladly read it, but since I don't know exactly what this technique is called I'm having trouble googling it.
You are using implicit join syntax. This is equivalent to using the JOIN keyword but it is a good idea to avoid this syntax completely and instead use explicit joins:
SELECT bb.name
FROM big_box bb
JOIN middle_box mb ON mb.parent_box = bb.id
JOIN little_box lb ON lb.parent_box = mb.id
WHERE lb.color = 'green'
You were also missing the column name in the join condition. I have guessed that the column is called id.
This type of query should be efficient if the tables are indexed correctly. In particular there should be foreign key constraints on the join conditions and an index on little_box.color.
An issue with your query is that if there are multiple green boxes inside a single box you will get duplicate rows returned. These duplicates can be removed by addding DISTINCT after SELECT.

Flink - Dataset - Can Flink respect the order of processing on multiple flows / input ?

In my Flink batch program (DataSet / Table ), I am reading multiple file, this is producing differents flows, do some processing, and save it with output format
As flink is using dataflow model, and my flows are not really related, it is processing in parallel
Yet I want Flink to respect the order of my output operations at least, because I want flow1 to be save before flow2
For example I have something like :
Table table1 = tableEnv.fromTableSource(new MyTableSource1());
DataSet<Obj1> dataSet1 = talbeEnv.toDataSet(table1.select("toto",..),Obj1.class)
dataSet1.output(new WateverdatasinkSQL())
Table table2 = tableEnv.fromTableSource(new MyTableSource2());
DataSet<Obj2 dataSet2 = tableEnv.toDataSet(table2.select("foo","bar",..),Obj2.class)
dataSet2.output(new WateverdatasinkSQL())
I want flink to wait for dataSet1 to be save to continue...
How can I do it as successive operations ?
I have already looked at the execution modes, but this is not doing it
Regards,
Bastien
The easiest solution is to separate both flows into individual jobs and execute them one after the other.
Table table1 = tableEnv.fromTableSource(new MyTableSource1());
DataSet<Obj1> dataSet1 = talbeEnv.toDataSet(table1.select("toto",..), Obj1.class);
dataSet1.output(new WateverdatasinkSQL());
env.execute();
Table table2 = tableEnv.fromTableSource(new MyTableSource2());
DataSet<Obj2> dataSet2 = tableEnv.toDataSet(table2.select("foo","bar",..), Obj2.class);
dataSet2.output(new WateverdatasinkSQL());
env.execute();

CTE performance on execution plan. Is it displayed two times, or two times processed?

This is the SQL with CommonTableExpression. Note, that USERS_PROJECTS_CTE used twice.
WITH USERS_PROJECTS_CTE (PRO_ID, SHOW_IAS, USERNAME)
AS
(
SELECT up.PRO_ID, up.SHOW_IAS, ISNULL(u.FIRST_NAME, '') + ' ' + ISNULL(u.SECOND_NAME, '')
FROM SFMIS07_PRO.USERS_PROJECTS up
INNER JOIN SFMIS07_ADM.USERS AS u
ON up.USER_ID = u.ID
WHERE up.IS_RESP_PERSON = 1 AND up.valid_to is null
)
SELECT up.PRO_ID,
up1.USERNAME as RESP_USER1,
up2.USERNAME as RESP_USER2,
up.COUNT_
FROM SFMIS07_PRO.PRO_RESP_USERS_KERNEL_MV AS up
LEFT JOIN USERS_PROJECTS_CTE AS up1 ON up.PRO_ID = up1.PRO_ID AND up1.SHOW_IAS=1
LEFT JOIN USERS_PROJECTS_CTE AS up2 ON up.PRO_ID = up2.PRO_ID AND up2.SHOW_IAS=0
The Execution Plan. Note that CTE displayed twice:
Questions:
am I right that CTE is not only displayed twice but processed twice?
is it possible to inform QO to reuse CTE ?
is it possible for QO in principle to detect "the same SQL fragment" and reuse results (I imagine the realization of this - by coping already prepared data)?
how to optimize the query (without using temporal tables :) ?
Am I right that CTE is not only displayed twice but processed twice?
Yes
Is it possible to inform QO to reuse CTE ?
Not directly but there are some hacks to encourage this.
is it possible for QO in principle to detect "the same SQL fragment"
and reuse results (I imagine the realization of this - by coping
already prepared data)?
In principle yes. See Microsoft Research Paper Efficient Exploitation of Similar Subexpressions for Query
Processing for examples.
how to optimize the query (without using temporal tables :) ?
The most reliable way would be to use a temporary (not temporal) table. See Provide a hint to force intermediate materialization of CTEs or derived tables for a more hacky workaround.

Why use Case in Where Clause

I know there are variations of this question out there but cannot find one that answers what I am looking for.
I have inherited a database and reports from another programmer who is no longer in the picture.
One of the Queries uses this code:
Select
b.HospitalMasterID
,b.TxnSite
,b.PatientID
,b.TxnDate as KeptDate
From
Billing as b
Inner Join Patient as p
on b.HospitalMasterID = p.HospitalMasterID
and b.PatientID = p.PatientID
Where
b._IsServOrItem=1
and b.TxnDate >= '20131001'
and (Case
When b.ExtendedAmount > 0 Then 1
When (Not(p.PlanCode is null)) and (b.listAmount >0) then 1
End = 1)
When I run the Query I get apx 900,000 rows returned. If I remove the Case statement, I get over a million rows returned.
Can someone explain why this is so? What exactly is the case statement doing? Is there a better way to accomplish the same thing. I really don't like this statement as it stands and the entire report query is very difficult to read due to lack of structure.
Version of Sql is T-Sql 2012.
Thanks,
Seems to me like it's doing this:
(b.ExtendedAmount > 0 OR (Not(p.PlanCode is null) and (b.listAmount >0)))
Maybe it was copy / pasted from somewhere else and modified? Regardless, it's bizarre.
I think that's someone trying to avoid using the OR operator in order to promote index seeks over scans. It would be worth looking at the plan, but I would be surprised if it differed significantly over the logic in Greg's answer.

Two radically different queries against 4 mil records execute in the same time - one uses brute force

I'm using SQL Server 2008. I have a table with over 3 million records, which is related to another table with a million records.
I have spent a few days experimenting with different ways of querying these tables. I have it down to two radically different queries, both of which take 6s to execute on my laptop.
The first query uses a brute force method of evaluating possibly likely matches, and removes incorrect matches via aggregate summation calculations.
The second gets all possibly likely matches, then removes incorrect matches via an EXCEPT query that uses two dedicated indexes to find the low and high mismatches.
Logically, one would expect the brute force to be slow and the indexes one to be fast. Not so. And I have experimented heavily with indexes until I got the best speed.
Further, the brute force query doesn't require as many indexes, which means that technically it would yield better overall system performance.
Below are the two execution plans. If you can't see them, please let me know and I'll re-post then in landscape orientation / mail them to you.
Brute-force query:
SELECT ProductID, [Rank]
FROM (
SELECT p.ProductID, ptr.[Rank], SUM(CASE
WHEN p.ParamLo < si.LowMin OR
p.ParamHi > si.HiMax THEN 1
ELSE 0
END) AS Fail
FROM dbo.SearchItemsGet(#SearchID, NULL) AS si
JOIN dbo.ProductDefs AS pd
ON pd.ParamTypeID = si.ParamTypeID
JOIN dbo.Params AS p
ON p.ProductDefID = pd.ProductDefID
JOIN dbo.ProductTypesResultsGet(#SearchID) AS ptr
ON ptr.ProductTypeID = pd.ProductTypeID
WHERE si.Mode IN (1, 2)
GROUP BY p.ProductID, ptr.[Rank]
) AS t
WHERE t.Fail = 0
Index-based exception query:
with si AS (
SELECT DISTINCT pd.ProductDefID, si.LowMin, si.HiMax
FROM dbo.SearchItemsGet(#SearchID, NULL) AS si
JOIN dbo.ProductDefs AS pd
ON pd.ParamTypeID = si.ParamTypeID
JOIN dbo.ProductTypesResultsGet(#SearchID) AS ptr
ON ptr.ProductTypeID = pd.ProductTypeID
WHERE si.Mode IN (1, 2)
)
SELECT p.ProductID
FROM dbo.Params AS p
JOIN si
ON si.ProductDefID = p.ProductDefID
EXCEPT
SELECT p.ProductID
FROM dbo.Params AS p
JOIN si
ON si.ProductDefID = p.ProductDefID
WHERE p.ParamLo < si.LowMin OR p.ParamHi > si.HiMax
My question is, based on the execution plans, which one look more efficient? I realize that thing may change as my data grows.
EDIT:
I have updated the indexes, and now have the following execution plan for the second query:
Trust the optimizer.
Write the query that most simply expresses what you're trying to achieve. If you're having perfomance problems with that query, then you should look at whether there are any missing indexes. But you still shouldn't have to explicitly work with these indexes.
Don't concern yourself by considerations of how you might implement such a search.
In very rare circumstances, you may need to further force the query to use particular indexes (via hints), but this is probably < 0.1% of queries.
In your posted plans, your "optimized" version is causing scans against 2 indexes of your (I presume) Params table (PK_Params_1, IX_Params_1). Without seeing the queries, it's difficult to know why this is happening, but if you're comparing against having a single scan against a table ("Brute force") and two, it's easy to see why the second isn't more efficient.
I think I'd try:
SELECT p.ProductID, ptr.[Rank]
FROM dbo.SearchItemsGet(#SearchID, NULL) AS si
JOIN dbo.ProductDefs AS pd
ON pd.ParamTypeID = si.ParamTypeID
JOIN dbo.Params AS p
ON p.ProductDefID = pd.ProductDefID
JOIN dbo.ProductTypesResultsGet(#SearchID) AS ptr
ON ptr.ProductTypeID = pd.ProductTypeID
LEFT JOIN Params p_anti
on p_anti.ProductDefId = pd.ProductDefID and
(p_anti.ParamLo < si.LowMin or p_anti.ParamHi > si.HiMax)
WHERE si.Mode IN (1, 2)
AND p_anti.ProductID is null
GROUP BY p.ProductID, ptr.[Rank]
I.e. introduce an anti-join that eliminates the results you don't want.
In SQL Server Management Studio, put both queries in the same query window and get the query plan for both at once. It should determine the query plans for both and give you a 'percent of total batch' for each one. The query with the lower percent of the total batch will be the better performing one.
Does 6 seconds on a laptop = .006 seconds on productions hardware? The part of your queries which worry me are the clustered index scans shown in the query plan. In my experience any time a query plan includes a CI scan it means the query will only get slower when data is added to the table.
What do the two functions yield as it appears they are the cause of the table scans? Is it possible to persist the data in the db and update the LoMin and HiMax as rows are added.
Looking at the two execution plans neither is very good. Look how far to the left the wide lines are. The wide lines means there are many rows. We need to reduce the number of rows earlier in the process so we do not work with such large hash tables and large sorts and nested loops.
BTW how many rows does your source have and how many rows are included in the result set?
Thank you all for your input and help.
From reading what you wrote, experimenting, and digging into the execution plan, I discovered the answer is tipping point.
There were too many records being returned to warrant use of the index.
See here (Kimberly Tripp).

Resources