Query optimisation in sybase - sybase

I need to retrieve customer MSISDN (phone no) from 22 customer databases. I have created a view for two cases:
First we need to check for which MSISDNs profile_id 16240 is inactive.This can be done by querying in database whose inactive data is not null.
Since for GPRS we have two profile 25054 and 16240,it happens e for MSISDNs 25054 (for internet purpose) is active and 16240 (for GPRS is not active)
so we need to create script for that purpose .
I have prepared a query:
CREATE VIEW SUBSCR_INFO_VIEW AS
SELECT subscr_no,account_no FROM CUSTOMER_PROFILE_DEF WHERE subscr_no NOT IN
(SELECT DISTINCT(subscr_no) FROM CUSTOMER_ID_EQUIP_MAP
WHERE inactive_date Is NOT NULL)
AND (profile_id IN (16240) AND cutoff_end_dt IS NOT NULL) OR (profile_id IN (25054) AND profile_id NOT IN (16240) AND cutoff_end_dt IS NULL)
SET ROWCOUNT 100
SELECT DISTINCT(subscr_no) FROM SUBSCR_INFO_VIEW
This will be hit in all 22 customer servers and to take data from a single customer it's taking 2.5 min. I want to reduce that time. Please let me know your feedback.

This is a little difficult to answer without knowing more about the structure of the database. How many records do you have in the CUSTOMER_PROFILE_DEF and CUSTOMER_ID_EQUIP_MAP tables, and what keys do you have? Also, your SQL is very difficult to understand in the original post, I have reformatted it below and made some small changes:
CREATE VIEW
SUBSCR_INFO_VIEW
AS SELECT
subscr_no,
account_no
FROM
CUSTOMER_PROFILE_DEF
WHERE
subscr_no
NOT IN (
SELECT DISTINCT
subscr_no
FROM
CUSTOMER_ID_EQUIP_MAP
WHERE
inactive_date Is NOT NULL
)
AND ((profile_id = 16240 AND cutoff_end_dt IS NOT NULL)
OR (profile_id = 25054 AND cutoff_end_dt IS NULL))
SET ROWCOUNT 100 -- This is just for testing?
SELECT DISTINCT(subscr_no) FROM SUBSCR_INFO_VIEW
The sql is largely the same, but I changed your profile_id in (12345) statements to profile_id = 12345 as there was only one value in the list of values.

Related

SQL Server - Update All Records, Per Group, With Result of SubQuery

If anyone could even just help me phrase this question better I'd appreciate it.
I have a SQL Server table, let's call it cars, which contains entries representing items and information about their owners including car_id, owner_accountNumber, owner_numCars.
We're using a system that sorts 'importantness of owner' based on number of cars owned, and relies on the owner_numCars column to do so. I'd rather not adjust this, if reasonably possible.
Is there a way I can update owner_numCars per owner_accountNumber using a stored procedure? Maybe some other more efficient way I can accomplish every owner_numCars containing the count of entries per owner_accountNumber?
Right now the only way I can think to do this is to (from the c# application):
SELECT owner_accountNumber, COUNT(*)
FROM mytable
GROUP BY owner_accountNumber;
and then foreach row returned by that query
UPDATE mytable
SET owner_numCars = <count result>
WHERE owner_accountNumber = <accountNumber result>
But this seems wildly inefficient compared to having the server handle the logic and updates.
Edit - Thanks for all the help. I know this isn't really a well set up database, but it's what I have to work with. I appreciate everyone's input and advice.
This solution takes into account that you want to keep the owner_numCars column in the CARs table and that the column should always be accurate in real time.
I'm defining table CARS as a table with attributes about cars including it's current owner. The number of cars owned by the current owner is de-normalized into this table. Say I, LAS, own three cars, then there are three entries in table CARS, as such:
car_id owner_accountNumber owner_numCars
1 LAS1 3
2 LAS1 3
3 LAS1 3
For owner_numCars to be used as an importance factor in a live interface, you'd need to update owner_numCars for every car every time LAS1 sells or buys a car or is removed from or added to a row.
Note you need to update CARS for both the old and new owners. If Sam buys car1, both Sam's and LAS' totals need to be updated.
You can use this procedure to update the rows. This SP is very context sensitive. It needs to be called after rows have been deleted or inserted for the deleted or inserted owner. When an owner is updated, it needs to be called for both the old and new owners.
To update real time as accounts change owners:
create procedure update_car_count
#p_acct nvarchar(50) -- use your actual datatype here
AS
update CARS
set owner_numCars = (select count(*) from CARS where owner_accountNumber = #p_acct)
where owner_accountNumber = #p_acct;
GO
To update all account_owners:
create procedure update_car_count_all
AS
update C
set owner_numCars = (select count(*) from CARS where owner_acctNumber = C.owner_acctNumber)
from CARS C
GO
I think what you need is a View. If you don't know, a View is a virtual table that displays/calculates data from a real table that is continously updated as the table data updates. So if you want to see your table with owner_numCars added you could do:
SELECT a.*, b.owner_numCars
from mytable as a
inner join
(SELECT owner_accountNumber, COUNT(*) as owner_numCars
FROM mytable
GROUP BY owner_accountNumber) as b
on a.owner_accountNumber = b.owner_accountNumber
You'd want to remove the owner_numCars column from the real table since you don't need to actually store that data on each row. If you can't remove it you can replace a.* with an explicit list of all the fields except owner_numCars.
You don't want to run SQL to update this value. What if it doesn't run for a long time? What if someone loads a lot of data and then runs the score and finds a guy that has 100 cars counts as a zero b/c the update didn't run. Data should only live in 1 place, updating has it living in 2. You want a view that pulls this value from the tables as it is needed.
CREATE VIEW vOwnersInfo
AS
SELECT o.*,
ISNULL(c.Cnt,0) AS Cnt
FROM OWNERS o
LEFT JOIN
(SELECT OwnerId,
COUNT(1) AS Cnt
FROM Cars
GROUP BY OwnerId) AS c
ON o.OwnerId = c.OwnerId
There are a lot of ways of doing this. Here is one way using COUNT() OVER window function and an updatable Common Table Expression [CTE]. That you won't have to worry about relating data back, ids etc.
;WITH cteCarCounts AS (
SELECT
owner_accountNumber
,owner_numCars
,NewNumberOfCars = COUNT(*) OVER (PARTITION BY owner_accountNumber)
FROM
MyTable
)
UPDATE cteCarCounts
SET owner_numCars = NewNumberOfCars
However, from a design perspective I would raise the question of whether this value (owner_numCars) should be on this table or on what I assume would be the owner table.
Rominus did make a good point of using a view if you want the data to always reflect the current value. You could also use also do it with a table valued function which could be more performant than a view. But if you are simply showing it then you could simply do something like this:
SELECT
owner_accountNumber
,owner_numCars = COUNT(*) OVER (PARTITION BY owner_accountNumber)
FROM
MyTable
By adding a where clause to either the CTE or the SELECT statement you will effectively limit your dataset and the solution should remain fast. E.g.
WHERE owner_accountNumber = #owner_accountNumber

TSQL Comparing 2 tables

I have 2 tables in 2 database. The scheme for the tables is identical. There are no timestamps or last updated information. Table A is a live table, that is, it's updated in "the" program. Update records, insert records and delete records all happen in Table A. Table B is a backup made weekly. Is there a quick way to compare the 2 tables and give me results similar to:
I | 54
D | 55
U | 60
So record 54 in the live table is new, record 55 in the live table was deleted, record 60 in the live table was updated.
This needs to work in SQL Server 2008 and up.
Fields: id, first_name, last_name, phone, email, address_id, birth_date, last_visit, provider_id, comments
I have no control over the scheme. I have read-only access to Table A, read-write to Table B.
Would it be easier to store a hash of each Table A's rows rather than a full copy of the table? Generally speaking I need to know what rows have been updated/inserted and deleted without a build in timestamp. I have the weekly backup table to look at but I could create a hash table if needed.
Using two full joins the first one isvused to check just for id existance and identify inserts and deletes the second would be used for row equality.
In the example I have used checksum for simplicity but I recommend you read up on the cons of using it and consider alternatives like hashbytes or checking each column for equality
Select id, checksum(*) hash
Into #live
From live.dbo.tbl
Select id, checksum(*) hash
Into #archive
From archive.dbo.tbl
Select l1.id,
Case when l1.id is null then 'd'
when a1.id is null then 'I'
when a2.id is null then 'u' end change_type
From #live l1
Full Join #archive a1 On a1.id = l1.id
Full Join #archive a2 On a2.id = l1.id
And a2.hash = l1.hash
I'm going to recommend a tool, but it's not free, although it has a fully functioning 30 day trial period. If you're going to compare data in SQL Server tables, look at Red Gate's SQL Data Compare. It's not cheap, and it will pay for itself many times over. (If you need to compare schemas, their SQL Compare does that.)
Barring that, having a third table, where you write a compare query and select those in one table and not the other (with a field indicating that), those in the other table and not the first, and then comparing field by field to find those different -- well that should work too. It will take longer, but if it's just one one table, the time it takes to write that code should be less than what you'll pay for the Red Gate tools.
If there is a column or set of columns that can uniquely identify each row, then a series of sql statements could be written to identify the inserts, updates and deletes. If there isn't a unique row identifier or the unique identifier (for example, one of the columns that makes it unique) changes, then no.

SQL views from one table

My colleague asked me a question regarding getting data from a SQL Server database.
I have a sample data set
[ID],[TOWN],[PERSON]
[1],[BELFAST],[JAMES]
[2],[NEWRY],[JOHN]
[3],[BELFAST],[SIMON]
[4],[LARNE],[ALAN]
Now from this I would like to return a SQL Dataset that returns me a different table based upon the view.
Essentially in code I could get a distinct on the town then loop sql filtering on the town. But is there a way I can do this in SQL?
Where I would get (3) views back (2 Belfast, 1 Newry and 1 Larne)
Basicly I it would return
[ID],[Town],[Person]
[1],[Belfast],[James]
[3],[Belfast],[Simon]
Then another view would return for 'Larne' and a Final one for Newry. Basically SQL creating views for each town it finds and then returns the records for each town.
You don't get views back - you have to define them yourself.
E.g. if you need one view for Belfast, a second for Newry and a third for Larne - then you need to create three views that return only those rows that match the relevant city name
CREATE VIEW BelfastView
AS
SELECT ID, Town, Person
FROM dbo.Towns
WHERE Town = 'Belfast'
CREATE VIEW LarneView
AS
SELECT ID, Town, Person
FROM dbo.Towns
WHERE Town = 'Larne'
CREATE VIEW NewryView
AS
SELECT ID, Town, Person
FROM dbo.Towns
WHERE Town = 'Newry'
Now, certain users might only be allowed to select data from the BelfastView and thus would never see any other rows of data from your underlying table.
But views are database objects like tables or stored procedures; you need to create them, maintain them, toss them when no longer needed.
EDIT
Based on your updated question, you simply need to create a view for each town you want to filter:
CREATE VIEW BelfastView AS
SELECT ID,
Town,
Person
FROM YourTable
WHERE Town = 'BELFAST'
Although you've only given us a small sample of your data, what you're asking is almost never a good idea. What happens when you have 50 new towns in your DB? Are you going to create a view for each town? This does not scale well (or at all).
Basically I have decided to Run it as a Stored Procedure to return me each item as a List. So something along the lines of this:
Create Procedure ListTowns
As
declare #towns char(11)
select #towns = (select distinct Town from [Towns])
while #towns is not null <> 0
begin
select * from [YourTable] where Town = #towns
end

Updating redundant/denormalized data automatically in SQL Server

Use a high level of redundant, denormalized data in my DB designs to improve performance. I'll often store data that would normally need to be joined or calculated. For example, if I have a User table and a Task table, I would store the Username and UserDisplayName redundantly in every Task record. Another example of this is storing aggregates, such as storing the TaskCount in the User table.
User
UserID
Username
UserDisplayName
TaskCount
Task
TaskID
TaskName
UserID
UserName
UserDisplayName
This is great for performance since the app has many more reads than insert, update or delete operations, and since some values like Username change rarely. However, the big draw back is that the integrity has to be enforced via application code or triggers. This can be very cumbersome with updates.
My question is can this be done automatically in SQL Server 2005/2010... maybe via a persisted/permanent View. Would anyone recommend another possibly solution or technology. I've heard document-based DBs such as CouchDB and MongoDB can handle denormalized data more effectively.
You might want to first try an Indexed View before moving to a NoSQL solution:
http://msdn.microsoft.com/en-us/library/ms187864.aspx
and:
http://msdn.microsoft.com/en-us/library/ms191432.aspx
Using an Indexed View would allow you to keep your base data in properly normalized tables and maintain data-integrity while giving you the denormalized "view" of that data. I would not recommend this for highly transactional tables, but you said it was heavier on reads than writes so you might want to see if this works for you.
Based on your two example tables, one option is:
1) Add a column to the User table defined as:
TaskCount INT NOT NULL DEFAULT (0)
2) Add a Trigger on the Task table defined as:
CREATE TRIGGER UpdateUserTaskCount
ON dbo.Task
AFTER INSERT, DELETE
AS
;WITH added AS
(
SELECT ins.UserID, COUNT(*) AS [NumTasks]
FROM INSERTED ins
GROUP BY ins.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount + added.NumTasks)
FROM dbo.[User] usr
INNER JOIN added
ON added.UserID = usr.UserID
;WITH removed AS
(
SELECT del.UserID, COUNT(*) AS [NumTasks]
FROM DELETED del
GROUP BY del.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount - removed.NumTasks)
FROM dbo.[User] usr
INNER JOIN removed
ON removed.UserID = usr.UserID
GO
3) Then do a View that has:
SELECT u.UserID,
u.Username,
u.UserDisplayName,
u.TaskCount,
t.TaskID,
t.TaskName
FROM User u
INNER JOIN Task t
ON t.UserID = u.UserID
And then follow the recommendations from the links above (WITH SCHEMABINDING, Unique Clustered Index, etc.) to make it "persisted". While it is inefficient to do an aggregation in a subquery in the SELECT as shown above, this specific case is intended to be denormalized in a situation that has higher reads than writes. So doing the Indexed View will keep the entire structure, including the aggregation, physically stored so each read will not recalculate it.
Now, if a LEFT JOIN is needed if some Users do not have any Tasks, then the Indexed View will not work due to the 5000 restrictions on creating them. In that case, you can create a real table (UserTask) that is your denormalized structure and have it populated via either a Trigger on just the User Table (assuming you do the Trigger I show above which updates the User Table based on changes in the Task table) or you can skip the TaskCount field in the User Table and just have Triggers on both tables to populate the UserTask table. In the end, this is basically what an Indexed View does just without you having to write the synchronization Trigger(s).

joining latest of various usermetadata tags to user rows

I have a postgres database with a user table (userid, firstname, lastname) and a usermetadata table (userid, code, content, created datetime). I store various information about each user in the usermetadata table by code and keep a full history. so for example, a user (userid 15) has the following metadata:
15, 'QHS', '20', '2008-08-24 13:36:33.465567-04'
15, 'QHE', '8', '2008-08-24 12:07:08.660519-04'
15, 'QHS', '21', '2008-08-24 09:44:44.39354-04'
15, 'QHE', '10', '2008-08-24 08:47:57.672058-04'
I need to fetch a list of all my users and the most recent value of each of various usermetadata codes. I did this programmatically and it was, of course godawful slow. The best I could figure out to do it in SQL was to join sub-selects, which were also slow and I had to do one for each code.
This is actually not that hard to do in PostgreSQL because it has the "DISTINCT ON" clause in its SELECT syntax (DISTINCT ON isn't standard SQL).
SELECT DISTINCT ON (code) code, content, createtime
FROM metatable
WHERE userid = 15
ORDER BY code, createtime DESC;
That will limit the returned results to the first result per unique code, and if you sort the results by the create time descending, you'll get the newest of each.
I suppose you're not willing to modify your schema, so I'm afraid my answe might not be of much help, but here goes...
One possible solution would be to have the time field empty until it was replaced by a newer value, when you insert the 'deprecation date' instead. Another way is to expand the table with an 'active' column, but that would introduce some redundancy.
The classic solution would be to have both 'Valid-From' and 'Valid-To' fields where the 'Valid-To' fields are blank until some other entry becomes valid. This can be handled easily by using triggers or similar. Using constraints to make sure there is only one item of each type that is valid will ensure data integrity.
Common to these is that there is a single way of determining the set of current fields. You'd simply select all entries with the active user and a NULL 'Valid-To' or 'deprecation date' or a true 'active'.
You might be interested in taking a look at the Wikipedia entry on temporal databases and the article A consensus glossary of temporal database concepts.
A subselect is the standard way of doing this sort of thing. You just need a Unique Constraint on UserId, Code, and Date - and then you can run the following:
SELECT *
FROM Table
JOIN (
SELECT UserId, Code, MAX(Date) as LastDate
FROM Table
GROUP BY UserId, Code
) as Latest ON
Table.UserId = Latest.UserId
AND Table.Code = Latest.Code
AND Table.Date = Latest.Date
WHERE
UserId = #userId

Resources