I am deleting login records in my database that don't have a corresponding logout record, but right now it's very slow It does this:
First it gets the queries to loop over to check to delete
Next it needs to find out if the next record for that user is a login or logout, if it's a login, I delete it.
To get the next record of that type it does this query of query:
<cfquery dbtype="query" name="getnext" maxrows="1">
SELECT * FROM getlogs WHERE id > #id# AND logType = 'login'
</cfquery>
But it's slow, doing it thousands of times makes it take about 56 seconds.
What would be a faster way to do this? Would another cfloop inside my loop (basicly a loop until I get to the row I want) be faster? Is there another way?
This sounds like something that can be done entirely in one query -- perhaps something like this:
delete from login_table t
where exists (
select id
from login_table
where id > t.id
and logtype = 'login'
)
This has nothing to do with ColdFusion per se; the same approach would apply in any environment. If this is a maintenance function that has no synchronous dependence on your application, you could even stick it into a stored procedure invoked automatically by a recurring "cleanup" task in the database itself.
Your best bet is to do it all in sql, using cursors or a temp table. That saves the roundtrips between the CF and sql servers.
Related
We have external table created, we need to run select on the table and select all the records, the select runs very very slow. Its not completing even after 30 mins, the table contains around 2millon recs
We also need to query this table from another DB and even this runs very very slow, doesn't return even after 30 mins.
Select is of the form:
select col1, col2,...col3 from ext_table;
Need help in:
1. Any suggestions on reducing the time taken for execution?
Note: we need to select entire content of the table so where condition might not be used.
Thanks in advance.
If you are not using the WHERE clause to push parameters to the remote database, then there is no way to optimize the performance of the query. You are returning the whole table.
My suggestion is to use SQL Data Sync to have a local copy of the table on this SQL Database that synchronizes with the remote Azure SQL Database at X interval of time.
I have an application that almost continuously works with inserting or updating data. Since multiple requests are handled asynchronous I wrote my queries like below. I used an example based on SO, but that's not what I'm actually doing.
DECLARE #rows int;
INSERT INTO [user] ([username],[reputation])
SELECT [username],[reputation]
FROM (
SELECT [username]=:user,[reputation]=:rep
) A
WHERE A.[username] NOT IN (
SELECT [username]
FROM [user]
);
SET #rows = ##ROWCOUNT;
IF (#rows=0) BEGIN
UPDATE [user]
SET [reputation]=:rep, [updated]=GetDate()
WHERE [username]=:user
END;
This is passed in total to the database with PHP PDO. Because of the amount of data and other processing factors it's heavy on the (cheap) VPS it's running on. It's not really a problem if these processes run slow or get delayed, but on the other hand this data should be available via a website and then the queries on the data should be quick.
I was thinking about replicating parts of the processed data to a second server and running the website on that database. But I'm wondering how that would actually work with a query like above.
I'm guessing the UPDATE query will only be in the transaction log when #rows=0, so that won't be a problem.
But would the first part only send INSERT INTO [user] ([username],[reputation]) VALUES ('Hugo Delsing', '10k') or the entire query with the WHERE NOT IN () query?
Most of the time a user would exists, so if it only runs the new inserts that won't be a problem. But if it would run the entire query each time the benefits would be small.
Obviously I could wrap the first part up in another check if exists(select 1 from [user] where [username] = :user) to make sure it only runs when there is no user, but I'm wondering if that is necessary.
Bonus question, but feel free to ignore because it's a bit broad: Would replicating be the way to go or does MS SQL offer other/better solutions for something like this?
I need to create a "ghost" table in SQL Server, which doesn't actually exist but is a result set of a SQL Query. Pseudo code is below:
SELECT genTbl_col1, genTblcol2
FROM genTbl;
However, "genTbl" is actually:
SELECT table1.col AS genTbl_col1,
table2.col AS genTbl_col2
FROM table1 INNER JOIN table2 ON (...)
In other words, I need that every time a query is run on the server trying to select from "genTbl", it simply creates a result set from the query and treats it like a real table.
The situation is that I have a software that runs queries on a database. I need to modify it, but I cannot change the software itself, so I need to trick it into thinking it can actually query "genTbl", when it actually doesn't exist but is simply a query of other tables.
To clarify, the query would have to be a sort of procedure, available by default in the database (i.e. every time there is a query for "genTbl").
Use #TMP
SELECT genTbl_col1, genTblcol2
INTO #TMP FROM genTbl;
It exists only in current session. You can also use ##TMP for all sessions.
I use below query to analyze usage of index in SQL Server.
SELECT *
FROM sys.dm_db_index_usage_stats A
WHERE A.database_id = DB_ID()
How can reset all data from this system table?
What do you mean by reset .. do you want to reset the index usage statistics in the table?
Taken from Here
Usage statistics: These are found in sys.dm_db_index_usage_stats.
Index usage statistics keep track of things like seeks and scans from
SELECT queries. They are not persisted and get reset on restart of sql
server. These statistics also get reset if the underlying index is
rebuilt "ALTER INDEX ... REBUILD", but not with "ALTER INDEX ...
REORG"
As said, you can't reset it manually. Take a look at this post which certainly says the same
http://social.msdn.microsoft.com/Forums/sqlserver/en-US/08eb7b79-64a3-4475-bfc3-69715aec8381/resetting-dmdbindexusagestats-without-restarting-or-detaching-a-database
Like mentioned, you cannot truly reset it without restarting the SQL Server.
BUT
Why do you want to reset it? Probably because you have made changes to your indexes and simply want to see how the usage has changed, am I right?
In this case you can hardcode the existing values into your query and subtract it to get new stats from this point.
By "hardcoding" I mean joining with a VALUES pseudo-table, something like this
--your SELECT goes here
--your FROM goes here
--add this JOIN
JOIN ( VALUES('IX_index1', 2412727),
('IX_index2', 1630517),
('IX_index3', 514129)) o(name, seeks) ON o.name=indexes.name
-- rest of your query
Now you can add this to your SELECT to get the difference:
SELECT dm_db_index_usage_stats.user_seeks - o.seeks AS newseeks
So in a nutshell:
SELECT the existing usage stats from dm_db_index_usage_stats
do some copy-pasting magic to get the existing stats and hardcode into your query
see the changes
June 29, 2010 - I had an un-committed action from a previous delete statement. I committed the action and I got another error about conflicting primary id's. I can fix that. So morale of the story, commit your actions.
Original Question -
I'm trying to run this query:
with spd_data as (
select *
from openquery(IRPROD,'select * from budget_user.spd_data where fiscal_year = 2010')
)
insert into [IRPROD]..[BUDGET_USER].[SPD_DATA_BUD]
(REC_ID, FISCAL_YEAR, ENTITY_CODE, DIVISION_CODE, DEPTID, POSITION_NBR, EMPLID,
spd_data.NAME, JOB_CODE, PAY_GROUP_CODE, FUND_CODE, FUND_SOURCE, CLASS_CODE,
PROGRAM_CODE, FUNCTION_CODE, PROJECT_ID, ACCOUNT_CODE, SPD_ENC_AMT, SPD_EXP_AMT,
SPD_FB_ENC_AMT, SPD_FB_EXP_AMT, SPD_TUIT_ENC_AMT, SPD_TUIT_EXP_AMT,
spd_data.RUNDATE, HOME_DEPTID, BUD_ORIG_AMT, BUD_APPR_AMT)
SELECT REC_ID, FISCAL_YEAR, ENTITY_CODE, DIVISION_CODE, DEPTID, POSITION_NBR, EMPLID,
spd_data.NAME, JOB_CODE, PAY_GROUP_CODE, FUND_CODE, FUND_SOURCE, CLASS_CODE,
PROGRAM_CODE, FUNCTION_CODE, PROJECT_ID, ACCOUNT_CODE, SPD_ENC_AMT, SPD_EXP_AMT,
SPD_FB_ENC_AMT, SPD_FB_EXP_AMT, SPD_TUIT_ENC_AMT, SPD_TUIT_EXP_AMT,
spd_data.RUNDATE, HOME_DEPTID, lngOrig_amt, lngAppr_amt
from spd_data
left join Budgets.dbo.tblAllPosDep on project_id = projid
and job_code = jcc and position_nbr = psno
and emplid = empid
where OrgProjTest = 'EQUAL';
Basically I'm selecting a table from IRPROD (an oracle db), joining it with a local table, and inserting the results back on IRPROD.
The problem I'm having is that while the query runs, it never stops. I've let it run for an hour and it keeps going until I cancel it. I can see on a bandwidth monitor on the SQL Server data going in and out. Also, if I just run the select part of the query it returns the results in 4 seconds.
Any ideas why it's not finishing? I've got other queryies setup in a similar manner and do not have any problems (granted those insert from local tables and not a remote table).
You didn't included any volume metrics. But I would recommend to use a temporary table to gather the results.
Then you should try to insert the first couple of rows. If this succeeds you'll have a strong indicator that everything is fine.
Try to break down each insert task by project_id or emplid to avoid large transactions logs.
You should also think about crafting a bulk batch process.
If you run just the select without the insert, how many records are returned? Does the data look right or are there multiple records due to the join?
Are there triggers on the table you are inserting into? If you are returning many records and triggers are on the table that are designed to run row-byrow this could be slowing things down. You are also sending to another server, so the network pipeline could be what is slowing you down. Maybe it would be better to send the budget data to the Oracle server and do the insert from there rather than from the SQL Server.