Limit size and/or frequency of user queries in SQL Server - sql-server

Is it possible to put a cap on the "size" and frequency of user queries in SQL Server (or perhaps another engine)? For example:
Let's say there are a few tables with millions of rows. Maybe there's a handful of admins and analysts working on the tables, and they'd know their way around enough to not run any unnecessary heavy queries that may run for several minutes/hours.
However, a sales/marketers/admin staff less familiar with SQL is more likely to run a heavy query e.g. with loads of joins, whether accidently or just for the fun of it. Multiply this by dozens of them, and the server can be severely hammered at.
Ideally, I'd want restrictions like the following:
If the engine anticipates there'll be above a million row scans, cancel the query (and tell the user why it was cancelled).
Limit queries being run by a single user to 20 queries within a 10-minute window.
User/role-level "caps"

You have pretty radical requirements for code execution. There is nothing out of the box that will work for you. However, you can implement certain things to achieve what you are trying to achieve here:
Do not give users access to tables directly, create procs and only give access to procs.
Inside the procs you can get all fancy with limiting the maximum number of rows a user i can return by adding top clause.
Create an Audit table and inside the proc every time a user makes a call to a proc add a row to that audit table, also the very first step could be to check how many rows are already in the audit table for that proc for the caller (how many times user has already executed the proc) and raise an error if the user has already exceeded the limit etc. you get the idea.
I would suggest not to limit the cost of query, this would come back to haunt you, for many reasons, write the queries/Procs yourself or someone you trust to write efficient code.
Something like this....
CREATE PROC dbo.usp_Test
AS
BEGIN
SET NOCOUNT ON;
Declare #UserCalls INT;
SELECT #UserCalls = Count(*)
FROM dbo.AuditTable
WHERE UserName = SUSER_SNAME()
AND ProcName = 'usp_Test'
AND Logged >= DATEADD(MINUTE , -1 , GETDATE());
IF (#UserCalls >= 10)
BEGIN
RAISERROR ('Come back in 1 minute, you have exceeded 10 execution/min limit' , 16 , 1);
RETURN;
END
ELSE
BEGIN
INSERT INTO dbo.Audit (ProcName , UserName , Logged)
VALUES ('usp_Test' , SUSER_SNAME() , GETDATE());
END
/* Rest of the code */
SELECT TOP (1000) *
FROM ...........;
END

The feature you're looking for is called Resource Governor.
You can classify incoming connections and assign them to a Workload Group, which specifies
CREATE WORKLOAD GROUP group_name
[ WITH
( [ IMPORTANCE = { LOW | MEDIUM | HIGH } ]
[ [ , ] REQUEST_MAX_MEMORY_GRANT_PERCENT = value ]
[ [ , ] REQUEST_MAX_CPU_TIME_SEC = value ]
[ [ , ] REQUEST_MEMORY_GRANT_TIMEOUT_SEC = value ]
[ [ , ] MAX_DOP = value ]
[ [ , ] GROUP_MAX_REQUESTS = value ] )
]
[ USING {
[ pool_name | "default" ]
[ [ , ] EXTERNAL external_pool_name | "default" ] ]
} ]
[ ; ]
And maps to a Resource Pool which has limited access to server resources.
In the SQL Server Resource Governor, a resource pool represents a
subset of the physical resources of an instance of the Database
Engine. Resource Governor enables you to specify limits on the amount
of CPU, physical IO, and memory that incoming application requests can
use within the resource pool. Each resource pool can contain one or
more workload groups.
It's important to combine Resource Governor with snapshot-based reads for the reporting users, either using SNAPSHOT isolation, or by setting the database to READ_COMMITTED_SNAPSHOT. Otherwise a reporting user with limited access to resources can acquire locks that interfere with other workloads.

Related

Making a postgres query less expensive for the DB

In a SQL query I have to join many tables and its very expensive for the DB.
In the DB a hostgroup has many host, there are like 20 hostgroups, and there is 4 hostgroups that I don't use...
I was wandering if I add a "not in" operator in my query, excluding those 4 hostgroup, the query will be less expensive? or just make thing worst using more resources on the db?
this is my query, just in case...
select history.clock, hstgrp.name as hostgroup, hstgrp.groupid as hgid , hosts.name as hostname ,
items.name as item, hosts.hostid, history.value as porcentaje, items.key_ as key ,items.itemid,
applications.name as appname, applications.applicationid as appid
FROM history
join items_applications on history.itemid = items_applications.itemid
join applications on items_applications.applicationid = applications.applicationid
join items on items.itemid = history.itemid
join hosts on items.hostid = hosts.hostid
join hosts_groups on hosts.hostid = hosts_groups.hostid
join hstgrp on hosts_groups.groupid = hstgrp.groupid
where lower(items.name) SIMILAR TO lower('Used disk space%|Used disk space on%')
and hstgrp.name not in ('Discovered', 'Discover VMs') <==============
The additional filter sure cannot harm, but unless it is very selective, it will probably not reduce the execution time significantly.
I am reduced to guessing, since you didn't add EXPLAIN (ANALYZE, BUFFERS) output to the question, but I'd assume that the query returns a lot of rows and is bound to be slow.
You could change the SIMILAR TO condition to
WHERE lower(items.name) LIKE lower('Used disk space%')
and support it with an index:
CREATE INDEX ON items (lower(name) text_pattern_ops);
Perhaps that will speed up the execution somewhat.

MarkLogic - How to know size of database, size of Index, Total indexs

We are using MarkLogic 9.0.8.2
We have setup MarkLogic cluster, ingested around 18M XML documents, few indexes have been created like Fields, PathRange & so on.
Now while setting up another environment with configuration, indexs, same number of records but i am not able to understand why the total size on database status page is different from previous environment.
So i started comparing database status page of both clusters where i can see size per forest/replica forest and all.
So in this case, i would like to know size for each
Database
Index
Also would like to know (instead of expanding each thru admin interface) the total indexes in given database
Option within Admin interface OR thru xQuery will also do.
MarkLogic does not break down the index sizes separately from the Database size. One reason for this is because the data is stored together with the Universal Index.
You could approximate the size of the other indexes by creating them one at a time, and checking the size before and after the reindexer runs, and the deleted fragments are merged out. We usually don't find a lot of benefit it trying to determine the exact index sizes, since the benefits they provide typically outweigh the cost of storage.
It's hard to say exactly why there is a size discrepancy. One common cause would be the number of deleted fragments in each database. Deleted fragments are pieces of data that have been marked for deletion (usually due to an update, delete or other change). Deleted fragments will continue to consume database space until they are merged out. This happens by default, or it can be manually started at the forest or database level.
The database size, and configured indexes can be determined through the Admin UI, Query Console (QConsole) or via the MarkLogic REST Management API (RMA) endpoints. QConsole supports a number of languages, but server side Javascript and XQuery are the most common. RMA can return results in XML or JSON.
Database Size:
REST: http://[host-name]:8002/manage/v2/databases/[database-name]?view=status
QConsole: Sum the disk size elements for the stands from xdmp.forestStatus(javascript) or xdmp:forest-status(XQuery) for all the forests in the database.
Configured Indexes:
REST: http://[host-name]:8002/manage/v2/databases/\database-name]?view=package
QConsole: Use xdmp.getConfiguration(javascript) or xdmp:get-configuration(XQuery) in conjunction with the xdmp.databaseGet[index type] or xdmp:database-get-[index type]
for $db-id in xdmp:databases()
let $db-name := xdmp:database-name($db-id)
let $db-size :=
fn:sum(
for $f-id in xdmp:database-forests($db-id)
let $f-status := xdmp:forest-status($f-id)
let $space := $f-status/forest:device-space
let $f-name := $f-status/forest:forest-name
let $f-size :=
fn:sum(
for $stand in $f-status/forest:stands/forest:stand
let $stand-size := $stand/forest:disk-size/fn:data(.)
return $space
)
return $f-size
)
order by $db-size descending
return $db-name || " = " || $db-size

Delphi - multiple users (sessions) after login (FireDAC)

I am working on Windows desktop application in Delphi using FireDAC driver and MSSQL database system. Currently, I am having a problem in understanding how multiple sessions (users) should work. Right now, I have three test users, and when I log in with any of them, every session has the same data and functionalities. I don't want that. I want that each user (each session) has different data and functionalities.
Note that this is different from distributed systems, where tasks are distributed by hosts in a network. I am not interested in distributed system. I have a desktop application.
Could someone explain how to achieve this (different users (sessions) = different data and functionalities)?
You've indicated in a comment that what you want is for a number of users o be able to see different data rows in the same table or tables
That's actually quite quite straightforward: you just need to define, for each user (or user type), the criteria which determine which data rows they are supposed to be able to see, then write a Where clause which selects only those rows. It's generally a bad idea to hard-code users's identities in a database and what data they are permitted to see and what operations they are permitted to carry out on the data.
It's hard to give a concrete example without getting into details of what you are wanting to do, but the following simple example might help.
Suppose you have a table of Customers, and one user is suposed to deal with the USA, the second user deals with France and the third with the rest of the world.
In your app, you could have an enumerated type to represent this:
type
TRegion = (rtUSA, rtFR, rtRoW); // RoW = Rest of the World
Then you could write a function to generate the Where clause of a SQL Select statement like this:
function GetRegionWhereClause(const ARegion : TRegion) : String;
begin
Result := ' Where ';
case ARegion of
rtUSA : Result := Result + ' Customer.Country = ''USA''';
rtFR : Result := Result + ' Customer.Country = ''FR''';
rtRoW : Result := Result + ' not Customer.Country in (''USA'', ''FR'')'
end; { case }
end;
You could then call GetRegionWhereClause when you generate the Sql to open the Customers table.
Similarly define for each user type what operations they are permitted to carry out on the data (Update, Insert, Delete). But implementing that would be more a question of selectively enabling and disable the gui functionality in your app to do the tasksin question.

Continuous queries in Influxdb ignoring where clause?

I'm having a bit of a trouble with the continuous queries in influxdb 0.8.8.
I'm trying to create a continuous query but it seems that the where clauses are ignored. I'm aware about the restrictions mentioned here: http://influxdb.com/docs/v0.8/api/continuous_queries.html but I don't consider that this would be the case here.
One row in the time series would contain data like this:
{"hex":"06a0b6", "squawk":"3421", "flight":"QTR028 ", "lat":99.867630, "lon":66.447365, "validposition":1, "altitude":39000, "vert_rate":-64,"track":125, "validtrack":1,"speed":482, "messages":201, "seen":219}
The query I'm running and works is the following:
select * from flight_series where time > now() - 30m and flight !~ /^$/ and validtrack = 1 and validposition = 1;
Trough it I'm trying to take the last 30 minutes from the current time, check that the flight field is no whitespaces and that the track/position are valid.
The query returns successfully but when I'm adding the
into filtered_log
part the 'where' clause is ignored.
How can I create a continuous query which takes the above-mentioned conditions into consideration? At least, how could I extract with one continuous query only the rows which have the valid track/heading set to 1 and the flight is not whitespace/empty string? The time constraint I could eliminate from the query and translate into shard retention/duration.
Also, could I specify to in the continuous query to save the data into a time-series which is located into another database (which has a more relaxed retention/duration policy)?
Thank you!
Later edit:
I've managed to do something closer to my need by using the following cq:
"select time, sequence_number, altitude, vert_rate, messages, squawk, lon, lat, speed, hex, seen from current_flights where ((flight !~ /^$/) AND (validtrack = 1)) AND (validposition = 1) into flight.[flight]"
This creates a series for each 'flight' even for those which have a whitespace in the 'flight' field -- for which a flight. series is built.
How could I specify the retention/duration policies for the series generated by the cq above? Can I do something like:
"spaces": [
{
"name": "flight",
"retentionPolicy": "1h",
"shardDuration": "30m",
"regex": "/.*/",
"replicationFactor": 1,
"split": 1
},
...
which would give me a retention of 1h and shard duration of 30m?
I'm a bit confused about where those series are stored, which shard space?
Thanks!
P.S.: My final goal would be the following:
Have a 'window' of 15-30min max with all the flights around, process some data from them and after that period is over discard the data but in the same time move/copy it to another long-term db/series which can be used for historical purposes.
You cannot put time restrictions into the WHERE clause of a continuous query. The server will generate the time restrictions as needed when the CQ runs and must ignore all others. I suspect if you leave out the time restriction the rest of the WHERE clause will be fine.
I don't believe CQs in 0.8 require an aggregation in the SELECT, but you do need to have GROUP BY clause to tell the CQ how often to run. I'm not sure what you would GROUP BY, perhaps the flight?
You can specify a different retention policy when writing to the new series but not a new database. In 0.8 the retention policy for a series is determined by regex matching on the series name. As long as you select a series name correctly it will go into your desired retention policy.
EDIT: updates for new questions
How could I specify the retention/duration policies for the series
generated by the cq above?
In 0.8.x, the shard space to which a series belongs controls the retention policy. The regex on the shard space determines which series belong to that shard. The shard space regex is evaluated newest to oldest, meaning the first created shard space will be the last regex evaluated. Unfortunately, I do know if it is possible to create new shard spaces once the database exists. See this discussion on the mailing list for more: https://groups.google.com/d/msgid/influxdb/ce3fc641-fbf2-4b39-9ce7-77e65c67ea24%40googlegroups.com
Can I do something like:
"spaces": [
{
"name": "flight",
"retentionPolicy": "1h",
"shardDuration": "30m",
"regex": "/.*/",
"replicationFactor": 1,
"split": 1
}, ... which would give me a retention of 1h and shard duration of 30m?
That shard space would have a shard duration of 30 minutes, retaining data for 1 hour, meaning any series would only exist in three shards, the current hot shard, the current cold shard, and the shard waiting for deletion.
The regex is /./, meaning it would match any series, not just the 'flight.' series. Perhaps /flight../ is a better regex if you only want those series generated by the CQ in that shard space.

Analysis Services stored procedure performance

I'm writing a stored procedure in .NET to do some complex calculations that can't be written easily in pure MDX. The first problem I'm having is how to retrieve a set of data in a tabular form to pass to my calculation.
My code so far is written below. I would have thought that after we retrieve our value at position **1, we would have all the data in memory to interact with. However, it seems that at position **2, a Query Subcube is issued to the storage engine for each and every day in our range. This is devastating to performance.
Is there something I'm doing wrong? Is there another method I can call to evaluate the set all at once?
// First get the date range that we'd like to calculate over.
// (These values are constant here for example only)
DateTime date = new DateTime(2012, 4, 1);
int dateFrom = KeyFromDate(date.AddDays(-360));
int dateTo = KeyFromDate(date);
string dateRange = string.Format(
"[Date].[Date].&[{0}]:[Date].[Date].&[{1}]",
dateFrom,
dateTo
);
Expression expression = new Expression(dateRange + "*[Measures].[My Measure]");
MDXValue value = expression.CalculateMdxObject(null); // ***1
foreach (var tuple in value.ToSet().Tuples)
{
MDXValue tupleValue = MDXValue.FromTuple(tuple).ToInt32(); // ***2
}
Run SQL Profiler, connect to analysis services, on tab "event selection" check "show all events" and select "get data from aggredations", "get data from cache", "query subsube" and "query subcube verbose".
First read this document http://www.microsoft.com/en-us/download/details.aspx?id=17303 - see page 18 - in order to understand how "query subcube verbose" is working.
Then in Visual Studio (where you're debugging your procedure) in debug mode pass through line **1
and see in SQL Profiler what is queried in verbose - what measure group and what attributes.
Then pass through ***2 and see again in SQL Profiler in verbose events what is queried.
I believe that the set of attributes is different, so it may so happen that in **1 it uses some aggregate, and in place **2 when "value" is present in the tuple - there are no aggregate for this set of attributes, so instead of making "read from aggregations" once it makes "read from measure group cache" several times.
I can't tell more exactly cause I don't have your cube. Try to find this out by "query subcube verbose" events, and try to use BIDS Helper to create necessary aggregations manually (with specific set of attributes) - it may help.

Resources