Creating an unlimited SqlServer Trace - sql-server

On my app i am creating a real time trace (not sure how yet but i am!) and on the sp_trace_create function in SQlServer, i know that the #maxfilesize defaults to 5, but on my app its going to be stopped when the user wants to stop it...any ideas how this can be done?
Because i dont want to have to save the files...im not sure how the rollover works?
Right now im putting it on a timer loop that queries the database with all the specified events on it with a maximum file size of 1(usually doesnt take more then about 2 seconds), merges with the old lot of data in my dgview and deletes the original file. this goes round until the user tells it to stop which will stop the timer from querying the database. Not a solid method but i guess its a start! All i need now is the find out the datatypes of the columns as when im setting my values in the filters they need to go in as the matching datatype to the column... anyone have any clue where i can get a list of the datatypes? msdn have the list but no types...

To start a trace with file rollover, instead of stopping at a maximum size, start the trace like so:
exec #rc = sp_trace_create #TraceID output, 2, N'InsertFileNameHere', #maxfilesize, NULL
where #maxfilesize will define the size reached before a new rollover file will be created.
WARNING: be very careful about performing unlimited tracing. If you fill up a production disc, it's your head not mine!
You can stop a running trace like so:
EXEC sp_trace_setstatus #ID, 0
EXEC sp_trace_setstatus #ID, 2
where #ID is the ID of the trace you want to stop.
See this post.

According to the documentation, what you want to do is not possible:
[ #maxfilesize = ] max_file_size
Specifies the maximum size in megabytes (MB) a trace file can grow. max_file_size is bigint, with a default value of 5.
If this parameter is specified without the TRACE_FILE_ROLLOVER option, the trace stops recording to the file when the disk space used exceeds the amount specified by max_file_size.
I don't see why you can't just cycle through the files and load them into a table or your app. Shouldn't be that hard.

Related

Python & SQL - Python wont actually commit data to SQL DB

This is a 2 part question but Ill get to my first conundrum.
I have some code I'm just trying to test where I want Python to run a stored procedure with the required variables and it runs and even shows the new row. However, if I go into the DB and run a SELECT statement to show me the data, its not there. Its as if the DB blocked it or something strange.
This code WORKS if I run a select statement instead of this stored procedure. The stored procedure works and even when I run this code, Python will run the stored procedure and spit back the new row (complete with the new row number that's auto-generated due to the IDENTITY I have on the column)
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=REPORT\INSTANCE;'
'UID=UserID;'
'PWD=Password;'
'Database=ReportDB;'
'Trusted_Connection=no;')
cursor = conn.cursor()
cursor.execute('EXEC [Schema].[Comments] \'2019-04-18 00:00:00.000\',\'Team\',\'1900-01-01 13:12:16.000\',\'testing this string\',\'username\'')
for row in cursor:
print(row)
Kind of lost here. Python will run the stored procedure (which I have set to run a select at the end of, to prove the data was committed) and Python shows that but I don't see it in the actual DB. That line is gone.
You can see here, the 470 is the ID Column (Identity, no null) and that's what it should be.
Notice that the most recent entry is still 469
Same Instance
Same DB
EDIT
I just noticed something in the Python. Each time I run the Python code, it runs the code and the stored procedure does the select at the end but each time I run it, the CommentsID increases by 1 (as it should) but its NOT remembering the previous ones inserted by Python. The only ones the SELECT statement pulls back in is the one I committed via SQL itself.
Notice here that the CommentsID (the first number in each line that starts with a 4) goes from 469 to 471. Yet I just had that image above that shows 470 - where did 470 go?!
Second part:
I'm having a hard time putting variables into that EXEC section of the code. I think I need to wrap it in single quotes which means I need to put \ in front of the quotes I need to stay for the SQL code. But when I do that and then try to run it, I cant get it to pull in the variables.
Here is what the SQL needs to be:
EXEC [schema].[Comments] 'Username'
In Python, I know I need it to be in single quotes but because the SQL code has quotes, you typically just put \ in front like this:
'EXEC [schema].[Comments] \'Username\''
That works. However, I then want username to pull from a variable, not be a string.

What are the reasons of increasing the values number of open objects?

The number of open objects in the sybase database is increasing badly
and having this error :
Increase the config parameter 'number of open objects' to avoid descriptor reuse.
At first the values of `open objects' was 100000
sp_monitorconfig "open objects"
go
Name Num_free Num_active Pct_act Max_Used Reuse_cnt Instance_Name
number of open objects 1223 90380 95.25 92380 9269
I changed the value from 100000 to 160000 and still the value is increasing.
Is there a way I can know what are the objects increasing ?
what are the causes of increasing the values and how to stop increasing in such way ?
When I've seen this issue (ever increasing descriptor usage for open objects), I've tracked the issue back to an application that is generating a large volume of prepared statements (eg, instead of re-using a prepared statement for repetitive DML statements, the app creates a new prepared statement for each DML statement).
In Sybase (now SAP) ASE, prepared statements are converted into 'lightweight procedures' (aka LWPs; think 'temporary procedures') which in turn require their own descriptor.
To find out if this is a LWP issue:
grant sybase_ts_role to your login
run dbcc traceon(3604)
run dbcc des
NOTE: dbcc des will generate a LOT of output so make sure you capture it to a file!!
In the 'dbcc des' output the LWP's show up with the following attributes:
exist in login's tempdb
have negative object id's
have names like *dddddddddddddd_hhhhhh (where 'd' == decimal digit, 'h' == hex digit) OR ...
may have names like *aadddddddddd_dddddddddaa* ('d' == decimal, 'a' == alphabetical character)
objssystat = O_PROC
objsysstat2 = O2_LWP
To find the offending connection(s) ... you may be able to pull the spid from the LWP name (dbcc des output) or from master..monCachedProcedures column (look for procs with names like *sq##########ss* and *ss#########ss* ... something that looks like system-/auto-generated names).
NOTE: Depending on ASE version (11? 12? 15? 16?) the LWP name format may vary so you may have to do some digging to find the associated spid.
For LWPs where the spid is part of the name, the spid is likely the first 5 digits of the (dbcc des) object name; so for the following we see the spid = 61
*00061000000606_9d5317
*00061000000626_a149eb
*00061000000606_9d5317
*00061000000589_63ea4e
This topic has come up many times over the years, and you can review some of my ramblings in the following links: here, here, here and here

Not all LSNs map to dates

I'm building an ETL that processes data from SQL server's change data capture feature. Part of the ETL is recording logs about the data that is processed including the data import window start and end. To do this I use the function sys.fn_map_lsn_to_time() to map the LSNs used to import the data to the corresponding datetime values.
The function sys.fn_cdc_get_all_changes_() takes two parameters that are the start and end of the data import window. These parameters are inclusive so the next run needs to increment the previous LSN to avoid re-importing rows that fall on the boundary.
The obvious answers is to use the function sys.fn_cdc_increment_lsn() to get the next LSN before bringing in the data. However, what I found is that this LSN does not always map to a datetime using sys.fn_map_lsn_to_time(). The LSN is valid for use in the sys.fn_cdc_get_all_change_() but I would like to be able to easily and accurately log the dates that are being used.
For example:
DECLARE #state_lsn_str CHAR(22) = '0x0000EEE100003E16008F'; -- try using `sys.fn_cdc_get_min_lsn(<capture_instance>)` instead since this value won't work for anyone else
DECLARE #state_lsn BINARY(10) = CONVERT(BINARY(10), #state_lsn_str, 1);
DECLARE #incr_lsn BINARY(10) = sys.fn_cdc_increment_lsn(#state_lsn);
SELECT CONVERT(CHAR(22), #incr_lsn, 1) AS incremented_lsn,
sys.fn_cdc_map_lsn_to_time(#incr_lsn) AS incremeneted_lsn_date;
This code returns an LSN value of 0x0000EEE100003E160090 and NULL for incremented_lsn_date
Is there a way to force an LSN to be mapped to a time?
OR
Is there a way to get the next LSN that does map to a time without risking losing any data?
The reason the value returned from sys.fn_cdc_increment_lsn() doesn't map to a datetime is there was no change recorded for that specific LSN. It increments the LSN by the smallest possible value even if there was no change recorded for that date.
To work around the issue I used the sys.fn_map_time_to_lsn() function. This function takes a relational operator parameter. You can get the next datetime value by using 'smallest greater than' for this parameter. The following code returns the next LSN that maps to a datetime:
DECLARE #state_lsn_str CHAR(22) = '0x0000EEE100003E16008F'; -- try using `sys.fn_cdc_get_min_lsn(<capture_instance>)` instead since this value won't work for anyone else
DECLARE #state_lsn BINARY(10) = CONVERT(BINARY(10), #state_lsn_str, 1);
DECLARE #state_lsn_date DATETIME = sys.fn_cdc_map_lsn_to_time(#state_lsn);
DECLARE #next_lsn BINARY(10) = sys.fn_cdc_map_time_to_lsn('smallest greater than', #state_lsn_date);
SELECT CONVERT(CHAR(22), #next_lsn, 1) AS next_lsn,
sys.fn_cdc_map_lsn_to_time(#next_lsn) AS next_lsn_date;
This code returns what appears to be a logical datetime value for the next LSN. Though I'm unsure how to 100% check that there is no data in any other tables.
The code above has a #state_lsn_date value of 2018-02-15 23:59:57.447 and the value found for the next LSN is 2018-02-16 00:00:01.363 and the integration runs at midnight.
The functions sys.fn_cdc_map_lsn_to_time() and sys.fn_cdc_map_time_to_lsn() use the cdc.lsn_time_mapping table to return their results. The documentation for this table states:
Returns one row for each transaction having rows in a change table.
This table is used to map between log sequence number (LSN) commit
values and the time the transaction committed. Entries may also be
logged for which there are no change tables entries. This allows the
table to record the completion of LSN processing in periods of low or
no change activity.
Microsoft Docs - cdc.lsn_time_mapping (Transact-SQL)
As I understand it that means every LSN value in any change table will be mapped here. There may be additional LSNs but there won't be missing LSNs. This allows the code to map to the next valid change date.
Since all changes will have a mapping in the cdc.lsn_time_mapping table using this method shouldn't lose any data.
Do I sound a little unsure? Well, I am.
I'm hoping someone with a deeper knowledge of the SQL Server Change Data Capture system can confirm whether this is safe or not.

how to increase the sample size used during schema discovery to 'unlimited'?

I have encountered some errors with the SDP where one of the potential fixes is to increase the sample size used during schema discovery to 'unlimited'.
For more information on these errors, see:
No matched schema for {"_id":"...","doc":{...}
The value type for json field XXXX was presented as YYYY but the discovered data type of the table's column was ZZZZ
XXXX does not exist in the discovered schema. Document has not been imported
Question:
How can I set the sample size? After I have set the sample size, do I need to trigger a rescan?
These are the steps you can follow to change the sample size. Beware that a larger sample size will increase the runtime for the algorithm and there is no indication in the dashboard other than the job remaining in 'triggered' state for a while.
Verify the specific load has been stopped and the dashboard status shows it as stopped (with or without error)
Find a document https://<account>.cloudant.com/_warehouser/<source> where <source> matches the name of the Cloudant database you have issues with
Note: Check https://<account>.cloudant.com/_warehouser/_all_docs if the document id is not obvious
Substitute "sample_size": null (which scans a sample of 10,000 random documents) with "sample_size": -1 (to scan all documents in your database) or "sample_size": X (to scan X documents in your database where X is a positive integer)
Save the document and trigger a rescan in the dashboard. A new schema discovery run will execute using the defined sample size.

How do you properly benchmark ColdFusion execution times?

1) What settings in the ColdFusion Administrator should be turned off/on?
2) What ColdFusion code should you use to properly benchmark execution time like getTickCount()?
3) What system information should you provide also like CF Engine, Version, Standard/Enterprise, DB, etc?
What we do is:
In Application.cfc's onRequestStart() -> set tick count value, add to REQUEST scope.
In Application.cfc's onRequestEnd() -> set tick count value, subtract first value from it to get total processing time in ms
We then have a set threshold (say 200ms) and if that threshold is reached we'll log a record in a database table
Typically we'll log the URL query string, the script name, the server name, etc.
This can give very useful information over time on how particular pages are performing. This can also be easily graphed so you can see if a page suddenly started taking 5000ms where before it was taking 300ms, and then you can check SVN to see what change did it :)
Hope that helps!
1) In CF administrator, in Debug Settings, you can turn on Enable Request Debugging Output, which outputs runtime and other debugging information at the bottom of every page. This can be useful if you want to see queries as well. If you want to use timers to you must select Timer Information in the Debug Settings(got hung on that for a hot minute).
2) You can use timers to have custom benchmarks of execution times. There are four types, inline, outside,comment or debug, each corresponding to where the output will be. In inline, it will create a little box around your code(if its a .cfm) and print the total runtime. The others will print in the bottom output that you turned on in CF admin.
3) I don't really know what you should provide. Wish I could help more. In my opinion the more information the better, so that what I would say :P
with respect to #mbseid's answer, request debugging adds a significant amount of processing time to any request, especially if you use CFCs. I would recommend you turn request debugging off and use getTickCount() at the top and bottom of the page and then take the difference to get the time to render that page. This will give you a much closer reflection of how the code will perform in production.

Resources