Interbase 2020 crashes/loops - loops

We use Interbase 2020 as production DB using UTF8 (approx 250 simultaneous user). With this database we have two main problems that we are not able to solve.
In history we had a problem with an older udf-function that crashed our database because it was not ready for unicode string operation. As a result we changed to unicode compatible versions.
The last few years sometimes we get hiccup (as we call it). In this case every client looses connection and the guardian restarts. The clients can connect again without us doing anything.
The second problem is that sometimes the interbase does not crash but everyone looses the connection and it is not possible to reconnect (by client, or ibexpert for example). In this case we have to restart the whole server.
These problems are occuring irregular. Most times it first starts with a hiccup. After a time (maybe two to ten hours later), the second problem arrives and we need to restart our database. If we are lucky we need to restart the server 2-3 times, on a bad day we need to restart the server more often as the second problem returns again and again (for example every 30 minutes).
We are not yet able to locate this problem. It doesn't matter if a user is connected to the database or just idling on weekends. It also often happens when nobody is connected.
Even the server logs don't give hints that helped us yet.
-We minimized udf function use as low as possible, changed to newer udfs that support unicode etc.
-functions that crash the server (afaik) are guarded that they dont get for example invalid datetimes
-We update database server regularely to newest version
-also updated client dlls
-also updated connection components (IBDAC) + Delphi 11.1
-wrote exception tracker in our client software (unfortunately there is only the connection lost error)
-regularely check active transactions if something hangs/loops/snapshot creation
Do you have any information that we could use to solve our problems? Is there any possibility to get more info out of the log files (other log levels possible?)? We don't want to log every procedure call if not necessary, but if there are no other options we need to..
Thanks for your help!

Matze,
I suggest you log a Case with our Support team at Embarcadero (https://www.embarcadero.com/support). They will work with you to understand the specifics of the crash, get relevant details (and Performance Monitoring information) from you, and help us work on a resolution (if not addressed already in our latest update).
We have addressed a few corner cases (and other crash reports) in many updates over the past couple years in InterBase 2020, and are eager to get to the bottom of this issue as well. You can see some of the resolved crash reports at https://docwiki.embarcadero.com/InterBase/2020/en/Resolved_Defects
Supporting 250 simultaneous users is not the problem, but understanding how the use cases are running into any potential system resource limits is important.
You do mention that you have the latest updates to InterBase 2020, but I do not see a build number in your message. You can get the most recent update build (14.4.0.804) of the server (if on Windows) from https://my.embarcadero.com/#downloadDetail/1383

Related

How can I check if the system time of the DB server is correct?

I have got a bug case from the service desk, which was a result of different system times on the application server (JBoss) and DB(Oracle) server. As a result, timeouts lied.
It doesn't happen often, but for the future, it will be better if the app server could raise alarm about the bad time on the DB server before it results in some deeper problems.
Of course, I can simply read
Select CURRENT_TIMESTAMP
, and compare it against the local time. But it is probable that the time of sending the query and getting its result will get some noticeable time and I will recognize good time as bad one or vice versa.
I can also check the time from sending the query to the return of the result. But this way will work correctly in the case of the good net without lags. And if the time on the DB server fails, it is highly probable that the net around the DB server is not OK. The queues on the DB server can make the times of sending and receiving noticeably unequal.
What is the best way you know to check the time on the DB server?
Limitations: preciseness of 5 sec
false alarms <10%
To be optimized(minimized): lost alarms.
Maybe I am inventing the bicycle and JBoss and/or Oracle have some tool for that? (I could not find it)
Have a program running on the app server get the current time there, then query the database time (CURRENT_TIMESTAMP) and the app server gets the current time there after the query returns.
Confirm that the DB time is between the two times on the App Server (with any tolerance you need). You can include a separate check on how long it took to get the response from the DB but it should be trivial.
If the environment is some form of VM, issues are most likely to arise when the VM is started or resumed from a pause. There might be situations where a clock is running fast or slow so recording the times would allow you to look for trends in either direction and allow you to take preemptive action.

Spectre/Meltdown slowing down delphi service

I have a problem with the spectre/meltdown patch from windows (it got released somewhere around Q1 last year). When activated, my delphi REST service is being slowed down about 15 times (so if a request takes 1 second, with the activated patches its about 15 seconds). I have traced the slowdown down to the database connection. Somehow the translation from parameters, after they have all been set, to the sql text, takes really long and then the execution on the database itself takes a lot longer than usual. First I helped myself by cutting down the sql statement to couple of rows, and it got faster (so more rows mean a lot more time. Approximately its like, if you add one more row to an update/insert statement it takes 0.2-0.3 seconds more to process the transaction. As far as I saw it, select statements work fine).
After I got the same issue on other requests, and the application is still in development, I turned of the patches, and everything got a lot faster. Now the administrator insists that the patches are being turned on, and the problem is there again .
Did anybody experience something like this, or is there a possiblity to exclude an application from being targeted by the patches? The strange thing is, I also have an client/server application that is using the same business logic. The client/server application is also being slowed down, but approximately just around the factor of 2. So thats the thing that I dont quite understand. With the same functions, it takes a lot longer from within the service, than from the client/server application.
Ah yes, I am using devart for the database connection, and its an mssql server (2016). The service and the client/server application are written in delphi XE7 (now trying to update do Xe10.2 hoping that this will help)
Thanks

Diagnosing poor Sql Server Service Broker forwarding performance

I've been running Service Broker in my development environment for a few months now and have had perfectly adequate performance, toward 1000 message per second (plenty for my needs).
I've also been running on a cut-down replica of my real production environment which involves a forwarding instance, and for the 1st time today pushed some load through it with terrible results! I'm trying to understand what I've been seeing, but am struggling a bit so I thought I'd put it out to see if anyone can help.
Firstly, messages are being delivered from start, to end, through the forwarder. However when I pushed a few thousand messages, I saw batches of between 20 to 100 being sent followed by delays of a minute or two. The messages are ultimately processed successfully.
Looking at the queue on the Store (the initial sender) there are thousands of messages sat waiting to be forwarded which are trickling out.
The security setup goes like:
Store database -> Certificate -> Forwarding instance -> Windows Security -> Central database
When I switch on profilers I am seeing lots errors:
Some examples on the forwarding instance:
7 - Send IO Error (10054(failed to retrieve text for this error. Reason: 15105))
Forwarded Message Dropped (The forwarded message has been dropped because a transport send error occurred when sending the message. Check previous events for the error.)
And on my 'central' target instance:
A corrupted message has been received. The binary message preamble is malformed.
Broker message undeliverable This message was dropped because it could not be dispatched on time. State: 2
Can anybody help by pointing me towards some checks I could make, or maybe something obvious that I've missed. I know I've got something wrong but just can't see what.
Edit - 14/1/2011 - more information:
Some more information on this - we took our message forwarding instance out of the equation and saw massive improvements immediately - 2000 messages were delivered in seconds.
The architecture uses transport security so we're currently trying to switch over to dialog security as we've read that transport security / forwarding can harm performance. We're hoping Dialog security will somehow optimize what needs to be decrypted by the forwarding instance therefore improving performance.
First thing Monday I want to switch off encryption on the transport layer (between initiator and forwarder) to see if that is where our bottleneck is occurring. Is it possible that this could cause a big overhead in our communications or should one forwarding instance not produce such a big bottleneck?
What SQL Server version?
There were several issues fixed with forwarding performance. I recommend you upgrade to latest SQL Server 2008 R2 and deploy latest cumulative updates. If upgrade is problematic in your environment, you can upgrade only the forwarder instance.
This might be a stupid suggestion, but have you changed the network topology lately? Maybe swapped out a network cable or overheated a switch? If this is occurring suddenly, it sounds more like a physical change than a logical one. I'd check the windows event log on both machines.
Yes, Dialog security is the best approach in conjunction with forwarders. Otherwise overhead will be enormous.

Classic ASP Bottlenecks

I have 2 websites connecting to the same instance of MSSQL via classic ASP. Both websites are similar in nature and run similar queries.
One website chokes up every once in a while, while the other website is fine. This leads me to believe MSSQL is not the problem, otherwise I would think the bottleneck would occur in both websites simultaneously.
I've been trying to use Performance Monitor in Windows Server 2008 to locate the problem, but since everything is in aggregate form, it's hard to find the offending asp page.
So I am looking for some troubleshooting tips...
Is there a simple way to check all recent ASP pages and the see amount of time they ran for?
Is there a simple way to see live page requests as they happen?
I basically need to track down this offending code, but I am having a hard time seeing what happening in real-time through IIS.
If you use "W3C Extended Logging" as the log mode for your IIS logfiles, then you can switch on a column "time-taken" which will give you the execution time of each ASP in milliseconds (by default, this column is disabled). See here for more details.
You may find that something in one application is taking a lock in the database (e.g. through a transaction) and then not releasing it, which causes the other app to timeout.
Check your code for transactions and them being closed, and possibly consdier setting up tracing on the SQL server to log deadlocks.
Your best bet is to run SQL Server profiler to see what procedure or sql may be taking a long time to execute. You can also use Process Monitor to see any pages that may be taking a long time to finish execution and finally dont forget to check your IIS logs.
Hope that helps

Time Dependent, How?

I have a database, which is a part of a Library Information system. It keeps track of the books borrowed by customers, keeping the due dates and automating the notification of accountability of customers, if a customer has returned a book beyond their due date.
Now, I am using MySQL for the DBMS. What I know is that MySQL's time is dependent on the system time. When checking if a borrowed book has already passed its due date, I would compare the current System time with the due date value associated to the borrowed book. Yeah, the database server will actually be running on a PC running winXP.
My problem is, when the system time gets changed, integrity of the data and checking of accountability gets compromised. Is there a way to work around this? Is there a sort of 'independent time' that I could use? Thanks a lot!
NOTE: Yeah, I'm afraid the application does not have a connection to the Internet.
I think you're trying to program around a problem your application shouldn't worry about. Your app gets time from the computer, you need to be able to rely upon that for accuracy. If the time gets changed, then the time was wrong, so what does that mean for old data? How long was it wrong? It's really not something you can solve programmatically.
A better solution is to make sure the time isn't wrong. Use windows time to sync against a time server to ensure accuracy.
If your PC is running within a Windows domain service, you could also choose to have your computer clock constantly synchronize its time with your domain server using the Windows Time Service.
If your PC has internet access, it can actually set its time against US National Institute of Standards Technology time service. Instructions and overview of how to use it can be found at the NIST Internet Time website.
I would configure an authoritative time server in windows XP. Here is a step by step process.

Resources