Site update, testing was fine, after deployment, again fine, once user load increases, FAIL? - sql-server

We are using ASP.NET MVC with LINQ to SQL. We added some features and tested them all to perfection on our QA box. We are using Windows Server 2003 and SQL Server 2005. So when we pushed out changes to the Live web server we also used Red Gate SQL Compare to push new database changes to the LIVE database. We tested again between the few of us, no problems. Time for bed.
The morning comes and users are starting to hit the app, and BOOM. We have no idea why this would happen as we have not been doing any new types of code things that we were not doing before. However we did notice that during the SQL Compare sync the names of all the foreign keys were different between the two databases, not the IDs in the tables, FK_AssetAsset_A0EB67 to FK_AssetAsset_B67EF8 (for example, don't remember the exact number of trailing mixed characters during the SQL Compare), we are not sure why but that is another variable in this problem.
Strangely once this was all pushed out we could then replicate the errors on QA, but not before everything was pushed to LIVE.
QA and LIVE databases are on the same SQL Server, but the apps are on different instances of Windows Server 2003.
Errors generated:
Index was outside the bounds of the array.
Invalid attempt to call FieldCount when reader is closed.
Server failed to resume the transaction.
There is already an open DataReader associated with this Command which must be closed first.
A transport-level error has occurred when sending the request to the server.
A transport-level error has occurred when receiving results from the server.
Invalid attempt to call Read when reader is closed.
Invalid attempt to call MetaData when reader is closed.
Count must be positive and count must refer to a location within the string/array/collection. Parameter name: count
ExecuteReader requires an open and available Connection. The connection's current state is connecting.
Any one have any idea what the heck could have happened?
EDIT: Since we were able to replicate the errors all of a sudden on QA, it might not be a user load issue... Needless to say we all feel really screwed here.

Concurrency always brings bugs out of the woodwork. I'd recommend you check for objects that could be shared among requests (such as static members and singletons) and refactor your code so that as little as possible is shared.
As far as specifics go, for the error "There is already an open DataReader associated with this Command which must be closed first," you may want to try adding MultipleActiveResultSets=True to your connection strings.

It sounds like you're crossing the streams a bit and trying to share DataContexts across requests. My suggestion would be to wire in a dependancy injection framework that creates a new instance of the dependancy for each request.
I use Castle's IoC and wire it into the controller factory so that when it sees a dependancy on a repository it creates a new instance of that repository for each request. If you go this route let me know and I can shoot you a few more resources.

Related

Performing the synchronization with ExecuteOfflineCommand more effectively

I'm wondering is there a way to recognize the OfflineComamd is being executed or internal flag or something to represent this command has been passed or mark it has been executed successfully. I have issue in recognizing the command is passed or not with unstable internet. I keep retrieve the records from database and comparing each and every time to see this has been passed or not. But due to the flow of my application, I'm finding it very difficult to avoid duplicates.IS there any automatic process to make sure commands executed automatically or something else?
2nd question, I can use UITimer to check isOffline() to make sure internet is connected or not on the forms. Is there something equivalent on server page or where queries is written to see internet is disconnected or not. When the control moved to queries and internet is disconnected I see the dialog open from form page being frozen for unlimited time and will not end. I have to close and re-open the app to continue the synchronization process.At the same time I cannot set a timeout for dialog because I'm not sure how long it will take the complete the Synchronization process. Please advise.
Extending on the same topic but I have created a new issue just to give more clarity on my questions.
executeOfflineCommand skips a command while executing from storage on Android
There is no way to know if a connection will stay stable as it requires knowledge of the future. You can work like transaction services do where the server side processes an offline command as a transaction using the approach of 2-phase commit.
In this approach you have an algorithm similar to this:
Client sends command to server
Server returns a special unique ID for the command
Client asks server to perform the unique id
Server acknowledges that the command was performed
If the first 2 stages didn't complete you just do that again. The worst thing that could happen is some orphan commands on the server.
If the 3rd option didn't complete you just do it again. The server knows whether it processed the command and will just acknowledge it if it was already processed.

Network access was interrupted

The Access database just needs to be open and it will usually crash within the next 20-40mins, resulting in the following error message:
Your network access was interrupted. To continue, close the database, and then open it again.
More details:
The database is split, with the back end and front end on a server. The computers are then connected to the server via LAN (ethernet).
Although there are multiple computers connected to the server, the database only has one user at a time.
The database has been fine for almost a year, until this week where this error has started occurring.
We never have connectivity issues with the server.
I have seen several answers saying it is:
the databases fault, as it is starting to corrupt
the servers fault, as it broken, dropping my connection briefly
microsofts fault, they should patch it
I am hoping this is a problem with the database itself, as I am not responsible for the server.
Does anyone have a definitive solution?
I have recently experienced the same problem, and it all started when I moved my DB in an extrernal disk. The same db was working just fine in the local disk, or in the previous external disk. So, i am guessing is just a bug that has to do with the disk letter changing or something like this.
The problem sounds like an unstable LAN connection OR changes the LAN location (e.g. new hardware or changs to admin settings) causing increased latency.
If you have forms in the FE bound to BE tables the latency can cause the connection to be severed resulting in the error you see.
I'm not a network admin but the main culprits I've seen are:
Users connecting to the network using a VPN using an unstable connection (cell phones, crappy wifi, or just bad ISP service).
Network admins capping persistent connections to a share causing disconnects.
Unstable network hardware or bad hardware configuration.
"Switching" between wired and wireless LAN connections.
I don't think the issue is the database other than having bound forms to a BE database which is more of a fundemental design problem than anything else.
Good luck!
I use Access 2010. I had the same issue but solved it in the following ways.
On the external data ribbon, go to the Import & link group and click on Linked Table Manager.
Click on select all.
Click on Ok to refresh the links.
In cases where the path of the BackEnd database file has been changed, browse to the new location and select the new path. This will also refresh the links. This will solve the problem. It did for me.
You wrote, "The database has been fine for almost a year, until this week where this error has started occurring."
Clearly something has recently changed for this to be happening and without narrowing the field of possibilities it's anyone's guess. However, in my experience Jet DB crashes when two or more users are accessing and editing the same record(s) at the same time. So, if you've recently added new users this is a possibility.
Note: Jet is a file-server DB not a client server, which means the app was probably designed for a specific number of front-end users. Without knowing more I would start there.
I resolved my issue on this when I figured out that I had a offline directory setup and the sync was having an issue I turned off the sync and tested it and the error went away.

OData HTTP400 Timeout Error

This is one of the most bizarre problems I've come across since I started using OData for my mobile apps. The OData server I've developed is backed by SQL Express 2008 and this combination has been installed on 50 different servers and/or PCs over the last 15 months. All 50 servers have been running stable with consistent function for large amounts of data.
A couple of days ago one of my clients contacted me indicating that my client app (running on iOS7) was having an odd error come up when POSTing data to their server. The error had an HTTP code of 400 and the error text is "The operation couldn't be completed. (Timeout error 400.)". My first question is: why is a timeout error coming back with a 400 code? Generally when I get timeouts (due to firewall, etc) they're in the 100x range. There is no indication in the event logs on the server of ANY problems occurring. My own logs (stored in the SQL database) show no error (which is odd because I'm using the generic exception catching method in my OData service to log any problems). I haven't got to the step of adding logging of all requests as yet.
The error is only being raised when posting one particular set of data. All other posts from the device are functioning perfectly. I got the client to re-install the app (deleting all data) and then to download the data set that was causing the error. The download worked fine. We began making changes to the data to replicate what the data looked like when the error occurred in incremental changes, posting the change to the server and observing the result. Most of the incremental changes work fine but certain combinations cause the error to occur. One of the increments involves a large volume of changes and that posts fine, but subsequent alteration of any of the objects (sometimes altering as little as 6 characters in a text field) cause the error to occur. And yet in some circumstances altering objects that have already been posted to the server works without a problem.
I wiped the service components from the server and undertook a fresh install. I shifted TCP ports in case 443 had another listener causing problems. I reset the server. None of these change the behaviour of the error.
My last ditch solution is to completely re-install IIS and .NET Framework but I'd obviously like to avoid this as it's not my server... The server is overseas from my current location so debugging isn't really an option. Hoping someone has an idea as to what I can do diagnostically to try and determine the source of this bizarre 'gremlin'.
Have you tried a more thorough traffic analysis using a tool like Fiddler? The "timeout" error does indeed seem odd and what stood from you post was that your server was "overseas". Could there be something with the "times" that are being used/generated, e.g. server time, local time, etc?
Just to confirm, the "same" exact set of data always fails? Can you replicate this via a remote debugger or via localhost? If so, can you turn on "verbose errors"?

App Engine backup never finishes only clue is failure in map reduce worker_callback

Over the last few weeks we have repeatedly failed on doing a complete backup of the data store using the datastore admin tool. We thought the issues had to do with quota errors we were running into so we switched our application from a free to a paid app and we still have problems.
Each time we are attempting to back up to the blobstore and what occurs is that the process never finishes. We see the backup in our Pending Backups list but it never actually completes. We only have a total of 43MB of data right now so we don't see it as a data transfer problem. Looking at our default Task Queues it shows that we have two pending tasks one is a call to /_ah/mapreduce/controller_callback and another is a call to /_ah/mapreduce/worker_callback
The worker_callback racks up its retry count and the only error clue we have is on the Previous Run tab it shows the last http response code to be 500. There is no error message, nothing shows up in our error logs, it just keeps trying over and over again.
We've been able to narrow the backup problems to a specific entity kind for a particular namespace but we can't figure out why that entity kind is failing whereas the others are not. The major difference is the entity kind has a large number of embedded entities, but if the app engine is able to read / put those entities we can't understand why it seems to be having problems backing it up. The particular namespace that the error occurs in has the largest data stored for that entity kind compared to the other namespaces we have setup.
We think if we can see what error is occurring in the worker_callback we may be able to figure out why the backup is failing, or what is wrong with our data that's preventing the backup. Is there something we need to setup / enable through settings / configuration files to give us more detailed information on the backup? Or is there some other avenue we should explore to figure out how to investigate/fix this problem?
I should mention we are using the Java SDK as well as Objectify V3 to work with the data store. We are also backing up data to the Blobstore.
Thank you.
Well with the app engine team's help we figured what the problem was and we worked around the issue. I want to give details in case anyone else runs into this problem.
From issue 8363 the app engine team indicated that from their logs they could see that the map reduce failed because of the large number of properties that our entity kind had. The specific entity kind that was causing the failure had a large number of variable properties that was generating errors when map reduce tried to write out a schema. They indicated that the solution on their end was to ignore entities that were like this in the backup to make it so the backup worked successfully.
What we did to work around the issue and make the backup work was change how we told objectify to store out data. The large number of properties were being created due to our use of the #embedded keyword on a HashMap() class member field. Since the embedded keyword breaks down classes into individual components it was generating a large number of properties. We switched the member field to be #serialized and then ran a conversion process to make it use the new serialized property. This made the backup / restore work again.
You can read more about the differences between embedded and serialized on objectify's website
snielson, would you mind opening an issue on our Public issue tracker here. Remember to add your Application ID so we can further debug this specific scenario.
Thanks!

Clearing Access Cache

I am developing a system in Access talking to a Sql Server backend. I can connect with two separate accounts A and B so that I can control permissions. In particular I have a view which is accessed via a pass through query which is denied to A but allowed by B.
Normally selection of A or B as the login is related to which Access Security Group the user belongs in, but I have set it up so that people in the Admins group (ie me) read the login from an internal access table. I have also created a form (and associated code) that allows an Admin to change this value.
This all works great and does its job perfectly - provided I start up Access from scratch.
It detects I am admin, reads the last value I set in the internal table, connects to the server with the correct login string (I loop deleting and re-creating all the tabledefs using this new connection string) and then displays my first form. I navigate to a button that runs the pass through query. When I click that button it recreates the pass through query, by deleting one with the same name and recreating it with the correct connection string (A or B login) before then running it to output results. If I am A, then it fails with a permission error (which I display and inform the user about), if I am B it works and I get the results.
I have added a system to attempt to change this on the fly for testing purposes. Having changed who Admin should login as (by writing to an internal table), it recalls the startup code, which loops through deleting and re-creating tabledefs and then puts me back at the intial form.
HOWEVER - If I now navigate to the button that runs my permission controlled query, it still deletes and re-creates the query def from scratch, but when I run it, it seems to run in the context of the SQL Server Login it set when I first started access, and not the new SQL Server Login I have just re-created everything with. So the query will run when it shouldn't (of visa versa).
If I exit Access and try again - it starts working properly again.
The only conclusion I can draw from this is that somewhere inside of Access it is caching the ODBC connection string - and instead of using the new one is using the old.
So my question is - is my conclusion correct, and if so how can I tell Access to clear its cache.
I am developing in Access 2010 - for a system that will ultimately be running in an Access 2000 environment - so the file format is an .mdb in the Access 2000 format.
I came to this topic because I had the same question: "How to clear the cache in Access 2010?"
In my case, the problem was that my application somehow "remembered" the entire path to my linked photos, even though I referenced only the file name. One of the links above lead me to search under "File > Current Database > Caching Web Service and SharePoint tables." The option to "Use the cache format that is compatible with MS Access 2010" was already checked, but I enabled the check box for "Clear Cache on Close" and closed the database.
Voila! All previously cached values, including the values for my linked photos, were cleared out. It doesn't appear that this setting affected my ODBC DNS-less connections, but I haven't confirmed this.
**TO CLEAR CACHING, go to File-->**OPTIONS-->Current Database, and scroll down to Caching Web Service and SharePoint tables.****
Can this page about ODBC linked access password reset be what you're looking for ?
As far as I know, there is no way to clear this cache. If you execute a query and supply a different UID/Password for that query, then the permissions you obtained from that act will remain in effect until such time you close down Access.
Thus if you execute another query and supply a "differnt" UID/password, and then later own execute another query with “lower” permissions, the other cached UID/password will be used. So you can (and will) have multiple UID/passwords cached at this point in time - you have no control over which one is used.
The only way around this would be to adopt a separate ADO query – this to my knowledge does not cache the credentials like when using DAO queries.

Resources