Project Run Time does not start on Sagemaker Studio Lab - amazon-sagemaker

That is the case as of last night. Does not work for CPU or GPU "compute type"
Basically, after pressing the "Start runtime" button, it says "Preparing project runtime..." for about ten minutes and then stops. It shows the following error, "There was a problem when starting the project runtime. This should be resolved shortly. Please try again later."
I have now tried it about five times over last night and this morning.
There is no way to even access the work that is saved there. The "project" will not boot up.
Basically it is a dud at this point.
Is anyone else experiencing similar issues? What does one do?

The issues/questions and answers that I have learned since (because I was asked to clarify):
(1) The environment on Sagemaker Studio Lab is supposed to be persistent. I.e., any time one starts it, the environment, files uploaded, etc. are where it was left last. However, I was not able to start the environment any longer. Before it locked out, it would start fine. Consequently, I was not able to get to my saved work in any way shape or form. I was wondering if anyone has had this issue
Answer: Thus far not too many people are approved for Sagemaker Studio Lab. So, I may have been the first or one of the first to encounter this issue. As of this writing there does not exist a way to access ones data if one cannot spin up a virtual machine that would have access to the data using their "Start runtime" button.
(2) It is not clear where one is supposed to report issues with Sagemaker Studio Lab. On one's home page in Studio Lab, it has a link to StackOverflow under "Get answers and help others". That is how I ended up here. Though, I should have included the following hashtag (#amazon-sagemaker pointing to https://stackoverflow.com/questions/tagged/amazon-sagemaker).
Answer: I eventually found where people are submitting bug reports. And I reported the issue there (see issue 56). https://github.com/aws/studio-lab-examples/issues
(3) It was not clear to me if I was to delete my account and request a new one if I would be put on a waitlist (which supposedly is presently long). I.e., this would be the manual factory reboot option where one still loses all of ones work, but, at least has an opportunity to start again with the environment.
Answer: Once one is approved, one does not go to the waitlist. Deleting my account, requesting a new one, and setting it up to the initial state took a couple of minutes for me. And yes, I lost all my work that was on there. So, back stuff up as it was your computer in the `80s. I.e., back up externally to that environment.

I signed up for ASMSL about 2 weeks ago. As of this evening (Feb 15) I'm able to log into my runtime without any trouble at all.

Related

SSIS package execution stops before finishing

Today I came across a bug that I'd like to share with everyone.
When trying to execute an SSIS package in Visual Studio (2015 and 2017 the following can happen (note this package was executed sucessfully before).
While on the bottom it clearly says the SSIS package is finished, the data flow task is still in progress (and will never finish). What also happens (don't know if consistently) a CMD window pops open with "SQLDUMPER.EXE".
This is not due to the way the steps are configured, since executing them individually might still lead to the proper results.
Note as well that in my case this problem makes Visual Studio incredibly unstable. It is not uncommon that after or during every execution Visual Studio crashes completely and is automatically shut down by Windows.
There are no error messages and I had no idea what the reason is this happened untill... (answer below)
After a lot of googling i've found a reason whya multitude of people seem to have this problem.
Right click the project and then click on properties
Click "Debugging"
Set "Run64BitRunTime" to "True"
Apparently for most people this fixes the problem.
For me however I had to come up with another solution.
Right click the project and click on properties
Click "General"
Set "TargetServerVersion" to either the target SQL server and run the package again OR set it to any SQL server version, run the package to see if it works, and if it doesn't set it to another version.
These solutions are counter intuitive because this problem seems to arrise at a random moment and the target sql server version might have been the same throughout the entire development.
I have not tried to see what happens when I deploy the bugged project to a server and see if it runs there, so any and all extra information on this problem would be appreciated.
I spent a good bit of time googling this problem. It looks like it is being reported by many people around the world. It is usually down to either x32/x64 compatibility or some miniscule issue with sorting or data sizes or compiling a custom C# script (or combination thereof).
None of these things worked for me so I rebuilt the entire package from scratch and the issue was gone.
It is far from perfect, especially if your package is large and/or complex but if nothing else works, this is your last resort.
"Run64BitRunTime" was already set to "True", but I was getting this error. so I restarted the visual studio and problem resolved.
None of the proposed solutions worked for me. I had to rebuild both the solution and the project a couple of times and this fixed it. Of course, I opened and closed the VS a couple of times in between. I guess this is a bug and you have to tinker with it to get it to work.

WSo2 EMM - App Management Database Bug

Running WSo2 EMM 1.1.0, everything has been working just fine except for one big issue.
From the moment I first click on an app in the App Management tab, the WSO2EMM_DB.h2.db file starts to steadily grow as long as the server is running, even with absolutely no changes. Eventually, it gets so big that clicking an app on that tab takes a ridiculously long time to load the list of devices using the app. We're talking 5+ minutes, it becomes completely unusable. I have checked the error logs and found no errors at all, every time.
Restarting the server does nothing to correct the issue. Even if I click an app on the App Management tab once, and never again, the database file will continue to grow. Even restarting the server and not logging into the EMM page, it will continue to grow.
The only thing I've found so far that can possibly help is keeping backup copies of the database file and overwriting the current file when it gets too big. Obviously that's not a solution, as I'd need to create a new backup file every time there's a change on the server, and eventually the database file would grow too big from that too.
It's not an issue with the H2 database either. Not only have I tried starting over fresh several times and have had the same behavior, but here is the only info I could find regarding this issue, and they were having the issue regardless of whether or not it was on H2 or MySQL.
I've been trying to find a solution for this for over a month with no success. Any help would be appreciated!
EDIT: It looks like this might be the subject of EMM-826. Unfortunately there seems to be no response to that bug report so far.
EDIT 2: EMM-826 was closed with a message saying the following:
This issue is fixed in the EMM 1.1.0 GA latest pack. Please get all the patches for the product/build the product from the latest source [ https://github.com/wso2/product-emm ] and try again.
Unfortunately, that did not work for me. I'm not sure what exactly I'm doing wrong, so I'll list the what I did to try to fix it:
Downloaded the EMM 1.1.0 zip from http://wso2.com/products/enterprise-mobility-manager/.
Downloaded the zip from https://github.com/wso2/product-emm and pasted the files from that into my EMM_HOME directory.
When that didn't work, I searched for patches and found I was only using patches 1-6. In the documentation I found I could download patches 7-12 here. Patches 9 and 10 didn't work right for some reason; causing me not to be able to reach the EMM dashboard or publisher. I could only access the Carbon manager. I was able to make patches 7, 8, 11, and 12 work though - with no change in behavior.
Here are the steps I take to reproduce the issue:
After setting a fresh copy of the EMM up, I log in to the EMM dashboard as Admin, set up a user account, and upload an app through the Publisher.
Register a device to the user account I set up. In this case, an Android device running Android 4.2.2.
From the dashboard, I go to App Management and click the app I uploaded. The list of devices loads, but from that point on, the database file starts growing and eventually, after several hours, becomes so large it the device list will never load.
Please help!
Found this happening also, from a quick look it's the WSO2EMM_DB.notifications table. Seems to keep a history of all notifications over time, and the info for app installs is taken from non-optimized queries, which degrade as the table grows. You 'could' delete all rows from the table, and it will re-populate as devices 'check back' and report their info.
But you'd probably want to write a query to just keep the latest notification of each type of each user (I'll leave that to someone else...) and as was mentioned, it is apparently fixed in the latest version.
Issue appears to be resolved in EMM 2.0, which can be found here.

Linq-To-Sql and MARS woes - A severe error occurred on the current command. The results, if any, should be discarded

We have built a website based on the design of the Kigg project on CodePlex:
http://kigg.codeplex.com/releases/view/28200
Basically, the code uses the repository pattern, with a repository implementation based on Linq-To-Sql. Full source code can be found at the link above.
The site has been running for some time now and just about a year ago we started to get errors like:
There is already an open DataReader associated with this Command which must be closed first.
ExecuteNonQuery requires an open and available Connection. The connection's current state is closed.
These are the closest error examples I can find based on my memory. These errors started to occur when the site traffic started to pick up. After banging my head against the wall, I figured out assumed that the problem is inherit within Linq-To-Sql and how we are using the same connection to call multiple commands in a single web request.
Evenually, I discovered MARS (Multiple Active Result Sets) and added that to the data context's connection string and like magic, all of my errors went away.
Now, fast forward about 1 year and the site traffic has increased tremendously. Every week or so, I will get an error in SQL Server that reads:
A severe error occurred on the current command. The results, if any, should be discarded
Immediately after this error, I receive hundreds to thousands of InvalidCastException errors in the error logs. Basically, this error shows up for each and every call to the Linq-To-Sql data context. Only after I restart the web server do these errors clear up.
I read a post on the Micosoft Support site that descrived my problem (minus the InvalidCastException errors) and stating the solution is that if I'm going to use MARS that I should also use Asncronous Processing=True. I tried this, but it did not solve my problem either.
Not really sure where to go from here. Hopefully someone here has seen and solved this problem before.
I have the same issue. Once the errors start, I have to restart the IIS Application Pool to fix.
I have not been able to reproduce the bug in dev despite trying many different scenarios involving multi-threading, leaving connections open, etc etc.
One possible lead I do have is that amongst the errors in the server Event Log is an OutOfMemoryException for the Application Pool. Perhaps this is the underlying cause of the spurious SQL Datareader errors (a memory leak elsewhere). Although again I haven't been able to reproduce this in dev.
Obviously if you are using a 64 bit OS then this is probably not the cause in your case.
So after much refactoring and re-architecting, we figured out that problem all along is MARS (Multiple Active Result Sets) itself. Not sure why or what happens exactly but MARS somehow gets result sets mixed up and doesn't recover until the web app is restarted.
We removed MARS and the errors stopped.
If I remember correctly, we added MARS to solve the problem where a connection/command was already closed using LinqToSql and we tried to access an object graph that hadn't been loaded. Without MARS, we'd get an error. But when we added MARS, it seemed to not care about it. This is really a great example of us not really understanding what the heck we were doing and we learned some valuable (and expensive) lessons from this.
Hope this helps others who have experienced this.
Thanks to all how have contributed their comments and answers.
I understand you figured out the solution..
Following is not a direct solution to the problem; but it is good for others to take a look at
What does "A severe error occurred on the current command. The results, if any, should be discarded." SQL Azure error mean?
http://social.msdn.microsoft.com/Forums/en-US/bbe589f8-e0eb-402e-b374-dbc74a089afc/severe-error-in-current-command-during-datareaderread

What could be reasons a WPF app would pin the CPU and lock the app on some computers but not others?

Stumped here. Posted a similar question before. We have a pretty large WPF app that on some machines runs great, but on others, all of a sudden, one of the CPU cores gets pinned at 100% (just one core) and the app freezes. It usually seems to happen when showing a context menu or a combobox drop-down (i.e. Popup controls) which is why we can't debug this since no user code is executing at that time. It's driving us crazy because again, on most machines it runs fine, but on a few, it freezes.
The odd thing is when we run it in a VM, it runs great there too! Crazy! Not sure what's causing this, or more importantly, where to even begin to look because as I said, no user code is running.
This happens on only about 10% of our machines, but it consistently happens on those machines. All are clean (i.e. relatively fresh OS installs, no crazy apps, etc.) and mostly identical machines spec-wise: similar CPUs, similar RAM, same video drivers and service packs.
So as I stated in the title, can anyone suggest possible reasons why a WPF app would pin the CPU and lock the app on some computers but not others? We're just stumped!
Found it!! Turns out there's a bug in .NET 4.0 regarding UI Automation and the changes MS introduced. Here's the info, and the fix! (Note: Even if you call MS, they will send you a link, but it's always a broken link. I managed to track this down manually.)
Note: Their article talks about a specific case that causes this behavior, but if you google around, you'll see tons of issues around hangs related to those DLLs. The latest is they're promising a fix in the .NET 4.5 runtime (from a MS post on this issue.)
Here's the KB article...
http://support.microsoft.com/kb/2484841/en-us
...and here is the actual hotfix.
http://archive.msdn.microsoft.com/KB2484841/Release/ProjectReleases.aspx?ReleaseId=5583
Crappy video driver? Pull two machines - one where it happens, one where not, and start analyzing differences. Could be hardware defects, bad video drivers, anything in that area. WPF uses the GPU to render if one is there.
Since you seem quite to lack options, i would advice to make a new project with just most basic ComboBox in the Window, doing almost nothing. This should work (check :-) ). Then you add features one by one in the ComboBox and test, for instance when you add command, start with empty one. Do this until it 'breaks'. So you know which feature is the culprit.
You didn t say if all was working with software rendering.

Visual Studio 2008 ContextSwitchDeadlock with log4net and NHibernate

I'm facing an extremely weird bug here and I'm not really sure If I'm following the right path to solving it or even how to solve it.
Here is the problem I'm facing: I start debugging a WPF application which uses log4net, NHibernate and LINQ to NHibernate, and when I try to get an Entity from the database my application and sometimes VS hang for a lot of time, and after a while an exception dialog opens showing a message containing the following information on a ContextSwitchDeadlock MDA:
The CLR has been unable to transition from COM context 0x34fc1a0 to COM context 0x34fc258 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this
I copied the code files to a new project and deleted the old project to see If I could make this message disappear, thinking it had something to do with my configuration. I started adding few things at a time to see what was causing it, and when I included log4net configuration code the bug appeared again. First I included it through AssemblyInfo and later trough code configuration on application startup, and absolutely nothing changed at all :(
So, here are my findings:
It only happens when I'm using log4net.
It happens when NHibernate loads an Entity from database (lazy loading).
I don't know what might be the source of this bug. It only happens when debugging in Visual Studio. I've tried following the steps on the "Enabling and Disabling MDAs" section of the following page: http://msdn.microsoft.com/en-us/library/d21c150d.aspx, but that doesn't work either, VS still hangs and it's memory usage increases.
When I run the program normally none of this happens, so I'm pretty sure this is not a deadlock situation, as this question suggests: contextswitchdeadlock (I've also tried the solutions posted there).
Because of that, I've decided to disable log4net and enable it again when deploying my app.
I'm posting this question to find out if somebody else has faced this bug or if somebody has some suggestions on how to solve it. Finally, it might help somebody else facing this very same problem.
When using the DebugAppender all the entities in the databases are loaded and all its data written to the debug output. That was causing the ContextSwitchDeadlock MDA since it tookmore than 60 seconds to run.
Disabling the DebugAppender solved my problem.
Thanks to Mauricio Scheffer for the tip.

Resources