Getting multiple authentication errors from MS graph endpoints across different tenants - azure-active-directory

We have a multitenant application, which is getting authentication errors on 30-40 % of the requests today, and there are no warning on azure status page or office status page.
This is cripling our application a lot.
Anymore else experiencing this?
I assume that there are still no where else to report Graph issues than here?

Related

How can I diagnose authentication issues in a custom single tenant Teams app?

I am developing an ASP.NET MVC website. It is hosted in Azure and users are authenticated with AAD for our single tenant. I intend to make the website available in a Teams app, so that my coworkers can navigate to my website via the Teams Windows app and the mobile app.
I follow these steps to integrate security in a web browser: https://learn.microsoft.com/en-us/learn/modules/msgraph-build-aspnetmvc-apps/5-exercise-add-auth
I followed these directions to integrate security with Teams: https://learn.microsoft.com/en-us/microsoftteams/platform/tabs/how-to/authentication/auth-aad-sso. I'm not sure that I did this correctly.
At this point, I can:
Login with a web browser
Login via Teams mobile app
Load Teams in a web browser then load my app (not a use case that I need to support, but this worked and I was not prompted to login. I assume that I wasn't prompted to login because I was already logged in directly in another browser tab).
I cannot:
Login via Teams Windows app -- This is my primary use case unfortunately.
When I try to login with the Teams application on Windows (using the same pages and forms as on mobile), the page just disappears. I'm not prompted with the usual Microsoft login page.
How can I diagnose the cause of the problem? I don't see any obvious errors reported in Teams. Is there any way to get access to the root error?
EDIT:
login.microsoftonline.com is reporting "Your browser is currently set to block cookies. You need to allow cookies to use this service." I'm now aware of the SameSite changes (https://learn.microsoft.com/en-us/azure/active-directory/develop/howto-handle-samesite-cookie-changes-chrome-browser?tabs=dotnet) and I've implemented the recommended SameSiteCookieManager code to address the SameSite issue.
I'm still running into the same issue. No exceptions or errors reported except that Teams does not accept cookies.
May have to break this down further, here's how I would decipher it.
First of all, you will have to figure out if the issue is with Teams or on the Auth side.
Figure out which line of code is executing last? You can debug or write to terminal/logs.
Do you see any exceptions? Ideally debuggers can help or you can add some code to catch any exception.
If it's failing before executing any code, do you see any http requests going out, you can use fiddler for this. See if you are seeing any error codes.
If you are using Windows then check Event Logs for any errors or exceptions for Teams App. Look at the Application logs.
Look out for Audit logs and sign in logs and check if you see any activity in your tenant when you run this app.
https://learn.microsoft.com/en-us/azure/active-directory/reports-monitoring/concept-activity-logs-azure-monitor
Some other points would be to check if you can repro this with a sample app or a different user(elevated user). If there is any additional info do share.

How do you configure a Blazor Server application using Azure AD in a load balanced environment?

I have an intranet Blazor Server application created using the Visual Studio template with the Work or School Accounts authentication option. Everything was working beautifully when running on my local machine and when the app was published to our development environment. However, once I moved the app to our staging environment, the application would sometimes crash after authenticating the user in Azure.
After troubleshooting the issue, I believe the problem to be that our on-premises staging environment is load balanced (mimicking production). Our dev environment is not load balanced. I think what was occurring was that once authenticated in Azure and redirected back to the application, the user doesn't always land on the same server due to the load balancer. This breaks the Signal-R circuit and caused the application to crash. This also explains why the error was random; happening maybe 2 out of every 10 logon attempts. To test this, I removed Azure AD authentication from the application and allowed anonymous access to every page. The crashes stopped.
My question is if anyone knows of any workaround to get Blazor Server with Azure AD authentication working with an on-premises load balancer. I searched all over the web and the only workaround I found was to use sticky sessions with Azure Signal R service. We are not hosting apps on the cloud yet. Is switching to Blazor Webassembly the only option if I want to use Blazor with authentication in my environment? Someone at work suggested switching the application to use our on premises ADFS server. However, wouldn't that encounter the same issue?
For reference, here is the code in startup.cs ConfigureServices method that sets up the Azure authentication in the application:
services.AddAuthentication(AzureADDefaults.AuthenticationScheme)
.AddAzureAD(options => Configuration.Bind("AzureAd", options));
services.AddControllersWithViews(options =>
{
var policy = new AuthorizationPolicyBuilder().RequireAuthenticatedUser().Build();
options.Filters.Add(new AuthorizeFilter(policy));
});
I found the solution to this and am posting it here in case anyone else is facing a similar issue.
It turns out the problem wasn't SignalR or anything specific to Blazor Server. After enabling the developer exception page on the load balanced environment, I saw that the error was "Unable to unprotect the message.State". The application state is encrypted by middleware before the user is authenticated by Azure AD. When Azure AD posts back, it includes that encrypted state which is then in turn decrypted on the client side by the middleware.
The key needed to decrypt is by stored on the web server. When in a load balanced environment, if you land on a different server than where you started, the middleware will then be attempting to decrypt state with the wrong key. This of course results in an error.
To fix this you have to store the keys on a central location like a file share instead of on the server itself. Implementing the fix is actually simple. Include the following line in ConfigureServices in startup.cs:
services.AddDataProtection()
.PersistKeysToFileSystem(new DirectoryInfo(#"\\server\share\directory\"));
There are also options to store keys on Azure if that is preferred.
This post by Kevin Dockx is what finally gave me the answer:
Solving Correlation Failed: State Property Not Found Errors (OpenID Connect Middleware / ASP.NET Core)

Azure AD OpenID login not showing errors on fail

I have been updating a system that has been in place for sometime and finding some clients have issues with login on mobile devices.
I have a test system in place and setup Application in Azure AD and noticed during testing if I login with incorrect credentials, login.microsoftonline.com will show:
Sorry, but we’re having trouble signing you in.
AADSTS50020: User account...
When login to the clients live systems I don't see this error and just get returned to the home page of the application.
The only difference is the client apps are configured with credentials for there Azure AD instance and I cannot access them. These where also built on the legacy App Registrations but that shouldn't be issue (ha). The server side is the same implementation.
Why am I not seeing the AADSTS errors in productions sites?
If you are not seeing any error and are just getting returned to the homepage it seems more likely to be an issue with the Redirect URI or the app registration configuration.
Please confirm that the redirect URIs in your application and in your registration are what they are intended be.
Also, ask them to check the developer tool logs when signing in to see if anything shows up. It might be failing but not triggering the error message.

500 Server Error after User Signup Through Google App Engine using hotmail account

I have an application deployed to Google App Engine.
The application relies on App Engine User API to login and signup. However I noticed that if user signup using hotmail account, after verify the account through OpenID option. App Engine tries to direct the browser to the following URL: https://appengine.google.com/_ah/conflogin?continue=https://myappid.appspot.com/login.do, where /login.do is used in UserService.createLoginURL("/login.do") to create the login URL.
At this step I am getting 500 Server Error as the following. When I check my server log, I couldn't find any request to login.do. Please help.
Error: Server Error
The server encountered an error and could not complete your request.
If the problem persists, please report your problem and mention this error message and the query that caused it.
I got this error when logging in 4 accounts.
It works on logging in 3 accounts.
We can not control the number of accounts a user login at the same time.
I think it is a critical bug in the implementation of Google.
Conclusion, Users API is not usable. The only way is to use OAuth.

ADFS 2.0 - How can I Debug "401 - Unauthorized"

I setup a test Server 2008 box with Active Directory and ADFS 2.0. I have an ASP.NET app which uses WIF to federate identity. ADFS is configured to use Active Directory for identity info. I used WIF to configure the client app to use the ADFS endpoint.
When I attempt to load the ASP.NET app as a user from the browser I am redirected to the ADFS endpoint and am prompted for credentials. I have attempted to login with several users accounts, even resetting passwords but the credentials never seem to be correct and a 401 Unauthorized is returned. I can login to other systems successfully with the same credentials.
I have enabled debug trace in verbose mode and enabled auditing in verbose mode but I can't find any errors or info to help me figure out the issue.
How can I get more info to narrow down the problem?
UPDATE:
I found that this issue is caused by my testing environment. My dev machine is on our corporate domain (acme.com). I created two 2008R2 VMs for a test Domain Controller (notacme.com) and Web Server.
If I attempt to access the website from a computer on the acme.com domain the error described above occurs. If I attempt to access the website from a computer on the notacme.com domain it works.
What can I do to access the website from a computer on the acme.com domain?
Apparently this was caused by the Extended Protection feature built into ADFS. In trying to troubleshoot this issue I had Fiddler running to track the requests/responses but at one point I swear I turned it off to test as well but it still didn't work. Apparently I didn't fully remove the Fiddler proxy because after a IE reboot and with Fiddler not running it worked in IE but found it still didn't work in Firefox or Chrome. This led me to a TechNet article which described the behavior I've been seeing in conjuction with using Fiddler.
http://social.technet.microsoft.com/wiki/contents/articles/ad-fs-2-0-continuously-prompted-for-credentials-while-using-fiddler-web-debugger.aspx
In my experience, every sign-in failure in IIS (including AD FS) is logged in the 'Security' event log as an 'Audit Failure' event, which contains more details. So I would search in the event viewer on the AD FS system, and see what those events have to say. Also in the event viewer, check the 'Applications and Services Logs' -> 'AD FS 2.0' -> Admin event log.
It looks like you did try to look at the HTTP traffic, e.g., using Fiddler. That's good. I presume the problem also occurs when Fiddler is not used?
(Do you perhaps have the problem of a repeated sign-in form, after you entered correct user name and password? Then look at the following answer: ADFS authentication - IE8 works, Chrome fails.)
(I have also seen a case where the initial authentication was successful, resulting in 'Audit Success' events, and then a 401 resulted from a later redirect. Also in this case the event logs on the AD FS system helped.)

Resources