Jenkins + Active Directory Authentication - Slow Login - active-directory

I have a Jenkins master running on Windows 2008 SP2 set up with Active Directory authentication. The authentication is working fine and normally there is no issue with Login.
Occasionally however Jenkins will take 4 to 5 minutes to log a user in. This seems to correlate with the amount of time a user has been inactive (i.e. A user who has not logged in for 2 or 3 weeks will experience extremely slow response when trying to log in).
Has anyone else experienced this behavior? I'm really not sure if I should start looking at active directory or Jenkins to troubleshoot this.

The plugin maintainers actively suggest to enable logging (using "hudson.plugins.active_directory" to ALL) and file a bug if a problem happens.

Jenkins slow login with Active directory seens to be almost alway's related to DNS issue. Check your dns Service (srv), NS, DomainDnsZones, _ldap.tcp.domaine.com, _gc._tcp.domaine.com and ForestDnsZone response from your AD/DNS server. If you can't reach all ip/port listed, you will face some random slow loggin (30 or 60 seconds, depend on the query) when the jenkins AD pluggins will try to query those servers/services.
You can easly trap the dns query with tcpdump or wireshark on your jenkin's server or on the DNS server.
tcpdump -i interface port 53
specifying sites can help you to make a potable workaround as that will limit slow loging to only bad result when querying inexistant/unreachable ip/port return by ForestDnsZone.

Related

Setting up samba 4 AD with an LDAP backend

Case:
For a couple of months now I've been following various tutorials, documentation and examples but somehow my end result always ends up not working like in any of the tutorials.
What I need to do is set up an active directory using Samba 4.0 on an Ubuntu Server 16.04 LTS. The samba should use a ldap-backend that is running on another Ubuntu Server 16.04 LTS. Windows clients will use the lan to login to the domain with ldap accounts.
A bonus would be to have a master-master connection from that ldap server to another ldap server, but since I already succeeded in doing something similar like that I will focus on the problem of setting up the Samba with Ldap backend.
I'm getting pertty frustrated since even though I follow tutorials and read a lot about the subject, it somehow never ends up in the result in which I can actually login to the domain, be it a samba account, be it ldap. The only thing close to this is that I at some point was able to login with a unix account, but no active directory services at that time.
Documentation that I followed:
https://help.ubuntu.com/lts/serverguide/samba-ldap.html
https://wiki.samba.org/index.php/Samba,_Active_Directory_%26_LDAP
https://help.ubuntu.com/lts/serverguide/samba-dc.html
https://www.techrepublic.com/article/how-to-configure-ubuntu-linux-server-as-a-domain-controller-with-samba-tool/
Steps performed:
Used servers:
- cloud.smoothalicious.info
- router.smoothalicious.info
- monfig.smoothalicious.info
In this order:
Installed ldap on both cloud and router. After which I implemented replication services succesfully. Cloud is the master (producer) and router is the slave (consumer). After this I imported the samba scheme and added the samba indices on the master ldap (cloud). Although replication was succesfull before, it failed with the samba indices without any error messages in syslog, auth.conf or any logs of ldap. Manually I added the indices on my own, giving up on replication at that time.
On monfig I installed Samba 4.0 and used the samba provision tool to configure it. Although I could finally find the active directory through a Windows 10 client, I could not login to it with a samba user account which I added to the domain.
The above steps are that of my previous setup, the new one follows.
Since this obviously was a big bust I decided to start over with a new tutorial. This was just setting up a Samba AD with a ldap-backend. (source: https://www.unixmen.com/setup-samba-domain-controller-with-openldap-backend-in-ubuntu-13-04/) This time I got as far as populating the ldap tree with smbldap-populate, which was succesful. Unfortunatly I was not able to find those groups with getent group. The error I get is:
nss_ldap: failed to bind to LDAP server ldapi:///cloud.smoothalicious.info: Can't contact LDAP server
Side note:
I don't seek answers, although they are welcome. I seek a tutorial that I can follow that does not end in me having different results that the tutorials shows me, even though I followed it in the detail <- this is frustrating, and it happens a lot.
LDAP backend for samba 4 is not supported:
https://wiki.samba.org/index.php/FAQ#Do_Samba_AD_DCs_Support_OpenLDAP_or_Other_LDAP_Servers_as_the_Back_End.3F
there's some work being done with it but it's far from being ready for production.
lot of people is asking for it but it seems that samba devs adopted a make-all-other-systems-acomodate-to-me approach.

Web app fails over to partner in mirror very slowly

I have 3 SQL servers in a mirror, one acting as witness with a DB mirrored with high safety. When I fail over from principal to mirror, the app goes down and sometimes comes back up after 2 refreshes, sometimes 5 minutes. It keeps trying to connect to the principal (I can see the denied attempts in the log for the user). These are not named instances.
I've tried with and without including "Failover Partner" in the connection string (shouldn't need it with a witness I think). And I am using DNS names for the principal and mirror, matching what it is in "mirroring_partner_instance". I've tried adding a timeout to the connection string and specifying on TCPIP. I've also turned on SQL browser. Just things I have tried based on other posts.
I remember mirror fail overs being very fast, almost instant. Am I just being too impatient? Is there a way to speed up the retry?

SQL Server error log entry : Error: 17806, Severity: 20, State: 14

I have error in my log for a few weeks, I searched a lot but I couldn't found useful answer.
I did close SQL Server port for public IP, But I have problem yet.
Error: 17806, Severity: 20, State: 14.
SSPI handshake failed with error code 0x8009030c, state 14 while establishing a connection with integrated security; the connection has been closed. Reason: AcceptSecurityContext failed. The Windows error code indicates the cause of failure. The logon attempt failed [CLIENT: 10.10.3.25]
Time raised: 27 Jan 2015 2:23 PM
It was raised error while this system was off.
The Scenario –
A couple of separate individual Windows ID’s started generating these errors while attempting connections, all other windows logins were working properly. The connections were initially happening through applications, but also occurred through sqlcmd. When logged in to the server locally with the offending ID’s the connections to SQL would succeed.
The Troubleshooting process –
Check all the regular SSPI issues, I wont bore you with the details as they are easily searchable
A relatively easy way of checking the “easy” authentication issues If possible/appropriate is to log into the SQL Server locally with the offending ID and fire up sqlcmd and connect to the server via sqlcmd –Sservername,port –E (by specifying the port you force TCP/IP instead of LPC, thereby forcing the network into the equation)
Verify whether the login is trying to use NTLM or Kerberos (many ways to do this but simplest is to see if there are any other KERBEROS connections on the machine)
SELECT DISTINCT auth_scheme FROM sys.dm_exec_connections
If Kerberos is in use, there are a few additional things to verify related to SPN’s, since only NTLM was in use on this server I skipped that
Determine if the accounts were excluded from connecting to the machine through the network through a group policy or some other AD setting
After all of these checked out OK, I began to try and figure out what the error code 0x8009030c meant, turns out, its fairly obvious what the description is : sec_e_logon_denied. This description was so helpful I thought about making this server into a boat anchor but, luckily for my employer the server room is located many miles away and has armed guards.
Since I knew we could logon locally to the SQL Server with the ID that SQL was rejecting with logon denied something else was trying to make my life miserable.
We didn’t have logon failure security auditing turned on so, I had no way of getting a better error description, As luck would have it though this would prove instrumental in finding the root cause. To get a better error message, I found this handy KB article detailing steps needed to put net logon into debug mode.
Say hello to my new best friend! — nltest.exe
After downloading nltest & using it to enable netlogon debugging on the SQL Server, I got this slightly better message in the netlogon.log file
06/15 14:15:39 [LOGON] SamLogon: Network logon of DOMAIN\USER from Laptop Entered
06/15 14:15:39 [CRITICAL] NlPrintRpcDebug: Couldn’t get EEInfo for I_NetLogonSamLogonEx: 1761 (may be legitimate for 0xc0000064)
06/15 14:15:39 [LOGON] SamLogon: Network logon of DOMAIN\USER from Laptop Returns 0xC0000064
The error code 0XC0000064 maps to “NO_SUCH_USER”
Since I was currently logged in to the server with the ID that was returning no such user, something else was obviously wrong, and luckily at this point I knew it wasn’t SQL.
Running “set log” on the server revealed that a local DC (call it DC1) was servicing the local logon request.
After asking our AD guys about DC1 and its synchronization status, as well as whether the user actually existed there, everything still looked OK.
After looking around a bit more I discovered this gem of a command for nltest to determine which DC will handle a logon request
C:\>nltest /whowill:Domain Account
[16:32:45] Mail message 0 sent successfully (\MAILSLOT\NET\GETDC579)
[16:32:45] Response 0: DC2 D:Domain A:Account (Act found)
The command completed successfully
Even though this command returned “act found” it was returning from DC2. (I dont exactly understand why the same account would authenticate against 2 different DC’s based on a local desktop login or a SQL login but it apparently can)
After asking the AD guys about DC2 the light bulbs apparently went off for them as that server actually exists behind a different set of firewalls, in a totally different location. While DC2 would return a ping, the console wouldn’t allow logons for some reason. After a quick reboot of DC2, and some magic AD pixie dust (I am not an AD admin, if it wasn’t totally obvious from my newfound friend nltest) the windows Id’s that were having trouble started authenticating against DC3 and our SSPI errors went away.
Interesting tidbit — During troubleshooting, I found that this particular SQL Server was authenticating accounts against at least 5 different DC’s. Some of this might be expected since there are different domains at play but, I haven’t heard a final answer from the AD guys about whether it should work that way.
The solution
Reboot the misbehaving DC, of course there may be other ways to fix this by redirecting requests to a different DC without a reboot but, since it was misbehaving anyway, and the AD experts wanted to reboot so we went with that. A reboot of SQL would have likely solved this problem too but, I hate reboot fixes of issues, they always seem to come back!
reference

Server crashed from traffic spike, now getting database connection error

So I posted a new blog on my site and promoted it on my facebook where the traffic spike was far bigger than anticipated, the server went down from the volume of traffic and after it was rebooted I am now getting a database connection error.
I contacted my server host and they told me this:
"I was able to get the relevant database details from the wp-config.php file in the home directory for your site and, using those creds I am able to connect to the relevant database without a problem.
To be sure that I was able to connect AND make a query to the database I have also created a simple test script that can be viewed at http://yoursite.com/mysqltest.php
This confirms that the server is responding correctly and that the database itself is able to accept connections and queries.
This leaves us with the likelihood that the issue lies with the scripting/configuration of the wordpress installation which is not something I am going to be able to assist you with.
I suspect that the problem lies with the wp-config.php file but cannot be certain."
I can't see how the wp-config would have changed, I haven't touched it in over a month and it's been working fine otherwise. The website was also working fine after I posted that blog, it was only after the server was rebooted that it doesn't. All the other sites on the server remain in perfect working condition. I don't see how a traffic spike could have done this. I'm lost as to what to do next? Please help! :(
D
Try this database connection test script https://gist.github.com/162913

ActiveDirectory Provider fail over Best Practices

ActiveDirectory Server 2003
I am using the ActiveDirectoryMembershipProvider and ADroleProvider. They work great. Until my active directory server restarts in the middle of the day to get updates. (I'm not in charge of the server and can't change this). When this happens, for the five minutes the server is rebooting, my users can't use my website because I've tied my menu to the Role Provider. So, here are my questions:
Is it possible to tell my RoleProvider to use the "next" available ADS? If so, how so that while the initial one reboots, I don't frustrate my users with ADS connection messages?
Should I be using some kind of connection pool that automatically reconnects to the available server? If so, how?
Let's imagine that all my active directory servers go down. Is there a way to keep my web application running? Obviously there are bigger problems if all servers are down, but what I'm after is a possible "disconnected" active directory authentication that will still move forward if the server somehow goes kaput. Is this wise AND possible?
You probably have the server connection string set to "server01.domain.local". If you change it to just "domain.local" you're no longer depending on "server01" being online. Instead you will use the Round Robin feature of Active Directory DNS to get a list of all domain controllers and use one that's online. (I don't think your admins reboot all of the domain controllers at the same time...)
Also try running nslookup domain.local a couple of times in succession in a command prompt to see the order changing.

Resources