Cross-platform Agent VIP Authentication Error - volttron

I am having trouble getting two agents to communicate across platforms.
I have two virtual machines running on an internal network and one of the VM's has an agent that attempts to connect and publish to the platform on the other VM. The code for the connection and send is the same as in examples like the ForwarderAgent. I know the agents can see each other, and attempt to connect, but the authentication fails.
On the platform I am trying to connect to, I can see the credentials that the publishing agent is presenting itself with. However, the presented credentials are a private key that is generated in
$VOLTTRONHOME/keystores/
every time I start the agent. So the credentials change every time i start the agent.
I am unsure how I can add the agent as a known identity beforehand if I don't know the credentials it will try to use.
I have added the different addresses as known_hosts, and attempted to register the agents between the two platforms using the public keys associated with their agent installations with
volttron-ctl auth add
but the sending agent still presents itself with new credentials. Is there a configuration step I am missing so that the agent will publish with its consistent public key?

When creating an agent to connect to the external platform from an installed agent you should use the following as a guideline of how to do it
````
import gevent
from volttron.platform.vip.agent import Agent
destination_vip="tcp://127.0.0.5:22916?serverkey=dafn..&publickey=adf&secretkey=afafdf"
event = gevent.event.Event()
# Note by specifying the identity, the remote platform will use the same
# keystore to authenticate the agent. Otherwise a guid is used which
# changes keys each time.
agent = Agent(address=destination_vip, enable_store=False, identity="remote_identity")
gevent.spawn(agent.core.run)
if not event.wait(timeout=10):
print("Unable to start agent!"
````
Note this was from https://github.com/VOLTTRON/volttron/blob/master/services/core/ForwardHistorian/forwarder/agent.py#L317, however there is a different mechanism that doesn't require the destination_vip address to be included public and secret keys within it that is in develop.
In addition, the publickey that you mention in the above code does need to be in the auth.json file and/or you need to allow all connections via /.*/ in the auth.json file.
I hope this helps!

Related

How to Delegate Credentials through double hop to SQL Server?

What I am trying to do:
We have a Task Scheduler that kicks off an EXE, which in the course of its runtime, will connect to SQL Server.
So that would be:
taskServer.myDomain triggers the Task Scheduler action
taskServer.myDomain exe runs locally
taskServer.myDomain initiates a connection to sqlServer.myDomain
The scheduled task is associated with a service account (svc_user) that is set to run with highest privilege, run whether the user is logged in or not, and store credentials for access to non-local resources.
The actual behavior
What we are seeing is the Task Scheduler is indeed running as svc_user. It triggers the EXE as expected, and the EXE is also running as svc_user. When the EXE initiates a connection to SQL Server, it errors on authentication.
Looking at the Event Viewer we can see the failure trying to initialize the connection to SQL
Exception Info: System.Data.SqlClient.SqlException
at System.Data.SqlClient.SqlInternalConnectionTds..ctor(System.Data.ProviderBase.DbConnectionPoolIdentity, System.Data.SqlClient.SqlConnectionString, System.Data.SqlClient.SqlCredential, System.Object, System.String, System.Security.SecureString, Boolean, System.Data.SqlClient.SqlConnectionString, System.Data.SqlClient.SessionData, System.Data.ProviderBase.DbConnectionPool, System.String, Boolean, System.Data.SqlClient.SqlAuthenticationProviderManager)
And then looking at the SQL Server logs we can see the root of the issue
Logon,Unknown,Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. Reason: Could not find a login matching the name provided.
The connection initialized by the EXE to SQL Server is trying to authenticate as ANONYMOUS LOGON.
What I have tried
Background
This issue popped up when our IT team started deploying a GPO lockdown in our environments. So in order to get to this point, we first had to add some GPO exceptions to allow the svc_user to:
log on locally
log on as batch job
Progress?
This is where we started being able to capture the ANONYMOUS LOGON error in SQL Server. From there we tried a handful of other GPO exceptions including
Allow Credential Save
Enable computer and user accounts to be trusted for delegation
The actual issue?
So it would appear that this is a double hop delegation issue. Which eventually led me here and then via the answer, here and here.
So I tried adding GPO policies to allow delegating fresh credentials using the WSMAN/* protocol + wildcard.
Two issues with this:
the Fresh credentials refer to prompted credentials while the EXE is running as a service during off-hours and inheriting the credentials from the TaskScheduler
the WSMAN protocol appears to be used for remote PowerShell sessions (via the original question in the serverfault post) and not SQL Service connections.
So, I added the protocol MSSQLSvc/* to the enabled delegation and tried all permutations of Fresh, Saved and Default delegation. (This was all done in Local Computer Policy -> Computer Configuration -> Administrative Templates -> system -> Credentials Delegation)
Where it gets weird
We have another server, otherServer.myDomain, which we setup with the same TaskSchedule. It is setup with the same GPO memberships, but seems to be able to successfully connect to SQL Server. AFAIK, the servers are identical as far as setup and configuration.
The Present
I have done a bit more digging into anywhere I could think that might offer clues as to how I can feed the credentials through or where they might be falling through. Including watching the traffic between the taskServer and the sqlServer as well as otherServer and sqlServer.
I was able to see NTLM challenges coming from the sqlServer to the taskServer/otherServer.
In the case of taskServer, the NTLM response only has a workstationString=taskServer
On otherServer, the NTLM response has workstationString=otherServer, domainString=myDomain, and userString=svc_user.
Question
What is the disconnect between hop 1 (task scheduler to EXE) and hop 2 (EXE to SQL on sqlServer)? And why does this behavior not match between taskServer and otherServer?
So I finally have an update/solution for this post.
The crux of the issue was a missing SPN. The short answer:
Add an SPN for sqlServer associated with the service account SQL services are running as (not the svc_user)
example: SetSPN -S MSSQLSvc/sqlServer.myDomain myDomain\svc_sql_user
Add another SPN like above but w/ the sql service port
example: SetSPN -S MSSQLSvc/sqlServer.myDomain:1433 myDomain\svc_sql_user
Set the SQL service user account to allow delegation like so

Why my Windows service only establishes connection with database when SQL Server Service runs under Local System account?

My windows service is using integrated authentication and running under Local System account and got the below exception.
The target principal name is incorrect. Cannot generate SSPI context.
The SQL Server Service is running under domain admin user e.g. "domain\administrator". If I change the SQL Server Service to run under Local System account then it fixes the above error.
Can anyone explain why it's happening like this? We have an InstallShield wizard which installs our application on client side i don't know how we can handle this behavior through the wizard. Also changing the user for SQL Server Service is not realistic as well because the client may not allow it.
Note: Once when my windows service works fine and I revert the SQL Server run under the admin account my service runs fine. I guess there are some permissions are set to the local system account.
Before it, I ran the Kerberos which generated the following script to run and fixed the issue. After this it was not required to change the user for SQL Server Service.
SetSPN -d "MSSQLSvc/FQDN" "domain\machine$"
SetSPN -s "MSSQLSvc/FQDN" "domain\administrator"
Please explain why it's happening and what is the best way to handle the situation?
When running under the Local System account, sql-server registers an spn for every service it controls automatcially up to active-directory, and attempts to unregister them when the service shuts down. The Local System account has the ability to communicate over the network as the computer account and thus can indicate to Active Directory as to when to make changes about itself and the SPN SQL Service wants to register. When you change the SQL Server account over to an AD domain user account, the Local System account immediately loses it's ability to control this; therefore you must manually delete the existing SPNs previously registered for that SQL service by Local System before registering new SPNs. You should now notice why its nice that the SQL server script helpfully calls for a deletion of the old SPN followed by the registration of a new one in order to prevent issues. When this isn't done properly - you'll get an authentication error when the kerberos clients obtain a ticket for the old invalid SPN - because it was never deleted and any Kerberos-aware service will always reject a ticket for a wrong SPN. After you make SPN changes, always be sure to restart the SQL Server service and right after that if you’re testing with a user have that user log out and log back in. This answers your main question here.
Please see this Microsoft document for further reading on the subject: Register a Service Principal Name for Kerberos Connections. There's also a very good youtube video on this exact problem, that's where I learned about it and how to resolve it. Ignore "SSRS" in the title, I've watched the entirety and the guidance applies to any and all services by SQL which have SPNs.
You had a secondary question at the very end of your question regarding what is the best way to handle the situation. If you're talking about solving it programmatically that would be very difficult to answer as all environments are different in some way and you will come across SQL instances running in all sorts of different security contexts. In an online forum like this you would probably get different answers from different people. If this were your only question, I think it would get closed by the moderators for "being primarily opinion-based" and likely to attract spam answers. I would suggest you incorporate some kind of guidance about the problem in some form of a Readme file that you should package with the InstallShield wizard.
Side note: I think you should add the kerberos tag to this question - as SPNs are relevant to Kerberos only - and not to any other authentication protocol.

WiX issue with executing SqlScript at the remote DB

Executing SqlScript at the remote DB causes an error:
Failed to connect to SQL database. (-2147467259 myDB1)
The SqlScript is the following:
<sql:SqlString
Id='UpdateSomething1'
SqlDb='myDB1'
ExecuteOnInstall='yes'
User='SQLUser'
ContinueOnError='no'
ExecuteOnReinstall='no'
ExecuteOnUninstall='no'
Sequence='26'
SQL='[SqlString]'/>
where the Db is:
<sql:SqlDatabase
Id='myDB1'
Database='myDB1'
Server='[DATABASE_SERVER]'
CreateOnInstall='yes'
DropOnInstall='no'
DropOnUninstall='no'
ContinueOnError='no'/>
and the user is:
<util:User
Id="SQLUser"
Name="myUserName1"
Password="password1"/>
The problem does not occur with the local DB.
We extracted more specific error message from the IP traffic (the actual error that the remote MSSQL server throws):
Can not open database "myDb1"
requested by the login. The login
failed. {remote machine name} Login
failed for user {user name}
Thank you for any help and information.
Max
I would need more information to be sure but here are some general observations I've had over the years.
In MSI, you typically run deferred custom actions with no impersonation so that they run as Administrator to support managed/elevated installs where the invoking user doesn't have admin either because they really don't or because UAC hasn't elevated their process.
In InstallShield, and I'm sure WiX is similar, this typically causes a problem for remote database connections. If you have a dialog in the UI sequence to test the connection it will succeed ( when expected to ) because the interactive user has permissions to that database/instance. And if installing locally it will succeed because SYSTEM (typically) has permissions the database/instance. But when installing to a remote instance it will frequently fail because SYSTEM can't authenticate against SQL on the remote machine. Your mileage will improve if using sql authentication ( e.g. SA ).
Personally I have some practices that I follow. If I'm creating a single tier system, I restrict the database to (local). If I'm creating a 2 tier system, I create two installers: one for my database layer which I restrict to (local) and one for my application layer which I then reuse the sqllogin dialog to verify connectivity and write the values out to a web.config or app.config. This allows me to loosely couple the layers and service them independently of each other.
I hope this helps to understand the types of issues that can be encountered. I don't know your exact problem without seeing your environement.
The WiX custom actions are just using standard OLEDB commands to connect to the remote server. If the credentials work locally but not remotely then I'd start by ensuring the credentials are correct. There isn't anything different in the WiX custom actions between local and remote servers.
Looking at your database element I would say that you have not added the User attribute to the sql:SqlDatabase so it is creating the database impersonating the current user.
Try:
<sql:SqlDatabase
Id='myDB1'
Database='myDB1'
Server='[DATABASE_SERVER]'
User='SQLUser'
CreateOnInstall='yes'
DropOnInstall='no'
DropOnUninstall='no'
ContinueOnError='no' />

VB.Net Secure Passwords to Database?

I recently made a small app for a friend and then made it a public app, in doing so I forgot that it connects to my MS SQL DB and checks for values. Someone used Red Gate .Net Reflector to get my password and destroy it all. I've contacted their ISP and they are looking into it, apparently this person has a static ip with them.
So this is a lesson learned at a heavy price for me. How can I prevent this from happening again? How can I get away from the unsafe connection string they were able to use?
Never hard code connection strings. Use the configuration section provided for it (connectionStrings), and if really paranoid, encrypt it.
If you are using a shared database, you should not even have a connection string on the client, but create a service point (for example a webservice) that will connect to the database on their behalf. The client can connect to this and your connection string is safe behind your service, which is in your control and on your server.
Don't expose a database connection, but have your app communicate through a webservice, or similar, that only has methods and privileges, to do what the app needs.
If you absolutely need the database connection, make sure the user only has read permissions on the database.
Encrypting the connection string is a start, but your program will have to know how to decrypt it for it to be useful. If your program can decrypt it, an attacker will also be able to - you can only affect the amount of work he needs to put in it.
Therefore, in my opinion, you should expose a read-only service.
If it's a public app, you need to provide individual logins for each user or have a proxy sitting between the database and the application which authenticates the users and talks to the database.
Encrypting the connection string wouldn't help much, I think it can be easily decrypted with built-in tools or with Crack.net.
If you're suuuper paranoid, prompt the sysadmin for the password each time the application starts (maybe an admin interface.) That way it's only memory resident.
I love this question. Like driis said, even with encrypted connection strings you need to store a password (or key, or whatever) to decrypt your encrypted connection string. Just more layers of the same problem.
Using connection strings and encrypted sections in you config will won't stop this type of attack, it's only designed to make the config file unreadable on a machine other than which it is installed.
The only safe way is to create a web service that connects to your database to retrieve the data, and then make sure that the web service logon only has the minimum permissons required, or force the user of the web service to logon and them impersonate that user for the database connection.
It appears you don't have firewall protection to stop external connections directly to your database so I wonder what other even more dangerous ports you may have exposed to the internet???
Using a firewall to limit access to your server to http, and https protocols would reduce the chances of a successful attack.

ActiveDirectory Provider fail over Best Practices

ActiveDirectory Server 2003
I am using the ActiveDirectoryMembershipProvider and ADroleProvider. They work great. Until my active directory server restarts in the middle of the day to get updates. (I'm not in charge of the server and can't change this). When this happens, for the five minutes the server is rebooting, my users can't use my website because I've tied my menu to the Role Provider. So, here are my questions:
Is it possible to tell my RoleProvider to use the "next" available ADS? If so, how so that while the initial one reboots, I don't frustrate my users with ADS connection messages?
Should I be using some kind of connection pool that automatically reconnects to the available server? If so, how?
Let's imagine that all my active directory servers go down. Is there a way to keep my web application running? Obviously there are bigger problems if all servers are down, but what I'm after is a possible "disconnected" active directory authentication that will still move forward if the server somehow goes kaput. Is this wise AND possible?
You probably have the server connection string set to "server01.domain.local". If you change it to just "domain.local" you're no longer depending on "server01" being online. Instead you will use the Round Robin feature of Active Directory DNS to get a list of all domain controllers and use one that's online. (I don't think your admins reboot all of the domain controllers at the same time...)
Also try running nslookup domain.local a couple of times in succession in a command prompt to see the order changing.

Resources