Kinesis Data Analytics (Flink) - how do I configure environment variables? - apache-flink

we are currently running a flink cluster in a standalone mode on Kubernetes. We have wanted to explore whether we could migrate over to managed flink on AWS (KDA).
But I don't seem to find any documentation or indication that it is possible to inject environment variables? Do these need to be provided as runtime arguments?
Related, is it possible to override default flink configurations that we currently specify in our flink-conf.yml in managed Flink?
Thanks in advance!

I'll answer my own question, I seems there is no way to provide the environment variables in the same fashion you would with configmaps in Kubernetes. Instead, we need to use the Runtime properties that can be defined in KDA. These can then be retrieved using the KinesisAnalyticsRuntime.getApplicationProperties()
As an example:
val params: ParameterTool = ParameterTool.fromArgs(args)
val config = params.get("env", "") match {
case "local" => AppConfiguration.initialize(sys.env)
case _ => // KDA
val kdaProperties = KinesisAnalyticsRuntime.getApplicationProperties()
logger.error(
s"kdaProperties $kdaProperties",
Some(Map("kda" -> kdaProperties))
)
Option(kdaProperties.get("DevProperties")) match {
case Some(kdaProperties) =>
val kdaPropsToMap = kdaProperties.asScala.toMap
AppConfiguration.initialize(kdaPropsToMap)
case None =>
logger.error(s"could not read KDA runtime properties", Some(Map("kda" -> kdaProperties)))
throw new Error(
"unable to read KDA runtime properties"
) // scalafix:ok
}
}
And where the grouping key defined in KDA for the Runtime properties is used to fetch these.
This also means configuring flink-conf.yml will be possible as Runtime properties which then need to be set during runtime (is my understanding)

Related

Is there a way limit the members of a camel-cluster (using KubernetesClusterService) using a pod selector (selection)

I am programmatically creating my CamelContext and successfully creating the ClusterServiceProvider using the KubernetesClusterService implementation. When running in my kubernetes cluster, it is electing a leader and responding appropriately to "master:" routes. All good.
However, I would like to limit the Pod/Deployments that are detected in the cluster member negotiation/inspection. It currently has knowledge of (finds) every Pod in the cluster namespace which includes completely unrelated deployments/instances.
The overall question is how to select what Pods/Deployments should be included in the particular camel cluster?
I see in the KubernetesLockConfiguration there is an attribute called clusterLabels however it is unclear to me how or what that is used for. When I do set the clusterLabels to something syntactically common in kubernetes (e.g. app -> my-app), then the cluster finds no members.
I mention that I am doing this programmatically as there is no spring-boot or other commonly documented configuration of camel involved. Running in Play-Framework Scala.
First, I create a CamelContext
val context = new DefaultCamelContext()
val rb = new RouteBuilder() {
def configure() = {
val policy = ClusteredRoutePolicy.forNamespace("default")
from(s"master:ip:timer://master-timer?fixedRate=true&period=60000")
.routePolicy(policy)
.bean(classOf[MasterTimer], "execute()")
.log("Current Leader ${routeId}")
}
}
Second, I create a ClusterServiceProvider
import org.apache.camel.component.kubernetes.cluster.KubernetesClusterService
val crc = new ClusteredRouteController()
val service = new KubernetesClusterService
service.setCamelContext(cc)
service.setKubernetesNamespace("default")
//
// if I set clusterLabels here, no camel cluster is realize/created.
// assumption is the my syntax for CamelKubernetes is wrong however
// it is unclear from documentation how to make this work.
//
// if I do not set clusterLabels, every pod in my kubernetes cluster is
// part of a cluster-member (CamelKubernetesLeaderNotifier logs that
// the list of cluster members has changed). So I get completely
// un-related deployments in the context of something that I want
// to specifically related, namely all pods with an "app" label of
// "my-app"
//
service.setClusterLabels(Map("app" -> "my-app").asJava)
crc.setNamespace("default")
crc.setClusterService(service)
camelContext.addService(service)
camelContext.setRouteController(crc)
camelContext.start()
camelContext.getRouteController().startAllRoutes()

Using dispatcher to invoke on another thread - Config/Settings is not threadsafe in .Net Core 3.1

Issue:
I am trying to update the GUI from a second thread.
Unfortunately the label stays the same. At first I thought there is an issue with the dispatcher but it works fine. It appears that the configuration is not threadsafe!
Code to reproduce:
I have a Settings File which is mainly used to keep variables persistent during application relaunches:
this is the Update code:
// get the amount of tickets created for me last week
int amountOftickets = JiraInterface.DayStatisticsGenerator.GetLastWeeksTicketAmount();
config.Default.Lastweekstickets = amountOftickets; // int == 12;
// Update GUI on GUI thread
mainWindow.Dispatcher.Invoke(() =>
{
mainWindow.SetIconsAccordingtoConfig();
mainWindow.NumberTicketsCreated.Content = config.Default.Lastweekstickets.ToString(); // int == 0!!
});
Does anyone have an Idea on how to shove the running configuration from the thread who updated it to the Gui thread?
After a quick look in the documentation, it seems that you have 2 options:
You can either use the Dispatcher.Invoke to set your config.Default
on GUI thread.
It seems that .NET Core and .NET Framework support synchronized SettingsBase which is what the Configs are ( https://learn.microsoft.com/en-us/dotnet/api/system.configuration.settingsbase.synchronized?view=netcore-3.0 )
EDIT 1:
Looking into the C# Properties more, it seems that if you look in your config file in Visual Studio under:
Properties -> Config.settings -> Config.Designer.cs
During the class initialization it seems that .NET Framework, already uses Synhronized property which should make the config thread safe
private static Config defaultInstance = ((Config)(global::System.Configuration.ApplicationSettingsBase.Synchronized(new Config())));
It could be that when you created this file, Synchronized wasn't used, so your config file isn't thread safe, especially if it was created in code and not designer/properties
From Microsoft Docs on SettingsBase Synchronized:
The indexer will get and set property data in a thread-safe manner if the IsSynchronized property is set to true. A SettingsBase instance by default is not thread-safe. However, you can call Synchronized passing in a SettingsBase instance to make the SettingsBase indexer operate in a thread-safe manner.

Need to supply DB password to run evolutions at run time - Play + Slick

I need to avoid storing plain text passwords in config files, and so I'm storing the Postgres password externally (in AWS Secrets Manager).
Similarly to the solution provided here:
Encrypted database password in Play + Slick + HikariCP application, I've been able to override dbConfig and supply the password to my DAO classes like this:
trait MyDaoSlick extends MyTableDefinitions with HasDatabaseConfig[MyPostgresDriver] {
protected val dbConfigProvider: DatabaseConfigProvider
override protected val dbConfig: DatabaseConfig[MyPostgresDriver] = secretDbConfig(dbConfigProvider)
def secretDbConfig(dbConfigProvider: DatabaseConfigProvider): DatabaseConfig[MyPostgresDriver] = {
DatabaseConfig.forConfig[MyPostgresDriver]("", dbConfigProvider.get[MyPostgresDriver].config
.withValue("db.user", ConfigValueFactory.fromAnyRef(getUN))
.withValue("db.password", ConfigValueFactory.fromAnyRef(getPWD)))
}
}
This works great for regular DB queries, however evolutions bypass this and still expect the username and the password to be in application.conf, which kind of defeats the purpose of the password being a secret.
Any advice on how evolutions could get the DB credentials from a function?
I ran into the same issue, and I managed to resolve it like this:
Create a custom application loader, as shown here: https://www.playframework.com/documentation/2.7.x/ScalaDependencyInjection#Advanced:-Extending-the-GuiceApplicationLoader
Inside the custom loader's builder, append the DB configuration parameters for Slick:
val extra = Seq(
"slick.dbs.default.db.url" -> secrets.url,
"slick.dbs.default.db.user" -> secrets.user,
"slick.dbs.default.db.password" -> secrets.pass
)
Nothing else needs to be changed, as you've basically added the configuration needed for anything Slick, evolutions included.
On older versions of Play, we used to do this inside GlobalSettings.onLoadConfig, but, at some point, that has been deprecated in favour of DI. More details here: https://www.playframework.com/documentation/2.7.x/GlobalSettings

Grails 2.4.4 Multiple datasources, separate drivers, IntegrationSpec

I am attempting to use multiple datasources in a Grails 2.4.4 project. According to the docs, this should be possible:
http://www.grails.org/doc/2.4.4/guide/conf.html#multipleDatasources
My primary dataSource (the one I want to use for all domain classes) is using H2 at the moment, as configured by the default DataSource.groovy configuration. My second, read-only datasource is SQL Server, and I tried to declare it as follows at the top level of my DataSource.groovy config (shared by all environments):
ds {
pooled = true
dialect = "org.hibernate.dialect.SQLServer2008Dialect"
driverClassName = "net.sourceforge.jtds.jdbc.Driver"
url = "jdbc:jtds:sqlserver://myserver:1433/mydb;domain=mydomain;useNTLMv2=true;user=myuser"
dbCreate = "none"
}
(Don't let the URL throw you off - I'm just having to use Windows Auth with JTDS. I've tested this via third-party clients as well.)
I inject this into my service class and use it, and everything appears to hook up well:
def dataSource_ds
def serviceMethod(){
Sql ds = new Sql(dataSource_ds)
String query = "SELECT ... "
def results = ds.rows(query)
println "Results are ${results.size()}"
return "Some value"
}
But when I try to access this from an IntegrationSpec-backed Integration Test, I noticed that I was getting "schema not found" errors for valid schemas referred to by my query string, such as "dbo". And the stack trace of any errors from this setup looks like this:
org.h2.jdbc.JdbcSQLException: Schema "DBO" not found; SQL statement:
...
at org.h2.message.DbException.getJdbcSQLException(DbException.java:329)
at org.h2.message.DbException.get(DbException.java:169)
at org.h2.message.DbException.get(DbException.java:146)
at org.h2.command.Parser.readTableOrView(Parser.java:4774)
at org.h2.command.Parser.readTableFilter(Parser.java:1083)
at org.h2.command.Parser.parseSelectSimpleFromPart(Parser.java:1689)
at org.h2.command.Parser.parseSelectSimple(Parser.java:1796)
at org.h2.command.Parser.parseSelectSub(Parser.java:1683)
at org.h2.command.Parser.parseSelectUnion(Parser.java:1526)
at org.h2.command.Parser.parseSelect(Parser.java:1514)
at org.h2.command.Parser.parsePrepared(Parser.java:404)
at org.h2.command.Parser.parse(Parser.java:278)
at org.h2.command.Parser.parse(Parser.java:250)
at org.h2.command.Parser.prepareCommand(Parser.java:217)
at org.h2.engine.Session.prepareLocal(Session.java:414)
at org.h2.engine.Session.prepareCommand(Session.java:363)
...
Now why would THIS datasource be trying to use the H2 driver?
In case it's relevant, my Integration test looks like this:
void "serviceMethod" () {
when: "service method is called"
String response = myService.serviceMethod()
then: "we should get the appropriate text back"
response.equals("Some value")
}
If, in the Service class, I hard-code the connection using a constructor of the Groovy Sql object, the integration test works fine, and any stack traces go through the JTDS driver.But when I try to use the injected datasource, things are strange.
Any idea what I'm doing wrong here?
Just to close the loop on this and hopefully save someone pain on this oversight in the future:
Grails uses an in-memory database when running tests. Make sure to read up on the other differences between integration tests and production here:
http://www.grails.org/doc/latest/guide/testing.html#integrationTesting
This feature makes the use of external (read-only) datasources during any tests pretty interesting, but some of that is to be expected (a test which depends on an external datasource is not a very good test in the long run). I hope to refactor my app and its testing approach at some point (e.g., to use a simple DAO and mock that during the test), because I don't really care about asserting the contents of the external datasource from my app's tests.

Is there any way to trace\log the sql using Dapper?

Is there a way to dump the generated sql to the Debug log or something? I'm using it in a winforms solution so the mini-profiler idea won't work for me.
I got the same issue and implemented some code after doing some search but having no ready-to-use stuff. There is a package on nuget MiniProfiler.Integrations I would like to share.
Update V2: it supports to work with other database servers, for MySQL it requires to have MiniProfiler.Integrations.MySql
Below are steps to work with SQL Server:
1.Instantiate the connection
var factory = new SqlServerDbConnectionFactory(_connectionString);
using (var connection = ProfiledDbConnectionFactory.New(factory, CustomDbProfiler.Current))
{
// your code
}
2.After all works done, write all commands to a file if you want
File.WriteAllText("SqlScripts.txt", CustomDbProfiler.Current.ProfilerContext.BuildCommands());
Dapper does not currently have an instrumentation point here. This is perhaps due, as you note, to the fact that we (as the authors) use mini-profiler to handle this. However, if it helps, the core parts of mini-profiler are actually designed to be architecture neutral, and I know of other people using it with winforms, wpf, wcf, etc - which would give you access to the profiling / tracing connection wrapper.
In theory, it would be perfectly possible to add some blanket capture-point, but I'm concerned about two things:
(primarily) security: since dapper doesn't have a concept of a context, it would be really really easy for malign code to attach quietly to sniff all sql traffic that goes via dapper; I really don't like the sound of that (this isn't an issue with the "decorator" approach, as the caller owns the connection, hence the logging context)
(secondary) performance: but... in truth, it is hard to say that a simple delegate-check (which would presumably be null in most cases) would have much impact
Of course, the other thing you could do is: steal the connection wrapper code from mini-profiler, and replace the profiler-context stuff with just: Debug.WriteLine etc.
You should consider using SQL profiler located in the menu of SQL Management Studio → Extras → SQL Server Profiler (no Dapper extensions needed - may work with other RDBMS when they got a SQL profiler tool too).
Then, start a new session.
You'll get something like this for example (you see all parameters and the complete SQL string):
exec sp_executesql N'SELECT * FROM Updates WHERE CAST(Product_ID as VARCHAR(50)) = #appId AND (Blocked IS NULL OR Blocked = 0)
AND (Beta IS NULL OR Beta = 0 OR #includeBeta = 1) AND (LangCode IS NULL OR LangCode IN (SELECT * FROM STRING_SPLIT(#langCode, '','')))',N'#appId nvarchar(4000),#includeBeta bit,#langCode nvarchar(4000)',#appId=N'fea5b0a7-1da6-4394-b8c8-05e7cb979161',#includeBeta=0,#langCode=N'de'
Try Dapper.Logging.
You can get it from NuGet. The way it works is you pass your code that creates your actual database connection into a factory that creates wrapped connections. Whenever a wrapped connection is opened or closed or you run a query against it, it will be logged. You can configure the logging message templates and other settings like whether SQL parameters are saved. Elapsed time is also saved.
In my opinion, the only downside is that the documentation is sparse, but I think that's just because it's a new project (as of this writing). I had to dig through the repo for a bit to understand it and to get it configured to my liking, but now it's working great.
From the documentation:
The tool consists of simple decorators for the DbConnection and
DbCommand which track the execution time and write messages to the
ILogger<T>. The ILogger<T> can be handled by any logging framework
(e.g. Serilog). The result is similar to the default EF Core logging
behavior.
The lib declares a helper method for registering the
IDbConnectionFactory in the IoC container. The connection factory is
SQL Provider agnostic. That's why you have to specify the real factory
method:
services.AddDbConnectionFactory(prv => new SqlConnection(conStr));
After registration, the IDbConnectionFactory can be injected into
classes that need a SQL connection.
private readonly IDbConnectionFactory _connectionFactory;
public GetProductsHandler(IDbConnectionFactory connectionFactory)
{
_connectionFactory = connectionFactory;
}
The IDbConnectionFactory.CreateConnection will return a decorated
version that logs the activity.
using (DbConnection db = _connectionFactory.CreateConnection())
{
//...
}
This is not exhaustive and is essentially a bit of hack, but if you have your SQL and you want to initialize your parameters, it's useful for basic debugging. Set up this extension method, then call it anywhere as desired.
public static class DapperExtensions
{
public static string ArgsAsSql(this DynamicParameters args)
{
if (args is null) throw new ArgumentNullException(nameof(args));
var sb = new StringBuilder();
foreach (var name in args.ParameterNames)
{
var pValue = args.Get<dynamic>(name);
var type = pValue.GetType();
if (type == typeof(DateTime))
sb.AppendFormat("DECLARE #{0} DATETIME ='{1}'\n", name, pValue.ToString("yyyy-MM-dd HH:mm:ss.fff"));
else if (type == typeof(bool))
sb.AppendFormat("DECLARE #{0} BIT = {1}\n", name, (bool)pValue ? 1 : 0);
else if (type == typeof(int))
sb.AppendFormat("DECLARE #{0} INT = {1}\n", name, pValue);
else if (type == typeof(List<int>))
sb.AppendFormat("-- REPLACE #{0} IN SQL: ({1})\n", name, string.Join(",", (List<int>)pValue));
else
sb.AppendFormat("DECLARE #{0} NVARCHAR(MAX) = '{1}'\n", name, pValue.ToString());
}
return sb.ToString();
}
}
You can then just use this in the immediate or watch windows to grab the SQL.
Just to add an update here since I see this question still get's quite a few hits - these days I use either Glimpse (seems it's dead now) or Stackify Prefix which both have sql command trace capabilities.
It's not exactly what I was looking for when I asked the original question but solve the same problem.

Resources