Apache Flink RollingFileAppender - apache-flink

I'm using Apache Flink v1.2. I wanted to switch to a rolling file appender to avoid huge log files containing data for several days. However it doesn't seem to work. I adapted the log4j Configuration (log4j.properties) as follows:
log4j.appender.file=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.file.RollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.file.DatePattern='.' yyyy-MM-dd-a'.log'
log4j.appender.file.MaxBackupIndex = 15
log4j.appender.file.append=false
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
First it complains it cannot find org.apache.log4j.rolling.RollingFileAppender. So I switch it to org.apache.log4j.RollingFileAppender and then it says RollingPolicy and DatePattern are not valid attributes for the RollingFileAppender.
Did anyone else encounter same issues / can you suggest what's wrong with this configuration?

In order to use the RollingFileAppender you first have to add the apache-log4j-extras-1.2.17.jar to your classpath (e.g. adding it to Flink's lib folder).
Next you have to configure it and specify a FileNamePattern before specifying the RollingPolicy. With the following log4j.properties file I can use the RollingFileAppender.
log4j.appender.file=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.file.RollingPolicy.FileNamePattern=logs/log.%d{yyyyMMdd-HHmm}.log
log4j.appender.file.RollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.file.append=false
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n

Related

Flink, odd behavior when using Hadoop Compatibility

I've add Flink Hadoop Compatibility to the project which reads sequence file from hdfs path,
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility_2.11</artifactId>
<version>1.5.6</version>
</dependency>
Here's the java code snippet,
DataSource<Tuple2<NullWritable, BytesWritable>> input = env.createInput(HadoopInputs.readHadoopFile(
new org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<NullWritable, BytesWritable>(),
NullWritable.class, BytesWritable.class, path));
This works pretty fine when I run it inside my Eclipse, but when I submit it via command line 'flink run ...', it complains,
The type returned by the input format could not be automatically determined. Please specify the TypeInformation of the produced type explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.
OK, so I update my code to add type information,
DataSource<Tuple2<NullWritable, BytesWritable>> input = env.createInput(HadoopInputs.readHadoopFile(
new org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<NullWritable, BytesWritable>(),
NullWritable.class, BytesWritable.class, path),
TypeInformation.of(new TypeHint<Tuple2<NullWritable, BytesWritable>>() {}));
Now it complains,
Caused by: java.lang.RuntimeException: Could not load the TypeInformation for the class 'org.apache.hadoop.io.Writable'. You may be missing the 'flink-hadoop-compatibility' dependency.
Some people suggest to copy flink-hadoop-compatibility_2.11-1.5.6.jar to FLINK_HOME/lib, but it doesn't help, still same error.
Does anyone have any clue?
My Flink is a standalone installation, version 1.5.6.
UPDATE:
Sorry, I copied flink-hadoop-compatibility_2.11-1.5.6.jar to the wrong place, after fixing that, it works.
Now my question is, is there any other way to go? Because copying that jar file to FLINK_HOME/lib is definitely not a good idea to me, especially when talking about a big flink cluster.
Fixed in version 1.9.0, see https://issues.apache.org/jira/browse/FLINK-12163 for details

Anybody know if OrcTableSource supports S3 file system?

I'm running into some troubles with using OrcTableSource to fetch Orc file from cloud Object storage(IBM COS), the code fragment is provided below:
OrcTableSource soORCTableSource = OrcTableSource.builder() // path to ORC
.path("s3://orders/so.orc") // s3://orders/so.csv
// schema of ORC files
.forOrcSchema(OrderHeaderORCSchema)
.withConfiguration(orcconfig)
.build();
seems this path is incorrect but anyone can help out? appreciate a lot!
Caused by: java.io.FileNotFoundException: File /so.orc does not exist
at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:142)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768) at
org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:528)
at org.apache.orc.impl.ReaderImpl.(ReaderImpl.java:370) at
org.apache.orc.OrcFile.createReader(OrcFile.java:342) at
org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:225)
at
org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:63)
at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at
java.lang.Thread.run(Thread.java:748)
By the way, I've already set up flink-s3-fs-presto-1.6.2 and had following code running correctly. The question is limited to OrcTableSource only.
DataSet<Tuple5<String, String, String, String, String>> orderinfoSet =
env.readCsvFile("s3://orders/so.csv")
.types(String.class, String.class, String.class
,String.class, String.class);
The problem is that Flink's OrcRowInputFormat uses two different file systems: One for generating the input splits and one for reading the actual input splits. For the former, it uses Flink's FileSystem abstraction and for the latter it uses Hadoop's FileSystem. Therefore, you need to configure Hadoop's configuration core-site.xml to contain the following snippet
<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
See this link for more information about setting up S3 for Hadoop.
This is a limitation of Flink's OrcRowInputFormat and should be fixed. I've created the corresponding issue.

Prepare multi-databases with play framework

I want to prepare my application to be compatible with many databases types. To try it i've used H2, MySql and Postgresql. So 'ive added into build.sbt :
"mysql" % "mysql-connector-java" % "5.1.35",
"org.postgresql" % "postgresql" % "9.4-1201-jdbc41"
and i've added conf/prod.conf with all configuration without database configuration, and 3 files:
conf/h2.conf
include "prod.conf"
db.h2.driver=org.h2.Driver
db.h2.url="jdbc:h2:mem:dontforget"
db.h2.jndiName=DefaultDS
ebean.h2="fr.chklang.dontforget.business.*"
conf/mysql.conf
include "prod.conf"
db.mysql.driver=com.mysql.jdbc.Driver
db.mysql.jndiName=DefaultDS
ebean.mysql="fr.chklang.dontforget.business.*"
conf/postgresql.conf
include "prod.conf"
db.postgresql.driver=org.postgresql.Driver
db.postgresql.jndiName=DefaultDS
ebean.postgresql="fr.chklang.dontforget.business.*"
Add to it i've three folders into conf/evolutions with
evolutions/h2
evolutions/mysql
evolutions/postgresql
with these things user can start my application with this command :
-Dconfig.file=dontforget-conf.conf -DapplyEvolutions.default=true -Dhttp.port=10180 &
And this conf file is
include "postgresql.conf"
db.postgresql.url="jdbc:postgresql:dontforget"
db.postgresql.user=myUserName
db.postgresql.password=myPassword
But with this configuration, when my application try to connect to DB :
The default EbeanServer has not been defined? This is normally set via the ebean.datasource.default property. Otherwise it should be registered programatically via registerServer()]]
So i've tried to add, into my configuration :
ebean.datasource.default=postgresql
but when i add it i've :
Configuration error: Configuration error[Configuration error[]]
at play.api.Configuration$.play$api$Configuration$$configError(Configuration.scala:94)
at play.api.Configuration.reportError(Configuration.scala:743)
at play.Configuration.reportError(Configuration.java:310)
at play.db.ebean.EbeanPlugin.onStart(EbeanPlugin.java:56)
at play.api.Play$$anonfun$start$1$$anonfun$apply$mcV$sp$1.apply(Play.scala:91)
at play.api.Play$$anonfun$start$1$$anonfun$apply$mcV$sp$1.apply(Play.scala:91)
at scala.collection.immutable.List.foreach(List.scala:383)
at play.api.Play$$anonfun$start$1.apply$mcV$sp(Play.scala:91)
at play.api.Play$$anonfun$start$1.apply(Play.scala:91)
at play.api.Play$$anonfun$start$1.apply(Play.scala:91)
at play.utils.Threads$.withContextClassLoader(Threads.scala:21)
at play.api.Play$.start(Play.scala:90)
at play.core.StaticApplication.<init>(ApplicationProvider.scala:55)
at play.core.server.NettyServer$.createServer(NettyServer.scala:253)
at play.core.server.NettyServer$$anonfun$main$3.apply(NettyServer.scala:289)
at play.core.server.NettyServer$$anonfun$main$3.apply(NettyServer.scala:284)
at scala.Option.map(Option.scala:145)
at play.core.server.NettyServer$.main(NettyServer.scala:284)
at play.core.server.NettyServer.main(NettyServer.scala)
Caused by: Configuration error: Configuration error[]
at play.api.Configuration$.play$api$Configuration$$configError(Configuration.scala:94)
at play.api.Configuration.reportError(Configuration.scala:743)
at play.api.db.BoneCPApi.play$api$db$BoneCPApi$$error(DB.scala:271)
at play.api.db.BoneCPApi$$anonfun$getDataSource$3.apply(DB.scala:438)
at play.api.db.BoneCPApi$$anonfun$getDataSource$3.apply(DB.scala:438)
at scala.Option.getOrElse(Option.scala:120)
at play.api.db.BoneCPApi.getDataSource(DB.scala:438)
at play.api.db.DB$$anonfun$getDataSource$1.apply(DB.scala:142)
at play.api.db.DB$$anonfun$getDataSource$1.apply(DB.scala:142)
at scala.Option.map(Option.scala:145)
at play.api.db.DB$.getDataSource(DB.scala:142)
at play.api.db.DB.getDataSource(DB.scala)
at play.db.DB.getDataSource(DB.java:25)
at play.db.ebean.EbeanPlugin.onStart(EbeanPlugin.java:54)
So i don't understand how i can do it.
YES!!! I've found it! After debug mode (etc...)
There was 2 problems.
First problem : I must add a key into my application.conf :
ebeanconfig.datasource
For me (for exemple), postgresql.conf is modified to :
db.postgresql.driver=org.postgresql.Driver
db.postgresql.jndiName=DefaultDS
ebean.postgresql="fr.chklang.dontforget.business.*"
ebeanconfig.datasource.default=postgresql
Second problem : include into play 2.3.x don't works because conf folder isn't added into classpath (ref Load file from '/conf' directory on Cloudbees ) so we must concat prod.conf, postgresql.conf and dontforget.conf into an only single file.
I hope i have helped any other developper...

Disable scalatest logging statements when running tests from maven

What is the method to disable logging on the scalatest log4j messages:
The log4j.properties is as follows:
log4j.rootLogger=INFO,CA,FA
#Console Appender
log4j.appender.CA=org.apache.log4j.ConsoleAppender
log4j.appender.CA.layout=org.apache.log4j.PatternLayout
log4j.appender.CA.layout.ConversionPattern=%d{HH:mm:ss.SSS} %p %c: %m%n
log4j.appender.CA.Threshold = INFO
#File Appender
log4j.appender.FA=org.apache.log4j.FileAppender
log4j.appender.FA.append=false
log4j.appender.FA.file=target/unit-tests.log
log4j.appender.FA.layout=org.apache.log4j.PatternLayout
log4j.appender.FA.layout.ConversionPattern=%d{HH:mm:ss.SSS} %p %c{1}: %m%n
log4j.appender.FA.Threshold = INFO
..
log4j.logger.org.scalatest=WARN
However we are seeing INFO level scalatest log4j messages:
2014-11-30 14:25:57,263 INFO [ScalaTest-run-running-DiscoverySuite] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2014-11-30 14:25:57,493 INFO [ScalaTest-run-running-DiscoverySuite] hbase.HBaseCommonTestingUtility (HBaseTestingUtility.java:startMiniCluster(840)) - Starting up minicluster with 1 master(s) and 2 regionserver(s) and 2 datanode(s)
2014-11-30 14:25:57,499 INFO [ScalaTest-run-running-DiscoverySuite] hbase.HBaseCommonTestingUtility (HBaseTestingUtility.java:setupClusterTestDir(390)) - Created new mini-cluster data directory: /shared/hwspark/target/
Alternatively, you can throw this bit of code anywhere in one of your tests,
org.slf4j.LoggerFactory.getLogger(org.slf4j.Logger.ROOT_LOGGER_NAME)
.asInstanceOf[ch.qos.logback.classic.Logger]
.setLevel(ch.qos.logback.classic.Level.WARN)
which will set all logging to the WARN level.
Those log messages are not actually being printed by ScalaTest, but by something you are using from your ScalaTest tests. The reason "ScalaTest" shows up in them is that ScalaTest does change the name of threads when suites and tests are executed, so that if someone has a suite that hangs forever and does a thread dump to investigate, it is more obvious what test and suite is causing the run to hang. Log4J seems to print out the thread name in square brackets, so that can give you a hint as to where these log messages are coming from.
In my case it was slick.relational I looked at the classpath with the class reported in ScalaTest-run-running-.... information and found the class to be find in a package imported and added that specific package to logback.xml as
<logger name="slick.relational" level="INFO"/>
In your case search for HBaseTestingUtility or the other class reported there to find which jar it contains it and work out your logback logger name from the package prefix.

Tomcat 7 - Retrieve the version of a webapp (versioned WAR)

I've been unable to find any easy way of figuring out the version string for a WAR file deployed with Tomcat 7 versioned naming (ie app##version.war). You can read about it here and what it enables here.
It'd be nice if there was a somewhat more supported approach other than the usual swiss army knife of reflection powered ribcage cracking:
final ServletContextEvent event ...
final ServletContext applicationContextFacade = event.getServletContext();
final Field applicationContextField = applicationContextFacade.getClass().getDeclaredField("context");
applicationContextField.setAccessible(true);
final Object applicationContext = applicationContextField.get(applicationContextFacade);
final Field standardContextField = applicationContext.getClass().getDeclaredField("context");
standardContextField.setAccessible(true);
final Object standardContext = standardContextField.get(applicationContext);
final Method webappVersion = standardContext.getClass().getMethod("getWebappVersion");
System.err.println("WAR version: " + webappVersion.invoke(standardContext));
I think the simplest solution is using the same version (SVN revision + padding as an example) in .war, web.xml and META-INF/MANIFEST.MF properties files, so you could retrieve the version of these files later in your APP or any standard tool that read version from a JAR/WAR
See MANIFEST.MF version-number
Another solution described here uses the path name on the server of the deployed WAR. You'd extract the version number from the string between the "##" and the "/"
runningVersion = StringUtils.substringBefore(
StringUtils.substringAfter(
servletConfig.getServletContext().getRealPath("/"),
"##"),
"/");
Starting from Tomcat versions 9.0.32, 8.5.52 and 7.0.101, the webapp version is exposed as a ServletContext attribute with the name org.apache.catalina.webappVersion.
Link to the closed enhancement request: https://bz.apache.org/bugzilla/show_bug.cgi?id=64189
The easiest way would be for Tomcat to make the version available via a ServletContext attribute (org.apache.catalina.core.StandardContext.webappVersion) or similar. The patch to do that would be trivial. I'd suggest opening an enhancement request in Tomcat's Bugzilla. If you include a patch then it should get applied fairly quickly.

Resources