sbt multi-project war packaging

sbt multi-project war packaging - package

I've been trying to find the way to package a web application composed by multiple projects into a single war. The only usefull plugin I've found for sbt is xsbt-web-plugin. I'm trying to use the War plugin inside that github hosted project, but only de dependencies of the web project are included in WEB-INF/lib (not the other projects defined in the build neither their dependencies). My build file is:
import sbt._
import Keys._
import com.github.siasia._
import WarPlugin.warSettings
object BuildSettings {
import Dependencies._
val buildOrganization = "ar.edu.itba.it"
val buildVersion = "1.27"
val hibernateVersion = "3.6.7.Final"
val guiceVersion = "3.0"
val wicketVersion = "1.5.4"
val jerseyVersion = "1.12"
val buildSettings = Defaults.defaultSettings ++ Seq(
organization := buildOrganization,
version := buildVersion,
crossPaths := false,
publishArtifact in (Compile, packageDoc) := false,
publishArtifact in (Compile, packageSrc) := false,
parallelExecution in ThisBuild := false,
libraryDependencies ++= Seq(slf4j_api, slf4j_log4j12, junit, hamcrest_all, mockito_all, h2, joda_time, guava, junit_interface),
javacOptions ++= Seq("-source", "1.6", "-target", "1.6"),
testOptions += Tests.Argument(TestFrameworks.JUnit))
}
object SgaBuild extends Build {
import BuildSettings._
import Dependencies._
import com.github.siasia.WebPlugin._
val compileAndTest = "test->test;compile->compile"
lazy val root = Project(id = "sga", base = file(".")) aggregate (itba_common, itba_common_wicket, itba_common_jpa, jasperreports_fonts_extensions, sga_backend, sga_rest, sga_wicket) // settings (rootSettings: _*)
lazy val itba_common = Project(id = "itba-common", base = file("itba-common")) settings (buildSettings: _*)
lazy val itba_common_jpa = Project(id = "itba-common-jpa", base = file("itba-common- jpa")) dependsOn (itba_common % compileAndTest) settings (buildSettings: _*)
lazy val itba_common_wicket = Project(id = "itba-common-wicket", base = file("itba- common-wicket")) dependsOn (itba_common % compileAndTest, itba_common_jpa % compileAndTest) settings (buildSettings: _*)
lazy val jasperreports_fonts_extensions = Project(id = "jasperreports-fonts- extensions", base = file("jasperreports-fonts-extensions")) settings (buildSettings: _*)
lazy val sga_backend = Project(id = "sga-backend", base = file("sga- backend")) dependsOn (itba_common % compileAndTest, itba_common_jpa % compileAndTest) settings (buildSettings: _*)
lazy val sga_rest = Project(id = "sga-rest", base = file("sga-rest")) dependsOn (sga_backend % compileAndTest) settings (buildSettings: _*)
lazy val sga_wicket = Project(id = "sga-wicket", base = file("sga-wicket")) dependsOn (sga_backend % compileAndTest, sga_rest % compileAndTest, itba_common_wicket % compileAndTest) settings ((buildSettings ++ warSettings(Compile)):_*)
}
object Resolvers {
val wicketStuff = "Wicket Stuff Repository" at "http://wicketstuff.org/maven/repository"
}
object Dependencies {
import BuildSettings._
... // (plenty of ModuleIDs that are refered from 7 XXX.sbt)
}
Looking at the plugin code it seems it copy all the files from full-classpath configuration into lib directory, but when inspecting full-configuration it doesn't include all the required jars (the ones from the other 7 projects and their dependencies).
Is this the right plugin to use or there is another one?.
Thanks in advance!

Related

I want to write ORC file using Flink's Streaming File Sink but it doesn’t write files correctly

I am reading data from Kafka and trying to write it to the HDFS file system in ORC format. I have used the below link reference from their official website. But I can see that Flink write exact same content for all data and make so many files and all files are ok 103KB
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html#orc-format
Please find my code below.
object BeaconBatchIngest extends StreamingBase {
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
def getTopicConfig(configs: List[Config]): Map[String, String] = (for (config: Config <- configs) yield (config.getString("sourceTopic"), config.getString("destinationTopic"))).toMap
def setKafkaConfig():Unit ={
val kafkaParams = new Properties()
kafkaParams.setProperty("bootstrap.servers","")
kafkaParams.setProperty("zookeeper.connect","")
kafkaParams.setProperty("group.id", DEFAULT_KAFKA_GROUP_ID)
kafkaParams.setProperty("auto.offset.reset", "latest")
val kafka_consumer:FlinkKafkaConsumer[String] = new FlinkKafkaConsumer[String]("sourceTopics", new SimpleStringSchema(),kafkaParams)
kafka_consumer.setStartFromLatest()
val stream: DataStream[DataParse] = env.addSource(kafka_consumer).map(new temp)
val schema: String = "struct<_col0:string,_col1:bigint,_col2:string,_col3:string,_col4:string>"
val writerProperties = new Properties()
writerProperties.setProperty("orc.compress", "ZLIB")
val writerFactory = new OrcBulkWriterFactory(new PersonVectorizer(schema),writerProperties,new org.apache.hadoop.conf.Configuration);
val sink: StreamingFileSink[DataParse] = StreamingFileSink
.forBulkFormat(new Path("hdfs://warehousestore/hive/warehouse/metrics_test.db/upp_raw_prod/hour=1/"), writerFactory)
.build()
stream.addSink(sink)
}
def main(args: Array[String]): Unit = {
setKafkaConfig()
env.enableCheckpointing(5000)
env.execute("Kafka_Flink_HIVE")
}
}
class temp extends MapFunction[String,DataParse]{
override def map(record: String): DataParse = {
new DataParse(record)
}
}
class DataParse(data : String){
val parsedJason = parse(data)
val timestamp = compact(render(parsedJason \ "timestamp")).replaceAll("\"", "").toLong
val event = compact(render(parsedJason \ "event")).replaceAll("\"", "")
val source_id = compact(render(parsedJason \ "source_id")).replaceAll("\"", "")
val app = compact(render(parsedJason \ "app")).replaceAll("\"", "")
val json = data
}
class PersonVectorizer(schema: String) extends Vectorizer[DataParse](schema) {
override def vectorize(element: DataParse, batch: VectorizedRowBatch): Unit = {
val eventColVector = batch.cols(0).asInstanceOf[BytesColumnVector]
val timeColVector = batch.cols(1).asInstanceOf[LongColumnVector]
val sourceIdColVector = batch.cols(2).asInstanceOf[BytesColumnVector]
val appColVector = batch.cols(3).asInstanceOf[BytesColumnVector]
val jsonColVector = batch.cols(4).asInstanceOf[BytesColumnVector]
timeColVector.vector(batch.size + 1) = element.timestamp
eventColVector.setVal(batch.size + 1, element.event.getBytes(StandardCharsets.UTF_8))
sourceIdColVector.setVal(batch.size + 1, element.source_id.getBytes(StandardCharsets.UTF_8))
appColVector.setVal(batch.size + 1, element.app.getBytes(StandardCharsets.UTF_8))
jsonColVector.setVal(batch.size + 1, element.json.getBytes(StandardCharsets.UTF_8))
}
}

With bulk formats (such as ORC), the StreamingFileSink rolls over to new files with every checkpoint. If you reduce the checkpointing interval (currently 5 seconds), it won't write so many files.

No checkpoint files are created in my simple application

I have the following simple flink application running within IDE, and I do a checkpoint every 5 seconds, and would like to write the checkpoint data into directory file:///d:/applog/out/mycheckpoint/, but after running for a while, i stop the application,but I didn't find anything under the directory file:///d:/applog/out/mycheckpoint/
The code is:
import java.util.Date
import io.github.streamingwithflink.util.DateUtil
import org.apache.flink.api.common.state.{ListState, ListStateDescriptor}
import org.apache.flink.api.scala._
import org.apache.flink.runtime.state.filesystem.FsStateBackend
import org.apache.flink.runtime.state.{FunctionInitializationContext, FunctionSnapshotContext}
import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction
import org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup
import org.apache.flink.streaming.api.functions.source.SourceFunction
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
object SourceFunctionExample {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(4)
env.getCheckpointConfig.setCheckpointInterval(5 * 1000)
env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)
env.setStateBackend(new FsStateBackend("file:///d:/applog/out/mycheckpoint/"))
val numbers: DataStream[Long] = env.addSource(new ReplayableCountSource)
numbers.print()
env.execute()
}
}
class ReplayableCountSource extends SourceFunction[Long] with CheckpointedFunction {
var isRunning: Boolean = true
var cnt: Long = _
var offsetState: ListState[Long] = _
override def run(ctx: SourceFunction.SourceContext[Long]): Unit = {
while (isRunning && cnt < Long.MaxValue) {
ctx.getCheckpointLock.synchronized {
// increment cnt
cnt += 1
ctx.collect(cnt)
}
Thread.sleep(200)
}
}
override def cancel(): Unit = isRunning = false
override def snapshotState(snapshotCtx: FunctionSnapshotContext): Unit = {
println("snapshotState is called at " + DateUtil.format(new Date) + s", cnt is ${cnt}")
// remove previous cnt
offsetState.clear()
// add current cnt
offsetState.add(cnt)
}
override def initializeState(initCtx: FunctionInitializationContext): Unit = {
// obtain operator list state to store the current cnt
val desc = new ListStateDescriptor[Long]("offset", classOf[Long])
offsetState = initCtx.getOperatorStateStore.getListState(desc)
// initialize cnt variable from the checkpoint
val it = offsetState.get()
cnt = if (null == it || !it.iterator().hasNext) {
-1L
} else {
it.iterator().next()
}
println("initializeState is called at " + DateUtil.format(new Date) + s", cnt is ${cnt}")
}
}

I tested the application on Windows and Linux and in both cases the checkpoint files were created as expected.
Note that the program keeps running if a checkpoint fails, for example due to some permission errors or invalid path.
Flink logs a WARN message with the exception that caused the checkpoint to fail.

akka-http How to use MergeHub to throttle requests from client side

I use Source.queue to queue up HttpRequests and throttle it on the client side to download files from a remote server. I understand that Source.queue is not threadsafe and we need to use MergeHub to make it threadsafe. Following is the piece of code that uses Source.queue and uses cachedHostConnectionPool.
import java.io.File
import akka.actor.Actor
import akka.event.Logging
import akka.http.scaladsl.Http
import akka.http.scaladsl.client.RequestBuilding
import akka.http.scaladsl.model.{HttpResponse, HttpRequest, Uri}
import akka.stream._
import akka.stream.scaladsl._
import akka.util.ByteString
import com.typesafe.config.ConfigFactory
import scala.concurrent.{Promise, Future}
import scala.concurrent.duration._
import scala.util.{Failure, Success}
class HttpClient extends Actor with RequestBuilding {
implicit val system = context.system
val logger = Logging(system, this)
implicit lazy val materializer = ActorMaterializer()
val config = ConfigFactory.load()
val remoteHost = config.getString("pool.connection.host")
val remoteHostPort = config.getInt("pool.connection.port")
val queueSize = config.getInt("pool.queueSize")
val throttleSize = config.getInt("pool.throttle.numberOfRequests")
val throttleDuration = config.getInt("pool.throttle.duration")
import scala.concurrent.ExecutionContext.Implicits.global
val connectionPool = Http().cachedHostConnectionPool[Promise[HttpResponse]](host = remoteHost, port = remoteHostPort)
// Construct a Queue
val requestQueue =
Source.queue[(HttpRequest, Promise[HttpResponse])](queueSize, OverflowStrategy.backpressure)
.throttle(throttleSize, throttleDuration.seconds, 1, ThrottleMode.shaping)
.via(connectionPool)
.toMat(Sink.foreach({
case ((Success(resp), p)) => p.success(resp)
case ((Failure(error), p)) => p.failure(error)
}))(Keep.left)
.run()
// Convert Promise[HttpResponse] to Future[HttpResponse]
def queueRequest(request: HttpRequest): Future[HttpResponse] = {
val responsePromise = Promise[HttpResponse]()
requestQueue.offer(request -> responsePromise).flatMap {
case QueueOfferResult.Enqueued => responsePromise.future
case QueueOfferResult.Dropped => Future.failed(new RuntimeException("Queue overflowed. Try again later."))
case QueueOfferResult.Failure(ex) => Future.failed(ex)
case QueueOfferResult.QueueClosed => Future.failed(new RuntimeException("Queue was closed (pool shut down) while running the request. Try again later."))
}
}
def receive = {
case "download" =>
val uri = Uri(s"http://localhost:8080/file_csv.csv")
downloadFile(uri, new File("/tmp/compass_audience.csv"))
}
def downloadFile(uri: Uri, destinationFilePath: File) = {
def fileSink: Sink[ByteString, Future[IOResult]] =
Flow[ByteString].buffer(512, OverflowStrategy.backpressure)
.toMat(FileIO.toPath(destinationFilePath.toPath)) (Keep.right)
// Submit to queue and execute HttpRequest and write HttpResponse to file
Source.fromFuture(queueRequest(Get(uri)))
.flatMapConcat(_.entity.dataBytes)
.via(Framing.delimiter(ByteString("\n"), maximumFrameLength = 10000, allowTruncation = true))
.map(_.utf8String)
.map(d => s"$d\n")
.map(ByteString(_))
.runWith(fileSink)
}
}
However, when I use MergeHub, it returns Sink[(HttpRequest, Promise[HttpResponse]), NotUsed]. I need to extract the response.entity.dataBytes and write the response to a file using a filesink. I am not able figure out how to use MergeHub to achieve this. Any help will be appreciated.
val hub: Sink[(HttpRequest, Promise[HttpResponse]), NotUsed] =
MergeHub.source[(HttpRequest, Promise[HttpResponse])](perProducerBufferSize = queueSize)
.throttle(throttleSize, throttleDuration.seconds, 1, ThrottleMode.shaping)
.via(connectionPool)
.toMat(Sink.foreach({
case ((Success(resp), p)) => p.success(resp)
case ((Failure(error), p)) => p.failure(error)
}))(Keep.left)
.run()

Source.Queue is actually thread safe now. If you want to use MergeHub:
private lazy val poolFlow: Flow[(HttpRequest, Promise[HttpResponse]), (Try[HttpResponse], Promise[HttpResponse]), Http.HostConnectionPool] =
Http().cachedHostConnectionPool[Promise[HttpResponse]](host).tail.head, port, connectionPoolSettings)
val ServerSink =
poolFlow.toMat(Sink.foreach({
case ((Success(resp), p)) => p.success(resp)
case ((Failure(e), p)) => p.failure(e)
}))(Keep.left)
// Attach a MergeHub Source to the consumer. This will materialize to a
// corresponding Sink.
val runnableGraph: RunnableGraph[Sink[(HttpRequest, Promise[HttpResponse]), NotUsed]] =
MergeHub.source[(HttpRequest, Promise[HttpResponse])](perProducerBufferSize = 16).to(ServerSink)
val toConsumer: Sink[(HttpRequest, Promise[HttpResponse]), NotUsed] = runnableGraph.run()
protected[akkahttp] def executeRequest[T](httpRequest: HttpRequest, unmarshal: HttpResponse => Future[T]): Future[T] = {
val responsePromise = Promise[HttpResponse]()
Source.single((httpRequest -> responsePromise)).runWith(toConsumer)
responsePromise.future.flatMap(handleHttpResponse(_, unmarshal))
)
}
}

Generate md5 checksum scala js

I am trying to calculate hex md5 checksum at in scala js incrementally. The checksum will be verified at server side once file is transferred.
I tried using spark-md5 scala js web jar dependency:
libraryDependencies ++= Seq("org.webjars.npm" % "spark-md5" % "2.0.2")
jsDependencies += "org.webjars.npm" % "spark-md5" % "2.0.2" / "spark-md5.js"
scala js Code:-
val reader = new FileReader
reader.readAsArrayBuffer(data) // data is javascript blob object
val spark = scala.scalajs.js.Dynamic.global.SparkMD5.ArrayBuffer
reader.onload = (e: Event) => {
spark.prototype.append(e.target)
print("Checksum - > " + spark.end)
}
Error:-
Uncaught TypeError: Cannot read property 'buffer' of undefined
at Object.SparkMD5.ArrayBuffer.append (sampleapp-jsdeps.js:596)
at FileReader. (SampleApp.scala:458)
I tried google but most of the help is available are for javascript, couldn't find anything on how to use this library in scala js.
Sorry If I missed something very obvious, I am new to both javascript & scala js.

From spark-md5 readme, I read:
var spark = new SparkMD5.ArrayBuffer();
spark.append(e.target.result);
var hexHash = spark.end();
The way you translate that in Scala.js is as follows (assuming you want to do it the dynamically typed way):
import scala.scalajs.js
import scala.scalajs.js.typedarray._
import org.scalajs.dom.{FileReader, Event}
val SparkMD5 = js.Dynamic.global.SparkMD5
val spark = js.Dynamic.newInstance(SparkMD5.ArrayBuffer)()
val fileContent = e.target.asInstanceOf[FileReader].result.asInstanceOf[ArrayBuffer]
spark.append(fileContent)
val hexHashDyn = spark.end()
val hexHash = hexHashDyn.asInstanceOf[String]
Integrating that with your code snippet yields:
val reader = new FileReader
reader.readAsArrayBuffer(data) // data is javascript blob object
val SparkMD5 = js.Dynamic.global.SparkMD5
val spark = js.Dynamic.newInstance(SparkMD5)()
reader.onload = (e: Event) => {
val fileContent = e.target.asInstanceOf[FileReader].result.asInstanceOf[ArrayBuffer]
spark.append(fileContent)
print("Checksum - > " + spark.end().asInstanceOf[String])
}
If that's the only use of SparkMD5 in your codebase, you can stop there. If you plan to use it several times, you should probably define a facade type for the APIs you want to use:
import scala.scalajs.js.annotation._
#js.native
object SparkMD5 extends js.Object {
#js.native
class ArrayBuffer() extends js.Object {
def append(chunk: js.typedarray.ArrayBuffer): Unit = js.native
def end(raw: Boolean = false): String = js.native
}
}
which you can then use much more naturally as:
val reader = new FileReader
reader.readAsArrayBuffer(data) // data is javascript blob object
val spark = new SparkMD5.ArrayBuffer()
reader.onload = (e: Event) => {
val fileContent = e.target.asInstanceOf[FileReader].result.asInstanceOf[ArrayBuffer]
spark.append(fileContent)
print("Checksum - > " + spark.end())
}
Disclaimer: not tested. It might need small adaptations here and there.

Copy DNN HTML Pro module in content to another module

Below code is working fine for HTML module but not working for HTML PRO module.
HtmlTextController htmlTextController = new HtmlTextController();
WorkflowStateController workflowStateController = new WorkflowStateController();
int workflowId = htmlTextController.GetWorkflow(ModuleId, TabId, PortalId).Value;
List<HtmlTextInfo> htmlContents = htmlTextController.GetAllHtmlText(ModuleModId);
htmlContents = htmlContents.OrderBy(c => c.Version).ToList();
foreach (var content in htmlContents)
{
HtmlTextInfo htmlContent = new HtmlTextInfo();
htmlContent.ItemID = -1;
htmlContent.StateID = workflowStateController.GetFirstWorkflowStateID(workflowId);
htmlContent.WorkflowID = workflowId;
htmlContent.ModuleID = ModuleId;
htmlContent.IsPublished = content.IsPublished;
htmlContent.Approved = content.Approved;
htmlContent.IsActive = content.IsActive;
htmlContent.Content = content.Content;
htmlContent.Summary = content.Summary;
htmlContent.Version = content.Version;
}
htmlTextController.UpdateHtmlText(htmlContent, htmlTextController.GetMaximumVersionHistory(PortalId));

This is occurred due to HTML Pro module has different methods. That is partially different from DNN HTML Module. below is the code.
HtmlTextController htmlTextController = new HtmlTextController();
WorkflowStateController workflowStateController = new WorkflowStateController();
WorkflowStateInfo wsinfo = new WorkflowStateInfo();
int workflowId = wsinfo.WorkflowID;
HtmlTextInfo htmlContents = htmlTextController.GetLatestHTMLContent(ModuleModId);
HtmlTextInfo htmlContent = new HtmlTextInfo();
htmlContent.ItemID = -1;
htmlContent.StateID = workflowStateController.GetFirstWorkflowStateID(workflowId);
htmlContent.WorkflowID = workflowId;
htmlContent.ModuleID = ModuleId;
htmlContent.IsPublished = htmlContents.IsPublished;
htmlContent.Approved = htmlContents.Approved;
htmlContent.IsActive = htmlContents.IsActive;
htmlContent.Content = htmlContents.Content;
htmlContent.Summary = htmlContents.Summary;
htmlContent.Version = htmlContents.Version;
if (Tags != null && Tags.Count > 0)
{
foreach (KeyValuePair<string, string> tag in Tags)
{
if (htmlContent.Content.Contains(tag.Key))
{
htmlContent.Content = htmlContent.Content.Replace(tag.Key, tag.Value);
}
}
}
htmlTextController.SaveHtmlContent(htmlContent, newModule);
And please add below reference to the code to refer the methods.
using DotNetNuke.Modules.HtmlPro;
using DotNetNuke.Professional.HtmlPro;
using DotNetNuke.Professional.HtmlPro.Components;
using DotNetNuke.Professional.HtmlPro.Services;

If you are looking to simply "copy" the content from one to the other, you might investigate the usage of the "Import" and "Export" functions that are part of these modules.
I recommend using this route to help you ensure better compatibility as time progresses. Should they update fields or other data elements you will not have to investigate and then update your code as part of this.
You can simply look at the .dnn manifest for each of these modules and find the BusinessControllerClass which will have two methods "ImportModule" and "ExportModule" that you could use.