I'm trying to upload a .csv file using ng-file-upload in the browser and Play for Scala 2.5.x at the backend.
If I use moveTo in Play the file is uploaded, but generated with headers and trailers instead of the plain file:
request.body.moveTo(new File("c:\\tmp\\uploaded\\filetest.csv"))
Is there a way to save only the data part?
Alternatively, I tried the following
def doUpload = Action(parse.multipartFormData) { request =>
println(request.contentType.get)
println(request.body)
request.body.file("file").map { file =>
val filename = file.filename
val contentType = file.contentType
file.ref.moveTo(new File("c:\\tmp\\uploaded\\filetest.csv"))
}
Ok("file uploaded at " + new java.util.Date())
}
But the map is empty. This is what I get in the first two println statements:
multipart/form-data
MultipartFormData(Map(),Vector(FilePart(file,hello2.csv,Some(application/vnd.ms-excel),TemporaryFile(C:\Users\BUSINE~1\AppData\Local\Temp\playtemp2236977636541678879\multipartBody2268668511935303172asTemporaryFile))),Vector())
Meaning that I am receiving something, however I cannot extract it. Any ideas?
A bit verbose maybe but it does what you want.
Note that I am on a Mac so you may have to change the /tmp/uploaded/ path in copyFile since it looks like you are on Windows.
package controllers
import java.io.File
import java.nio.file.attribute.PosixFilePermission._
import java.nio.file.attribute.PosixFilePermissions
import java.nio.file.{Files, Path}
import java.util
import javax.inject._
import akka.stream.IOResult
import akka.stream.scaladsl._
import akka.util.ByteString
import play.api._
import play.api.data.Form
import play.api.data.Forms._
import play.api.i18n.MessagesApi
import play.api.libs.streams._
import play.api.mvc.MultipartFormData.FilePart
import play.api.mvc._
import play.core.parsers.Multipart.FileInfo
import scala.concurrent.Future
import java.nio.file.StandardCopyOption._
import java.nio.file.Paths
case class FormData(filename: String)
// Type for multipart body parser
type FilePartHandler[A] = FileInfo => Accumulator[ByteString, FilePart[A]]
val form = Form(mapping("file" -> text)(FormData.apply)(FormData.unapply))
private def deleteTempFile(file: File) = Files.deleteIfExists(file.toPath)
// Copies temp file to your loc with provided name
private def copyFile(file: File, name: String) =
Files.copy(file.toPath(), Paths.get("/tmp/uploaded/", ("copy_" + name)), REPLACE_EXISTING)
// FilePartHandler which returns a File, rather than Play's TemporaryFile class
private def handleFilePartAsFile: FilePartHandler[File] = {
case FileInfo(partName, filename, contentType) =>
val attr = PosixFilePermissions.asFileAttribute(util.EnumSet.of(OWNER_READ, OWNER_WRITE))
val path: Path = Files.createTempFile("multipartBody", "tempFile", attr)
val file = path.toFile
val fileSink: Sink[ByteString, Future[IOResult]] = FileIO.toPath(file.toPath())
val accumulator: Accumulator[ByteString, IOResult] = Accumulator(fileSink)
accumulator.map {
case IOResult(count, status) =>
FilePart(partName, filename, contentType, file)
} (play.api.libs.concurrent.Execution.defaultContext)
}
// The action
def doUpload = Action(parse.multipartFormData(handleFilePartAsFile)) { implicit request =>
val fileOption = request.body.file("file").map {
case FilePart(key, filename, contentType, file) =>
val copy = copyFile(file, filename)
val deleted = deleteTempFile(file) // delete original uploaded file after we have
copy
}
Ok(s"Uploaded: ${fileOption}")
}
Related
Looking at the signature of throttle in Akka Streams I see that it can take a cost function of type (Out) ⇒ Int to compute how the element passing through the stage affects the cost/per speed limit of elements sent downstream.
Is it possible to implement throttling based on external input for the stream with a built-in stage? If not, would it be easier just to implement a custom stage?
I'm expecting something like:
def throttle(cost: Int, per: FiniteDuration, maximumBurst: Int, costCalculation: ⇒ Int, mode: ThrottleMode): Repr[Out]
So that we could do something like:
throttle(75 /** 75% CPU **/, 1.second, 80 /** 80% CPU **/, getSinkCPU, ThrottleMode.Shaping)
I simulate a throttle using Akka-http + Akka-stream when uploading multiple files. As far as I understood, the cost function is based on messages and not on CPU usage. maybe you can derive this to CPU usage. Here I have the code that I use a cost function based on the number of messages:
import akka.Done
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.{ContentTypes, HttpEntity, Multipart}
import akka.http.scaladsl.server.Directives._
import akka.stream.ThrottleMode
import akka.stream.scaladsl.{FileIO, Sink, Source}
import akka.util.ByteString
import java.io.File
import scala.concurrent.Future
import scala.concurrent.duration._
import scala.util.{Failure, Success}
object UploadingFiles {
def main(args: Array[String]): Unit = {
implicit val system = ActorSystem("UploadingFiles")
val NO_OF_MESSAGES = 1
val filesRoutes = {
(pathEndOrSingleSlash & get) {
complete(
HttpEntity(
ContentTypes.`text/html(UTF-8)`,
"""
|<html>
| <body>
| <form action="http://localhost:8080/upload" method="post" enctype="multipart/form-data">
| <input type="file" name="myFile" multiple>
| <button type="submit">Upload</button>
| </form>
| </body>
|</html>
""".stripMargin
)
)
} ~ (path("upload") & post & extractLog) { log =>
// handling uploading files using multipart/form-data
entity(as[Multipart.FormData]) { formData =>
// handle file payload
val partsSource: Source[Multipart.FormData.BodyPart, Any] = formData.parts
val filePartsSink: Sink[Multipart.FormData.BodyPart, Future[Done]] =
Sink.foreach[Multipart.FormData.BodyPart] { bodyPart =>
if (bodyPart.name == "myFile") {
// create a file
val filename = "download/" + bodyPart.filename.getOrElse("tempFile_" + System.currentTimeMillis())
val file = new File(filename)
log.info(s"writing to file: $filename")
val fileContentsSource: Source[ByteString, _] = bodyPart.entity.dataBytes
val fileContentsSink: Sink[ByteString, _] = FileIO.toPath(file.toPath)
val publishRate = NO_OF_MESSAGES / 1
// writing the data to the file using akka-stream graph
// Throttling with a cost function
fileContentsSource
.throttle(publishRate, 2 seconds, publishRate, ThrottleMode.shaping)
.runWith(fileContentsSink)
}
}
val writeOperationFuture = partsSource.runWith(filePartsSink)
onComplete(writeOperationFuture) {
case Success(value) => complete("file uploaded =)")
case Failure(exception) => complete(s"file failed to upload: $exception")
}
}
}
}
println("access the browser at: localhost:8080")
Http()
.newServerAt("localhost", 8080)
.bindFlow(filesRoutes)
}
}
Im trying to write a simple akka stream rest endpoint and client for consuming this stream. But then i try to run server and client, client is able to consume only part of stream. I can't see any exception during execution.
Here are my server and client:
import akka.NotUsed
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.common.{EntityStreamingSupport, JsonEntityStreamingSupport}
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport
import akka.stream.{ActorAttributes, ActorMaterializer, Attributes, Supervision}
import akka.stream.scaladsl.{Flow, Source}
import akka.util.ByteString
import spray.json.DefaultJsonProtocol
import scala.io.StdIn
import scala.util.Random
object WebServer {
object Model {
case class Person(id: Int = Random.nextInt(), fName: String = Random.nextString(10), sName: String = Random.nextString(10))
}
object JsonProtocol extends SprayJsonSupport with DefaultJsonProtocol {
implicit val personFormat = jsonFormat(Model.Person.apply, "id", "firstName", "secondaryName")
}
def main(args: Array[String]) {
implicit val system = ActorSystem("my-system")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val start = ByteString.empty
val sep = ByteString("\n")
val end = ByteString.empty
import JsonProtocol._
implicit val jsonStreamingSupport: JsonEntityStreamingSupport = EntityStreamingSupport.json()
.withFramingRenderer(Flow[ByteString].intersperse(start, sep, end))
.withParallelMarshalling(parallelism = 8, unordered = false)
val decider: Supervision.Decider = {
case ex: Throwable => {
println("Exception occurs")
ex.printStackTrace()
Supervision.Resume
}
}
val persons: Source[Model.Person, NotUsed] = Source.fromIterator(
() => (0 to 1000000).map(id => Model.Person(id = id)).iterator
)
.withAttributes(ActorAttributes.supervisionStrategy(decider))
.map(p => { println(p); p })
val route =
path("persons") {
get {
complete(persons)
}
}
val bindingFuture = Http().bindAndHandle(route, "localhost", 8080)
println(s"Server online at http://localhost:8080/\nPress RETURN to stop...")
StdIn.readLine()
bindingFuture
.flatMap(_.unbind())
.onComplete(_ => {
println("Stopping http server ...")
system.terminate()
})
}
}
and client:
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.{HttpRequest, Uri}
import akka.stream.{ActorAttributes, ActorMaterializer, Supervision}
import scala.util.{Failure, Success}
object WebClient {
def main(args: Array[String]): Unit = {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val request = HttpRequest(uri = Uri("http://localhost:8080/persons"))
val response = Http().singleRequest(request)
val attributes = ActorAttributes.withSupervisionStrategy {
case ex: Throwable => {
println("Exception occurs")
ex.printStackTrace
Supervision.Resume
}
}
response.map(r => {
r.entity.dataBytes.withAttributes(attributes)
}).onComplete {
case Success(db) => db.map(bs => bs.utf8String).runForeach(println)
case Failure(ex) => ex.printStackTrace()
}
}
}
it works for 100, 1000, 10 000 persons but does not work for > 100 000'.
It looks like there is some limit for stream but i can't find it
Last record has been printed by server on my local machine is (with number 79101):
Person(79101,ⰷ瑐劲죗醂竜泲늎制䠸,䮳硝沢并⎗ᝨᫌꊭᐽ酡)
Last record on client is(with number 79048):
{"id":79048,"firstName":"췁頔䚐龫暀࡙頨捜昗㢵","secondaryName":"⏉ݾ袈庩컆◁ꄹ葪䑥Ϻ"}
Maybe somebody know why it happens?
I found a solution. I have to explicitly add r.entity.withoutSizeLimit() on client and after that all works as expected
I use Source.queue to queue up HttpRequests and throttle it on the client side to download files from a remote server. I understand that Source.queue is not threadsafe and we need to use MergeHub to make it threadsafe. Following is the piece of code that uses Source.queue and uses cachedHostConnectionPool.
import java.io.File
import akka.actor.Actor
import akka.event.Logging
import akka.http.scaladsl.Http
import akka.http.scaladsl.client.RequestBuilding
import akka.http.scaladsl.model.{HttpResponse, HttpRequest, Uri}
import akka.stream._
import akka.stream.scaladsl._
import akka.util.ByteString
import com.typesafe.config.ConfigFactory
import scala.concurrent.{Promise, Future}
import scala.concurrent.duration._
import scala.util.{Failure, Success}
class HttpClient extends Actor with RequestBuilding {
implicit val system = context.system
val logger = Logging(system, this)
implicit lazy val materializer = ActorMaterializer()
val config = ConfigFactory.load()
val remoteHost = config.getString("pool.connection.host")
val remoteHostPort = config.getInt("pool.connection.port")
val queueSize = config.getInt("pool.queueSize")
val throttleSize = config.getInt("pool.throttle.numberOfRequests")
val throttleDuration = config.getInt("pool.throttle.duration")
import scala.concurrent.ExecutionContext.Implicits.global
val connectionPool = Http().cachedHostConnectionPool[Promise[HttpResponse]](host = remoteHost, port = remoteHostPort)
// Construct a Queue
val requestQueue =
Source.queue[(HttpRequest, Promise[HttpResponse])](queueSize, OverflowStrategy.backpressure)
.throttle(throttleSize, throttleDuration.seconds, 1, ThrottleMode.shaping)
.via(connectionPool)
.toMat(Sink.foreach({
case ((Success(resp), p)) => p.success(resp)
case ((Failure(error), p)) => p.failure(error)
}))(Keep.left)
.run()
// Convert Promise[HttpResponse] to Future[HttpResponse]
def queueRequest(request: HttpRequest): Future[HttpResponse] = {
val responsePromise = Promise[HttpResponse]()
requestQueue.offer(request -> responsePromise).flatMap {
case QueueOfferResult.Enqueued => responsePromise.future
case QueueOfferResult.Dropped => Future.failed(new RuntimeException("Queue overflowed. Try again later."))
case QueueOfferResult.Failure(ex) => Future.failed(ex)
case QueueOfferResult.QueueClosed => Future.failed(new RuntimeException("Queue was closed (pool shut down) while running the request. Try again later."))
}
}
def receive = {
case "download" =>
val uri = Uri(s"http://localhost:8080/file_csv.csv")
downloadFile(uri, new File("/tmp/compass_audience.csv"))
}
def downloadFile(uri: Uri, destinationFilePath: File) = {
def fileSink: Sink[ByteString, Future[IOResult]] =
Flow[ByteString].buffer(512, OverflowStrategy.backpressure)
.toMat(FileIO.toPath(destinationFilePath.toPath)) (Keep.right)
// Submit to queue and execute HttpRequest and write HttpResponse to file
Source.fromFuture(queueRequest(Get(uri)))
.flatMapConcat(_.entity.dataBytes)
.via(Framing.delimiter(ByteString("\n"), maximumFrameLength = 10000, allowTruncation = true))
.map(_.utf8String)
.map(d => s"$d\n")
.map(ByteString(_))
.runWith(fileSink)
}
}
However, when I use MergeHub, it returns Sink[(HttpRequest, Promise[HttpResponse]), NotUsed]. I need to extract the response.entity.dataBytes and write the response to a file using a filesink. I am not able figure out how to use MergeHub to achieve this. Any help will be appreciated.
val hub: Sink[(HttpRequest, Promise[HttpResponse]), NotUsed] =
MergeHub.source[(HttpRequest, Promise[HttpResponse])](perProducerBufferSize = queueSize)
.throttle(throttleSize, throttleDuration.seconds, 1, ThrottleMode.shaping)
.via(connectionPool)
.toMat(Sink.foreach({
case ((Success(resp), p)) => p.success(resp)
case ((Failure(error), p)) => p.failure(error)
}))(Keep.left)
.run()
Source.Queue is actually thread safe now. If you want to use MergeHub:
private lazy val poolFlow: Flow[(HttpRequest, Promise[HttpResponse]), (Try[HttpResponse], Promise[HttpResponse]), Http.HostConnectionPool] =
Http().cachedHostConnectionPool[Promise[HttpResponse]](host).tail.head, port, connectionPoolSettings)
val ServerSink =
poolFlow.toMat(Sink.foreach({
case ((Success(resp), p)) => p.success(resp)
case ((Failure(e), p)) => p.failure(e)
}))(Keep.left)
// Attach a MergeHub Source to the consumer. This will materialize to a
// corresponding Sink.
val runnableGraph: RunnableGraph[Sink[(HttpRequest, Promise[HttpResponse]), NotUsed]] =
MergeHub.source[(HttpRequest, Promise[HttpResponse])](perProducerBufferSize = 16).to(ServerSink)
val toConsumer: Sink[(HttpRequest, Promise[HttpResponse]), NotUsed] = runnableGraph.run()
protected[akkahttp] def executeRequest[T](httpRequest: HttpRequest, unmarshal: HttpResponse => Future[T]): Future[T] = {
val responsePromise = Promise[HttpResponse]()
Source.single((httpRequest -> responsePromise)).runWith(toConsumer)
responsePromise.future.flatMap(handleHttpResponse(_, unmarshal))
)
}
}
I am trying to extract tweets from a specific hashtag, and save them in csv file. The below code works well, but I would like to split data. How can I split it.
Any advice will be highly appreciated,
Niddal
# -*- coding: utf-8 -*-
from __future__ import absolute_import, print_function
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
import json
import codecs
import sys
ckey = ''
csecret = ''
atoken = ''
asecret = ''
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
class StdOutListener(StreamListener):
def on_data(self, data):
try:
tweet = json.loads(data)['text']
#tweet = data.split(',"text":"')[1].split('","source')[0]
print(tweet.translate(non_bmp_map))
saveThis = str(time.time())+'::'+tweet
SaveFile = codecs.open('d:\\StremHash.csv','a', "utf-8")
SaveFile.write(saveThis)
SaveFile.write('\n')
SaveFile.close()
return True
except BaseException, e:
print ('failed on data,',str(e))
time.sleep(5)
def on_error(self, status):
print(status)
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, l)
twitterStream.filter(track=[unicode("#عيدكم_مبارك","utf-8")])
I wish to have a file upload form that in addition to the file selection input , also has other input fields like textarea, dropdown etc. The problem is that I cannot access any post parameters other than the file in my blobstore upload handler. I am using the following function call to get the parameter name but it always returns an empty screen.
par = self.request.get("par")
I found another question with a similar problem Uploading a video to google app engine blobstore. The answer to that question suggests a workaround solution to set the filename to the parameter you wish to read which is not sufficient for my needs. Is there a way to access other form parameters in the post method of blobstore upload handler?
Did you find the solution?
In my experience, when using form/multipart request doesn't include the other parameters and they have to be dug out manually.
This is how I dig out parameters out of request that is used to send a file.
import java.util.Map;
import java.util.HashMap;
import java.util.Iterator;
import javax.servlet.http.HttpServletRequest;
// for reading form data when posted with multipart/form-data
import java.io.*;
import javax.servlet.ServletException;
import org.apache.commons.fileupload.FileItemStream;
import org.apache.commons.fileupload.FileItemIterator;
import org.apache.commons.fileupload.servlet.ServletFileUpload;
import com.google.appengine.api.datastore.Blob;
// Fetch the attributes for a given model using rails conventions.
// We need to do this in Java because getParameterMap uses generics.
// We currently only support one lever: foo[bar] but not foo[bar][baz].
// We currently only pull the first value, so no support for checkboxes
public class ScopedParameterMap {
public static Map params(HttpServletRequest req, String model)
throws ServletException, IOException {
Map<String, Object> scoped = new HashMap<String, Object>();
if (req.getHeader("Content-Type").startsWith("multipart/form-data")) {
try {
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iterator = upload.getItemIterator(req); // this is used to get those params
while (iterator.hasNext()) {
FileItemStream item = iterator.next();
InputStream stream = item.openStream();
String attr = item.getFieldName();
if (attr.startsWith(model + "[") && attr.endsWith("]")) { // fetches all stuff like article[...], you can modify this to return only one value
int len = 0;
int offset = 0;
byte[] buffer = new byte[8192];
ByteArrayOutputStream file = new ByteArrayOutputStream();
while ((len = stream.read(buffer, 0, buffer.length)) != -1) {
offset += len;
file.write(buffer, 0, len);
}
String key = attr.split("\\[|\\]")[1];
if (item.isFormField()) {
scoped.put(key, file.toString());
} else {
if (file.size() > 0) {
scoped.put(key, file.toByteArray());
}
}
}
}
} catch (Exception ex) {
throw new ServletException(ex);
}
} else {
Map params = req.getParameterMap();
Iterator i = params.keySet().iterator();
while (i.hasNext()) {
String attr = (String) i.next();
if (attr.startsWith(model + "[") && attr.endsWith("]")) {
String key = attr.split("\\[|\\]")[1];
String val = ((String[]) params.get(attr))[0];
scoped.put(key, val);
// TODO: when multiple values, set a List instead
}
}
}
return scoped;
}
}
I hope this speedy answer helps, let me know if you have questions.