I am trying to save a Array[String, Int] data into file. However, every time, it reports:
object not serializable
I also tried to combine the two columns into a string, and want to write it line by line, but it still report such error. The code is:
val fw = new PrintWriter(new File("/path/data_stream.txt"))
myArray.foreach(x => fw.write((x._1.toString + " " + x._2.toString + "\n").toByte
import java.nio.file._
val data = Array(("one", 1), ("two", 2), ("three", 3))
data.foreach(d => Files.write(Paths.get("/path/data_stream.txt"), (d._1 + " " + d._2 + "\n").getBytes, StandardOpenOption.CREATE, StandardOpenOption.APPEND))
Related
Hello everyone I am writing my bot for discord, I get the values from google sheets, but they are not displayed beautifully. How can I align them so that the name is under the name and the number is under the numbers
Here's how it turns out https://i.stack.imgur.com/Gdquh.png
And it should be like this https://i.stack.imgur.com/UAPg2.png
spreadsheet_id = 'id'
result = service.spreadsheets().values().get(spreadsheetId=spreadsheet_id, range='A1:C15', majorDimension='ROWS').execute()
values = result.get('values', [])
embed = discord.Embed(description="\n".join([x[0] + " " + x[2] for x in values]))
result2 = service.spreadsheets().values().get(spreadsheetId=spreadsheet_id, range='A16:C28', majorDimension='ROWS').execute()
values2 = result2.get('values', [])
embed2 = discord.Embed(description="\n".join([x[0] + " " + x[2] for x in values2]))
I've been putting my data into pandas DataFrames, so I've been using to_markdown https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_markdown.html and printing it within code blocks.
import pandas as pd
array = pd.DataFrame(values)
embed = discord.Embed(description='```' + array.to_markdown(index=False) + '```')
In my test script I try to process 100 lines data from csv data feed via following statement:
private val scn = scenario("test_scn").feed(insertFeeder, 100).exec(httpReq)
But I always get an error:
[ERROR] HttpRequestAction - 'httpRequest-1' failed to execute: No attribute named 'name' is defined
Could you please help me to find out the root cause? thank you.
Here is the script:
private val insertFeeder = csv("test_data.csv").queue
private val csvHeader = GeneralUtil.readFirstLine(""test_data.csv"")
private val httpConf = http .baseURL("http://serviceURL") .disableFollowRedirect .disableWarmUp .shareConnections
private var httpReq = http("insert_request") .post("/insert")
for (i <- 0 to 99) {
val paramsInArray = csvHeader.split(",")
for (param <- paramsInArray) {
if (param.equalsIgnoreCase("name")) {
httpReq = httpReq.formParam(("name" + "[" + i +"]").el[String] , "${name}")
}
if (param.equalsIgnoreCase("url")) {
httpReq = httpReq.formParam(("url" + "[" + i +"]").el[String] , "${url}")
}
if (!param.equalsIgnoreCase("name") && !param.equalsIgnoreCase("url")) {
val firstArg = param + "[" + i + "]"
val secondArg = "${" + param + "}"
httpReq = httpReq.formParam(firstArg, secondArg)
}
}
}
private val scn = scenario("test_scn") .feed(insertFeeder, 100) .exec(httpReq)
setUp( scn.inject( constantUsersPerSec(1) during (1200 seconds) ).protocols(httpConf) ).assertions(global.failedRequests.count.lte(5))
And the data in test_data.csv is:
name,url,price,size,gender
image_1,http://image_1_url,100,xl,male
image_2,http://image_2_url,90,m,female
image_3,http://image_3_url,10,s,female
...
image_2000,http://image_2000_url,200,xl,male
By the way, if I process only 1 line, it works well.
Read the document again, and fixed the issue. If feed multiple records all at once, the attribute names will be suffixed from 1.
https://gatling.io/docs/current/session/feeder/#csv-feeders
def createlist(json, filename, arraygroup):
filehdl = open(filename, "wb")
for key, value in rulegroup.items():
filehdl.write('#' + value + "\n")
for list in json:
if list['arraygroup'] == value:
filehdl.write(list['name'] + " " + list['surname'] + "\n")
filehdl.close()
#depart1
testname testsurname
testname1 testsurname2
testname3 testsurname3
#depart2
#depart3
Json has 5 names, and 2 names should be places in #depart2, #depart3 base on their correct department.
Hi i have here a code that will create a file and separate the names in json file base on there group, but the 2nd forloop (with json) was not resetting its index, so after 1st loop of the forloop list was stuck.
tnx for answer.^^,
I already resolve the problem by inputing json obj to array.
jsonlist = []
for obj in json:
jsonlist.append(obj)
then change the 2nd loop with jsonlist
for list in jsonlist:
if list['arraygroup'] == value:
filehdl.write(list['name'] + " " + list['surname'] + "\n")
I'm trying to read an input file in Scala that I know the structure of, however I only need every 9th entry. So far I have managed to read the whole thing using:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
The issue, this leaves me with an array that is huge (we're talking 20GB of data). Not only have I seen myself forced to write some very ugly code in order to convert between RDD[Array[String]] and Array[String] but it's essentially made my code useless.
I've tried different approaches and mixes between using
.map()
.flatMap() and
.reduceByKey()
however nothing actually put my collected "cells" into the format that I need them to be.
Here's what is supposed to happen: Reading a folder of text files from our server, the code should read each "line" of text in the format:
*---------*
| NASDAQ: |
*---------*
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
and only keep a hold of the stock_symbol as that is the identifier I'm counting. So far my attempts have been to turn the entire thing into an array only collect every 9th index from the first one into a collected_cells var. Issue is, based on my calculations and real life results, that code would take 335 days to run (no joke).
Here's my current code for reference:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkNum {
def main(args: Array[String]) {
// Do some Scala voodoo
val sc = new SparkContext(new SparkConf().setAppName("Spark Numerical"))
// Set input file as per HDFS structure + input args
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
var collected_cells:Array[String] = new Array[String](0)
//println("[MESSAGE] Length of CC: " + collected_cells.length)
val divider:Long = 9
val array_length = fields.count / divider
val casted_length = array_length.toInt
val indexedFields = fields.zipWithIndex
val indexKey = indexedFields.map{case (k,v) => (v,k)}
println("[MESSAGE] Number of lines: " + array_length)
println("[MESSAGE] Casted lenght of: " + casted_length)
for( i <- 1 to casted_length ) {
println("[URGENT DEBUG] Processin line " + i + " of " + casted_length)
var index = 9 * i - 8
println("[URGENT DEBUG] Index defined to be " + index)
collected_cells :+ indexKey.lookup(index)
}
println("[MESSAGE] collected_cells size: " + collected_cells.length)
val single_cells = collected_cells.flatMap(collected_cells => collected_cells);
val counted_cells = single_cells.map(cell => (cell, 1).reduceByKey{case (x, y) => x + y})
// val result = counted_cells.reduceByKey((a,b) => (a+b))
// val inmem = counted_cells.persist()
//
// // Collect driver into file to be put into user archive
// inmem.saveAsTextFile("path to server location")
// ==> Not necessary to save the result as processing time is recorded, not output
}
}
The bottom part is currently commented out as I tried to debug it, but it acts as pseudo-code for me to know what I need done. I may want to point out that I am next to not at all familiar with Scala and hence things like the _ notation confuse the life out of me.
Thanks for your time.
There are some concepts that need clarification in the question:
When we execute this code:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
That does not result in a huge array of the size of the data. That expression represents a transformation of the base data. It can be further transformed until we reduce the data to the information set we desire.
In this case, we want the stock_symbol field of a record encoded a csv:
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
I'm also going to assume that the data file contains a banner like this:
*---------*
| NASDAQ: |
*---------*
The first thing we're going to do is to remove anything that looks like this banner. In fact, I'm going to assume that the first field is the name of a stock exchange that start with an alphanumeric character. We will do this before we do any splitting, resulting in:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields = validLines.map(line => line.split(","))
It helps to write the types of the variables, to have peace of mind that we have the data types that we expect. As we progress in our Scala skills that might become less important. Let's rewrite the expression above with types:
val lines: RDD[String] = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines: RDD[String] = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields: RDD[Array[String]] = validLines.map(line => line.split(","))
We are interested in the stock_symbol field, which positionally is the element #1 in a 0-based array:
val stockSymbols:RDD[String] = fields.map(record => record(1))
If we want to count the symbols, all that's left is to issue a count:
val totalSymbolCount = stockSymbols.count()
That's not very helpful because we have one entry for every record. Slightly more interesting questions would be:
How many different stock symbols we have?
val uniqueStockSymbols = stockSymbols.distinct.count()
How many records for each symbol do we have?
val countBySymbol = stockSymbols.map(s => (s,1)).reduceByKey(_+_)
In Spark 2.0, CSV support for Dataframes and Datasets is available out of the box
Given that our data does not have a header row with the field names (what's usual in large datasets), we will need to provide the column names:
val stockDF = sparkSession.read.csv("/tmp/quotes_clean.csv").toDF("exchange", "symbol", "date", "open", "close", "volume", "price")
We can answer our questions very easy now:
val uniqueSymbols = stockDF.select("symbol").distinct().count
val recordsPerSymbol = stockDF.groupBy($"symbol").agg(count($"symbol"))
I have created a customized work order submission form in Forms & Sheets that auto emails a confirmation from each submission (job request form) to create a data trail of vendor activity. Fairly integrated and totally cobbled together by a lot of reading in these forums coupled with a gazillion frustrating moments of trial & error. Novice moving towards "capable" but Im stuck on a piece of code for a triggered confirmation email with random work order generator and email confirmations and toggle based management built in. The code below that I actually need help with is for that triggered confirmation email that sends a confirmation of service, work order #, and also shows everything they originally submitted. The problem is that the code I have is providing the data exactly how and I want it and placement is great, but I need to create visual distinction between the column titles and the variable submission data. Can someone please help me add a bold code to the column titles in line 16 to help create that visual differentiation between columnar "category and submission data?
// This constant is written in column C for rows for which an email
// has been sent successfully.
var EMAIL_SENT = "EMAIL_SENT";
function sendEmails2() {
var sheet = SpreadsheetApp.getActiveSheet();
var startRow = 2; // First row of data to process
var numRows = 1000; // Number of rows to process
// Fetch the range of cells A2:B3
var dataRange = sheet.getRange(startRow, 1, numRows, 27)
// Fetch values for each row in the Range.
var data = dataRange.getValues();
for (var i = 0; i < data.length; ++i) {
var row = data[i];
var emailAddress = row[19];
var message = row[16] + "\n\n" + "Submitted By: " + row[19] + "\n\n" + "Date Submitted: " + row[0] + "\n\n" + row[21] + "\n\n" + "IMPORTANT NOTES FROM CDS: " + row[20] + "\n\n" + "Full Show Services: " + row[3] + "\n\n" + "Event Start Date: " + row[4] + "\n\n" + "Event End Date: " + row[5] + "\n\n" + "Warehouse Locations: " + row[6] + "\n\n" + "Individual Services Requested: " + row[7] + "\n\n" + "Individual Services - Warehouse(s) & Date(s) Requested: " + row[8] + "\n\n" + "Partial Hourly Staffing Details Requested: " + row[9] + "\n\n" + "Requestors Instructions / Comments: " + row[10] + "\n\n" + "Files: " + row[11] + row[12] + "\n\n" + "Thank you for your request. We appreciate your business. CDS Special Events Team ";// Second columnn
var emailSent = row[18];
var subject = row[16];// Third columnvar ss = SpreadsheetApp.getActiveSpreadsheet();
if (emailSent != EMAIL_SENT) { // Prevents sending duplicates
MailApp.sendEmail(emailAddress, subject, message);
sheet.getRange(startRow + i, 19).setValue(EMAIL_SENT);
// Make sure the cell is updated right away in case the script is interrupted
SpreadsheetApp.flush();
}
}
}