Iterate map and reduce operations

Iterate map and reduce operations - loops

I'm writing a Hadoop application calculates map data at a certain resolution. My Input files are tiles of a map, named according to the QuadTile principle. I need to subsample those, and stitch those together until I have a certain higher-level tile which covers a larger area but at a lower resolution. Like zooming out in google maps.
Currently my Mapper subsamples tiles and my reducer combines tiles a a certain level and forms tiles of one level up. So for so good. But depending on which tile I need, I need to repeat those map and reduce steps a x times, which I have not been able to do so far.
What would be the best way to do so? Is it possible without explicitly saving the tiles in some temp directory and starting a new mapreduce Job on those temp dirs until I get what I want? What I think would be the perfect solution is something roughly like 'while(context.hasMoreThanOneKey()){iterate mapreduce}'.
Following an answer, I have now written a class TileJob which extends Job. However, the mapreduce is still not chained. Could you tell me what I'm doing wrong?
public boolean waitForCompletion(boolean verbose) throws IOException, InterruptedException, ClassNotFoundException{
if(desiredkeylength != currentinputkeylength-1){
System.out.println("In loop, setting input at " + tempout);
String tempin = tempout;
FileInputFormat.setInputPaths(this, tempin);
tempout = (output + currentinputkeylength + "/");
FileOutputFormat.setOutputPath(this, new Path(tempout));
System.out.println("Setting output at " + tempout);
currentinputkeylength--;
Configuration conf = new Configuration();
TileJob job = new TileJob(conf);
job.setJobName(getJobName());
job.setUpJob(tempin, tempout, tiletogenerate, currentinputkeylength);
return job.waitForCompletion(verbose);
}else{
//desiredkeylength == currentkeylength-1
System.out.println("In else, setting input at " + tempout);
String tempin = tempout;
FileInputFormat.setInputPaths(this, tempin);
tempout = output;
FileOutputFormat.setOutputPath(this, new Path(tempout));
System.out.println("Setting output at " + tempout);
currentinputkeylength--;
Configuration conf = new Configuration();
TileJob job = new TileJob(conf);
job.setJobName(getJobName());
job.setUpJob(tempin, tempout, tiletogenerate, currentinputkeylength);
currentinputkeylength--;
return super.waitForCompletion(verbose);
}
}

Usually you kick a mapreduce step off by having a driver class main method that configures the Job, Configuration and format type (input and output). Once everything's ready to go that main method calls Job::waitForCompletion() which submits the job and waits for the job to complete before continuing.
You can wrap some of that logic in a loop that repeatedly calls Job::waitForCompletion() until your criteria is met. You can implement your criteria using counters. Put logic into your reduce() method to set or increment a counter with the number of keys. Your loop in the driver class can get the value of that (distributed) counter from the Job instance and you code your while expression using that value.
What file locations you use is up to you. Inside this driver loop you can change the file location for the inputs and outputs, or keep them the same.
I should probably add that you ought to go ahead and create a new Job and Configuration instance inside the loop. I don't know that those objects are reusable in this situation.
public static void main(String[] args) {
int keys = 2;
boolean completed = true;
while (completed & (keys > 1)) {
Job job = new Job();
// Do all your job configuration here
completed = job.waitForCompletion();
if (completed) {
keys = job.getCounter().findCounter("Total","Keys").getValue();
}
}
}

Related

How to update UI with a viewModel for byte arrays and lists in Kotlin/Jetpack Compose?

So the problem I am facing is that when the viewModel data updates it doesn't seems to update the state of my bytearrays: ByteArray by (mutableStateOf) and mutableListOf()
When I change pages/come back to the page it does update them again. How can get the view to update for things like lists and bytearrays
Is mutableStateOf the wrong way to update bytearrays and lists? I couldn't really find anything useful.
Example of byte array that doesn't work (using this code with a Float by mutableStateOf works!).
How I retrieve the data in the #Composable:
val Data = BluetoothMonitoring.shared.Data /*from a viewModel class, doesn't update when Data / bytearray /list changes, only when switching pages in app.*/
Class:
class BluetoothMonitoring : ViewModel(){
companion object {
val shared = BluetoothMonitoring()
}
var Data : ByteArray by mutableStateOf( ByteArray(11) { 0x00 })
}
Any help is appreciated and thanks in advance!

You seem to come from IOS/Swift where using Shared objects is a common pattern
In Android, you're not supposed to create a viewmodel instance by hand, you are supposed to use ViewModelFactory, or compose viewModel() function. The ViewModel object basically preserves some data for you across activity recompositions, i.e. when activity is paused, resumed etc
The general good way of structuring your app is using UDF (Unidirectional Data Flow) where, data flows upwards towards the view/composable and events flow downwards towards the viewmodel, data layer etc. We use something called as Flow which is a reactive data stream, which updates the listener whenever some value changes
Keeping in mind these two points, I've created a very brief way of how you could restructure your code, so that it almost always works. Please adapt accordingly to your logic
class MyViewModel: ViewModel(){
// Declare a flow in the viewModel
var myData = MutableStateFlow(0)
fun modifyMyData(newData: Int){
myData = newData
}
}
And in your composable view layer
#Composable
fun YourComposable(){
val myViewModel = viewModel()
val myUiState by myViewModel.myData.collectAsState()
// Now use your value, and change it, it will be changed accordingly and updated everywhere
}
I also recommend reading this codelab

How I handle byte arrays over Bluetooth with kotlin and Android.
I'm talking sic (communication) to arduino over bluetooth in the app I'm making. Kotlin likes Byte and Arduino likes UBYTE so I do all of those translations in the app because Kotlin threads are inline code easy and the phone has more power for such things.
Add toByte() to everthing that is outbound. <- (actual solution)
outS[8] = Color.alpha(word.color).toByte()
outS[9] = Color.red(word.color).toByte()
outS[10] = Color.green(word.color).toByte()
outS[11] = Color.blue(word.color).toByte()
outS[12] = 0.toByte()
outS[13] = 232.toByte()
outS[14] = 34.toByte()
outS[15] = 182.toByte()
//outS[16] = newline working without a newLine!
// outS[16] = newline
val os = getMyOutputStream()
if (os != null) {
os.write(outS)
}
For inbound data...I have to change everything to UByte. <- (actual solution)
val w: Word = wordViewModel.getWord(i)
if (w._id != 0) {
w.rechecked = (byteArray[7].toInt() != 0)
w.recolor = Color.argb(
byteArray[8].toUByte().toInt(),
byteArray[9].toUByte().toInt(),
byteArray[10].toUByte().toInt(),
byteArray[11].toUByte().toInt()
)
//finally update the word
wordViewModel.update(w)
}
My Youtube channel is about model trains not coding there some android and electronics in there. And a video of bluetooth arduino android is coming soon like tonight or tomorrow soon. I've been working on the app for just over a month using it to learn Kotlin. The project is actually working but I have to do the bluetooth connection manually the app actually controls individual neopixels for my model train layout.

How does Flink treat timestamps within iterative loops?

How are timestamps treated within an iterative DataStream loop within Flink?
For example, here is an example of a simple iterative loop within Flink where the feedback loop is of a different type to the input stream:
DataStream<MyInput> inputStream = env.addSource(new MyInputSourceFunction());
IterativeStream.ConnectedIterativeStreams<MyInput, MyFeedback> iterativeStream = inputStream.iterate().withFeedbackType(MyFeedback.class);
// define an output tag so we can emit feedback objects via a side output
final OutputTag<MyFeedback> outputTag = new OutputTag<MyFeedback>("feedback-output"){};
// now do some processing
SingleOutputStreamOperator<MyOutput> combinedStreams = iterativeStream.process(new CoProcessFunction<MyInput, MyFeedback, MyOutput>() {
#Override
public void processElement1(MyInput value, Context ctx, Collector<MyOutput> out) throws Exception {
// do some processing of the stream of MyInput values
// emit MyOutput values downstream by calling out.collect()
out.collect(someInstanceOfMyOutput);
}
#Override
public void processElement2(MyFeedback value, Context ctx, Collector<MyOutput> out) throws Exception {
// do some more processing on the feedback classes
// emit feedback items
ctx.output(outputTag, someInstanceOfMyFeedback);
}
});
iterativeStream.closeWith(combinedStreams.getSideOutput(outputTag));
My questions revolve around how does Flink use timestamps within a feedback loop:
Within the ConnectedIterativeStreams, how does Flink treat ordering of the input objects across the streams of regular inputs and feedback objects? If I emit an object into the feedback loop, when will it be seen by the head of the loop with respect to the regular stream of input objects?
How does the behaviour change when using event time processing?

AFAICT, Flink doesn't provide any guarantees on the ordering of input objects. I've run into this when trying to use iterations for a clustering algorithm in Flink, where the centroid updates don't get processed in a timely manner. The only solution I found was to essentially create a single (unioned) stream of the incoming events and the centroid updates, versus using a co-stream.
FYI there's this proposal to address some of the short-comings of iterations.

Action to spawn multiple further actions in Gatling scenario

Background
I'm currently working on a capability analysis set of stress-testing tools for which I'm using gatling.
Part of this involves loading up an elasticsearch with scroll queries followed by update API calls.
What I want to achieve
Step 1: Run the scroll initiator and save the _scroll_id where it can be used by further scroll queries
Step 2: Run a scroll query on repeat, as part of each scroll query make a modification to each hit returned and index it back into elasticsearch, effectively spawning up to 1000 Actions from the one scroll query action, and having the results sampled.
Step 1 is easy. Step 2 not so much.
What I've tried
I'm currently trying to achieve this via a ResponseTransformer that parses JSON-formatted results, makes modifications to each one and fires off a thread for each one that attempts another exec(http(...).post(...) etc) to index the changes back into elasticsearch.
Basically, I think I'm going about it the wrong way about it. The indexing threads never get run, let alone sampled by gatling.
Here's the main body of my scroll query action:
...
val pool = Executors.newFixedThreadPool(parallelism)
val query = exec(http("Scroll Query")
.get(s"/_search/scroll")
.body(ElFileBody("queries/scrollquery.json")).asJSON // Do the scroll query
.check(jsonPath("$._scroll_id").saveAs("scroll_id")) // Get the scroll ID from the response
.transformResponse { case response if response.isReceived =>
new ResponseWrapper(response) {
val responseJson = JSON.parseFull(response.body.string)
// Get the hits and
val hits = responseJson.get.asInstanceOf[Map[String, Any]]("hits").asInstanceOf[Map[String,Any]]("hits").asInstanceOf[List[Map[String, Any]]]
for (hit <- hits) {
val id = hit.get("_id").get.asInstanceOf[String]
val immutableSource = hit.get("_source").get.asInstanceOf[Map[String, Any]]
val source = collection.mutable.Map(immutableSource.toSeq: _*) // Make the map mutable
source("newfield") = "testvalue" // Make a modification
Thread.sleep(pause) // Pause to simulate topology throughput
pool.execute(new DocumentIndexer(index, doctype, id, source)) // Create a new thread that executes the index request
}
}
}) // Make some mods and re-index into elasticsearch
...
DocumentIndexer looks like this:
class DocumentIndexer(index: String, doctype: String, id: String, source: scala.collection.mutable.Map[String, Any]) extends Runnable {
...
val httpConf = http
.baseURL(s"http://$host:$port/${index}/${doctype}/${id}")
.acceptHeader("application/json")
.doNotTrackHeader("1")
.disableWarmUp
override def run() {
val json = new ObjectMapper().writeValueAsString(source)
exec(http(s"Index ${id}")
.post("/_update")
.body(StringBody(json)).asJSON)
}
}
Questions
Is this even possible using gatling?
How can I achieve what I want to achieve?
Thanks for any help/suggestions!

It's possible to achieve this by using jsonPath to extract the JSON hit array and saving the elements into the session and then, using a foreach in the action chain and exec-ing the index task in the loop you can perform the indexing accordingly.
ie:
ScrollQuery
...
val query = exec(http("Scroll Query")
.get(s"/_search/scroll")
.body(ElFileBody("queries/scrollquery.json")).asJSON // Do the scroll query
.check(jsonPath("$._scroll_id").saveAs("scroll_id")) // Get the scroll ID from the response
.check(jsonPath("$.hits.hits[*]").ofType[Map[String,Any]].findAll.saveAs("hitsJson")) // Save a List of hit Maps into the session
)
...
Simulation
...
val scrollQueries = scenario("Enrichment Topologies").exec(ScrollQueryInitiator.query, repeat(numberOfPagesToScrollThrough, "scrollQueryCounter"){
exec(ScrollQuery.query, pause(10 seconds).foreach("${hitsJson}", "hit"){ exec(HitProcessor.query) })
})
...
HitProcessor
...
def getBody(session: Session): String = {
val hit = session("hit").as[Map[String,Any]]
val id = hit("_id").asInstanceOf[String]
val source = mapAsScalaMap(hit("_source").asInstanceOf[java.util.LinkedHashMap[String,Any]])
source.put("newfield", "testvalue")
val sourceJson = new ObjectMapper().writeValueAsString(mapAsJavaMap(source))
val json = s"""{"doc":${sourceJson}}"""
json
}
def getId(session: Session): String = {
val hit = session("hit").as[Map[String,Any]]
val id = URLEncoder.encode(hit("_id").asInstanceOf[String], "UTF-8")
val uri = s"/${index}/${doctype}/${id}/_update"
uri
}
val query = exec(http(s"Index Item")
.post(session => getId(session))
.body(StringBody(session => getBody(session))).asJSON)
...
Disclaimer: This code still needs optimising! And I haven't actually learnt much scala yet. Feel free to comment with better solutions
Having done this, what I really want to achieve now is to parallelise a given number of the indexing tasks. ie: I get 1000 hits back, I want to execute an index task for each individual hit, but rather than just iterating over them and doing them one after another, I want to do 10 at a time concurrently.
However, I think this is a separate question, really, so I'll present it as such.

start jenkins builds parallely?

I have 'n' no. of jobs, which I want to start simultaneously. Is it feasible in Jenkins? I tried using DSL plugin, work flow plugin. I have used 'parallel' method. I have my list of jobnames in an array/list and want to run them parallel. Please help.
Currently I'm iterating the jobnames in an array, so they start one by one, instead I want them to start parallel. How this can be achieved ?

This will fullfill your requirement
GParsPool.withPool(NO.OF.THREADS){
sampleList.eachParallel{
callYourMethod(it)
}
}

I've done this before. The solution I found that worked for me was through use of upstream/downstream jobs, visualizing using the Build Pipeline Plugin.
Essentially, create one blank 'bootstrap' job that you use to kick off all the jobs that you require to run in parallel, with each of the parallel jobs having the bootstrap job as an upstream trigger.

The option that we have been using to trigger multiple jobs in parallel is the https://wiki.jenkins-ci.org/display/JENKINS/Parameterized+Trigger+Plugin we then have a master job that just needs a comma separated list of all the jobs to call.

In my opinion, the best way (it will up your personal skill ;-) ) is to use Jenkins Pipeline !!
You can find a simple example of code just below
// While you can't use Groovy's .collect or similar methods currently, you can
// still transform a list into a set of actual build steps to be executed in
// parallel.
// Our initial list of strings we want to echo in parallel
def stringsToEcho = ["a", "b", "c", "d"]
// The map we'll store the parallel steps in before executing them.
def stepsForParallel = [:]
// The standard 'for (String s: stringsToEcho)' syntax also doesn't work, so we
// need to use old school 'for (int i = 0...)' style for loops.
for (int i = 0; i < stringsToEcho.size(); i++) {
// Get the actual string here.
def s = stringsToEcho.get(i)
// Transform that into a step and add the step to the map as the value, with
// a name for the parallel step as the key. Here, we'll just use something
// like "echoing (string)"
def stepName = "echoing ${s}"
stepsForParallel[stepName] = transformIntoStep(s)
}
// Actually run the steps in parallel - parallel takes a map as an argument,
// hence the above.
parallel stepsForParallel
// Take the string and echo it.
def transformIntoStep(inputString) {
// We need to wrap what we return in a Groovy closure, or else it's invoked
// when this method is called, not when we pass it to parallel.
// To do this, you need to wrap the code below in { }, and either return
// that explicitly, or use { -> } syntax.
return {
node {
echo inputString
}
}
}

System.TypeException: Cannot have more than 10 chunks in a single operation

I have this very weird error, "System.TypeException: Cannot have more than 10 chunks in a single operation", has anyone seen/encountered this before ? Please can you guide me if you know how to solve this.
I am trying to insert different types of sObjects together in a list of sObject. The list is never larger than 10 rows.

This post here:
https://developer.salesforce.com/forums/ForumsMain?id=906F000000090nUIAQ
suggests that it is not the number of different sObjects, but the order of the objects that causes this chunk limit to be exceeded. In other words, "1,1,1,2,2,2" has one chunk, the transition from "1" to "2". "1,2,3,4,5,6" has six chunks, even though the number of elements is the same. Putting the objects into the list sorted in object order is the suggested solution.
Is it possible for you to create a reasonable test case with only 2 or 3 rows?

There are two possible explanations for this issue:
As Jagular noted, you did not order the sobjects you tried to insert, so there are more than 10 'chunks' in the list.
You try to insert > 2000 records, and > 1 sobject type. This one seems like a Salesforce Bug, since the error message doesn't match the issue.

Scenario 1 and its solution
When you have a hybrid list, make sure that the objects are not scattered without any order. For example, A,B,A,B,A,B,A,B…. Salesforce has an inherent trouble in switching sObject types for more than 10 times. They call this switching limit as Chunking Limit. So, on this hybrid list, if you would have sorted it and then passed it for DML, Salesforce would have been much happier. For example. A,A,A,A,B,B,B,B… In this case, salesforce only has to switch one time (that is read all A objects –>switch –> read all B objects). The max chunk limit is 10. So, here we are safe.
listToUpdate.sort();
UPDATE listToUpdate;
Scenario 2 and its solution
Another point that we have to bear in our mind is that if the hybrid list contains more number of objects for one type, we can run into TypeException. As mentioned in the screenshot, if list contains 1001 objects of type A and 1001 objects of type B, then total objects is equal to 2002. The maximum chunks allowed is 10. So, if you do a simple math, the number of objects in each chunk would be 2002/10 = 200. Salesforce also enforces another governor limit that each chunk should not contain 200 or more than 200 objects. In this case, we will have to foresee how much objects are possible to enter this code and we have to write code to pass lists of safe size for DML every time.
Scenario 3 and its solution
Scenario 3 and its solution
Third scenario that can happen is if the hybrid list contains objects of more than 10 types, then even if the size of the list if very small, switching happens when salesforce reads different sObject. So, we have to make sure that in this case, we allot separate lists for each sObject type and then pass it on for DML. Doing this in an apex trigger or apex class would cause you some trouble as multiple DML’s are initiated in a context. Passing this kind of multiple sObject lists for DML operations in a different context would really ease up the load you pump into the platform. Consider doing this kind of logic in a Batch Apex Job rather than a apex trigger or apex class.
Hope this helps.

Below a code that should cover all 3 Scenario from Arpit Sethi.
It's a piece of code I took from this topic: https://developer.salesforce.com/forums/?id=906F000000090nUIAQ.
and modified to cover Scenario 2.
private static void saveSobjectSet(List <Sobject> listToUpdate) {
Integer SFDC_CHUNK_LIMIT = 10;
// Developed this part due to System.TypeException: Cannot have more than 10 chunks in a single operation
Map<String, List<Sobject>> sortedMapPerObjectType = new Map<String, List<Sobject>>();
Map<String, Integer> numberOf200ChunkPerObject = new Map<String, Integer>();
for (Sobject obj : listToUpdate) {
String objTypeREAL = String.valueOf(obj.getSObjectType());
if (! numberOf200ChunkPerObject.containsKey(objTypeREAL)){
numberOf200ChunkPerObject.put(objTypeREAL, 1);
}
// Number of 200 chunk for a given Object
Integer numnberOf200Record = numberOf200ChunkPerObject.get(objTypeREAL);
// Object type + number of 200 records chunk
String objTypeCURRENT = String.valueOf(obj.getSObjectType()) + String.valueOf(numnberOf200Record);
// CurrentList
List<sObject> currentList = sortedMapPerObjectType.get(objTypeCURRENT);
if (currentList == null || currentList.size() > 199) {
if(currentList != null && currentList.size() > 199){
numberOf200ChunkPerObject.put(objTypeREAL, numnberOf200Record + 1);
objTypeCURRENT = String.valueOf(obj.getSObjectType()) + String.valueOf(numnberOf200Record);
}
sortedMapPerObjectType.put(objTypeCURRENT, new List<Sobject>());
}
sortedMapPerObjectType.get(objTypeCURRENT).add(obj);
}
while(sortedMapPerObjectType.size() > 0) {
// Create a new list, which can contain a max of chunking limit, and sorted, so we don't get any errors
List<Sobject> safeListForChunking = new List<Sobject>();
List<String> keyListSobjectType = new List<String>(sortedMapPerObjectType.keySet());
for (Integer i = 0;i<SFDC_CHUNK_LIMIT && !sortedMapPerObjectType.isEmpty();i++) {
List<Sobject> listSobjectOfOneType = sortedMapPerObjectType.remove(keyListSobjectType.remove(0));
safeListForChunking.addAll(listSobjectOfOneType);
}
update safeListForChunking;
}
}
Hope it helps,
Bye

Hi i kind of deviced a simple way to sort a list of different sobject types
public List<Sobject> SortRecordsByType(List<Sobject> records){
List<Sobject> response = new List<Sobject>();
Map<string,List<Sobject>> sortDictionary = new Map<string,List<Sobject>>();
for(Sobject record : records){
string objectTypeName = record.getSobjectType().getDescribe().getName();
if(sortDictionary.containsKey(objectTypeName)){
sortDictionary.get(objectTypeName).add(record);
}else{
sortDictionary.put(objectTypeName , new List<Sobject>{record});
}
}
// arrange in order
for(string objectName : sortDictionary.keySet()){
response.addAll(sortDictionary.get(objectName));
}
return response;
}
hopefully this solves your problem .

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight