I wrote my model in the GUI and want to run it with repetitions in the headless mode in a cluster.
The model has a go command that is repeated until we reach a specified step (at each go procedure, the year variable is incremented and when we reach 2070, the model stop running). At the end of the go procedure, the world is exported (and analysed in R).
If I run multiple repetitions on parallel cores, how can I export the worlds so they have different names ?
So far, I export the world with the following lines (when running only one time) :
let file-name (word scenario "_NETLOGO_" year ".csv")
export-world (file-name)
But if the model is run at the same time on several cores, there will be overlap and I would not know which file is coming from which repetition (assuming that the name would change with an extra (1)). I thought about creating folders to save the worlds,is that possible ? Is so, how is it possible to pimp the folder name according to the number of repetitions ?
Related
I'm trying to generate a random release-name by using the Bamboo Server's deployment project / plan.
I was able to generate a dynamic version number by using variables (defined the bamboo.release_number, bamboo.release_major, etc.)
resulting a release name of "StaticName-1.2"
I would like to define on each release that the StaticName will be generated randomly, based on predefined array or list.
The wanted result is by giving it the list of: ["NameA", "NameB", "NameC"]
Bamboo will generate something else on every run, like:
"NameB-1.3"
"NameC-1.4"
"NameA-1.5"
Any ideas how can I preform such thing?
Generally, you can add a build task to write the release name to a file in the project in the format of KEY=VALUE. You would then add an inject variables task to read that file, adding the variables to the plan. You can then reference that variable when creating the release just as you did for the build number.
I am learning about big data in my class and right now we are learning about HIVE. We learned about the mappers and reducers today, but honestly it went way over my head. Could someone explain to me what the mapper and the reducer does in each step? or atleast point me to some good readings? Thanks in advance
Lets try to understand map-reduce flow from above diagram which i downloaded from internet.
We are going to discuss word count problem in hadoop , which is also known as hello world in Hadoop .
Word Count is a program where we find the occurrence of each word from the file.
Lets try to understand
Step 1) :
Input file : we need some data on which we will be running the word count program , to run this program on cluster , first step should be to put this file on hadoop , which can be done via various ways easiest way is to use hadoop shell commands :
like put or copyFromLocal can be used :
Step 2)
Map Reduce talk in terms of key value pair , which means mapper will get input in the form of key and value pair, they will do the required processing then they will produce intermediate result in the form of key value pair ,which would be input for reducer to further work on that and finally reducer will also write their output in key value pair.
But we know mapper execute just after main driver program ,so who is providing input to mapper in the form of key value pair,input format does this thing for you .
InputFormat is the class which does two major things:
1) Input split ( Your number of instances of mapper driven by this input split or number of mapper is driven by input split , by default your one split size is equivalent to one block if you go by default configuration, but you may change the split size as per your need .
so if you are playing with lets say 512 mb data and your block size is 64 mb so about 8 input split size will be used , 8 mapper instances would run or 8 mappers would be used )
2) Breaking the data in key value pair ( record reader is the class which does this thing at the back end)
Now what would be key and value for a mapper , that would be driven by the file input format you use , for instance for TextInputFormat which is the mostly used inputformat. it sends longWritable(equivalent to long) as a key and Text(string) as a value to mapper
Your mapper class would work on 1 split , in class you have a map function which would work on a single line at a time so as we can see from the above diagram single line would go the map function
for example it send : "Apple orange Mango" to map function
3) Mapper
In mapper we get line as an input so now we need to write our logic .
we break the line into words based on delimited so now we have single single word in one line
As we know map works on key value pair .
we can take that work as a key and value as 1
why we have taken word as key not the other way round , because next phase is
Shuffling and Sorting phase : In this phase framework will make the group based on similar keys , or all the different keys will come together during shuffling phase and they will be sorted on the basis of keys.
Now Lets again revise :
Initially we had one file which was sent to different different mapper based on input splits , then in mapper class in map function we got one line as an input ,so built our logic with respect to one line , all the lines will work in a similar way for one instance and finally all instance would be able to work parallel way like this.
Now lets say you have 10 mappers running , now in map reduce your number of reducer is always less than mapper .
so if we 10 mappers were used so most likely 2-3 reducers would be used .
Shuffling and sorting phase we have seen all the similar keys will club together .
First of all on which basis it would be decided that which mapper data will go to which reducer.
In out case 10 mappers data has to divide in 2 reducers ,so on which basis it would decide .
There is a component called Partitioner which will decide which mapper output will go to which reducer based on hash partitioning and using modulo operator on that .
so if we are using hashing so this is 100% sure that all the same keys would go to same reducer.
We don't have to bother about anything as framework has been designed to do so efficiently , but yes as it has been written in java so we do have all the flexibility to play with different components as per need like customizing key,custom partitioner ,custom comparator and so on .
4)
Reducer : Now reducer will get keys and list of its value in its input something like this
Apple,<1,1,1,1)>
Now in reducer we write logic what exactly we want to do , for our case we want to do word count so simply we have to count the values .
That was also the reason we took 1 as value initially in Map phase because simply we had to count .
Output : final output would be written to hdfs by reducer Again in key value pair.
I am working on my first SSIS package. I have a view with data that looks something like:
Loc Data
1 asd
1 qwe
2 zxc
3 jkl
And I need all of the rows to go to different files based on the Loc value. So all of the data rows where Loc = 1 should end up in the file named Loc1.txt, and the same for each other Loc.
It seems like this can be accomplished with a conditional split to flat file, but that would require a destination for each Location. I have a lot of Locations, and they all will be handled the same way other than being split in to different files.
Is there a built in way to do this without creating a bunch of destination components? Or can I at least use the script component to act as a way?
You should be able to set an expression using a variable. Define your path up to the directory and then set the variable equal to that column.
You'll need an Execute SQL task to return a Single Row result set, and loop that in a container for every row in your original result set.
I don't have access at the moment to post screenshots, but this link should help outline the steps.
So when your package runs the expression will look like:
'C:\Documents\MyPath\location' + #User::LocationColumn + '.txt'
It should end up feeding your directory with files according to location.
Set the User::LocationColumn equal to the Location Column in your result set. Write your result set to group by Location, so all your records write to a single file per Location.
I spent some time try to complete this task using the method #Phoenix suggest, but stumbled upon this video along the way.
I ended up going with the method shown in the video. I was hoping I wouldn't have to separate it in to multiple select statements for each location and an extra one to grab the distinct locations, but I thought the SSIS implementation in the video was much cleaner than the alternative.
Change the connection manager's connection string, in which you have to use variable which should be changed.
By varying the variable, destination file also changes
and connection string is :
'C:\Documents\ABC\Files\' + #User::data + '.txt'
vote this if it helps you
I'm working in SPSS with this kaplan-meier command:
KM data BY sample
/STATUS=status(0)
/PRINT TABLE MEAN
/PLOT SURVIVAL HAZARD
/TEST LOGRANK BRESLOW TARONE
/COMPARE OVERALL POOLED.
This is no problem but there's a lot of data i have to process and I'm trying to get this in a syntax file together. Can i do a loop of several kaplan-meier commands with data going through a set of variables such as {time0 time1 time2} and sample going through a set such as {sample0 sample1 sample2}.
I tried with DO REPEAT - END REPEAT. But I couldnt get it to work.
DO REPEAT applies to transformation commands. Procedures cannot be placed inside loops. However, if you install the Python Essentials from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral), this is easy to do. If you can provide more details on what you want to loop over, we can explain how to do this.
I'm tring to create an SSIS package to import some dataset files, however given that I seem to be hitting a brick
wall everytime I achieve a small part of the task I need to take a step back and perform a sanity check on what I'm
trying to achieve, and if you good people can advise whether SSIS is the way to go about this then I would
appreciate it.
These are my questions from this morning :-
debugging SSIS packages - debug.writeline
Changing an SSIS dts variables
What I'm trying to do is have a For..Each container enumerate over the files in a share on the SQL Server. For each
file it finds a script task runs to check various attributes of the filename, such as looking for a three letter
code, a date in CCYYMM, the name of the data contained therein, and optionally some comments. For example:-
ABC_201007_SalesData_[optional comment goes here].csv
I'm looking to parse the name using a regular expression and put the values of 'ABC', '201007', and
'SalesData' in variables.
I then want to move the file to an error folder if it doesn't meet certain criteria :-
Three character code
Six character date
Dataset name (e.g. SalesData, in this example)
CSV extension
I then want to lookup the Character code, the date (or part thereof), and the Dataset name against a lookup table
to mark off a 'checklist' of received files from each client.
Then, based on the entry in the checklist, I want to kick off another SSIS package.
So, for example I may have a table called 'Checklist' with these columns :-
Client code Dataset SSIS_Package
ABC SalesData NorthSalesData.dtsx
DEF SalesData SouthSalesData.dtsx
If anyone has a better way of achieving this I am interested in hearing about it.
Thanks in advance
That's an interesting scenario, and should be relatively easy to handle.
First, your choice of the Foreach Loop is a good one. You'll be using the Foreach File Enumerator. You can restrict the files you iterate over to be just CSVs so that you don't have to "filter" for those later.
The Foreach File Enumerator puts the filename (full path or just file name) into a variable - let's call that "FileName". There's (at least) two ways you can parse that - expressions or a Script Task. Depends which one you're more comfortable with. Either way, you'll need to create three variables to hold the "parts" of the filename - I'll call them "FileCode", "FileDate", and "FileDataset".
To do this with expressions, you need to set the EvaluateAsExpression property on FileCode, FileDate, and FileDataset to true. Then in the expressions, you need to use FINDSTRING and SUBSTRING to carve up FileName as you see fit. Expressions don't have Regex capability.
To do this in a Script Task, pass the FileName variable in as a ReadOnly variable, and the other three as ReadWrite. You can use the Regex capabilities of .Net, or just manually use IndexOf and Substring to get what you need.
Unfortunately, you have just missed the SQLLunch livemeeting on the ForEach loop: http://www.bidn.com/blogs/BradSchacht/ssis/812/sql-lunch-tomorrow
They are recording the session, however.