Get dask.distributed profiling time information? - distributed

I am using dask.distributed to schedule many jobs across workers. The documentation shows how to get profiling information from the Bokeh interface
here.
It also shows that one can obtain the raw profile information calling client.profile().
However, when I call this method, the profiling information doesn't seem to include the average run time of a process, whereas it is present in the Boken interface. Is there a way to retrieve this in raw form?
Also, profile.py explains the structure of the profile information here:
We represent this tree as a nested dictionary with the following form:
{
'identifier': 'root',
'description': 'A long description of the line of code being run.',
'count': 10 # the number of times we have seen this line
'children': { # callers of this line. Recursive dicts
'ident-b': {'description': ...
'identifier': 'ident-a',
'count': ...
'children': {...}},
'ident-b': {'description': ...
'identifier': 'ident-b',
'count': ...
'children': {...}}}
}
There is no mention of timing information here. Thanks!

You should compare the value of 'count' against the profile-interval value in your config.yaml file. The profile-interval value is in milliseconds and determines the frequency at which we sample the working thread. So if profile-interval was 10 and you saw 50 counts of a particular line then that line was likely active for around 500ms * threads.

Related

Tradingview, pinescript strategy.comment, multiple comments

I do have a specific tradingview, pinescript command structure I want to maintain, and this includes strategy related arguments as well.
(since I know matlab, I will start with this). In matlab nomenclature, if you have a string array, you can do the following
array = ['dog'; 'cat']
and you can call
array(1) (to display 'dog', or array(2) to display 'cat' ...etc. And if you want to assign it to a new variable, you can do it as
new = array(1); %etc.....
In pinescript what I am trying to do is the following
orderCondition = array.new_string(4)
array.insert(orderCondition, 1, 'open')
array.insert(orderCondition, 2, 'open_new')
array.insert(orderCondition, 3, 'close')
array.insert(orderCondition, 4, 'close_old')
So in this array I am hoping I have something like
[ 'open'; 'open_new'; 'close'; 'close_old' ]
The critical part is in the assignment section. What I want to achieve is the following. I want to declare the first two parts of the array in one strategy comment, and the remaining two in the other, like
strategy.entry("LE", strategy.long, comment=orderCondition[1,2])
strategy.entry("LE", strategy.long, comment=orderCondition[3,4])
so that I group them. And not only that I am also hoping to be able to read those in strategy alert window, as
{{strategy.comment[1-2]}}
{{strategy.comment[3-4]}}
Is this possible? And if possible how can I achieve this? Thank you for your time.

SPSS produces 1 scatter plot with split file

I am working with data where I need to create multiple scatter plots for different populations. I also recently upgraded from SPSS v26 to v28 and the code that I used for this worked in v26 but is no longer working correctly in v28. Instead of producing multiple scatter plots like it's supposed to, it is now producing 1 plot in v28, presumably the first subpopulation in the split. I tested the split file function with descriptives and it worked as intended.
I have scoured everything in the GUI menus to find any kind of setting that was ticked by default and came up with nothing. I also tried to use a filter based on the criterion variables in my split file function and ran a scatter plot but that gave me a graph of the whole population instead of the subpopulation I filtered. Any guidance on what could be going on with scatter plots and the split file function in SPSS v28 would be greatly appreciated.
Here is my code for reference:
SORT CASES BY exit_yr service sigscore.
SPLIT FILE SEPARATE BY exit_yr service sigscore.
*Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=wait.time[name="wait_time"]
score.change[name="score_change"] service.ideal[name="service_ideal"] MISSING=VARIABLEWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE
/FITLINE TOTAL=NO SUBGROUP=NO.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: wait_time=col(source(s), name("wait_time"))
DATA: score_change=col(source(s), name("score_change"))
DATA: service_ideal=col(source(s), name("service_ideal"), unit.category())
GUIDE: axis(dim(1), label("wait.time: Difference in days between referral submission ",
"and referral acceptance"))
GUIDE: axis(dim(2), label("score.change: score change from pre to post"))
GUIDE: legend(aesthetic(aesthetic.color.interior), label("service.ideal"))
GUIDE: text.title(label("Grouped Scatter of score.change: score change from pre to ",
"post by wait.time: Difference in days between referral submission and referral ",
"acceptance by service.ideal"))
ELEMENT: point(position(wait_time*score_change), color.interior(service_ideal))
END GPL.
SPLIT FILE OFF.

TensorFlow learn.Estimator : is it naive to call fit() many times? Because I get ResourceExhaustedError

I am learning machine learning using TensorFlow. I have been through a couple of tutorials but I still have a hard time trying to find what are the good ways of training a model. Recently I implemented a CNN model I found in the litterature. The model must take a crop of a certain size centered on a given pixel and predict the label of this pixel. It does that for each pixel of the image. I used:
classifier = tf.learn.Estimator(model_fn=cnn_model_fn, model_dir="./cnn")
with cnn_model_fn beeing a function I implemented.
For each training image, we take 3000 crops randomly, so I can't load all theses images and their crops to memory. The way I found is by loading one image at a time, extract the 3000 crops and then call classifier.fit() to train on the 3000 crops. Then loop for each image in my dataset.
for i in range(len(filenames)):
...
image = misc.imread(filenames[i])
labels = misc.imread(groundTruth[i]) #labels for each pixels
input_classifier = preprocess(image,...) #crops 3000 images in image and do other things
input_labels = preprocess_labels(labels, ...) #take the corresponding 3000 labels
classifier.fit(x = input_classifier,
y = input_labels,
batch_size = 30
steps = 100)
It worked fine for 100 images, but if I try on the whole dataset (2000 images), it always stops and give an error of ResourceExhausted.
...
[everything goes well]
...
iteration :227/2000
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating
TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus
id: 0000:01:00.0)
INFO:tensorflow:Create CheckpointSaverHook.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating
TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus
id: 0000:01:00.0)
Traceback (most recent call last):
File "train-cnn.py", line 78, in <module>
classifier.fit(x= input_classifier, y=input_labels,batch_size=30, steps=100)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
...
...
...
tensorflow.python.framework.errors_impl.ResourceExhaustedError: cnn/graph.pbtxt.tmp32bcc6311c164c29b91177d17d05d669
I don't see why it gets OOM... I have suspicions that it is because of the way I call fit() in loop. After each fit(), a ckpt is saved and it must be restored right after to train on the next image. So is it a bad way to train a model?
running estimator.fit in a loop with smaller steps is not a good idea. I would put all input logic into an input_fn. then run estimator.fit only once with more steps.
An example of reading data from different files can be found here: tf.contrib.learn.read_batch_examples

in Watson Discovery News Feed API, limiting articles returned by date

Per the API documentation, manipulating the vales of "start" and "end" will result in different data sets being returned. Strangely, changing the values of start and end resulted in the same result being returned. What am I missing? Thanks!
qopts = {'query': '/automotive and vehicles',
'aggregation' : '[term(yyymmdd).term(docSentiment.type,count:3)]',
'return': 'docSentiment.type,yyyymmdd',
'count': '50',
'start': 'now-2w',
'end' : 'now-1w',
'offset': my_offset}
my_query = discovery.query(my_disc_environment_id, my_disc_collection_id, qopts)
I hope it is helpful for you.
I am not sure if this is right answer for you because I have limited information.
First of all, please check the number of return-sets. if the return-set has more then 50 dataset, result could be same. (might count param. -1 = unlimited in the rest API of WCA (Watson Context Analytics)
Second, if you can check the log from the server side, you can see the full query which manipulated from WATSON engine.
Last, I am not really sure that watson REST-API can recognize 'now-2w' style start-end form. Would you please link the tutorial? In my previous project, I wrote the start-end date by Y-M-D form.
Good Luck

Update AngularJS data from separate source

I'm working on my first Angular project and I've been thinking of the best way to word this question for a while now but I'm going to give this a shot.
I'm building an app that uses the Veralite (MiOS) to return device data in json format. The first request returns all of the devices on the system (example below).
"devices":[
{
"name":"Bedroom Light",
"altid":"4",
"id":6,
"category":2,
"subcategory":0,
"room":0,
"parent":1,
"status":"0",
"level":"0",
"state":-1,
"comment":""
},
{
"name":"Office Light",
"altid":"6",
"id":18,
"category":2,
"subcategory":0,
"room":0,
"parent":1,
"level":"0",
"status":"0",
"state":-1,
"comment":""
}
Once all of the devices are returned, my script begins long polling the vera engine. Once a change to a device is made the results of the long poll are returned, but the results only include the devices that were changed (example below).
"devices":[
{
"altid":"6",
"id":"18",
"subcategory":"0",
"room":"0",
"parent":"1",
"level":"20",
"status":"1",
"state":"4",
"comment":"Office Light: Transmit was ok"
}
What I am trying to wrap my head around, is the proper way to update the existing devices array with the newly updated data. Would I need to convert them to arrays, then loop through each array and try to match them by keys?
Hopefully I asked this as clearly as possible.
EDIT: Just to update this a bit for anyone that stumbles across this, specifically people interested in developing for the Veralite. The ID returned from the original request will be returned as an integer, but when long polling the engine, the ID will be returned as a string. So even though the selected answer is correct, you'll need to either parse the updated device ID as an integer (parseInt), or only use a == instead of a === when filtering the devices.
You can loop through the object fields just like you can loop through an array with Object.keys.
Conceptually something like:
Step 1) Find the updated product, by the field id:
var previousVersionDevice = $scope.devices.filter(function(item) {
return item.id === updateDevice.id; // Keep only the one existing device that matches id
})[0];
Step 2) Loop through the keys in the product and overwrite the previous values with the ones received.
Object.keys(updatedDevice).forEach(function(key) {
previousVersionDevice[key] = updatedDevice[key]; // Overwrite/add all from updated version
});
yes, you can loop through and find the matching device via some sort of unique key (assuming id is unique and won't change, you can use that).

Resources