Optaplanner benchmarking and fine tuning

Optaplanner benchmarking and fine tuning - benchmarking

I am current tweaking and fine tuning my installer booking assignment optimizer, just recently upgraded my library to Optaplanner 6.2.0 Final. I am using the benchmarker to observe which optimization strategy (EntityTabu, SimulatedAnnealing, with or without TailChainSwapMove) I have a few questions:
1) I made an eventListener attached to my Solver, for displaying any improvements in scoring. Can I attached the eventListener to my benchmark?
2) For ChangeMove and SwapMove selector, can I use a filterClass in conjuction with an entitySelector, so I could utilize nearbyDistanceMeterClass?
<solverBenchmark>
<name>Entity tabu w tailChainSwapMove</name>
<solver>
<localSearch>
<unionMoveSelector>
<changeMoveSelector>
<filterClass>com.tmrnd.pejal.opta.solver.move.InstallerChangeMoveFilter</filterClass>
</changeMoveSelector>
<swapMoveSelector>
<filterClass>com.tmrnd.pejal.opta.solver.move.SamePttSwapMoveFilter</filterClass>
</swapMoveSelector>
<tailChainSwapMoveSelector>
<entitySelector id="entitySelector3"/>
<valueSelector>
<nearbySelection>
<originEntitySelector mimicSelectorRef="entitySelector3"/>
<nearbyDistanceMeterClass>com.tmrnd.pejal.opta.solver.move.BookingNearbyDistanceMeter</nearbyDistanceMeterClass>
<parabolicDistributionSizeMaximum>20</parabolicDistributionSizeMaximum>
</nearbySelection>
</valueSelector>
</tailChainSwapMoveSelector>
</unionMoveSelector>
<acceptor>
<entityTabuRatio>0.05</entityTabuRatio>
</acceptor>
<forager>
<acceptedCountLimit>1000</acceptedCountLimit>
</forager>
</localSearch>
</solver>

1) Do you mean like all the optional statistics that the benchmarker supports, such as the BEST_SCORE statistic (see docs) etc? All those statistics are nicely shown in the benchmark report.
2) Try it out.

Related

Swin Transformer attention maps visualization

I am using a Swin Transformer for a hierarchical problem of multi calss multi label classification. I would like to visualize the self attention maps on my input image trying to extract them from the model, unfortunately I am not succeeding in this task. Could you give me a hint on how to do it?
I share you the part of the code in which I am trying to do this task.
attention_maps = []
for module in model.modules():
#print(module)
if hasattr(module,'attention_patches'): #controlla se la variabile ha l' attributo
print(module.attention_patches.shape)
if module.attention_patches.numel() == 224*224:
attention_maps.append(module.attention_patches)
for attention_map in attention_maps:
attention_map = attention_map.reshape(224, 224, 1)
plt.imshow(sample['image'].permute(1, 2, 0), interpolation='nearest')
plt.imshow(attention_map, alpha=0.7, cmap=plt.cm.Greys)
plt.show()
``
In addition if you know about some explainability techniques, like Grad-CAM, which could be used with a hierarchical Swin Transformer, feel free to attach a link, it would be very helpful for me.

I am also researching the same, while I don't have anything specific to SWIN. Here are some resources related to Vision Transformers. I hope it helps:
https://jacobgil.github.io/deeplearning/vision-transformer-explainability
https://github.com/jacobgil/vit-explain
https://github.com/hila-chefer/Transformer-Explainability
https://github.com/hila-chefer/Transformer-Explainability/blob/main/Transformer_explainability.ipynb

flink assign uid to window function

is there a way to assign uid to a window function (such as apply(ApplyCustomFunction)) as we do for map/flatmap (or other) functions in Flink. The Flink version is 1.13.1.
I would like to specify the case with an example
DataStream<RECORD> outputDataStream = dataStream
.coGroup(otherDataStream)
.where(DATA::getKey)
.equalTo(OTHERDATA::getKey)
.window(TumblingProcessingTimeWindows.of(Time.seconds(2)))
.apply(new CoGroupFunction());
Thanks

CoGroupedStreams.WithWindow#apply(CoGroupFunction<T1,T2,T>) doesn't have the return type that's needed for setting a UID or per-operator parallelism (among other things). This was done in order to keep binary backwards compatibility, and can't be fixed before Flink 2.0.
You can work around this by using the (deprecated) with method instead of apply, as in
DataStream<RECORD> outputDataStream = dataStream
.coGroup(otherDataStream)
.where(DATA::getKey)
.equalTo(OTHERDATA::getKey)
.window(TumblingProcessingTimeWindows.of(Time.seconds(2)))
.with(new CoGroupFunction())
.uid("window");
The with method will be removed once it is no longer needed.

Use with() instead of apply(). It will be fixed in 2.0 version, how it sayed in documentation

How to speed up bulk operations in IBM Graph

I'm trying to populate my graph on IBM Graph service using gremlin queries. I'm using addVertex and I'm doing it in batches. The gremlin I'm using looks like this and it seems slow
{"gremlin":
"def g = graph.traversal();
graph.addVertex(T.label, "foo")";
.
.
.
}
Is there a way to speed this up

The problem with this script is that it will be compiled every time and that takes time. If you have 100's of these then the time to compile each one will definitely add up. A Better way to do is to write the script once and then bind the variables in a bindings object.
{
"gremlin": "def g = graph.traversal();graph.addVertex(T.label, name)",
"bindings": { "name": "foo" }
}
This technique will pretty much work with any database that's built on top of Tinkerpop and uses Gremlin as a DSL

Is it possible to do Field Collapsing/Grouping with Solrnet 0.3.1 and Sorl 3.3

As mentioned in the question I am trying to utilize Solr's result grouping feature to (surprise) group my search results. From what I understand Solrnet 0.3.1 supports field collapsing but is broken for Solr because Solr replaced field collapsing with results grouping in version 3.3.
I have seen that Solrnet 0.4.0 alpha supports grouping however I don't think it is possible for me to utilize that for my current project as it is alpha and I would have a tough time justifying it to the customer. Unless someone can list some fairly compelling arguments to the contrary that an alpha would be 'unsafe'.
I have also tried adding the group parameters by setting the ExtraParams like so:
ExtraParams = new Dictionary<string, string>{{"group", "true"}, {"group.field", "fieldName"}}
This throws a NullReferenceException:
at SolrNet.Impl.ResponseParsers.ResultsResponseParser`1.Parse(XmlDocument xml, SolrQueryResults`1 results) in c:\prg\SolrNet\svn\SolrNet\Impl\ResponseParsers\ResultsResponseParser.cs:line 35
at SolrNet.Impl.SolrQueryResultParser`1.Parse(String r) in c:\prg\SolrNet\svn\SolrNet\Impl\SolrQueryResultParser.cs:line 46
at SolrNet.Impl.SolrQueryExecuter`1.Execute(ISolrQuery q, QueryOptions options) in c:\prg\SolrNet\svn\SolrNet\Impl\SolrQueryExecuter.cs:line 309
at SolrNet.Impl.SolrBasicServer`1.Query(ISolrQuery query, QueryOptions options) in c:\prg\SolrNet\svn\SolrNet\Impl\SolrBasicServer.cs:line 83
at SolrNet.Impl.SolrServer`1.Query(String q, QueryOptions options) in c:\prg\SolrNet\svn\SolrNet\Impl\SolrServer.cs:line 78
at RSearch.Core.SearchIndex.Search(String term, Int32 page, Int32 pageSize) in D:\Development\LESA-LARIAT\LariatMapper\Core\SearchIndex.cs:line 153
at RSearch.Controllers.SearchController.Index(SearchInfo searchInfo) in D:\Development\LESA-LARIAT\LariatWeb\Controllers\SearchController.cs:line 16
at lambda_method(Closure , ControllerBase , Object[] )
at System.Web.Mvc.ActionMethodDispatcher.Execute(ControllerBase controller, Object[] parameters)
at System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters)
at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters)
at System.Web.Mvc.ControllerActionInvoker.<>c__DisplayClass15.
<InvokeActionMethodWithFilters>b__12()
at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter(IActionFilter filter, ActionExecutingContext preContext, Func`1 continuation)
My guess at why it does this is Solrnet doesn't understand the structure of results passed back to it so it throws this exception.
I would really like to be able to do this as I feel it's a little 'dirty' to use Solrnet to leverage all of Solr's features but have the grouping done via LINQ after the query is returned. If it is my only option I don't mind using it though.
Thank you for your help.

In short: no, there is no way with that combo of Solr/SolrNet versions. SolrNet 0.3.1 implemented field collapsing for a nightly build of Solr, when it just started implementing field collapsing (it wasn't called 'grouping' then). After that, field collapsing was completely overhauled in Solr (now called 'result grouping'), and the SolrNet 0.3.1 implementation was left obsolete.
Support for result grouping was added shortly after that and released with 0.4.0a1.
I recommend going with 0.4.0a1. It's not 'unsafe' as in 'unstable' at all:
I'm using nightly builds in production without any issues.
Download count for 0.4.0a1 is pretty high, almost as much as 0.3.1
SolrNet is developed using continuous integration and thoroughly tested.
As I explained in the release notes, 'alpha' to me means thoroughly unit-tested, but not API-frozen (API may slightly change in the GA release).
Doing grouping client-side (i.e. LINQ) is not really an option, as you have to fetch all the documents to do so properly. It's just like wanting to to pagination + sorting client-side with a relational database.
You may also be able to backport result grouping to 0.3.1, but IMHO is really pointless and a waste of time.

Plotting a word-cloud by date for a twitter search result? (using R)

I wish to search twitter for a word (let's say #google), and then be able to generate a tag cloud of the words used in twitts, but according to dates (for example, having a moving window of an hour, that moves by 10 minutes each time, and shows me how different words gotten more often used throughout the day).
I would appreciate any help on how to go about doing this regarding: resources for the information, code for the programming (R is the only language I am apt in using) and ideas on visualization. Questions:
How do I get the information?
In R, I found that the twitteR package has the searchTwitter command. But I don't know how big an "n" I can get from it. Also, It doesn't return the dates in which the twitt originated from.
I see here that I could get until 1500 twitts, but this requires me to do the parsing manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of twitts. Is it even possible to get them in retrospect?? (for example, asking older posts each time through the API URL ?) If not, there is the more general question of how to create a personal storage of twitts on your home computer? (a question which might be better left to another SO thread - although any insights from people here would be very interesting for me to read)
How to parse the information (in R)? I know that R has functions that could help from the rcurl and twitteR packages. But I don't know which, or how to use them. Any suggestions would be of help.
How to analyse? how to remove all the "not interesting" words? I found that the "tm" package in R has this example:
reuters <- tm_map(reuters, removeWords, stopwords("english"))
Would this do the trick? I should I do something else/more ?
Also, I imagine I would like to do that after cutting my dataset according to time (which will require some posix-like functions (which I am not exactly sure which would be needed here, or how to use it).
And lastly, there is the question of visualization. How do I create a tag cloud of the words? I found a solution for this here, any other suggestion/recommendations?
I believe I am asking a huge question here but I tried to break it to as many straightforward questions as possible. Any help will be welcomed!
Best,
Tal

Word/Tag cloud in R using "snippets" package
www.wordle.net
Using openNLP package you could pos-tag the tweets(pos=Part of speech) and then extract just the nouns, verbs or adjectives for visualization in a wordcloud.
Maybe you can query twitter and use the current system-time as a time-stamp, write to a local database and query again in increments of x secs/mins, etc.
There is historical data available at http://www.readwriteweb.com/archives/twitter_data_dump_infochimp_puts_1b_connections_up.php and http://www.wired.com/epicenter/2010/04/loc-google-twitter/

As for the plotting piece: I did a word cloud here: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ using the snippets package, my code is in there. I manually pulled out certain words. Check it out and let me know if you have more specific questions.

I note that this is an old question, and there are several solutions available via web search, but here's one answer (via http://blog.ouseful.info/2012/02/15/generating-twitter-wordclouds-in-r-prompted-by-an-open-learning-blogpost/):
require(twitteR)
searchTerm='#dev8d'
#Grab the tweets
rdmTweets <- searchTwitter(searchTerm, n=500)
#Use a handy helper function to put the tweets into a dataframe
tw.df=twListToDF(rdmTweets)
##Note: there are some handy, basic Twitter related functions here:
##https://github.com/matteoredaelli/twitter-r-utils
#For example:
RemoveAtPeople <- function(tweet) {
gsub("#\\w+", "", tweet)
}
#Then for example, remove #d names
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))
##Wordcloud - scripts available from various sources; I used:
#http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/
#Call with eg: tw.c=generateCorpus(tw.df$text)
generateCorpus= function(df,my.stopwords=c()){
#Install the textmining library
require(tm)
#The following is cribbed and seems to do what it says on the can
tw.corpus= Corpus(VectorSource(df))
# remove punctuation
tw.corpus = tm_map(tw.corpus, removePunctuation)
#normalise case
tw.corpus = tm_map(tw.corpus, tolower)
# remove stopwords
tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)
tw.corpus
}
wordcloud.generate=function(corpus,min.freq=3){
require(wordcloud)
doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
dm = as.matrix(doc.m)
# calculate the frequency of words
v = sort(rowSums(dm), decreasing=TRUE)
d = data.frame(word=names(v), freq=v)
#Generate the wordcloud
wc=wordcloud(d$word, d$freq, min.freq=min.freq)
wc
}
print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7))
##Generate an image file of the wordcloud
png('test.png', width=600,height=600)
wordcloud.generate(generateCorpus(tweets,'dev8d'),7)
dev.off()
#We could make it even easier if we hide away the tweet grabbing code. eg:
tweets.grabber=function(searchTerm,num=500){
require(twitteR)
rdmTweets = searchTwitter(searchTerm, n=num)
tw.df=twListToDF(rdmTweets)
as.vector(sapply(tw.df$text, RemoveAtPeople))
}
#Then we could do something like:
tweets=tweets.grabber('ukgc12')
wordcloud.generate(generateCorpus(tweets),3)

I would like to answer your question in making big word cloud.
What I did is
Use s0.tweet <- searchTwitter(KEYWORD,n=1500) for 7 days or more, such as THIS.
Combine them by this command :
rdmTweets = c(s0.tweet,s1.tweet,s2.tweet,s3.tweet,s4.tweet,s5.tweet,s6.tweet,s7.tweet)
The result:
This Square Cloud consists of about 9000 tweets.
Source: People voice about Lynas Malaysia through Twitter Analysis with R CloudStat
Hope it help!