Solr optimize command status - solr

I have run the solr optimize command using update?optimize=true. Can any one pls tel me how to check the status of Solr optimize command? I am using Solr 3.5 Thanks in advance.

While the optimize is running, you can run the top command, type M to sort by memory usage, and watch the RES and SHR columns increase for the SOLR java process. Also keep and eye on Mem: free at the top of the screen. As long as RES and SHR are increasing, optimize is working. In fact the only thing that will stop it would be if Mem: free goes down to zero.
If that happens to you, rerun optimize with a LARGER number for maxSegments. For instance if 5 segments runs out of RAM, try 9. Then run again with 8, then again with 7, then try 5 again, and 3 and 1.

The easiest way to check the status of the index after an optimize, is to browse to http://<your instance & core>/admin/stats.jsp. This is also the same as clicking [Statistics] link off of the Solr Admin page.
If you look in the stats: section once on that page, typically after an optimize, the numDocs and maxDoc values will be the same as all pending deletes will have occurred. Also the reader value should show a value that contains segments=1 at the end. This will be the case as the optimize command will force the index to be merged into one section as explained below in this excerpt from the UpdateXmlMessages section for optimize in the Solr Wiki.
An optimize is like a hard commit except that it forces all of the
index segments to be merged into a single segment first.

Related

Netlogo headless exporting world at each steps

I wrote my model in the GUI and want to run it with repetitions in the headless mode in a cluster.
The model has a go command that is repeated until we reach a specified step (at each go procedure, the year variable is incremented and when we reach 2070, the model stop running). At the end of the go procedure, the world is exported (and analysed in R).
If I run multiple repetitions on parallel cores, how can I export the worlds so they have different names ?
So far, I export the world with the following lines (when running only one time) :
let file-name (word scenario "_NETLOGO_" year ".csv")
export-world (file-name)
But if the model is run at the same time on several cores, there will be overlap and I would not know which file is coming from which repetition (assuming that the name would change with an extra (1)). I thought about creating folders to save the worlds,is that possible ? Is so, how is it possible to pimp the folder name according to the number of repetitions ?

Restarting ODI 12c load plan persistently goes to the else step of a case step

I have a load plan in ODI 12c which goes into error at an 'else' step.
The structure of this problematic part of the plan is:
SERIAL step
1.1. RUN SCENARIO for refreshing a variable
1.2. CASE step
when value = 1 then run scenario X;
else run dummy scenario to break the load plan.
I fix the issue, restart the plan and expect after the variable is refreshed and its value is 1, to have scenario X run. But it still goes to the else clause. The SERIAL step is with 'Restart all children' option; the dummy scenario in the else is with 'Restart from new session' option. The variable step (1.1) is executed, the variable is refreshed. I also tried all other combinations for restart options for the SERIAL and for the ELSE scenario, but it still 'remembers' the initial path - going directly to the 'else'.
Any suggestions how to reach the scenario X step by restarting the plan? Any hidden setting I'm missing? I read the Oracle documentation all over again but not much help there.
a late comment from my side:
I do not understand, what do you mean by "1.1. RUN SCENARIO for refreshing a variable". Within an ODI loadplan you'll have two options for refreshing a (loadplan) variable and they have to be applied at LP step level:
a) Enable "Overwrite" and enter a literal value OR
b) Enable Refresh
Hope it helps.
Ralf

How to create a table of 5 GB in HBase for YCSB benchmarking?

I want to benchmark an HBase using YCSB. It's my first time using either.
I've gone through some online tutorials, and now I need to create a sample table of size 5 GB. But I don't know how to:
Batch-put a bunch of data into a table
Control the size to be around 5 GB
Could anyone give me some help on that?
Before, I've used HBase performance evaluation tool to load data into HBase. May be it can help you.
hbase org.apache.hadoop.hbase.PerformanceEvaluation
Various options are available for this tool. For your case you can set the data size to be 5GB.
This is pretty easy, the default (core) workload uses strings that are ~1KB each. So to get 5GB, just use 5,000,000 records.
You can do this by specifying the recordcount parameter in the command line, or creating your own workload file with this parameter inside.
Here's how you would do it on the command line (using the included workload workloada):
./bin/ycsb load hbase12 -P workloads/workloada -p recordcount=5000000
A custom file would look like this:
recordcount=5000000
operationcount=1000000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readproportion=0.8
updateproportion=0.2
scanproportion=0
insertproportion=0
And then you just run:
./bin/ycsb load hbase12 -P myWorkload
This will insert all the data into your database.

SSIS Fuzzy Lookup Transform not matching values with spaces in on and not in the other

I'm working on an SSIS project to transfer data from a legacy system to a new system. This is the first time I've used SSIS but so far all has gone well, until now.
I have to match product names from the old and new system. Some are clean matches and a standard lookup catches those. Some are not and I'm using a fuzzy lookup on the no match output of the main lookup to try and catch those afterwards. This works on some thing but what seems like the most obvious matches it completely misses. for example
Source data: FG 45J
Target data: FG45J
This is not matched by the fuzzy lookup. I've tried ticking and unticking the spaces delimiter box to no avail. The threshold is set to zero so everything gets through but similarity and confidence are zero on the relevant output records. Some others do return non zero similarity etc but they don't have spaces. Matches to return is set to one although I've tried setting this up to four to see if this made any difference and it didn't. I expect I've missed something but I can't work out what.
Any help would be greatly appreciated

How do you properly benchmark ColdFusion execution times?

1) What settings in the ColdFusion Administrator should be turned off/on?
2) What ColdFusion code should you use to properly benchmark execution time like getTickCount()?
3) What system information should you provide also like CF Engine, Version, Standard/Enterprise, DB, etc?
What we do is:
In Application.cfc's onRequestStart() -> set tick count value, add to REQUEST scope.
In Application.cfc's onRequestEnd() -> set tick count value, subtract first value from it to get total processing time in ms
We then have a set threshold (say 200ms) and if that threshold is reached we'll log a record in a database table
Typically we'll log the URL query string, the script name, the server name, etc.
This can give very useful information over time on how particular pages are performing. This can also be easily graphed so you can see if a page suddenly started taking 5000ms where before it was taking 300ms, and then you can check SVN to see what change did it :)
Hope that helps!
1) In CF administrator, in Debug Settings, you can turn on Enable Request Debugging Output, which outputs runtime and other debugging information at the bottom of every page. This can be useful if you want to see queries as well. If you want to use timers to you must select Timer Information in the Debug Settings(got hung on that for a hot minute).
2) You can use timers to have custom benchmarks of execution times. There are four types, inline, outside,comment or debug, each corresponding to where the output will be. In inline, it will create a little box around your code(if its a .cfm) and print the total runtime. The others will print in the bottom output that you turned on in CF admin.
3) I don't really know what you should provide. Wish I could help more. In my opinion the more information the better, so that what I would say :P
with respect to #mbseid's answer, request debugging adds a significant amount of processing time to any request, especially if you use CFCs. I would recommend you turn request debugging off and use getTickCount() at the top and bottom of the page and then take the difference to get the time to render that page. This will give you a much closer reflection of how the code will perform in production.

Resources