I try to make a search by Solr using a file txt specified in sourceLocation attribute of suggest searchComponet. I've used this example:
sample dict
hard disk hitachi
hard disk jjdd 3.0
hard disk wd 2.0
and make this query
host/solr/suggest?q=hard%20disk%20h&spellcheck=true&spellcheck.collate=true&spellcheck.build=true
but the response is
03build304hard disk
jjddhard disk wdhard disk
hitachi31011hard disk
jjddhard disk wdhard disk
hitachi hard disk jjdd disk
hard disk jjdd
I want to have only one result, hard disk hitachi.
If i write a query with param q=hard disk , i've the same result and in collation tag is put hard disk jjdd disk
it seems that search don't work on multi words
Can someone help me?
Related
I'm using TDengine 3.0. Now it is found that a large amount of 0000000.log is generated under /var/lib/taos/vnode/vnode2/wal/, which takes up a lot of space.
How should the log file be configured, and how should the file be cleaned up?
you could set WAL_RETENTION_PERIOD value to 0 then each WAL file is deleted immediately after its contents are written to disk. it would decrease the space immediately.
from https://docs.tdengine.com/taos-sql/database/
I have a basic Pentaho transformation in my job that reads 5,000 records from a stored procedure in SQL Server via a 'Table Input' step. This data has 5 columns one of which is an XML column. After the 'Table Input' a 'Text File Output' step is run which takes the path to save from one of the columns and the xml data to save as the only field provided in the fields tab. This then creates 5,000 XML files in the given location by streaming data from the 'Table Input' to 'Text File Output'.
When this job is executed it runs at 99-100% CPU utilization for the duration of the job and then drops back down to ~5-10% CPU utilization afterwards. Is there any way to control the CPU utilization either through Pentaho or command prompt? This is running on a Windows Server 2012 R2 machine with 4GB of RAM with a Intel Xeon CPU E5-2680 v2 # 2.8 GHz processor. I have seen that the memory usage can get controlled through Spoon.bat but haven't found anything online about controlling CPU usage.
In my experience, neither of those steps is CPU intensive under normal circumstances. Two causes I can think of are:
It's choking trying to format the XML. That would be fixed by checking the options Lazy conversion in the Table input step and Fast data dump (no formatting) in the text file output step. Then it should just stream the string data through.
The other is that you have huge XMLs and the CPU usage is actually garbage collection because Pentaho is running out of memory. Test this by increasing the maximum heap space (the -Xmx1024m option in the startup script.)
We talk about HDD with single NTFS partition with size about 650Gigabytes.
We've done following:
delete partition scheme i.e. 512Kilobytes from the beginning;
flush 50Gigabytes with \xff from the beginning during write test;
restore partition scheme i.e. load mbr backup.
The question: How can we restore NTFS in that case?
What we tried to do:
testdisk with deep search without any found NTFS.
Additional info:
NTFS Boot Sector |Master File Table | File System Data | Master File Table Copy
To prevent the MFT from becoming fragmented, NTFS reserves 12.5 percent of volume by default for exclusive use of the MFT.
50G > 12.5% * 650G, therefore we cleaned vital data for ntfs recovery capability.
Recently I downloaded a big (140 GB) tar file and it has an MD5 code to verify the downloaded version.
I used md5sum filename to generate MD5 code and compare it with the original one. But, it seems that I should wait for a long time.
Is there a faster way to generate MD5 code for a big file in Fedora?
If you're not using SSD, your hard drive will be only able to read at about 30M/s.
So for a 140 000MB file size, you have already something like 1h and a half just to read the file.
Now add that there is some process on your computer running, i guess that your "long time" can be something like 2 hours.
Unless switching of storage support for a faster one (SSD, USB), there's nothing much you can do.
Now if md5sum take 10h, i guess it's possible you can find better.
I am looking for the best optimized way we can use to transfer the large log files from local path to NFS path.
Here the log files will keep on changing dynamically with time.
What i am currently using is a java utility which will read the file from local path and will transfer it to NFS path. But this seems to be consuming high time.
We cant use copy commands, as the log file are getting appended with more new logs. So this will not work.
What i am looking for is .. Is there any way other than using a java utility which will transfer the details of log file from local path to NFS path.
Thanks in Advance !!
If your network speed is higher than log growing speed, you can just cp src dst.
If log grows too fast and you can't push that much data, but you only want to take current snapshot, I see three options:
Like you do now, read whole file into memory, as you do now, and then copy it to destination. With large log files it may result in very large memory footprint. Requires special utility or tmpfs.
Make a local copy of file, then move this copy to destination. Quite obvious. Requires you to have enough free space and increases storage device pressure. If temporary file is in tmpfs, this is exactly the same as first method, but doesn't requires special tools (still needs memory and large enough tmpfs).
Take current file size and copy only that amount of data, ignoring anything that will be appended during copying.
E.g.:
dd if=src.file of=/remote/dst.file bs=1 count=`stat -c '%s' src.file`
stat takes current file size, and then this dd is instructed to copy only that amount of bytes.
Due to low bs, for better performance you may want to combine it with another dd:
dd status=none bs=1 count=`stat -c '%s' src.file` | dd bs=1M of=/remote/dst.file