Solr File Descriptor Count - file

I have an Apache Solr 4.2.1 instance that has 4 cores of total size (625MB + 30MB + 20GB + 300MB) 21 GB.
It runs on a 4 Core CPU, 16GB RAM, 120GB HD, CentOS dedicated machine.
1st core is fully imported once a day.
2nd core is fully imported every two hours.
3rd core is delta imported every two hours.
4rth core is fully imported every two hours.
The server also has a decent amount of queries (search for and create, update and delete documents).
Every core has maxDocs: 100 and maxTime: 15000 for autoCommint and maxTime: 1000 for autoSoftCommit.
The System usage is:
Around 97% of 14.96 GB Physical Memory
0MB Swap Space
Around 94% of 4096 File Descriptor Count
From 60% to 90% of 1.21GB of JVM-Memory.
When I reboot the machine the File Descriptor Count fall to near 0 and then steadily over the course of on week or so it reaches the aforementioned value.
So, to conclude, my questions are:
Is 94% of 4096 File Descriptor Count normal?
How can I increase the maximum File Descriptor Count?
How can I calculate the theoretical optimal value for the maximum and used File Descriptor Count.
Will the File Descriptor Count reaches 100? If yes, the server will crash? Or it will keep it bellow 100% by itself and functions as it should?
Thanks a lot beforehand!

Sure.
ulimit -n <number>. See Increasing ulimit on CentOS.
There really isn't one - as many as needed depending on a lot of factors, such as your mergefactor (if you have many files, the number of open files will be large as well - this is especially true for the indices that aren't full imports. Check the number of files in your data directories and issue an optimize if the same index has become very fragmented and have a large mergefactor), number of searchers, other software running on the same server, etc.
It could. Yes (or at least it won't function properly, as it won't be able to open any new files). No. In practice you'll get a message about being unable to open a file with the message "Too many open files".

So, the issue with the File Descriptor Count (FDC) and to be more precise with the ever increasing FDC was that I was committing after every update!
I noticed that Solr wasn't deleting old transaction logs. Thus, after the period of one week FDC was maxing out and I was forced to reboot.
I stopped committing after every update and now my Solr stats are:
Around 55% of 14.96 GB Physical Memory
0MB Swap Space
Around 4% of 4096 File Descriptor Count
From 60% to 80% of 1.21GB of JVM-Memory.
Also, the old transaction logs are deleted by auto commit (soft & hard) and Solr has no more performance wornings!
So, as pointed very well in this article:
Understanding Transaction Logs, Soft Commit and Commit in SolrCloud
"Be very careful committing from the client! In fact, don’t do it."

Related

snowflake query still running after byte scanned is 100

This might be more of snowflake knowledge question rather than issue.
I am running a copy command from s3 to snowflake.
and i see it took 30 min to get 100 the byte scanned however even after having byte scanned to 100 percent it took another 40 min for the query to be completed.
Can somebody please explain whats going on here, as this way i feel it is tough to estimate how much time any running copy command might take while looking into the history screen.
Sounds like you're referring to the 100% in the Bytes Scanned column of the query profile. If you have transformations in your COPY INTO command this will take additional time to process. As others have mentioned, the size of the warehouse will have an impact as the warehouse size will determine the number of cores and threads, which directly affect the parallelism of the writes.
In short the Bytes Scanned is just a measure of the total data read by Snowflake that will be processed by the job, but it still needs to process the job.
We in the past we have found each xsmall can load 40mb/s from S3, and thus a small can load 2x. So that's our base line expectation of load speeds.
What can legitimately slow down a copy is if you are coping from the root of the bucket s3://buck_name/ but there are millions of file in the that directory, with only one new 100 byte file. But I suspect that is also not the case.
The next thing it might be is failure to run the query part, which in the profile would have multiple profile stage tabs of the likes of 1 \ 1001 \ 2002 which the increment on the stage in the thousands indicating the query failed to execute and that it was re-run. This sometimes can be due to the warehouse get corrupted, and sometime due to the new run-time of the current release failing, and the retry's can be running on older releases to see if those succeed. But there are often clues to some of this, with time being seen "spilling to internal/external storage" is something we have seen when bugs occurs.
But in reality if things are seeming "really" strange, I would open a support ticket, and ask for an explanation of what is happening. With the usual, this is what I am seeing, this is why I think it's strange..

Method Replication in Postgresql 9.5

Hy guys . help me.
What is Difference wal level logical, hot_standby, minimal, logical and what is wal segment whether need to use big segment or not?
now i use Wal_segment : 50.
why after I try insert 5 million row segment archive over 50?
The parameter wal_level determines how much information is written to the transaction log (write-ahead log, short WAL).
The settings in decreasing order of amount of WAL emitted:
For logical replication or logical decoding, you need logical.
To run physical replication with hot standby, you need hot_standby.
To archive WAL files with archive_mode = on, you need archive.
The minimal level logs only the information required for crash recovery.
Note that from PostgreSQL 9.6 on, archive and hot_standby have been deprecated and replaced with the new setting replica.
A WAL segment is one 16 MB transaction log file as seen in pg_xlog or pg_wal.
I guess that with wal_segment you mean the parameter checkpoint_segments (max_wal_size since 9.5).
It does not pose an absolute limit on the number of WAL segments, it determines after how much WAL a checkpoint will be forced. If your archive_command is slow, WAL may pile up.

Memory limit for a single instance exceeded during start of appengine-mapreduce job

I'm trying to use appengine-mapreduce to prepare data for loading into BigQuery and am running into a memory limit. I'm using CloudStorage, so perhaps the accepted response in this question does not apply.
What I'm seeing is that a single VM instance, which appears to be the coordinator for the overall mapper task, is exceeding the ~ 1GB allocated to it and then being killed before any workers start. In this screenshot, there are three instances, and only the top one is growing in memory:
In several earlier attempts, there were as many as twelve instances, and all but one were well under the memory limit, and only one was reaching the limit and being killed. This suggests to me that there's not a general memory leak (as might be addressed with Guido van Rossum's suggestion in the earlier question to periodically call gc.collect()), but instead a mismatch between the size and number of files being processed and working assumptions about file size and count in the appengine-mapreduce code.
In the above example, twelve .zip files are being handed to the job for processing. The smallest .zip file is about 12 MB compressed, and the largest is 45 MB compressed. Following is the configuration I'm passing to the MapperPipeline:
output = yield mapreduce_pipeline.MapperPipeline(
info.job_name,
info.mapper_function,
'mapreduce.input_readers.FileInputReader',
output_writer_spec='mapreduce.output_writers.FileOutputWriter',
params={
'input_reader': {
'files': gs_paths,
'format': 'zip[lines]',
},
'output_writer': {
'filesystem': 'gs',
'gs_bucket_name': info.results_dirname,
'output_sharding': 'none',
},
},
shards=info.shard_count)
The value of shard_count is 16 in this case.
Is this a familiar situation? Is there something straightforward I can do to avoid hitting the 1GB memory limit? (It is possible that the poster of this question, which has gone unanswered, was running into a similar issue.)
I was able to get over the first hurdle, where the coordinator was being killed because it was running out of memory, by using files on the order of ~ 1-6MB and upping the shard count to 96 shards. Here are some things I learned after that:
The shard count is not roughly equivalent to the number of instances, although it may be equivalent to the number of workers. At a later point, where I had up to 200+ shards, approx. 30 instances were spun up.
Memory management was not so relevant when the coordinator was being killed, but it was later on, when secondary instances were running out of memory.
If you call gc.collect() too often, there is a big degradation in throughput. If you call it too little, instances will be killed.
As I had guessed, there appears to be a complex relationship between the number of files to be processed, the individual file size, the number of shards that are specified, how often garbage collection takes place, and the maximum available memory on an instance, which all must be compatible in order to avoid running into memory limits.
The AppEngine infrastructure appears to go through periods of high utilization, where what was working before starts to fail due to HTTP timeouts in parts of the stack that are handled by appengine-mapreduce.
The memory-related tuning seems to be specific to the amount of data that one is uploading. I'm pessimistic that it's going to be straightforward to work out a general approach for all of the different tables that I'm needing to process, each of which has a different aggregate size. The most I've been able to process so far is 140 MB compressed (perhaps 1-2 GB uncompressed).
For anyone attempting this, here are some of the numbers I kept track of:
Chunk Shards Total MB MB/shard Time Completed? Total memory Instances Comments
daily 96 83.15 0.87 9:45 Yes ? ?
daily 96 121 1.2 12.35 Yes ? ?
daily 96 140 1.5 8.36 Yes 1200 MB (s. 400 MB) 24
daily 180 236 1.3 - No 1200 MB ?
monthly 32 12 0.38 5:46 Yes Worker killed 4
monthly 110 140 1.3 - No Workers killed 4 Better memory management of workers needed.
monthly 110 140 1.3 - No 8 Memory was better, but throughput became unworkable.
monthly 32 140 4.4 - No Coordinator killed -
monthly 64 140 2.18 - No Workers killed -
daily 180 236 1.3 - No - - HTTP timeouts during startup
daily 96 140 1.5 - No - - HTTP timeouts during startup

How to read/write at Maximum Speed from Hard Disk.Multi Threaded program I coded cannot go above 15 mb/ sec

I have a 5 gb 256 Files in csv which I need to read at optimum speed and then write back
data in Binary form .
I made following arrangments to achieve it :-
For each file, there is one corresponding thread.
Am using C function fscanf,fwrite.
But in Resource Monitor,it shows not more then 12 MB/ Sec of Hard Disk and 100 % Acitve Highest Time.
Google says HardDisk can read/write till 100 MB/Sec.
Machine Configuration is :-
Intel i7 Core 3.4. Has 8 Cores.
Please give me your prespective.
My aim to complete this process within 1 Min .
** Using One Thread it took me 12 Mins**
If all the files reside on the same disk, using multiple threads is likely to be counter-productive. If you read from many files in parallel, the HDD heads will keep moving back and forth between different areas of the disk, drastically reducing throughput.
I would measure how long it takes a built-in OS utility to read the files (on Unix, something like dd or cat into /dev/null) and then use that as a baseline, bearing in mind that you also need to write stuff back. Writing can be costly both in terms of throughput and seek times.
I would then come up with a single-threaded implementation that reads and writes data in large chunks, and see whether I can get it to perform similarly the OS tools.
P.S. If you have 5GB of data and your HDD's top raw throughput is 100MB, and you also need to write the converted data back onto the same disk, you goal of 1 minute is not realistic.

WAL sequence number infinite?

I am wondering if database WAL sequences are infinite? I guess most WAL records have a fix size for the WAL number? Is this a really big number that is so big that it just won't reach an end? This might be quite a waste of space? Or have the big DB-player invented a better method?
Or do they implement some logic to let the WAl start at 0 again? That might have heavy impact on many spots in the code...?
EDIT:
Impact: E.g. the recovery after a crash relies on the sequence number getting bigger along the timeline. If the sequence could start over the recovery could get confused.
Term WAL sequence number: WAL (Write Ahead Log a.k.a the transactional log that is guranteed to be on your disk before the application layer received that a transaction was successful). This log has a growing number to keep the database consitent e.g. in case of recovery by checking the WAL sequence number from the pages against the sequence number from the WAL.
I would not assume that every database implements the same strategy.
Speaking only for Oracle, the SCN (system change number) is a 48-bit number so an Oracle database can handle nearly 300 trillion transactions before hitting the limit. Realistically, that will take eons. Even if you could do 1 thousand transactions per second, the SCN wouldn't hit the limit for 300 billion seconds or roughly 9500 years. Now, there are various things that can cause the SCN to increment in addition to just doing transactions (famously the recent issue with hot backups and database links that caused a few users to exceed the database's checks for the reasonability of the SCN) so it won't really take 9500 years to hit the limit. But, realistically, it gives Oracle plenty of time to move to a 64-bit SCN some years down the line, buying everyone a few more centuries of functionality.
Like SQL Server, DB2 calls that counter a Log Sequence Number (LSN). IBM recently expanded the size of their LSN from six bytes to eight bytes, unsigned. The LSN is an ever-growing pointer that shows where in the log files that specific log record can be found. An eight byte LSN means that a DB2 database can write nearly 16 exbibytes of log records before running out of address space, at which point the contents of the database must be unloaded and copied into a new database.
Postgres WAL numbers technically can overflow, but only after writing 32 Eb of data, as the WAL file pointer is 64-bit
See What is an LSN: Log Sequence Number. The article describes the structure of the SQL Server LSN (the WAL number) and shows you how to decode one. Since LSNs are of fixed size and they don't roll over, it follows that you can run out of them. It will take a very very long time though.
In PostgreSQL, WAL logs are stored as a set of segment files. "Segment files are given ever-increasing numbers as names, starting at 000000010000000000000000." The number doesn't wrap around.

Resources