Sorry for this basic question
I logging($this->log()) lot of things from all controllers in my live site. My tmp/logs folder contain around 1 gb of log files, whether this will reduce the site speed/ or slows site speed? we are removing log files once in a week from live server
Yes it will affect performance
If you have large log files, before writing to the log it's necessary to scan to the end of the log file - although this is not normally noticable it can with large files degrade performance.
It's for this reasons that tools like logrotate exist; and why it's a good idea to get in the habit of deleting/archiving your log files from time to time - or regularly if they are building up quickly.
the impact of writing log is pretty negligible (although I don't know how much you are logging). Besides, try to limit logging to debugging purpose, when you are done, remove it.
Related
all the attachments from issues are put into a #hashed directory. Each attachment creates its own hashed directory. Over time it can create a lot of sub-directories in a single directory. A project with 100K issues with an average 10 images each can lead to 1 million. This can be a performance bottleneck. How does Gitlab suggest self hosting users do for this. Thanks
Answer here https://forum.gitlab.com/t/filesystem-performance-issue-with-lots-of-sub-directories-for-attachments/76004
Basically quoting #gitlab-greg from the gitlab forum
Usually bottlenecks are caused by excessive read/write operations where input/output is the limiting factor. Attachments usually take up minimal storage space, so it’s actually much easier and more common for someone to hit max I/O by downloading/pushing a lot of Git Repo, LFS, or (Package|Container) Registry data in parallel. Alternatively, if storage space is filled up (90%+ full), you might see some performance issues.
I’ve not heard of any situations where the existence of high number of image uploads or a large amount of sub-directories in #hashed directory has caused a performance bottleneck in itself. GitLab uses a database to keep track of where issue attachments are stored, so it’s not like it’s crawling or looking through every directory in #hashed every time an upload or image is requested.
I’ve worked with GitLab admins who have (tens of) thousands of users and I’ve never seen subdirectories for attachments result in a performance bottleneck. If you find it does cause a bottleneck, please create an issue.
After getting answer from #Andrew Regan, I have edited my question and its explanation.
I want to serve html data to millions. And I came to know - its done by caching.
I knew HTML as files and now I have also read that HTML pages are stored in databases to serve.
Thus my question is, which of the following caching will be quicker
- Caching of HTML files in different folders and sub folders
or
- Caching of HTML data in database.
Even if this experiment is done on only single file / table record, which method would be faster? (No doubt the result for single file or record will be out in nano seconds, yet, which caching will happen faster? e.g. either one procedure would take 0.000000001 second and another procedure would take 0.000000002 seconds.
You haven't told us anything about your application's architecture, or your expected traffic, or whether you've considered existing frameworks for solving your problem. So this will be a very high-level view.
The answer is caches.
With static content you shouldn't need to expose the end user to the performance of either the filesystem or the database. You shouldn't want to anyway.
If you're simply serving up fixed, unchanging, static content, on a single server, the most effective option is to simply read the whole lot into a cache (ideally held in RAM, not disk) on startup and proceed from there without any additional loads or fetches. (Even better, you'd use a proven cache outside of your network.)
That should be extremely fast for the end-user. As far as performance on the machine is concerned, it shouldn't make a big difference how you seed the cache, whether from the filesystem or the database, though disk is perhaps easier to work with.
You can load data lazily into the cache if you prefer, or if you have changing content. It still won't really matter if you load it from disk or from a DB provided you cache aggressively enough. Set as long a TTL (lifetime) as you can to avoid unnecessary reloads.
I'm having a database on Sql server 2012 which size is 5 Gb. and getting larger daily.
I need to shrink that database.
My question is:-
Is it a good sign of shrinking the database weekly or monthly one. as running out of space. Are there any side effects like decreasing the performance.
Generally, shrinking is bad because it takes up a lot of resources and cost performance, then you'd likely need to take a look at statistics and fragmentation of indexes at the same time.
Also because a database rarely really can be made "smaller" (more data doesn't take up less space), and because if it has grown to a specific size, it's because the data takes up that amount of space.
So basically what you need to look at is why the database grows. Is it unintended growth? Is it data files which grow or log files?
If data files - do you store more data then you need?
If log files - how is your backup procedure and how do you handle the log file in that?
Shrinking files usually is treating a symptom that something is wrong, more than treating what is actually wrong.
So it is definitely not a good sign if it grows 'unexpected' or 'too much', and trying to find the cause would be the better route.
(Of course, real life scenarios exists where you sometimes do have to make the 'bad' choice, but well - this was just generally speaking :))
The only time a log file should be shrunk when there is some abnormal activities like increasing its size automatically. it is bad to shrink a log file regularly, see here to know why
Currently i'm using Zend_Cache_Backend_File for caching my project (especially responses from external web services). I was wandering if I could find some benefit in migrating the structure to Zend_Cache_Backend_Sqlite.
Possible advantages are:
File system is well-ordered (only 1 file in cache folder)
Removing expired entries should be quicker (my assumption, since zend wouldn't need to scan internal-metadatas for expiring date of each cache)
Possible disadvantages:
Finding record to read (with files zend check if file exists based on filename and should be a bit quicker) in term of speed.
I've tried to search a bit in internet but it seems that there are not a lot of discussion about the matter.
What do you think about it?
Thanks in advance.
I'd say, it depends on your application.
Switch shouldn't be hard. Just test both cases, and see which is the best for you. No benchmark is objective except your own.
Measuring just performance, Zend_Cache_Backend_Static is the fastest one.
One other disadvantage of Zend_Cache_Backend_File is that if you have a lot of cache files it could take your OS a long time to load a single one because it has to open and scan the entire cache directory each time. So say you have 10,000 cache files, try doing an ls shell command on the cache dir to see how long it takes to read in all the files and print the list. This same lag will translate to your app every time the cache needs to be accessed.
You can use the hashed_directory_level option to mitigate this issue a bit, but it only nests up to two directories deep, which may not be enough if you have a lot of cache files. I ran into this problem on a project, causing performance to actually degrade over time as the cache got bigger and bigger. We couldn't switch to Zend_Cache_Backend_Memcached because we needed tag functionality (not supported by Memcached). Switching to Zend_Cache_Backend_Sqlite is a good option to solve this performance degradation problem.
I want to log visits to my website with a high visits rate to file. How much writes to log file can I perform per second?
If you can't use Analytics, why wouldn't you use your webserver's existing logging system? If you are using a real webserver, it almost certainly as a logging mechanism that is already optimized for maximum throughput.
Your question is impossible to answer in all other respects. The number of possible writes is governed by hardware, operating system and contention from other running software.
Don't do that, use Google Analytics instead. You'd end up running into many problems trying to open files, write to them, close them, so on and so forth. Problems would arise when you overwrite data that hasn't yet been committed, etc.
If you need your own local solution (within a private network, etc) you can look into an option like AWStats which operates off of crawling through your log files.
Or just analyze the Apache access log files. For example with AWStats.
File writes are not expensive until you actually flush the data to disk. Usually your operating system will cache things aggressively so you can have very good write performance if you don't try to fsync() your data manually (but of course you might lose the latest log entries if there's a crash).
Another problem however is that file I/O is not necessarily thread-safe, and writing to the same file from multiple threads or processes (which will probably happen if we're talking about a Web app) might produce the wrong results: missing or duplicate or intermingled log lines, for example.
If your hard disk drive can write 40 MB/s and your log file lines are approx. 300 bytes in length, I'd assume that you can write 140000 HTTP requests per second to your logfile if you keep it open.
Anyway, you should not do that on your own, since most web servers already write to logfiles and they know very good how to do that, how to roll the files if a maximum limit is reached and how to format the log lines according to some well-known patterns.
File access is very expensive, especially when doing writes. I would recommend saving them to RAM (using whatever cache method suits you best) and periodically writing the results to disk.
You could also use a database for this. Something like:
UPDATE stats SET hits = hits + 1
Try out a couple different solutions, benchmark the performance, and implement whichever works fast enough with minimal resource usage.
If using Apache, I'd recommend using the rotatelogs utility supplied as a part of the standard kit.
We use this to allow rotating the server logs out on a daily basis without having to stop and start the server. N.B. Use the new "||" syntax when declaring the log directive.
The site I'm involved with is one of the largest on the Internet with hit rates peaking in the millions per second for extended periods of time.
Edit: I forgot to say that the site uses standard Apache logging directives and we have not needed to customise the Apache logging code at all.
Edit: BTW Unless you really need it, don't log bytes served as this causes all sorts of issues around the midnight boundary.
Let Apache do it; do the analysis work on the back-end.