What should I do with my pg_log?

What should I do with my pg_log? - database

I have two database server, one for main usage and the second is for standby. The log file is being recorded every day. About last week, when I found that my log file size is getting bigger and it used like 12GB disk space. I am wondering do I need to create a crontab schedule task like delete the old log like every 1 or two months. Will it affect my system or should I just do backup for it?
Another question is if I am doing streaming replication, the log file size of pg_log in standby server will be the same with primary server?

You may want to look at https://www.postgresql.org/docs/current/static/logfile-maintenance.html for some ideas of what you can do.
Personally, I think it's easiest to rotate logs cyclically, like what's described in this answer: https://dba.stackexchange.com/a/133443. This method also can help you find what you're looking for more quickly, since your logs are separated by day. Of course, if you rotate logs based on the day name and have new logs overwrite the old, you're limiting the amount of time you can go back in the logs, which can be detrimental, but if you're consistently having to go back more than a week to check logs, you might need to get a more proactive monitoring system in place to alert you when problems are happening.
As for your second question, I don't have a definitive answer, but I'd expect that the log file sizes will be similar, but probably not exactly the same. Different servers might be setup with different logging verbosity, and there may be some log messages unique to each server.

Related

Logging requests into database

Should I log requests info (client ip, request status code, execution time etc.) in my web app into the database to analyse users behavoir and arised errors? And what info log for better experience?

Its often tempting to log lots of information, however I usually find that when I come to use it to answer a question it's often the case that the wrong piece of information has been recorded or only partially. Or it has been recorded but has not been stored in a usable way and takes further programming to turn the log into meaningful information.
So I would start with the question of what you want to see/find and log accordingly. Generally then logging capability can be expanded in the future as new issues/insights are required.
remember every time you log something you are slowing your application down. You are also using more disk space, no one is going to thank you for buying more disk / longer backups just because you have logged everything on every action.
I guess I would follow a train of thought a bit like:
1) What are you trying to find, if its an error you can predict then why not cater for it in your code to start with. If its usability what format does the data need to be in at at what points should it be recorded.
2) How long do you need it for, be sure to purge the logs after a period to conserve disk space.
3) Every element stored is a performance hit, might be small but for high number of transactions it adds up.
4) Be wary of privacy rules, an IP address may be considered as identifiable data, in which case you need to publish a data privacy policy (see point 2).
5) Consider using a flag to control logging on or off. Then you can use it at times of a known issue but not record everything always when not needed.

Access database table keeps getting chinese characters where they don't belong

EDIT This is not a duplicate, as I do not have any Memo fields. I am also not grouping anything. The corruption is always found within the Prime table.
Lately, I am often getting a single line of data in my access 2010 db coming up with a load of chinese characters. It has happened before, but lately it is becoming a regular occurrence and I would very much like it to stop.
Here is what I have going on and what limitations I have.
Access split database. Multiple users. Users only have an *.accdr front end to work in, stored locally on their desktops (only about 6 users in total). They all use the access 2010 runtime, very very few have full MS access on their machines.
The Backend is stored on a large shared sitewide drive (or series of drives) that on all users machines is simply "G:" This drive, it should be noted, occasionally has issues like being too full. I have no means to put the backend on a dedicated machine, and other software is out of the question. IT support is offsite, and frankly they are about as clued in as AOL tech support was in the 90's.
Normal daily procedure is to load the output from another program into a merge table. This merge table is kept so we can spot changes and duplication. The merge table is then appended into the Prime table. Primary key in the Prime table prevents over writing existing information. The primary key is on 5 different columns in the prime table. Each column may have legitimate repeating values, but the combination of those values is unique. I have no pre-defined relationships. all relationships are shown on the query level. A backup of the data in the Prime table is done by creating an excel file once per day. I run Compact and Repair on the database every couple of weeks.
Every once in a long while, some hiccup in the universe, or data collision, or strange hard drive problem would cause a line in the Prime table to turn into chinese characters. When that happens, I check the backup excel file to make sure the corruption is not there. I then have everyone get out of the database. I run a Compact and repair, remove the offending line, C&R again, and get on with my day. This used to happen maybe once every 2 months.
Now I am getting this corruption on what seems to be an accelerating cycle. Once a week, 3 times a week, now it seems to be daily.
The changes in the front end made recently have all been form level stuff. Not anything in the queries themselves.
My boss won't accept the "Unusual sunspot and solar flare activity" excuse anymore.
What should I do to prevent this (within my limitations)?
Thanks in advance folks.
EDIT 2
The last few days we have been trying to systematically test various things to reproduce and isolate the corruption. I have an additional person who normally runs the daily update per my instructions. We reviewed the process and no problems or deviation. I have access to 4 different machines I can run the updates on so day one we used my daily use computer (Access 2013). Step by step checking for corruption. No corruption. Day 2 was on a machine that only has Access 2010, with same step by step process checks. No Corruption. Day 3 will be on my co-workers machine with the same step by step checks. I'll update as I go. I wonder if the problem could be machine specific.

After some careful testing, incorporating advice in all of the comments, we f determined that the problem most likely rests with the fact that the drive the DB resides in was getting full. The problem started when the drive was approaching about 90% of capacity. Since then, the Drive has been somewhat cleaned of old files, the drive is now at about 60% of capacity, and the corruption problems have gone away. We'll keep monitoring.
Thanks again for all the advice, and I hope this helps others in the future!

On the app engine development server why does the "Applying all pending transactions and saving the datastore" step take several seconds?

It seems like something that should take place in a flash, at least locally. Is there some time emulation of the production server that causes it?

It depends how much data is saved in your local datastore and how fast your disk is. I've noticed if you have a lot of deleted data in your datastore, the file still ends up being big.
If you want to clear the database, it's better to delete the whole file than delete all the individual entities.

What is the recommended way to build functionality similar to Stackoverflow's "Inbox"?

I have an asp.net-mvc website and people manage a list of projects. Based on some algorithm, I can tell if a project is out of date. When a user logs in, i want it to show the number of stale projects (similar to when i see a number of updates in the inbox).
The algorithm to calculate stale projects is kind of slow so if everytime a user logs in, i have to:
Run a query for all project where they are the owner
Run the IsStale() algorithm
Display the count where IsStale = true
My guess is that will be real slow. Also, on everything project write, i would have to recalculate the above to see if changed.
Another idea i had was to create a table and run a job everything minutes to calculate stale projects and store the latest count in this metrics table. Then just query that when users log in. The issue there is I still have to keep that table in sync and if it only recalcs once every minute, if people update projects, it won't change the value until after a minute.
Any idea for a fast, scalable way to support this inbox concept to alert users of number of items to review ??

The first step is always proper requirement analysis. Let's assume I'm a Project Manager. I log in to the system and it displays my only project as on time. A developer comes to my office an tells me there is a delay in his activity. I select the developer's activity and change its duration. The system still displays my project as on time, so I happily leave work.
How do you think I would feel if I receive a phone call at 3:00 AM from the client asking me for an explanation of why the project is no longer on time? Obviously, quite surprised, because the system didn't warn me in any way. Why did that happen? Because I had to wait 30 seconds (why not only 1 second?) for the next run of a scheduled job to update the project status.
That just can't be a solution. A warning must be sent immediately to the user, even if it takes 30 seconds to run the IsStale() process. Show the user a loading... image or anything else, but make sure the user has accurate data.
Now, regarding the implementation, nothing can be done to run away from the previous issue: you will have to run that process when something that affects some due date changes. However, what you can do is not unnecessarily run that process. For example, you mentioned that you could run it whenever the user logs in. What if 2 or more users log in and see the same project and don't change anything? It would be unnecessary to run the process twice.
Whatsmore, if you make sure the process is run when the user updates the project, you won't need to run the process at any other time. In conclusion, this schema has the following advantages and disadvantages compared to the "polling" solution:
Advantages
No scheduled job
No unneeded process runs (this is arguable because you could set a dirty flag on the project and only run it if it is true)
No unneeded queries of the dirty value
The user will always be informed of the current and real state of the project (which is by far, the most important item to address in any solution provided)
Disadvantages
If a user updates a project and then upates it again in a matter of seconds the process would be run twice (in the polling schema the process might not even be run once in that period, depending on the frequency it has been scheduled)
The user who updates the project will have to wait for the process to finish
Changing to how you implement the notification system in a similar way to StackOverflow, that's quite a different question. I guess you have a many-to-many relationship with users and projects. The simplest solution would be adding a single attribute to the relationship between those entities (the middle table):
Cardinalities: A user has many projects. A project has many users
That way when you run the process you should update each user's Has_pending_notifications with the new result. For example, if a user updates a project and it is no longer on time then you should set to true all users Has_pending_notifications field so that they're aware of the situation. Similarly, set it to false when the project is on time (I understand you just want to make sure the notifications are displayed when the project is no longer on time).
Taking StackOverflow's example, when a user reads a notification you should set the flag to false. Make sure you don't use timestamps to guess if a user has read a notification: logging in doesn't mean reading notifications.
Finally, if the notification itself is complex enough, you can move it away from the relationship between users and projects and go for something like this:
Cardinalities: A user has many projects. A project has many users. A user has many notifications. A notifications has one user. A project has many notifications. A notification has one project.
I hope something I've said has made sense, or give you some other better idea :)

You can do as follows:
To each user record add a datetime field sayng the last time the slow computation was done. Call it LastDate.
To each project add a boolean to say if it has to be listed. Call it: Selected
When you run the Slow procedure set you update the Selected fileds
Now when the user logs if LastDate is enough close to now you use the results of the last slow computation and just take all project with Selected true. Otherwise yourun again the slow computation.
The above procedure is optimal, becuase it re-compute the slow procedure ONLY IF ACTUALLY NEEDED, while running a procedure at fixed intervals of time...has the risk of wasting time because maybe the user will neber use the result of a computation.

Make a field "stale".
Run a SQL statement that updates stale=1 with all records where stale=0 AND (that algorithm returns true).
Then run a SQL statement that selects all records where stale=1.
The reason this will work fast is because SQL parsers, like PHP, shouldn't do the second half of the AND statement if the first half returns true, making it a very fast run through the whole list, checking all the records, trying to make them stale IF NOT already stale. If it's already stale, the algorithm won't be executed, saving you time. If it's not, the algorithm will be run to see if it's become stale, and then stale will be set to 1.
The second query then just returns all the stale records where stale=1.

You can do this:
In the database change the timestamp every time a project is accessed by the user.
When the user logs in, pull all their projects. Check the timestamp and compare it with with today's date, if it's older than n-days, add it to the stale list. I don't believe that comparing dates will result in any slow logic.

I think the fundamental questions need to be resolved before you think about databases and code. The primary of these is: "Why is IsStale() slow?"
From comments elsewhere it is clear that the concept that this is slow is non-negotiable. Is this computation out of your hands? Are the results resistant to caching? What level of change triggers the re-computation.
Having written scheduling systems in the past, there are two types of changes: those that can happen within the slack and those that cause cascading schedule changes. Likewise, there are two types of rebuilds: total and local. Total rebuilds are obvious; local rebuilds try to minimize "damage" to other scheduled resources.
Here is the crux of the matter: if you have total rebuild on every update, you could be looking at 30 minute lags from the time of the change to the time that the schedule is stable. (I'm basing this on my experience with an ERP system's rebuild time with a very complex workload).
If the reality of your system is that such tasks take 30 minutes, having a design goal of instant gratification for your users is contrary to the ground truth of the matter. However, you may be able to detect schedule inconsistency far faster than the rebuild. In that case you could show the user "schedule has been overrun, recomputing new end times" or something similar... but I suspect that if you have a lot of schedule changes being entered by different users at the same time the system would degrade into one continuous display of that notice. However, you at least gain the advantage that you could batch changes happening over a period of time for the next rebuild.
It is for this reason that most of the scheduling problems I have seen don't actually do real time re-computations. In the context of the ERP situation there is a schedule master who is responsible for the scheduling of the shop floor and any changes get funneled through them. The "master" schedule was regenerated prior to each shift (shifts were 12 hours, so twice a day) and during the shift delays were worked in via "local" modifications that did not shuffle the master schedule until the next 12 hour block.
In a much simpler situation (software design) the schedule was updated once a day in response to the day's progress reporting. Bad news was delivered during the next morning's scrum, along with the updated schedule.
Making a long story short, I'm thinking that perhaps this is an "unask the question" moment, where the assumption needs to be challenged. If the re-computation is large enough that continuous updates are impractical, then aligning expectations with reality is in order. Either the algorithm needs work (optimizing for local changes), the hardware farm needs expansion or the timing of expectations of "truth" needs to be recalibrated.
A more refined answer would frankly require more details than "just assume an expensive process" because the proper points of attack on that process are impossible to know.

What is the worst database accident that happened to you in production? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
For example: Updating all rows of the customer table because you forgot to add the where clause.
What was it like, realizing it and reporting it to your coworkers or customers?
What were the lessons learned?

I think my worst mistake was
truncate table Customers
truncate table Transactions
I didnt see what MSSQL server I was logged into, I wanted to clear my local copy out...The familiar "OH s**t" when it was taking significantly longer than about half a second to delete, my boss noticed I went visibily white, and asked what I just did. About half a mintue later, our site monitor went nuts and started emailing us saying the site was down.
Lesson learned? Never keep a connection open to live DB longer than absolutly needed.
Was only up till 4am restoring the data from the backups too! My boss felt sorry for me, and bought me dinner...

I work for a small e-commerce company, there's 2 developers and a DBA, me being one of the developers. I'm normally not in the habit of updating production data on the fly, if we have stored procedures we've changed we put them through source control and have an officially deployment routine setup.
Well anyways a user came to me needing an update done to our contact database, batch updating a bunch of facilities. So I wrote out the query in our test environment, something like
update facilities set address1 = '123 Fake Street'
where facilityid in (1, 2, 3)
Something like that. Ran it in test, 3 rows updated. Copied it to clipboard, pasted it in terminal services on our production sql box, ran it, watched in horror as it took 5 seconds to execute and updated 100000 rows. Somehow I copied the first line and not the second, and wasn't paying attention as I CTRL + V, CTRL + E'd.
My DBA, an older Greek gentleman, probably the grumpiest person I've met was not thrilled. Luckily we had a backup, and it didn't break any pages, luckily that field is only really for display purposes (and billing/shipping).
Lesson learned was pay attention to what you're copying and pasting, probably some others too.

A junior DBA meant to do:
delete from [table] where [condition]
Instead they typed:
delete [table] where [condition]
Which is valid T-Sql but basically ignores the where [condition] bit completely (at least it did back then on MSSQL 2000/97 - I forget which) and wipes the entire table.
That was fun :-/

About 7 years ago, I was generating a change script for a client's DB after working late. I had only changed stored procedures but when I generated the SQL I had "script dependent objects" checked. I ran it on my local machine and all appeared to work well. I ran it on the client's server and the script succeeded.
Then I loaded the web site and the site was empty. To my horror, the "script dependent objects" setting did a DROP TABLE for every table that my stored procedures touched.
I immediately called the lead dev and boss letting them know what happened and asking where the latest backup of the DB could be located. 2 other devs were conferenced in and the conclusion we came to was that no backup system was even in place and no data could be restored. The client lost their entire website's content and I was the root cause. The result was a $5000 credit given to our client.
For me it was a great lesson, and now I am super-cautious about running any change scripts, and backing up DBs first. I'm still with the same company today, and whenever the jokes come up about backups or database scripts someone always brings up the famous "DROP TABLE" incident.

Something to the effect of:
update email set processedTime=null,sentTime=null
on a production newsletter database, resending every email in the database.

I once managed to write an updating cursor that never exited. On a 2M+ row table. The locks just escalated and escalated until this 16-core, 8GB RAM (in 2002!) box actually ground to a halt (of the blue screen variety).

update Customers set ModifyUser = 'Terrapin'
I forgot the where clause - pretty innocent, but on a table with 5000+ customers, my name will be on every record for a while...
Lesson learned: use transaction commit and rollback!

We were trying to fix a busted node on an Oracle cluster.
The storage management module was having problems, so we clicked the un-install button with the intention of re-installing and copying the configuration over from another node.
Hmm, it turns out the un-install button applied to the entire cluster, so it cheerfully removed the storage management module from all the nodes in the system.
Causing every node in the production cluster to crash. And since none of the nodes had a storage manager, they wouldn't come up!
Here's an interesting fact about backups... the oldest backups get rotated off-site, and you know what your oldest files on a database are? The configuration files that got set up when the system was installed.
So we had to have the offsite people send a courier with that tape, and a couple of hours later we had everything reinstalled and running. Now we keep local copies of the installation and configuration files!

I thought I was working in the testing DB (which wasn't the case apparently), so when I finished 'testing' I run a script to reset all data back to the standard test data we use... ouch!
Luckily this happened on a database that had backups in place, so after figuring out I did something wrong we could easily bring back the original database.
However this incident did teach the company I worked for to realy seperate the production and the test environment.

I don't remember all the sql statements that ran out of control but I have one lesson learned - do it in a transaction if you can (beware of the big logfiles!).
In production, if you can, proceed the old fashioned way:
Use a maintenance window
Backup
Perform your change
verify
restore if something went wrong
Pretty uncool, but generally working and even possible to give this procedure to somebody else to run it during their night shift while you're getting your well deserved sleep :-)

I did exactly what you suggested. I updated all the rows in a table that held customer documents because I forgot to add the "where ID = 5" at the end. That was a mistake.
But I was smart and paranoid. I knew I would screw up one day. I had issued a "start transaction". I issued a rollback and then checked the table was OK.
It wasn't.
Lesson learned in production: despite the fact we like to use InnoDB tables in MySQL for many MANY reasons... be SURE you haven't managed to find one of the few MyISAM tables that doesn't respect transactions and you can't roll back on. Don't trust MySQL under any circumstances, and habitually issuing a "start transaction" is a good thing. Even in the worst case scenario (what happened here) it didn't hurt anything and it would have protected me on the InnoDB tables.
I had to restore the table from a backup. Luckily we have nightly backups, the data almost never changes, and the table is a few dozen rows so it was near instantaneous. For reference, no one knew that we still had non-InnoDB tables around, we thought we converted them all long ago. No one told me to look out for this gotcha, no one knew it was there. My boss would have done the same exact thing (if he had hit enter too early before typing the where clause too).

I discovered I didn't understand Oracle redo log files (terminology? it was a long time ago) and lost a weeks' trade data, which had to be manually re-keyed from paper tickets.
There was a silver lining - during the weekend I spent inputting, I learned a lot about the useability of my trade input screen, which improved dramatically thereafter.

Worst case scenario for most people is production data loss, but if they're not running nightly backups or replicating data to a DR site, then they deserve everything they get!
#Keith in T-SQL, isn't the FROM keyword optional for a DELETE? Both of those statements do exactly the same thing...

The worst thing that happened to me was that a Production server consume all the space in the HD. I was using SQL Server so I see the database files and see that the log was about 10 Gb so I decide to do what I always do when I want to trunc a Log file. I did a Detach the delete the log file and then attach again. Well I realize that if the log file is not close properly this procedure does not work. so I end up with a mdf file and no log file. Thankfully I went to the Microsoft site I get a way to restore the database as recovery and move to another database.

Updating all rows of the customer table because you forgot to add the where clause.
That was exactly i did :| . I had updated the password column for all users to a sample string i had typed onto the console. The worst part of it was i was accessing the production server and i was checking out some queries when i did this. My seniors then had to revert an old backup and had to field some calls from some really disgruntled customers. Ofcourse there is another time when i did use the delete statement, which i don't even want to talk about ;-)

I dropped the live database and deleted it.
Lesson learned: ensure you know your SQL - and make sure that you back up before you touch stuff.

This didn't happen to me, just a customer of ours whos mess I had to clean up.
They had a SQL server running on a RAID5 disk array - nice hotswap drives complete with lighted disk status indicators. Green = Good, Red = Bad.
One of their drives turned from green to red and the genius who was told to pull and replace the (Red) bad drive takes a (Green) good one out instead. Well this didn't quite manage to bring down the raid set completely - opting for the somewhat readable (Red) vs unavaliable (Green) for several minutes.. after realizing the mistake and swapping the drives back any data blocks that were written during this time became jyberish as disk synchronization was lost) ... 24-straight hours later writing meta programs to recover readable data and reconstruct a medium sized schema they were back up and running.
Morals of this story include...Never use RAID5, always maintain backups, careful who you hire.
I made a major mistake on a customers production system once -- luckily while wondering why the command was taking so long to execute realized what I had done and canceled it before the world came to an end.
Moral of this story include ... always start a new transaction before changing ANYTHING, test the results are what you expect and then and only then commit the transaction.
As a general observation many classes of rm -rf / type errors can be prevented by properly defining foreign key constraints on your schema and staying far away from any command labled 'CASCADE'

Truncate table T_DAT_STORE
T_DAT_STORE was the fact table of the department I work in. I think I was connected to the development database. Fortunately, we have a daily backup, which hasn't been used until that day, and the data was restored in six hours.
Since then I revise everything before a truncate, and periodically I ask for a backup restoration of minor tables only to check the backup is doing well (Backup isn't done by my department)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight