Writing log files using Java EE - file

I need to create application logs to capture users signing in/out and their requests, for that.
We're using Java EE, and thought that creating new log files (new txt file for each day) would be a good approach, but I see that people discourage doing that, the question is: why not do it that way, and what is the correct approach?
also - is there some way to get the application directory?

log4j is one of the popularly used logger for Java EE applications and the others are slf4j,logback
log4j has many features, one them being able to create daily log files.
and to answer your question,
creating daily log files does not cause any harm to your application.

Logging to text files and rolling them daily is quite a normal approach and discouraging it per se is not justified.
For some specific uses it may be improper, for example if you log sensitive data (passwords, card numbers, etc.). There may be also issues with some cluster configurations, but then you have to ask a more detailed question.

Log4J works fine, but once you have many different Applications logging to many different Log files in different locations, you encounter the problem of having to search many log files to find the trace of certain Transaction.
One colleague recommended GrayLog2 once, which makes the viewing of the Log Files a lot easier.
You might want to take a look at that as well, depending on how many Log files your planning to keep.
http://graylog2.org/about

Related

How to deal with issues when storing uploaded files in the file system for a web app?

I am building a web application where the users can create reports and then upload some images for the created reports. Those images will be rendered in the browser when the user clicks a button on the report page. The images are confidential and only authorized users will be able to access them.
I am aware of the pros and cons of storing images in database, in filesystem or a service like amazon S3. For my application, I am inclined to keep the images in the filesystem and paths of the images in the database. That means I have to deal with the problems arising around distributed transaction management. I need some advice on how to deal with these problems.
1- I believe one of the proper solutions is to use technologies like JTA and XADisk. I am not very knowledgeable about these technologies but I believe 2 phase commit is how automicity is achieved. I am using MySQL as the database, and it seems like 2 phase commit is supported by MySQL. Problem with this approach is XADisk does not seem to be an active project and there is not much documentation about it and there is the fact that I am not very knowlegable about the ins and outs of this approach. I am not sure if I should invest in this approach.
2- I believe I can get away with some of the problems arising from the violation of ACID properties for my application. While uploading images, I can first write the files to disk, if this operation succeeds I can update the paths in the database. If database transaction fails, I can delete the files from the disk. I know that is still not bulletproof; an electricity shortage might occur just after the db transaction or the disk might not be responsive for a while etc...I know there are also concurrency issues, for instance if one user tries to modify the uploaded image and another tries to delete it at the same time, there will be some problems. Still the chances for concurrent updates in my application will be relatively low.
I believe I can live with orphan files on the disk or orphan image paths on the db if such exceptional cases occur. If a file path exists in db and not in the file system, I can show a notification to the user on report page and he might try to reupload the image. Orphan files in the file system would not be too much problem, I might run a process to detect such files time to time. Still, I am not very comfortable with this approach.
3- The last option might be to not store file paths in the db at all. I can structure the filesystem such that I can infer the file path in code and load all images at once. For instance, I can create a folder with the name of report id for each report. When a request has been made to load images of the report, I can load the images at once since I know the report id. That might end up with huge number of folders in the filesystem and I am not sure if such a design is acceptable. Concurrency issues will still exist in this scheme.
I would appreciate some advice on which approach I should follow.
I believe you are trying to be ultra-correct, and maybe not that much is needed, but I also faced some similar situation some time ago and explored also different possibilities. I disliked options aligned to your option 1, but about the 2 and 3, I had different successful approaches.
Let's sum up first the list of concerns:
You want the file to be saved
You want the file path to be linked to the corresponding entity (i.e the report)
You don't want a file path to be linked to a file that doesn't exist
You don't want files in the filesystem not linked to any report
And the different approaches:
1. Using DB
You can assure transactions in the DB pretty much with any relational database, and with S3 you can ensure read-after-write consistency for both new objects and upload of new objects. If you PUT an object and you get a 200 OK, it will be readable. Now, how to put all this together? You need to keep track of the process. I can figure 2 ways:
1.1 With a progress table
The upload request is saved to a table with anything need to identify this file, report id, temp uploaded file path, destination path, and a status column
You save the file
If the file safe fails you can update the record in the table, or delete it
If saving the file is successful, in a transaction:
update the progress table with successful status
update the table where you actually save the relationship report-image
Have a cron, but not checking the filesystem, but checking the process table. If there is any file in the filesystem that is orphan, definitely it had been added to the table (it was point 1). Here you can decide if you will delete the file, or if you have enough info, you can continue with the aborted process triggering the point 4.
The same report-image relationship table with some extra status columns.
1.2 With a queue system
Like RabbitMQ, SQS, AMQ, etc
A very similar approach could be done with any queue system instead of a db table. I wont give much details because it depends more on your real infrastructure, but just the general idea.
The upload request goes to a queue, you send a message with anything you may need to identify this file, report id, and if you want a tentative final path.
You upload the file
A worker reads pending messages in the queue and does the work. The message is marked as consumed only when everything goes well.
If something fails, naturally the message will come back to the queue
In the next time a message is read, the worker can have enough info to see if there is work to resume, or even a file to delete if resuming is not possible
In both cases, concurrency problems wont be straightforward to manage, but can be managed (relying on DB locks in fist case, and FIFO queues in second cases) but always with some application logic
2. Without DB
To some extent a system without a database would be perfectly acceptable, if we can defend it as a proper convention over configuration design.
You have to deal with 3 things:
Save files
Read files
Make sure that the structure of the filesystem is manageable
Lets start with 3:
Folder structure
In general, something like one folder for report id will be too simple, and maybe hard to maintain, and also ultimately too plain. This will cause issues, because if we have a folder images with one folder per report, and tomorrow you have less say 200k reports, the images folder will have 200k elements, and even an ls will take too much time, same for any programing language trying to access. That will kill you
You can think about something more sophisticated. Personally like a way that I learnt from Magento 1 more than 10 years ago and I used a lot since then: Using a folder structure following first outside rules, but extended with rules derived extended with the file name itself.
We want to save a product image. The image name is: myproduct.jpg
first rule is: for product images i use /media/catalog/product
then, to avoid many images in the same one, i create one folder per every letter of the image name, up to some number of letters. Lets say 3. So my final folder will be something like /media/catalog/product/m/y/p/myproduct.jpg
like this, it is clear where to save any new image. You can do something similar using your reports id, categories, or anything that makes sense for you. The final objective is to avoid too flat structure, and to create a tree that makes sense to you, and also that can be automatized easily.
And that takes us to the next part:
Read and write.
I implemented a similar system before quite successfully. It allowed me to save files easy, and to retrieve them easily, with locations that were purely dynamic. The parts here were:
S3 (but you can do with any filesystem)
A small microservice acting as a proxy for both read and write.
Some namespace system and attached logic.
The logic is quite simple. The namespace lets me know where the file will be saved. For example, the namespace can be companyname/reports/images.
Lets say a develop a microservice for read and write:
For saving a file, it receives:
namespace
entity id (ie you report)
file to upload
And it will do:
based on the rules I have for that namespace, and the id and file name will save the file in this folder
it doesn't return the physical location. That remains unknown to the client.
Then, for reading, clients will use a URL that uses also convention. For example you can have something like
https://myservice.com/{NAMESPACE}/{entity_id}
And based on the logic, the microservice will know where to find that in the storage and return the image.
If you have more than one image per report, you can do different things, such as:
- you may want to have a third slug in the path such as https://myservice.com/{NAMESPACE}/{entity_id}/1 https://myservice.com/{NAMESPACE}/{entity_id}/2 etc...
- if it is for your internal application usage, you can have one endpoint that returns the list of all eligible images, lets say https://myservice.com/{NAMESPACE}/{entity_id} returns an array with all image urls
How I implemented this was with quite simple yml config to define the logic, and very simple code reading that config. That allowed me to have a lot of flexibility. For example save reports in total different paths or servers or s3 buckets if they belong to different companies or are different report types

Using Tessy with Subversion

I have an embedded C project which uses subversion for source control. I want to use Tessy for unit testing and have these tests archived in subversion too. However, it generates many small files which will make analysing diffs for the actual source code changes a real pain. Trying to actually look at the source changes when there are hundreds of Tessy related files changed will make it impossible.
Does anyone know if there is a setting to have these stored in a less problematic format or any suggestions for a viable solution? What would be ideal is if it could store everything as, for example, an xml file - this would make browsing directory diffs easier and would allow the actual content to be human readable as well.
Any ideas?
I know this is an old question ...
Does anyone know if there is a setting to have these stored in a less problematic format or any suggestions for a viable solution?
The TESSY recommended way is to do utilize the database save feature found under in File menu (and in a variety of right-click menu's). This creates a binary .tmb file which contains everything related to your tests. By default the .tmb file is stored in the backup directory in your Tessy Project folder. The config folder, backup folder and the PDBX file would then all be stored in SVN. See the Tessy Users Manual (Backup, restore, version control chapter) for more specifics.
What would be ideal is if it could store everything as, for example, an xml file - this would make browsing directory diffs easier and would allow the actual content to be human readable as well.
That would be ideal, but unfortunately is not really an option. Having everything stored as a binary file makes it impossible to do a useful diff. The other problem with this method is that it disconnects a change to the test from the file that is checked into SVN - unless the tester specifically performs a database save.
Yes I realize xUnit testing frameworks don't have those limitations, but Tessy has some features (like MCDC and DO178B support) that the xUnit frameworks do not have baked in.
So how do you work in this environment. Key word - Discipline.
We set up internal procedures for who and how tests gets updated. When the proecedures are followed we are able to deal with the limitations presented above. It is not optimal, but with some internal discipline it can work.

What can I do with generated error logs?

I'm currently working on a web application which generates daily error (and non error) logs.
The current system outputs a log per task to a text file, and outputs critical errors as well as "start" and "finish" type messages to an email account.
The current workflow is as follows: scour the email box for errors, then go and find the .txt file to look at the associated errors and find the cause.
There are around 30 txt files split across about 5 servers.
This system was set up before me, but I'm looking for any advice on how to deal with the situation.
I have control of the script forming the error logs so can do pretty much anything - but I'm lost where to start: I'd considered some kind of web facing dashboard tool, maybe output the files to RSS or something?
Are there any external or internal tools I should be using?
Of course you may use the SQL Server Reporting Services or review this comparison table, there are some packages which may support SQL Server but they may be overwhelming for your task.
It's not really clear what your problem is or what you want to do, but if I understand correctly, your biggest problem is that some messages are logged to a log file but others are sent by email. Therefore, there is no single location that has all error messages in it and that makes analysis and troubleshooting difficult.
The best solution would be to use a logging framework that supports multiple logging destinations (file, DB, email) and severities. That would allow you to specify a configuration like "all errors are logged to a text file and critical ones are also sent by email", so you can ensure that you have everything in one place for general analysis but critical errors are also handled with priority.
You didn't mention what programming language you use, but assuming it's .NET-based then log4net and Enterprise Library are two common frameworks and there are many questions about them here on SO. Googling should give you a good idea of the pros and cons for your situation. If you're using a different language then you can look for the equivalent package: log4j (Java), logging (Python) etc.

Where to create/keep secret files for license information/trials on Windows/Mac OS X/Linux? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm writing a commercial product which uses a simple registration mechanism and allows the user to use the application for a demo period before purchasing.
My application must somewhere store the registration information (if entered) and/or the date of the first launch to calculate if the user is still within the demo/trial period. While I'm pretty much finished with the registration mechanism itself, I now have to find a good way to store the registration information on the user's disk.
The most obvious idea would be to store the trial period in the preferences file, but since user tend to delete/tinker with those from time to time, it might be a good idea to keep the registration information in a separate, more hidden file.
So here's my question: What is the best place/strategy to keep and create such hidden files on Windows, Mac OS X and Linux? Here is what came to my mind so far:
Linux/Mac OS X
Most Unix-like systems are rather locked down when it comes to places a user can write files to. In most cases this is only the /tmp directory and the user's home directory.
I guess the easiest here is probably to create a file with a dot-prefix to make it less visible, then give it a name that won't make it obvious that it's associated with my application.
Windows
Probably much like Linux/Mac OS X - more recent Windows versions become more restrictive when it comes to file system permissions.
Anyway, I'd like to hear your ideas and thoughts. Even better if you have already implemented something similar in the past.
Thanks!
Update
For me the places for such files is more relevant than the discussion of the question if this way for copy protection is good or bad.
Who cares where you put the file. Its the contents you want to protect.
On the server side, encrypt/sign the user info with a private key and distribute it the user. Email a license file, have the application connect and download it, whatever.
In your application, include the public key. If you cant authenticate/decrypt the file, fail. If you can, continue to function. You only ever need to reconnect to the server if you can't authenticate the license file. You only need the most primitive "license server" to support this. If you email the file, the "license server" is just a script that encrypts a string and emails the user.
Nothing will protect you from sophisticated attempts to hack your application. But this solution will deny casual users the ability to break your license.
And if you want to prevent the user from re-registering multiple times or sharing the license file with their friends, record their MAC address server side and in the license file. Personally, I wouldn't do this. And it won't stop sophisticated hackers, but its up to you to decide how much time to spend in the cat and mouse game.
For Windows, you might try using Isolated Storage, which will store a file in a product unique location, which is usually sufficiently obscure (and has quite a deep path), and has the advantage of being completely transparent to the developer.
POSIX systems should put app data in a hidden file in the user's home directory. Windows systems should put something under CSIDL_APPDATA.
To be honest, no matter what you do, you will be found out. If your system is self-contained, that is it does not require to be connected to the Internet or some other device at run time, then both your lock and key must be in your code or the data you write to disk. So while you can obfuscate the key (and may be even the lock), the owner of the system can invoke system trace tools or whatever to find you out. But I guess you knew that. Every major software vendor has tried various methods to make this work, but are broken every time.
I think your only real hope is to have your software phone home regularly to see if it still has a valid license.
To illustrate the problems with this approach, there was a Linux-based media server that stored its free trial timestamp in /usr/bin/.tv. It only takes an strace for someone to realise that file is being accessed - in this case, simply deleting the file restarted the trial.
If you are a single developer you will have to spend a lot of money and/or time implementing a protection scheme that only needs to be cracked by one person to be available for everyone. Of course, your aim may only be to deter casual software pirates, in which case even the most basic protection (such as the method described above) will do the job.
Specifically on the Mac, this sort of file should live in ~/Library/Application Support/YourAppName if a user licence, or /Library/Application Support/YourAppName for a machine licence.
When a user licences my app I write the file to ~/Library/Application Support/MyAppName, as that requires no special permissions, but try to read it from both locations to allow for a machine licence if I ever create one.
Use the registry for the windows version.
It is build for keeping data in a central place, and as an added bonus, if the user deletes your entire folder, the settings ar still sitting in the register(*)
here on stackoverflow is an article describing how to access the registsry using the Java programming language.
I don't think Mac has something like this, and I know Linux certainly doesn't have it, but it is a start.
(*) the register is of course also not safe for tinkering users who can easily delete the keys belonging to your app.

Creating the Front End MDE

I created a database for tracking metrics, with some automation tricks (email, .doc,.ppt presentations, etc) with a very large Main-table, and lots of forms/GUI. This is the first time I have ever I worried about an MDE/front-end for the thing. So if you would be so kind to answer a few questions, or offer any advice, it would be greatly appreciated (I would hate for all this work to not be utilized).
What is the first thing I need to do? It the 2000 version that must be converted to 03 to create the MDE, but does that get done before I use the database splitter?
Will the amount of objects in the database effect the ability to do this? I have something like 80 forms, 70 queries, 20+ macros, 12 tables, etc...but does the amount of objects prevent some of this from working well once the front end is there?
when i split the database, can I continue to work/make changes and such on the "back end", and have those changes directly effect the front end?
These may be some basic questions, but I don't know the answer so.....Thanks!
Here is my 2 ยข.
Question 1 - I have never used the database splitter as I feel I have more control doing it manually. If you do it manually you can do it to a version that does not have a database splitter. But if you do use the splitter then--yes--you will have to upgrade to a version that has a splitter before doing it.
To do it manually here are the steps.
Backup everything.
Create a copy of your file into the same directory. So if you have an MyApp.MDB create a copy into the same directory with a new name, such as MyAppDATA.mdb.
Open the new DATA file (MyAppDATA.mdb) and delete all of the objects EXCEPT the TABLES.
Open the App file (MyApp.mdb) and delete all of the tables.
Also in MyApp.mdb...go to the File/Get External Data/Link Tables menu to link the tables in MyAppDATA.mdb to MyApp.mdb. Select All and create the links.
That should do it. And if you screw up you made a backup...right?
A couple of tips and gotchas...be sure that you go to Tools/Options and that you are NOT showing System and Hidden tables. You just don't want to delete system tables from MyApp. Another way to do it is do NOT delete tables that start with MSys or USys.
Question 2 - Does not matter how many object you have. In fact you don't have that many objects anyway.
Question 3 - Yes...you will make backend changes in MyAppData.mdb and when you open MyApp.mdb those changes will auto-magically be there to see and query against etc. (In the query designer you may need to save/close/reopen to see new fields if you made the mod while in the query). The EXCEPTION to that is New Tables You will have to use the File/Get External Data/Link Tables option to create links to new tables.
One thing to remember (and that I hope you already realize) is that the one downside of splitting the database is that when you deploy the front end file that usually the relative path to the data will vary from machine to machine and there is no automatic re-linking of tables in access. If your target clients have full access you can always use Tools/Database Utilities/Linked Table Manager to refresh the links to the right location. If you can't do that then you will have to do one of the following:
1. Write code that does the automatic re-linking for you. Basically it will check the links...if invalid it will prompt the user for the data location (or look it up in an INI file) and re-link the tables.
2. Always deploy your app to the same location on all machines. If you have commercial visions for your application this won't work...I mention it for academic reasons. It might be doable for a limited deployment where you have a lot of control over file placement on each machine.
3. Put the Data file (MyAppDATA.mdb) onto a network share and link the table across the network using a drive mapping or UNC (\myserver\mydata\ApplicationData\MyAppData.mdb). The latter is preferred but both of them run the same risks as number two.
Seth
PS This answer assumes Access 2003.
PPS If you have commercial visions for your application then the table linking has got to be REALLY robust.
PPPS I agree with the commenter that you may want to take the plunge and do SQL if it is in your skill set.
One thing that hasn't been discussed, and that's the issue of whether the compile to MDE could fail. Basically, if your code compiles in your front-end MDB, it will convert to an MDE. But I've noticed that lots of people never compile.
Some hints for keeping your VBA code in good shape:
in VBE options, turn off COMPILE ON DEMAND.
add the COMPILE button to your standard VBE toolbar and USE IT OFTEN.
periodically, backup your MDB and decompile/recompile it.
Also, remember that you must keep the MDB source, as the VBA code is not editable in an MDE and not recoverable by any good method.
EDIT:
Steps for a decompile:
backup your MDB.
start an instance of Access with the /decompile commandline argument. For, instance, I have a shortcut on my deskstop that has this as the target:
"C:\Program Files\Microsoft Office\OFFICE11\MSACCESS.EXE" /decompile
having opened that instance of Access, open the MDB you want to decompile. You will see nothing happen. DO NOTHING FURTHER IN THIS INSTANCE OF ACCESS -- close this instance of Access (the reason for this is that Michael Kaplan, who knows a thing or two about this, recommended that you never do any work in an Access instance opened with the decompile switch because he said there was no guarantee that the Access application code executed under those circumstances in a way that was fully safe for all kinds of Access work).
open the just-decompiled MDB holding down the shift key (you want to be sure that startup routines don't run because that would likely recompile the product before you've finished your cleanup) and compact the MDB (holding down the shift key again).
open the code editor and compile the project (DEBUG -> COMPILE [db name] for those who haven't step #2 in my original compiling instructions at the top of the post before the edit).
compact the MDB (doesn't matter if you bypass startup, since it's already fully compiled).
Why so many steps?
Because the purpose of the decompile is to get rid of the compiled p-code in order to start afresh from the canonical VBA code. Following the steps above insures that you have completely cleared the data pages storing the compiled code before you recompile. The reason for this is that without the compact step after the decompile, under some very rare circumstances, the code can behave strangely. I can't imagine that the old discarded p-code is being used again, but there's something about the pointers between the canonical code and the compiled code that apparently doesn't get completely flushed by a decompile without a compact.
This would be a comment to Seth's answer, but my rep isn't high enough to comment yet.
Seth did a great job answering your questions, I just wanted to add a bit more to part #1 about using the Database Splitter. The Database Splitter in the Tools menu works fine. Doing it manually is alright too, but it's a whole lot faster and easier to use the Database Splitter. I've used it a dozen times and never encountered any issues after using it.
http://www.databasedev.co.uk/split_a_database.html has a decent page about some of the pros, cons of splitting your database.
http://www.accessmvp.com/TWickerath/articles/multiuser.htm also has some good info when dealing with a split database in a multi-user environment.
Seth gave you a very good answer. But I'll add a few comments.
The number of objects only becomes relevant when you get close to about 1000 forms, reports and modules which have code. There's a limit about there. If you do get that message when trying to make an MDE then you almost certainly have a code error and need to compile to find the error
Another resource is "Splitting your app into a front end and back end Tips"
See the Auto FE Updater downloads page to make the process of distributing new FEs relatively painless.. The utility also supports Terminal Server/Citrix quite nicely.

Resources