Dealing with server stranded file uploads

Dealing with server stranded file uploads - angularjs

I have an Angular SPA app running on Azure and I'd like to implement a rich text editor similar to Medium.com. I'll be using some existing editor for this, but I have a problem with image files.
The problem
I would like my editor to have the ability to also add images inside content. The problem I'm having is when should I be uploading images to the server?
Possible solution 1: Immediate upload after they're selected
The good
saving content is quicker because all images are likely already uploaded
files get displayed right after they're uploaded from the server URL
The bad
files may have to be deleted on the server if users selects to cancel editing
files may get stranded on the server if user simply closes their browser window
Possible solution 2: Upload after save
The good
files get displayed immediately using FileAPI capabilities
no stranded server side files if editing is discarded in whatever way
The bad
saving of content may take longer as all images need to be uploaded at the moment of saving content
additional client-side to display images from local files
Question
I would like to implement Solution 1 because it provides more transparent user interface process and reacts quicker to edit saves => better UX. But how should I be managing stranded files? I could use a worker process that would delete stranded files from time to time, but I wonder whether that's the best approach for this scenario.
What and how would you suggest I implement this?

This is highly subjective (opinion based), but I will give it a shot.
You are actually having bigger problem than you think. In your abstracted approaches, you only describe a situation when user started something as new. Whereas I see much harder to solve issues if user is editing existing item. What will happen if he/she deletes images, adds new images and at the end hits CANCEL. And also, if Internet connection drops while creating / editing?
I would also go for Solution one. And, of course minimize the "bad" things, as they aren't really that much or hard to handle. Here is how I would solve all the "bad"s in Approach 1:
All my articles (or whatever user is editing with editor) will have a boolean flag "IsDraft" or something like this. Then all my business logic for front end will only look for items where IsDraft == False.
Whenever a user starts a new article (the easiest to solve problem) I immediately create new item in my DB with IsDraft=True
Have a link table to keep link between ID of item being created and Image Files being used (blobs). The point here is, that, if you do not keep links between used and unused blobs, you will have a lot headaches deciding which blob to delete and which one to leave on the storage.
Have a worker process (either as worker process in Web Role if I use Cloud Services, or as a Web Job (+ nice and short explanation here) is I use Web Sites) that checks for articles that are Draft and older than XXX days. If found - delete files + article itself.
Handling Editing of existing item is more challenging - for this, I might take the approach of:
Create a new copy of the entire article when user hits Edit and mark it as draft
If user hits save - switch the content of the new article (new version) with the existing one, leaving the new article marked as IsDraft - the worker process will clean it up.
If user doesn't hit save for some reason (hit cancel, or internet drops, or computer restarts, or browser crashes, or or or ..) - the new article will be cleaned later by the worker process
And if you want to go deeper and crazier, you can have a section in your admin panel where you show the Drafts to your users, so they can either continue work, or leave it to be auto cleaned.

Related

Best way to transfer database to new website without access to previous one

I has old website, which I miss dashbored login info, so I didn't has access to its database
Now I implement new one, what the best way to extract previous data (mp3 files, articals, mp4 files) to use it in new one.
I hope there is a way to extract its not manually
Thanks

I'd suggest in future all extraneous matter, images, mp3s etc. be held in a separate folder on your server, and hyperlinked to where they appear on a page.
That folder can be downloaded for safe-keeping.
.
In this case, I can only suggest that if the website is still up ( since you've simply lost the keys ), as an ordinary user you could download the whole website [ which wouldn't include the database ] with something like wget or Httrack to your computer, and from that extract what items are possible to your computer [ filtering by extensions if need be ].
At the very least you should get an idea of which items were present, even if they don't download.
As for articles, either the website online, or the downloaded website should display them in a browser, and you could copy the text.
None of this applies if the website is no longer online; but I wish you well whatever the outcome.

How to deal with issues when storing uploaded files in the file system for a web app?

I am building a web application where the users can create reports and then upload some images for the created reports. Those images will be rendered in the browser when the user clicks a button on the report page. The images are confidential and only authorized users will be able to access them.
I am aware of the pros and cons of storing images in database, in filesystem or a service like amazon S3. For my application, I am inclined to keep the images in the filesystem and paths of the images in the database. That means I have to deal with the problems arising around distributed transaction management. I need some advice on how to deal with these problems.
1- I believe one of the proper solutions is to use technologies like JTA and XADisk. I am not very knowledgeable about these technologies but I believe 2 phase commit is how automicity is achieved. I am using MySQL as the database, and it seems like 2 phase commit is supported by MySQL. Problem with this approach is XADisk does not seem to be an active project and there is not much documentation about it and there is the fact that I am not very knowlegable about the ins and outs of this approach. I am not sure if I should invest in this approach.
2- I believe I can get away with some of the problems arising from the violation of ACID properties for my application. While uploading images, I can first write the files to disk, if this operation succeeds I can update the paths in the database. If database transaction fails, I can delete the files from the disk. I know that is still not bulletproof; an electricity shortage might occur just after the db transaction or the disk might not be responsive for a while etc...I know there are also concurrency issues, for instance if one user tries to modify the uploaded image and another tries to delete it at the same time, there will be some problems. Still the chances for concurrent updates in my application will be relatively low.
I believe I can live with orphan files on the disk or orphan image paths on the db if such exceptional cases occur. If a file path exists in db and not in the file system, I can show a notification to the user on report page and he might try to reupload the image. Orphan files in the file system would not be too much problem, I might run a process to detect such files time to time. Still, I am not very comfortable with this approach.
3- The last option might be to not store file paths in the db at all. I can structure the filesystem such that I can infer the file path in code and load all images at once. For instance, I can create a folder with the name of report id for each report. When a request has been made to load images of the report, I can load the images at once since I know the report id. That might end up with huge number of folders in the filesystem and I am not sure if such a design is acceptable. Concurrency issues will still exist in this scheme.
I would appreciate some advice on which approach I should follow.

I believe you are trying to be ultra-correct, and maybe not that much is needed, but I also faced some similar situation some time ago and explored also different possibilities. I disliked options aligned to your option 1, but about the 2 and 3, I had different successful approaches.
Let's sum up first the list of concerns:
You want the file to be saved
You want the file path to be linked to the corresponding entity (i.e the report)
You don't want a file path to be linked to a file that doesn't exist
You don't want files in the filesystem not linked to any report
And the different approaches:
1. Using DB
You can assure transactions in the DB pretty much with any relational database, and with S3 you can ensure read-after-write consistency for both new objects and upload of new objects. If you PUT an object and you get a 200 OK, it will be readable. Now, how to put all this together? You need to keep track of the process. I can figure 2 ways:
1.1 With a progress table
The upload request is saved to a table with anything need to identify this file, report id, temp uploaded file path, destination path, and a status column
You save the file
If the file safe fails you can update the record in the table, or delete it
If saving the file is successful, in a transaction:
update the progress table with successful status
update the table where you actually save the relationship report-image
Have a cron, but not checking the filesystem, but checking the process table. If there is any file in the filesystem that is orphan, definitely it had been added to the table (it was point 1). Here you can decide if you will delete the file, or if you have enough info, you can continue with the aborted process triggering the point 4.
The same report-image relationship table with some extra status columns.
1.2 With a queue system
Like RabbitMQ, SQS, AMQ, etc
A very similar approach could be done with any queue system instead of a db table. I wont give much details because it depends more on your real infrastructure, but just the general idea.
The upload request goes to a queue, you send a message with anything you may need to identify this file, report id, and if you want a tentative final path.
You upload the file
A worker reads pending messages in the queue and does the work. The message is marked as consumed only when everything goes well.
If something fails, naturally the message will come back to the queue
In the next time a message is read, the worker can have enough info to see if there is work to resume, or even a file to delete if resuming is not possible
In both cases, concurrency problems wont be straightforward to manage, but can be managed (relying on DB locks in fist case, and FIFO queues in second cases) but always with some application logic
2. Without DB
To some extent a system without a database would be perfectly acceptable, if we can defend it as a proper convention over configuration design.
You have to deal with 3 things:
Save files
Read files
Make sure that the structure of the filesystem is manageable
Lets start with 3:
Folder structure
In general, something like one folder for report id will be too simple, and maybe hard to maintain, and also ultimately too plain. This will cause issues, because if we have a folder images with one folder per report, and tomorrow you have less say 200k reports, the images folder will have 200k elements, and even an ls will take too much time, same for any programing language trying to access. That will kill you
You can think about something more sophisticated. Personally like a way that I learnt from Magento 1 more than 10 years ago and I used a lot since then: Using a folder structure following first outside rules, but extended with rules derived extended with the file name itself.
We want to save a product image. The image name is: myproduct.jpg
first rule is: for product images i use /media/catalog/product
then, to avoid many images in the same one, i create one folder per every letter of the image name, up to some number of letters. Lets say 3. So my final folder will be something like /media/catalog/product/m/y/p/myproduct.jpg
like this, it is clear where to save any new image. You can do something similar using your reports id, categories, or anything that makes sense for you. The final objective is to avoid too flat structure, and to create a tree that makes sense to you, and also that can be automatized easily.
And that takes us to the next part:
Read and write.
I implemented a similar system before quite successfully. It allowed me to save files easy, and to retrieve them easily, with locations that were purely dynamic. The parts here were:
S3 (but you can do with any filesystem)
A small microservice acting as a proxy for both read and write.
Some namespace system and attached logic.
The logic is quite simple. The namespace lets me know where the file will be saved. For example, the namespace can be companyname/reports/images.
Lets say a develop a microservice for read and write:
For saving a file, it receives:
namespace
entity id (ie you report)
file to upload
And it will do:
based on the rules I have for that namespace, and the id and file name will save the file in this folder
it doesn't return the physical location. That remains unknown to the client.
Then, for reading, clients will use a URL that uses also convention. For example you can have something like
https://myservice.com/{NAMESPACE}/{entity_id}
And based on the logic, the microservice will know where to find that in the storage and return the image.
If you have more than one image per report, you can do different things, such as:
- you may want to have a third slug in the path such as https://myservice.com/{NAMESPACE}/{entity_id}/1 https://myservice.com/{NAMESPACE}/{entity_id}/2 etc...
- if it is for your internal application usage, you can have one endpoint that returns the list of all eligible images, lets say https://myservice.com/{NAMESPACE}/{entity_id} returns an array with all image urls
How I implemented this was with quite simple yml config to define the logic, and very simple code reading that config. That allowed me to have a lot of flexibility. For example save reports in total different paths or servers or s3 buckets if they belong to different companies or are different report types

MS Access database trouble

I have an access database that has been used for many years, converted from Access 2000 to 2007 and was fine. In the last couple weeks it has been doing strange things!
There is a form for 'editing' a record. When the user clicked on the button to open this form, a small white box appeared and said 'Record Deleted'.
After that, the database was corrupted. I support this database and I can not even get into it in design view. When I try to open it (holding the shift key down while opening it), it takes a while, then it displays the Access design page that has the 'blank database' icon and to the right it lists the frequent opened databases.
So, I can't even get the to objects. The only option I had was to restore from a previous night backup. This meant the users lost all their work for the day. Today, one week later, it has happened again. All the users work was lost because I had to restore from backup.
I don't know where to begin to trouble shoot this since I can not get into it in design view when it has become corrupted. Looking for any suggestions to debug this. I can use a copy of the database I had restored.
Thanks

As a first and most important suggestion. You should split your database.You can do this from the database tools tab on top. By doing this you will have a seperate back end independat of the front end and your client will not loose any data as if they get the error / corrupted database it would not affect the data secured in the backend
Second I havent had the exact same error but in the past I have faced instances where the forms just dont work. a recommendation i read somewhere was to create a new blank form and copy over the elements from this form onto that and delete this form. I doubt if there is any problem with the VBA but it would be worth compiling the code to check.
Apologies if this does not help much, but I hope the first suggestions helps protect your client data in the instance your database crashes.

First, check if any automated VBA code or macro is running on OnOpen, OnLoad, OnCurrent, AfterUpdate, OnDirty, etc. events of the troubleshooting forms. Simply open the VBA window and look at code on the specific form's module. Or in the case of macros, open form in design view and check the Event tab of Property Sheet (and the same for specific buttons, textboxes, etc.). There may be DoCmd.RunCommands occurring when users interact with form controls.
Also, if you find yourself unable to open forms or deal with a corrupted database, consider beginning with a blank Access .accdb file and import all objects from the previous Access 2000 .mdb file. And if specific controls don't function properly, recreate them as needed.
As mentioned above, split your database between BackEnd (only tables) and FrontEnd (forms, queries, macros, modules) which prevents corruption, efficiently runs systems as only data is sent across the network and not whole application items, and overall fosters a better multi-user environment. Each user can have copies of the FE on their local machines but all will connect to one BE on a shared network. To help, Access 2007-2013 has a button for this on the Ribbon under Database Tools.

Best practice for updating silverlight deployment that is actively being used

I am currently running a SL3 project where we are in a highly iterative development mode with about 25 active test customers. I am making small changes at a clip of about 4 new builds per day. It is important to know this application is mission critical line of business for these 25 people, it is the tool they use all day to do their work so they are using it constantly and often launch their browser and the app in the morning and never close it until the end of the day.
The challenge is that when I make an update to the application I have no clean way to notify the users, in most cases this is ok as it is rare that I introduce a data contract change or something that would be a classic 'breaking' change to the app/service. Users keep plugging along and will get the change next time they refresh.
Right now we have resorted to emailing everyone and telling them to force refresh or close the browser and log back in.
Surely there is a better way...
Right now my train of thought is to have a method on the server that compares client xap versions and determines if the client being used is the most up to date, if so I will notify the user and make them update.
What have you done to solve this problem?

One way of doing it is to use a push mechanism (I used Kaazing Websoocket Gateway but any would do). When a new version of the XAP is released a message (either manually entered into the system by admin or automated triggered by XAP file change event) would be sent to all the clients. In the simplest scenario some notification would be shown to a user (telling him that a new version is released and the application needs to refresh) and then the app would refresh (by simply reloading the page) saving user's state if necessary.

If I would do this I would just keep it simple. A configuration value in web.config and a corresponding service method that simply returns that value (the value itself could be anything, but a counter is probably wise). Then you could have your Silverlight app poll that service method at regular intervals. Whenever the value changes (which you would do manually when you deploy a new version), just pop up a dialog telling the user to refresh the browser or log in/out. This way you don't have to force them to refresh every time. If you go with the idea of comparing xap file versions they will always be required to refresh, even for non-breaking changes.
If you want to take it further you could come up with some sort of mechanism to distinguish between different severity levels. For instance, if the new config value would contain the string "update_forced", you could force the users to reload the app by logging them out automatically (a little harsh, perhaps). If it contains the string "update_recommended", just show a little icon at the top right corner saying that there is a new version and that they should upgrade in their own time.

Granted, this was targeted at Silverlight 3, but with the PollingDuplex client and such in the newer versions of Silverlight, you could publish an "Update Now" bit to the clients, and build a mechanism in the client to alert the user that there is an update that is now out... that they should update it shortly, etc. You may even be able, through serialization and such, to save the state that they are in when they close the app to reload it.
We've done stuff similar with a LOB app that we built, so that as users are changing things, the rest of the userbase sees those changes immediately. Next up will be putting the flags in to change authorization and upgrades "on the fly" if you will.

How can I open an Office Document from sql server via my C# windows app and automatically save back to database when editing is done?

Our application is written in C# using .net 2.0. The application tracks our business process and users can attach office documents for reference as attachments. They frequently edit those documents. Currently, they have to save the file to their hard drive, edit and save the file, then re-attach to our application to save into database (SQL 2005).
Our users would like to be able to edit the document and save the changes without needing to detatch, edit, and re-attach.
We can programmatically launch the office (word, excel or powerpoint) document, but how can we tell when the document has closed and re-attach the updated version to the database automatically?
Thanks for any help.
Joe

You may have a designated directory (e.g. Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments)) where you keep the files currently edited and put a FileSystemWatcher to watch whether the file changes.
The Diagnostics.Process class also has an "Exited" event that will notify you when the process, well, exited.

Look into SmartDocument technology at MSDN. Of the three productivity apps you mention (Word, Excel, Powerpoint) the first two for sure, and possibly Powerpoint, can have the right side panel -- where you usually see styles, navigation, or tasks -- programmed through VS using SmartDocument and Tools for Office plugins.
So, you can program the panel such that the user, working in Office or Excel (and possibly PP -- haven't looked into that) can select the document to be edited from your application, do the edits, and save/attach the document, all from the Word interface. The programming behind the right side panel will automagically take care of detaching, and reattaching the document, and can also save the document to SQLServer, if you like.
Hope this helps.

If you are launching the file by using Process.Start method and passing file path as a parameter, you can use the WaitForExit() method to be notified when the user has closed the file.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight