I'm building an application that presents sensitive patient information.
One of my routes shows presents an HTML fragment received from the server that contains an image of a patient document.
I need to ensure that that document is not accessible on disk after the page is closed
That would be a really big issue if it was left there.
I noticed that the route was caching and I had to remove it from $templateCache to detect changes. Is that just cached in memory or is it local hdd?
A broader question might be: does Angular cache anything on persistent storage beyond what the browser already does according to HTTP cache control headers?
This really depends on the abstraction level you look at:
A Javascript app cannot write arbitrary files to the disk - there is the well-known browser sandbox, and this also applies to angular.js apps. So, if you are not specifically using the Browser offline APIs such as LocalStorage, Cookies, ... in your own code, there will be "just" the usual browser cache. So you should be fine.
Caveat 1: Sometimes it seems to be quite hard to control the browser caches, there are multiple ways browsers cache things as you already mentioned. They can usually be by http headers. So if you configure your HTTP headers very carefully you should be fine.
Caveat 2: There are multiple ways operating systems cache things and they may or may not save some of these caches to disk (as a simple example consider the windows hibernation file: there is probably a copy of your image in there when you had it open in your browser upon hibernation.). This cannot be controlled by a browser app - but for most applications it does not really matter.
Related
1. The setup
I'm currently initiating a GET request to an S3 bucket (not important) to download a very large file using the browser fetch(). This file is, in it's stored form, raw and unusable binary data, not structured.
2. The task and problem
There are a few things I want to do on the client-side with this data:
I need to process this data as it streams into the client to perform transformations on it (decryption, for example).
Once the data is processed and downloaded, it might still not be of any immediate use to the user outside the context of the web UI. Maybe the data should stay stored within the web app's sandbox disk space unless a user explicitly exports it?
3. The question
Where can I store this blob of unstructured data in both or either of the use cases listed above? There appear to be many options but none that fit this use case precisely. Any thoughts?
EDIT:
I feel like an idiot. I totally forgot about the FileSystem API. I'll take a look and answer my own question with a pseudo-implementation of this works.
EDIT 2:
I feel the need to reiterate what I stated in 2.2 above:
within the web app's sandbox disk space
I don't care about accessing the user's whole file system. I just want a space I can work with large files in on disk, similar to the app space directories provided to mobile applications by Android and iOS.
If you want to save and process a file at client level, and Blob is not an option, you may consider File System Access API (https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API#writing_to_files), even if this will introduce an interaction with the user.
Another option would be to take the advantages of PWAs client-side storage (https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Client-side_web_APIs/Client-side_storage), this is also about your application architecture.
Before to check if to process your file at client level can be done as you need with the existing technologies, check if you really need to do that because it is only option, or, instead, if you are able to move such logic at server level, depending on your use cases.
I'm considering hiring a developer from upwork.com to build my chrome extension. He was saying he would need to build it using local storage, but I questioned him and asked why he couldn't use the chrome.storage API. His response was that if the user clears the browser cache, it would clear everything saved in the extension. That didn't seem right to me but I wanted to ask you all.
No, even after clearing ALL browser data and cache if you are using chrome.storage API any data stored in that manner will still be present.
Unless you explicitly clear the extensions storage by command it will not be cleared.
Extensions in Google Chrome will not be removed or preferences changed/cleared.
You are right that this will also depend on where your extension saves its preferences, for example, extensions may allow you to sign in and save your data in the cloud - whereas other may not. There are multiple ways to store this data, ask why an extension developer cannot develop a robust extension?
I am struggling with some details about finding a solution to make an Angular / C# app available offline.
My idea would be to use upup.js the get the business logic for the SPA in Angular available offline. upup.js uses service workers to do so. I would store the data required for the offline app using angular-localForage which uses IndexedDB and falls back to WebSQL and localStorage in case.
The problem is, that I have to make files and images available offline too without requiring the user to visit the page they are being used on and I am worried about the maximum quotas. I could store them using either upup.js and adding those files as assets, or store them with angular-localForage as blobs. IndexedDB is supposed to be unlimited by now if I am informed correctly? I couldn't find any maximum quota for a service worker though, as the upup.js solution would use. The AppCache is deprecated, so I wouldn't use that... Or maybe I understood something completely wrong, or there is even another, better solution? Anyhow, the question is:
TL;DR: What is the better way to store files for an AngularJS offline application: angular-localForage (IndexedDB etc.) or a Service Worker (upup.js) and what are the maximum quotas for each solution? Or is there an even better solution?
In my opinion Service Worker (SW) is better than tradition local storage. Plus SW can also use indexedDB.
For the implementation, it is very depending. How your app structure, what technologies of front-end being with with your angular app, what is your main goals of using SW...?
1. Traditional JS loading, you are likely merge all the JS file in one... like app.js contains everything.
And you also don't care about Push Notification neither any other cool features that SW offering.
=> For this case it seem like upup.js suite you the best.
NOTE : beware that upup.js attempt to registering SW on it own, so it likely blocking or complexified your work on expanding feature of SW.
2. Advanced AMD user, where almost all of your JS chopped to small pieces... like fooCtrl.js, barCtrl.js, etc...
For sure you don't want to configure like 100+ files of JS, and further more you will got a lot of html template to load.
=> In this case I will suggest you to use sw-toolbox . A very powerful and light weight tool made by Google. At initial if you are not familiar with SW concept yet, you will have a bit of trouble setting it up for your site (but it won't be longer than a day of you are an advanced JS developer)
After all has configured, all become so very simple. For example this is how I cache all of static content in my site.
self.toolbox.router.get(/(.js|.css|.png|.jpg|.json|.html)/, self.toolbox.fastest);
3. You don't care about what kind of technology at front-end side. You just interest with SW.
=> Simply go for sw-toolbox it's a real time-saver for basic configuration. And if you want to expand the usage of SW, you can just expand it by your own will.
I want ask : if cache too many page 10000 page is cached.
10000 page create 10000 file cache.
Is it ok ? it can create slow?
I don't think that this could slowdown the application. Modern file systems support big amount of files in a directory. The problem is if you what to manually list all those files.
A cache file is stored on the server as static HTML rather than the dynamically generated HTML code that is created with PHP.
Loading these cache files is significantly quicker than running PHP code through the PHP compiler at runtime.
The only issue is perhaps disk space as the cache files are physical files on the server. Most cache filesizes should be relatively small if used correctly so this really shouldn't be an issue on a proper web server with sufficient resources.
Cache files are generally always faster than running the PHP script as they do not have to be processed - the overhead is just hitting the file and retrieving it.
The compromise you make with cache is whether or not your data changes often enough to warrant using file cache, and whether or not users need access to an always up to date file.
I wouldn't worry about it, and hey you can always turn the cache off - right?
Yes, but probably not significant
Full-page cache files are all stored in the same folder. As such caching 10k pages, means having 10k files in a folder. It will not likely be significant, but there will be a slow down in application performance as the cache fills up.
Also note that there's a limit to how many files you can store in a folder depending on the drive format though generally speaking by the time the limit is reached performance is already significantly affected.
Don't use view caching if it's not necessary
Even full page caching has a cost. A normal php request is the following logic:
user -> internet -> webserver -> php -> (application logic)
Using full page view caching this doesn't change much:
user -> internet -> webserver -> php -> (read and render cache file)
If there is no dynamic content in the cache file it's a better idea to store the contents as a static file and move the response closer to the user:
user -> internet -> webserver -> static html file
Plugins like html cache permit this by storing cached views as html files and allowing the webserver to handle requests before invoking php.
That also means, depending on the cache headers sent for html files, that subsequent requests come straight out of the user's browser cache - and you can't get faster than that:
user -> user's browser cache
planning to launch a comic site which serves comic strips (images).
I have little prior experience to serving/caching images.
so these are my 2 methods i'm considering:
1. Using LinkProperty
class Comic(db.Model)
image_link = db.LinkProperty()
timestamp = db.DateTimeProperty(auto_now=True)
Advantages:
The images are get-ed from the disk space itself ( and disk space is cheap i take it?)
I can easily set up app.yaml with an expiration date to cache the content in user's browser
I can set up memcache to retrieve the entities faster (for high traffic)
2. Using BlobProperty
I used this tutorial , it worked pretty neat. http://code.google.com/appengine/articles/images.html
Side question: Can I say that using BlobProperty sort of "protects" my images from outside linkage? That means people can't just link directly to the comic strips
I have a few worries for method 2.
I can obviously memcache these entities for faster reads.
But then:
Is memcaching images a good thing? My images are large (100-200kb per image). I think memcache allows only up to 4 GB of cached data? Or is it 1 Mb per memcached entity, with unlimited entities...
What if appengine's memcache fails? -> Solution: I'd have to go back to the datastore.
How do I cache these images in the user's browser? If I was doing method no. 1, I could just easily add to my app.yaml the expiration date for the content, and pictures get cached user side.
would like to hear your thoughts.
Should I use method 1 or 2? method 1 sounds dead simple and straightforward, should I be wary of it?
[EDITED]
How do solve this dilemma?
Dilemma: The last thing I want to do is to prevent people from getting the direct link to the image and putting it up on bit.ly because the user will automatically get directed to only the image on my server
( and not the advertising/content around it if the user had accessed it from the main page itself )
You're going to be using a lot of bandwidth to transfer all these images from the server to the clients (browsers). Remember appengine has a maximum number of files you can upload, I think it is 1000 but it may have increased recently. And if you want to control access to the files I do not think you can use option #1.
Option #2 is good, but your bandwidth and storage costs are going to be high if you have a lot of content. To solve this problem people usually turn to Content Delivery Networks (CDNs). Amazon S3 and edgecast.com are two such CDNs that support token based access urls. Meaning, you can generate a token in your appengine app that that is good for either the IP address, time, geography and some other criteria and then give your cdn url with this token to the requestor. The CDN serves your images and does the access checks based on the token. This will help you control access, but remember if there is a will, there is a way and you can't 100% secure anything - but you probably get reasonably close.
So instead of storing the content in appengine, you would store it on the cdn, and use appengine to create urls with tokens pointing to the content on the cdn.
Here are some links about the signed urls. I've used both of these :
http://jets3t.s3.amazonaws.com/toolkit/code-samples.html#signed-urls
http://www.edgecast.com/edgecast_difference.htm - look at 'Content Security'
In terms of solving your dilemma, I think that there are a couple of alternatives:
you could cause the images to be
rendered in a Flash object that would
download the images from your server
in some kind of encrypted format that
it would know how to decode. This would
involve quite a bit of up-front work.
you could have a valid-one-time link
for the image. Each time that you
generated the surrounding web page,
the link to the image would be
generated randomly, and the
image-serving code would invalidate
that link after allowing it one time. If you
have a high-traffic web-site, this would be a very
resource-intensive scheme.
Really, though, you want to consider just how much work it is worth to force people to see ads, especially when a goodly number of them will be coming to your site via Firefox, and there's almost nothing that you can do to circumvent AdBlock.
In terms of choosing between your two methods, there are a couple of things to think about. With option one, where are are storing the images as static files, you will only be able to add new images by doing an appcfg.py update. Since AppEngine application do not allow you to write to the filesystem, you will need to add new images to your development code and do a code deployment. This might be difficult from a site management perspective. Also, serving the images form memcache would likely not offer you an improvement performance over having them served as static files.
Your second option, putting the images in the datastore does protect your images from linking only to the extent that you have some power to control through logic if they are served or not. The problem that you will encounter is that making that decision is difficult. Remember that HTTP is stateless, so finding a way to distinguish a request from a link that is external to your application and one that is internal to your application is going to require trickery.
My personal feeling is that jumping through hoops to make sure that people can't see your comics with seeing ads is solving the prolbem the wrong way. If the content that you are publishing is worth protecting, people will flock to your website to enjoy it anyway. Through high volumes of traffic, you will more than make up for anyone who directly links to your image, thus circumventing a few ad serves. Don't try to outsmart your consumers. Deliver outstanding content, and you will make plenty of money.
Your method #1 isn't practical: You'd need to upload a new version of your app for each new comic strip.
Your method #2 should work fine. It doesn't automatically "protect" your images from being hotlinked - they're still served up on a URL like any other image - but you can write whatever code you want in the image serving handler to try and prevent abuse.
A third option, and a variant of #2, is to use the new Blob API. Instead of storing the image itself in the datastore, you can store the blob key, and your image handler just instructs the blobstore infrastructure what image to serve.