is there an elegant way to determine the size of data downloaded from a website -- bearing in mind that not all requests will go to the same domain that you originally visited and that other browsers may in the background be polling at the same time. Ideally i'd like to look at the size of each individual page -- or for a Flash site the total downloaded over time.
I'm looking for some kind of browser plug-in or Fiddler script. I'm not sure Fiddler would work due to the issues pointed out above.
I want to compare sites similar to mine for total filesize - and keep track of my own site also.
Firebug and HttpFox are two Firefox plugin that can be used to determine the size of data downloaded from a website for one single page. While Firebug is a great tool for any web developer, HttpFox is a more specialized plugin to analyze HTTP requests / responses (with relative size).
You can install both and try them out, just be sure to disable the one while enabling the other.
If you need a website wide measurement:
If the website is made of plain HTML and assets (like CSS, images, flash, ...) you can check how big the folder containing the website is on the server (this assumes you can login on the server)
You can mirror the website locally using wget, curl or some GUI based application like Site Sucker and check how big the folder containing the mirror is
If you know the website is huge but you don't know how much, you can estimate its size. i.e. www.mygallery.com has 1000 galleries; each gallery has an average of 20 images loaded; every image is stored in 2 different sizes (thumbnail and full size) an average of for _n_kb / image; ...
Keep in mind that if you download / estimating a dynamic websites, you are dealing with what the website produces, not with the real size of the website on the server. A small PHP script can produce tons of HTML.
Have you tried Firebug for Firefox?
The "Net" panel in Firebug will tell you the size and fetch time of each fetched file, along with the totals.
You can download the entire site and then you will know for sure!
https://www.httrack.com/
Related
I'm using Selenium and as part of loading a web page, a number of additional HTTP requests are made, including downloading a file of JSON encoded data which is used as data for the web page.
From within the developer tools in Chrome, I can see the name of this file, but how can I get hold of this information via Selenium?
Note that I don't know how this file is downloaded (Javascript? Something else?) and I don't really care - the data is all I care about and the file seems to have a sufficiently obvious name (but probably not fixed!) that given a list of sdownloaded files, I can figure out which one is the one I want.
Seems my question is a duplicate and the answer can be found here:
how to access Network panel on google chrome developer tools with selenium?
The key command is:
timings = driver.execute_script("return window.performance.getEntries();")
...but you want to place this in a loop in case your code makes the request too soon and the file you want has not yet been read. I used a 1s delay and a max of 20 attempts - normally takes 2 cycles for the file I want to appear in this list.
I found the first web page loading time for CN1 Javascript Built taking too long, need about 2 minutes.
I attached the Chrome's network loading screen shot, found the classes.js is the most heavy page, possible to zip it?
Second, there is 2 theme files that downloaded sequentially, is it possible for them to load at the same time?
Kindly advice.
Normally I would answer that you can look at the performance section of the developer guide but the relevant sections there relate to reducing the theme.res size which seems pretty small in your case.
The largest portion in your code is the class files so I'm guessing that the best way to reduce them is to further reduce dependencies so the obfucator can remove more dead code. Keep in mind that the classes.js file is cached and can be deployed via CDN's such as cloudflair to improve download speeds. It can be served in a gzipped form as well which is a part of the CDN repertoire.
I have an application for a huge business, which needs many pages, controls etc. The .xap file easily goes up to 50MB. I notice that every time when I load the page, the .xap file got downloaded to my local. However, my users may use 3G network to connect, so it must be very slow if we downlaod the app everytime they open the page. So I was wondering if there is some way I can do the deployment similar to WPF, which only download to local when the version is changed....
Any other suggestion to improve the loading speed is welcomed.
Thanks a lot
First and for most get your web server caching headers sorted. Typically you open the ClientBin folder in IIS Manager and enter the HTTP Response Header section. Set expiry to something like 1 Day (or if you update during normal working hours set to 15 Minutes). Note just because the content expires doesn't mean it will be re-downloaded but it does mean it'll get cached before being used. The browser will inform the server of the version it currently has if it has expired allow the server to simply respond with "go ahead and use that it hasn't changed since the last time you checked".
For such a large system you should seriously consider dividing the app up into multiple dll projects. Then use the Application Library Caching feature found in the main apps project properties. You need to create the appropriate .extmap.xml files for each of your dlls. Many of the SDK and Toolkit dlls have them already. This results in separate .zip files for these dlls being placed in the ClientBin folder and not incorporated into one large Xap. This allows you separate slow moving / never changing code into a set of zips and more frequently changing business code into another set. When you update the app the you only update the changed zips thus reducing the download burden of a new version. (Note this only works with inbrowser based apps).
In the serverlight project option, check the Reduce XAP size by using application library caching.
planning to launch a comic site which serves comic strips (images).
I have little prior experience to serving/caching images.
so these are my 2 methods i'm considering:
1. Using LinkProperty
class Comic(db.Model)
image_link = db.LinkProperty()
timestamp = db.DateTimeProperty(auto_now=True)
Advantages:
The images are get-ed from the disk space itself ( and disk space is cheap i take it?)
I can easily set up app.yaml with an expiration date to cache the content in user's browser
I can set up memcache to retrieve the entities faster (for high traffic)
2. Using BlobProperty
I used this tutorial , it worked pretty neat. http://code.google.com/appengine/articles/images.html
Side question: Can I say that using BlobProperty sort of "protects" my images from outside linkage? That means people can't just link directly to the comic strips
I have a few worries for method 2.
I can obviously memcache these entities for faster reads.
But then:
Is memcaching images a good thing? My images are large (100-200kb per image). I think memcache allows only up to 4 GB of cached data? Or is it 1 Mb per memcached entity, with unlimited entities...
What if appengine's memcache fails? -> Solution: I'd have to go back to the datastore.
How do I cache these images in the user's browser? If I was doing method no. 1, I could just easily add to my app.yaml the expiration date for the content, and pictures get cached user side.
would like to hear your thoughts.
Should I use method 1 or 2? method 1 sounds dead simple and straightforward, should I be wary of it?
[EDITED]
How do solve this dilemma?
Dilemma: The last thing I want to do is to prevent people from getting the direct link to the image and putting it up on bit.ly because the user will automatically get directed to only the image on my server
( and not the advertising/content around it if the user had accessed it from the main page itself )
You're going to be using a lot of bandwidth to transfer all these images from the server to the clients (browsers). Remember appengine has a maximum number of files you can upload, I think it is 1000 but it may have increased recently. And if you want to control access to the files I do not think you can use option #1.
Option #2 is good, but your bandwidth and storage costs are going to be high if you have a lot of content. To solve this problem people usually turn to Content Delivery Networks (CDNs). Amazon S3 and edgecast.com are two such CDNs that support token based access urls. Meaning, you can generate a token in your appengine app that that is good for either the IP address, time, geography and some other criteria and then give your cdn url with this token to the requestor. The CDN serves your images and does the access checks based on the token. This will help you control access, but remember if there is a will, there is a way and you can't 100% secure anything - but you probably get reasonably close.
So instead of storing the content in appengine, you would store it on the cdn, and use appengine to create urls with tokens pointing to the content on the cdn.
Here are some links about the signed urls. I've used both of these :
http://jets3t.s3.amazonaws.com/toolkit/code-samples.html#signed-urls
http://www.edgecast.com/edgecast_difference.htm - look at 'Content Security'
In terms of solving your dilemma, I think that there are a couple of alternatives:
you could cause the images to be
rendered in a Flash object that would
download the images from your server
in some kind of encrypted format that
it would know how to decode. This would
involve quite a bit of up-front work.
you could have a valid-one-time link
for the image. Each time that you
generated the surrounding web page,
the link to the image would be
generated randomly, and the
image-serving code would invalidate
that link after allowing it one time. If you
have a high-traffic web-site, this would be a very
resource-intensive scheme.
Really, though, you want to consider just how much work it is worth to force people to see ads, especially when a goodly number of them will be coming to your site via Firefox, and there's almost nothing that you can do to circumvent AdBlock.
In terms of choosing between your two methods, there are a couple of things to think about. With option one, where are are storing the images as static files, you will only be able to add new images by doing an appcfg.py update. Since AppEngine application do not allow you to write to the filesystem, you will need to add new images to your development code and do a code deployment. This might be difficult from a site management perspective. Also, serving the images form memcache would likely not offer you an improvement performance over having them served as static files.
Your second option, putting the images in the datastore does protect your images from linking only to the extent that you have some power to control through logic if they are served or not. The problem that you will encounter is that making that decision is difficult. Remember that HTTP is stateless, so finding a way to distinguish a request from a link that is external to your application and one that is internal to your application is going to require trickery.
My personal feeling is that jumping through hoops to make sure that people can't see your comics with seeing ads is solving the prolbem the wrong way. If the content that you are publishing is worth protecting, people will flock to your website to enjoy it anyway. Through high volumes of traffic, you will more than make up for anyone who directly links to your image, thus circumventing a few ad serves. Don't try to outsmart your consumers. Deliver outstanding content, and you will make plenty of money.
Your method #1 isn't practical: You'd need to upload a new version of your app for each new comic strip.
Your method #2 should work fine. It doesn't automatically "protect" your images from being hotlinked - they're still served up on a URL like any other image - but you can write whatever code you want in the image serving handler to try and prevent abuse.
A third option, and a variant of #2, is to use the new Blob API. Instead of storing the image itself in the datastore, you can store the blob key, and your image handler just instructs the blobstore infrastructure what image to serve.
Using Silverlight 3, I noticed that System.Xml.Linq.dll was added to my XAP file, increasing the size from 12 to 58 k, so I checked the box 'Reduce XAP Size by using application library caching'.
Publishing the app to IIS, then loading it with Web Dev Helper enabled, I see that when I open the app, the XAP file at 12k is loaded, then the System.Xml.Linq.zip is loaded at 46k, for a total of 58k. Whenever I refresh the main page of the app, the same files are loaded into the browser. If I uncheck the 'Reduce..." box, then re-publish the app to IIS, one XAP file at 58k is loaded whenever I load the application.
How is one method different from or better than the other? I could see the advantage if the dll were somehow saved on the client computer removing the need to download it each time the app were opened.
Thanks
Mike Thomas
A browser caches by URL, so by splitting your application into a part which changes frequently and a part which will probably stay the same for a long time (the Linq part) and which might be shared between applications even, you save some download.
But it depends on the exact situation (frequency of change, location of 'generic' DLLs, etc.) whether it really helps.
The whole reason for keeping XAP size small is so that your application loads as quickly as possible. This is important: even on a faster connection, a bloated XAP can take extra seconds to load, which can be long enough for your users to leave your site.
While Linq is only accounting for 46KB, there are other cases where this can make a bigger deal. For instance, the SyndicationFeed class makes it really easy to handle RSS and ATOM feeds, but it weighs in at 114KB.
Application library caching helps in two ways:
It allows for sharing common DLL's between applications, so if another application has already pulled down a system DLL, your app can just reference it.
It allows your application updates to be smaller, since the framework DLL's won't change betwen XAP versions.
The difference is that when dll's are outside of the XAP file even though browser asks for those files webserver responds with 304 Not Modified HTTP response.
By default browser will not request for those files to be downloaded again. This obviously saves time especially when project references "heavy" libraries (ie. Telerik ones can be quite large in size)
Hope this helps someone.