Should "http://" be stored with a database record of a URL? - database

There are a number of fields a user can fill in where they'd enter a URL (their personal website, business site, favorite sites, etc etc).
It's the only thing they'd be entering in that particular field.
So should I always strip out "http://" to keep it consistent and to also reduce the possibility of broken links (ie. "http//")?
Just not sure what the best way to store URLs is.

If there's a reason to sanitize your users' input (security, size, speed, accuracy...) then do it.
But otherwise, don't.
There's actually a benefit a lot of times in taking your customer-input data as-is. They own their own typos or misspellings, broken links, etc. that way. As long as it doesn't cause a problem for you (i.e. you don't have a reason to sanitize it).
BTW -- consistency is a moot point, as it won't change the data type, and you can easily check for the "http://" and add or remove it as necessary in your presentation layers with a re-usable function.

As far as I know you actually can not call it an "URL", without having the protocol part:
http://www.w3.org/Addressing/URL/url-spec.txt
I wouldn't remove it.
However if you really need to keep the data consistent, it really depends how the URL is actually typed in your application. If it's a browser-like application, I'd bet it can be assumed to be http:// in front if there is none, for valid links.

Related

How to make JSON loads faster with large data (on HTTP or WebPage)

. Requesting the page(on HTTP or WebPage), it is very slow or even crash unless i load my JSON with fewer data. I really need to solve this since sooner or later i will be using large amount of data frequently. Here are my JSON data. --->>>
Notes:
1. The JSON loads only String and Integer.
2. I used to view my JSON in JSONView more like treeview using plugin
from GoogleChrome.
I am using angular and nodejs. tq
A quick resume of all the things that comes to my mind :
I had a similar issue once. My solutions may make the UI change.
Pagination
I doubt you can display that much data at one time, so the strategy should be divising your data in small amounts and then only load more when the client ask for it.
This way, the whole data is no longer stored in RAM as it is currently. This is how forums works (only 20 topics at a time).
Just imagine if StackOverflow make you load the whole historic of questions in the main page, how much GB would your navigator need just for that ?
You can use pagination in a classic way (button with page number, like google), or in an infinite scroll way, as you want.
For that you need to adapt your api and keep track of the index of the pages you already loaded at every moment in your Front. There are plenty of examples in AngularJS.
Only show the beginning of the data
When you look at Facebook comments, you may have a "show more" button. In their case, maybe it's to not break the UI, but it can also be used to not overload data.
You can display only the main lines of your datae (titles or somewhat) and add a button so the user can load more details if they want.
In your data model, the cost seems to be on the second level of "C". Just load data untill this second level, and download the remaining part (for this object) only if the user asks for.
Once again, no need to overload, your client's RAM will be thankfull, and your client's mobile 3G too.
Optimize your data stucture
If this is still not enough :
As StefanArya said in comment, indeed remove the "I" attribute, which is redundant with the JSON key.
Remove the "I" as you can use Object.keys() to get key name.
You also may don't need that much precision on your floats.
If I see any other ideas, I'll edit this post later.

Portlet event send array of objects

We have multiple projects with multiple portlets and need to send an array of objects between them.
Our situation:
One of the porlets is like a "Master-portlet", it will be responsible for all the REST-calls and consume json-data and parse it to Java-Objects.
All the other portlets will receive an array of objects and show them to the user.
Our thoughts and solution:
We wanted to implement this by sending arrays of objects trough events. One of the "smaller" portlets will send an event to the "Master-portlet" and the "Master-portlet" will then answer with a new event and send the right array of objects back.
Our problem:
We dont know how to send arrays of objects trough events. Is this even possible?
Also we are not sure if this is the right way to solve this. Are events ment to send a bigger amount of data?
Is there a better solution for our case? Maybe it would be better to implement a database and all the portlets get the information from there?
Consider portlet events (and portlets) the UI layer of your application. Based on this, judge if the amount of data that you send back and forth makes sense or not. Also, if you closely couple the portlets, you're just hiding the fact that they can only function together - at least a questionable idea. You rather want them to react to common circumstances (events), but not rely on a specific source of events (master portlet) being available.
That being said: The more complex the data is that you send as payload of a JSR-286 event, the easier you run into classloading problems in cases where your portlets are in different webapplications. If you restrict yourself to Java native types (e.g. String, Map, etc) you will omit problems with the classloader.
Typically you want to communicate changes to the current context (e.g. new "current customer" selected - and an identifier) but not all of the particular data (e.g. the new customer's name and order history). The rest of the data typically comes through the business layer anyway.
That's not to say that you absolutely must not couple your portlets - just that my preference is to rather have them very loosely coupled, so that I can add individual small portlets that replace those that I thought of yesterday.
If you have some time, I've covered a bit of this in a webinar last year, I hope that this adds some clarification where I was too vague in this quick answer.

How to implement a lossless URL shortening

First, a bit of context:
I'm trying to implement a URL shortening on my own server (in C, if that matters). The aim is to avoid long URLs while being able to restore a context from a shortened URL.
Currently I have a implementation that creates a session on the server, identified by a certain ID. This works, but consumes memory on the server (and is not desired since it's an embedded server with limited resources and the main purpose of the device isn't providing web pages but doing other cool stuff).
Another option would be to use cookies or HTML5 webstorage to store the session information in the client.
But what I'm searching for is the possibility to store the shortened URL parameters in one parameter that I attach to the URL and be able to re-construct the original parameters from that one.
First thought was to use a Base64-encoding to put all the parameters into one, but this produces an even larger URL.
Currently, I'm thinking of compressing the URL parameters (using some compression algorithm like zip, bz2, ...), do the Base64-encoding on that compressed binary blob and use that information as context. When I get the parameter, I could do a Base64-decoding, de-compress the result and have hands on the original URL.
The question is: is there any other possibility that I'm overlooking that I could use to lossless compress a large list of URL parameters into a single smaller one?
Update:
After the comments from home, I realized that I overlooked that compressing itself adds some overhead to the compressed data making the compressed data even larger than the original data because of the overhead that for example zipping adds to the content.
So (as home states in his comments), I'm starting to think that compressing the whole list of URL parameters is only really useful if the parameters are beyond a certain length because otherwise, I could end up having an even larger URL than before.
You can always roll your own compression. If you simply apply some huffman coding, the result will always be smaller (but then base64 encoding it, it'll grow a bit, so the net effect may perhaps not be optimal).
I'm using a custom compression strategy on an embedded project I work with where I first use a lzjb (a lempel ziv derivate, follow link for source code, really tight implementation (from open solaris)) followed by huffman coding the compressed result.
The lzjb algorithm doesn't perform too well on very short inputs, though (~16 bytes, in which case I leave it uncompressed).

look up input field webbrowser C language

This is in C Language
I want to know how i can write a program to lookup all the input fields of a website. Any website. and then can fill them in. I can write the simple webbrowser in vbs but how can i analyse the input fields. even better would be is i could click the lookup field and it puts the name of it in a box..... that would be ideal.
Anyone can help? thanks :)
Are you sure you want to do this in C?
I ask because it is not easy. First of all, you need to be able to run the HTTP GET request against the webpage you wish to view. For this, you probably need libcurl; you definitely don't want to be writing from scratch at any rate.
Next, you need to process the html you get, finding all input fields. You do NOT want to do this using regular expressions, if anything for the sake of bobince's blood pressure. HTML is not a regular language is the bit you need to take away - you need an xml parser. Enter libxml. I'm sure there are other xml libraries out there, and even libraries for parsing html.
Finally, having done that (got the fields etc) you need to be able to populate them and submit the correct request as per the ACTION and METHOD parameters of the FORM.
This is of course assuming you know what the fields should be formatted with. And it also assumes nothing else is going on. If you have a javascript validated web form (I sincerely hope they're validating on the request too, but they might provide feedback via JS) you won't benefit from that (unless you're going to integrate JS, in which case you might as well write a browser).
This is not a trivial task and it is the reason there are accessibility standards for HTML, because otherwise it becomes tricky to interpret the form without human interaction.
Of course, this all assumes said html is well formed, which isn't always the case...
I might suggest another approach. BeautifulSoup is a well known Python web scraping library that works very well. Python as a language allows easier string manipulation too, which will dramatically cut down your development time. I'd suggest giving the need to use C some serious thought given the size and complexity of the task you want to undertake vs your need to get a result quickly. If you have a lot of time, by all means go for C.

Obfuscating email in html

I'm currently developing a website, into which I've included a filter that attempts to obfuscate any e-mail addresses present in the webpages it serves.
As it is now, it converts the addresses into images.
I've also seen a few other methods in use; some split the address into characters and use generated javascript to include it in the final document, but that requires javascript, so it's not that useful in my opinion. The upside is it can be used to create a working mailto-link.
Another method, quite similar to the above, uses hex-notation to markup the e-mail address. I'm not really convinced it will thwart any serious harvesters though.
Others utilize the human brains' ability to understand language, and will either replace characters like the #-symbol with words, or separate the host and the username etc.
My question now is, how reliable is my method, of using generated images (whose filename do not give the address away) against scrapers, when I'm not using any distortion on the text in the images? Should I prefer a different method?
And as a continuation: if I wan't a fallback method, just in case the image creation should fail for some reason, which would be the smartest way to go?
Here you'll find many ways of obfuscating emails, and their effectiveness.
Hope it helps!
My question now is, how reliable is my method, of using generated images (whose filename do not give the address away) against scrapers, when I'm not using any distortion on the text in the images?
I don't have any data to back that up, but I would say: Quite reliable. Harvesters can get millions of addresses using "conventional" means; I don't think it's economically feasible for them to do image processing just to get a handful more.
And as a continuation: if I wan't a fallback method, just in case the image creation should fail for some reason, which would be the smartest way to go?
Use a good spam filter. :-) No, seriously, it's really hard keeping a mail address hidden from harvesters.
one possibility is to continue using the image, but replace it with text and a mailto link if javascript is enabled.
As long as you don't name the image something obvious, like emailadress.png, you should be pretty safe - I think.
I think it's all about providing some kind of 'are you human test' before you display the email or display the email in a way that is itself a test.
Thinking along the same lines maybe providing a link as the email address and running the tests before displaying the email might be a solution too.
As a user, an image-obfuscated email address is almost as useless as no email address. Whatever method you choose, I should ideally be given a mailto link, second best is some sort of your.name.69 AT longwebsitewhosnameicanteasilytranscribe.net style address.

Resources