I'm currently developing a website, into which I've included a filter that attempts to obfuscate any e-mail addresses present in the webpages it serves.
As it is now, it converts the addresses into images.
I've also seen a few other methods in use; some split the address into characters and use generated javascript to include it in the final document, but that requires javascript, so it's not that useful in my opinion. The upside is it can be used to create a working mailto-link.
Another method, quite similar to the above, uses hex-notation to markup the e-mail address. I'm not really convinced it will thwart any serious harvesters though.
Others utilize the human brains' ability to understand language, and will either replace characters like the #-symbol with words, or separate the host and the username etc.
My question now is, how reliable is my method, of using generated images (whose filename do not give the address away) against scrapers, when I'm not using any distortion on the text in the images? Should I prefer a different method?
And as a continuation: if I wan't a fallback method, just in case the image creation should fail for some reason, which would be the smartest way to go?
Here you'll find many ways of obfuscating emails, and their effectiveness.
Hope it helps!
My question now is, how reliable is my method, of using generated images (whose filename do not give the address away) against scrapers, when I'm not using any distortion on the text in the images?
I don't have any data to back that up, but I would say: Quite reliable. Harvesters can get millions of addresses using "conventional" means; I don't think it's economically feasible for them to do image processing just to get a handful more.
And as a continuation: if I wan't a fallback method, just in case the image creation should fail for some reason, which would be the smartest way to go?
Use a good spam filter. :-) No, seriously, it's really hard keeping a mail address hidden from harvesters.
one possibility is to continue using the image, but replace it with text and a mailto link if javascript is enabled.
As long as you don't name the image something obvious, like emailadress.png, you should be pretty safe - I think.
I think it's all about providing some kind of 'are you human test' before you display the email or display the email in a way that is itself a test.
Thinking along the same lines maybe providing a link as the email address and running the tests before displaying the email might be a solution too.
As a user, an image-obfuscated email address is almost as useless as no email address. Whatever method you choose, I should ideally be given a mailto link, second best is some sort of your.name.69 AT longwebsitewhosnameicanteasilytranscribe.net style address.
Related
Could someone point me in the right direction please?
I am trying to extract specific text/numbers from a json payload. I can access/isolate the full string of text by using triggerFormDataValue('text').
The text in question may contain 'sendSMS 1122334455 actual message' as its actual value
Is there anyway, in a logic app flow, to break the text into its component parts?
(sendsms, 1122334455 and actual message)
n.b. I have already tried interacting with the cognitive analysis app for key word searches but that doesn't return the number, nor the full sting, just the key words.
thanks
For more complex logic like the one you have, I would recommend to create an Azure Function. This is a light-weight solution that will offer you the flexibility of a microservice which offers you this possibility.
To extract the numbers, you may use a regular expression.
Edit:
I've found a similar question here on SO, but the conclusion is very similar.
I've done some small research now and it seems Microsoft deliberately does't put too much text parsing functionality in Logic Apps in order to avoid them being too complex. You might have a chance if you put them in JSON notation, but I think the better option would be to go to Azure functions, since it provides reuseability as well.
If that's all you need to do, you can use the split() function. Details: String Functions...split
Firstly, I know about the duplicates. We're not talking about iOs/Android/KindOfDevice-only, as the others & cookies are not the way I want to go.
So I want to bypass the need of a password or something by "binding" my service (which is only an idea by now) to the device used.
An E-Mail and stuff would be needed of course, to keep your devices bundled.
What would your approaches be?
My thoughts so far
My first idea was using the mac-adress, because I heard that they're unique. But a quick google told me that's not really true.
On Phones I could use the phone number or the IMEI, but I don't want it to be restricted to phones, it should be usable by web, too.
I guess when we talk about a web-solution, stuff would get even more tricky because browsers won't let the service go really deep into the system and stuff?
Of course I guess there needs to be a combination of two or more things. So two not-so-unique things combine to an 99%-unique-thing?
I just need some help about how to go on with this problem, a direction, because if you google terms like "unique device identification" you only get this medicine-thing..
In my project I use
var secureUDID = (UIDevice.current.identifierForVendor?.uuidString)!
which - Returns a string created from the UUID, such as “E621E1F8-C36C-495A-93FC-0C247A3E6E5F”.
UUID - An alphanumeric string that uniquely identifies a device to the app’s vendor.
There are a number of fields a user can fill in where they'd enter a URL (their personal website, business site, favorite sites, etc etc).
It's the only thing they'd be entering in that particular field.
So should I always strip out "http://" to keep it consistent and to also reduce the possibility of broken links (ie. "http//")?
Just not sure what the best way to store URLs is.
If there's a reason to sanitize your users' input (security, size, speed, accuracy...) then do it.
But otherwise, don't.
There's actually a benefit a lot of times in taking your customer-input data as-is. They own their own typos or misspellings, broken links, etc. that way. As long as it doesn't cause a problem for you (i.e. you don't have a reason to sanitize it).
BTW -- consistency is a moot point, as it won't change the data type, and you can easily check for the "http://" and add or remove it as necessary in your presentation layers with a re-usable function.
As far as I know you actually can not call it an "URL", without having the protocol part:
http://www.w3.org/Addressing/URL/url-spec.txt
I wouldn't remove it.
However if you really need to keep the data consistent, it really depends how the URL is actually typed in your application. If it's a browser-like application, I'd bet it can be assumed to be http:// in front if there is none, for valid links.
I'm using Akismet for my spam protection on my web page. It won't even let users post something like, "Hey guys check this out!". I was hoping that I could just get rid of links and have them check the posters IP to see if it had been logged, but not block something so simple.
Is there a way to decrease the harshness through Akismet? I'm using the .NET 2.0 library here http://www.codeplex.com/wikipage?ProjectName=AkismetApi
For questions about Akismet you are always welcome to drop us a line - http://akismet.com/contact/
For cases like this the first thing I suggest is making sure that you are sending the correct data for the Akismet API call - http://akismet.com/development/api/#comment-check - since sending wrong or insufficient data can reduce the accuracy.
Second, if Akismet makes a mistake you should be sending the data back via the Submit Ham and Submit Spam API calls. This allows the Akismet system to learn more about what you consider spam/not spam on your site.
I'd suggest not using Akismet at all and just managing it yourself. You could write a regex to remove the links from postings: http://www.jhartig.com/2010/02/perfect-regex-for-removing-links-when.html
Instead of using one of these anti-spam engines, have you thought about using Facebook the way TechCrunch does? It is very effectively at not allowing spam or flamewars because it's not anonymous.
The other things to use is ReCaptcha, to keep the bots out which is probably the cause of your spam problems in the first place. http://www.google.com/recaptcha
hey here are my two cents to the topic:
try something that webcrawlers hate!! something they can't understand at ALL!!
you guessed it right pal!! FLASH!!
If I were you would use something flash like: flexi commment or something
This is in C Language
I want to know how i can write a program to lookup all the input fields of a website. Any website. and then can fill them in. I can write the simple webbrowser in vbs but how can i analyse the input fields. even better would be is i could click the lookup field and it puts the name of it in a box..... that would be ideal.
Anyone can help? thanks :)
Are you sure you want to do this in C?
I ask because it is not easy. First of all, you need to be able to run the HTTP GET request against the webpage you wish to view. For this, you probably need libcurl; you definitely don't want to be writing from scratch at any rate.
Next, you need to process the html you get, finding all input fields. You do NOT want to do this using regular expressions, if anything for the sake of bobince's blood pressure. HTML is not a regular language is the bit you need to take away - you need an xml parser. Enter libxml. I'm sure there are other xml libraries out there, and even libraries for parsing html.
Finally, having done that (got the fields etc) you need to be able to populate them and submit the correct request as per the ACTION and METHOD parameters of the FORM.
This is of course assuming you know what the fields should be formatted with. And it also assumes nothing else is going on. If you have a javascript validated web form (I sincerely hope they're validating on the request too, but they might provide feedback via JS) you won't benefit from that (unless you're going to integrate JS, in which case you might as well write a browser).
This is not a trivial task and it is the reason there are accessibility standards for HTML, because otherwise it becomes tricky to interpret the form without human interaction.
Of course, this all assumes said html is well formed, which isn't always the case...
I might suggest another approach. BeautifulSoup is a well known Python web scraping library that works very well. Python as a language allows easier string manipulation too, which will dramatically cut down your development time. I'd suggest giving the need to use C some serious thought given the size and complexity of the task you want to undertake vs your need to get a result quickly. If you have a lot of time, by all means go for C.