Spam from multiple user agents, same IP - spam-prevention

I have a lot of spam posts in a forum that I moderate that I can't quite figure out.
(1) The spammer seems to be getting through Captcha
(2) I have logged the same IP (a Charter/Spectrum address -- so I can't block the ASN) for the following User Agents:
[
{
"userAgent": "Nokia7250/1.0 (3.14) Profile/MIDP-1.0 Configuration/CLDC-1.0"
},
{
"userAgent": "Mozilla/5.0 (PlayBook; U; RIM Tablet OS 2.1.0; en-US) AppleWebKit/536.2+ (KHTML like Gecko) Version/7.2.1.0 Safari/536.2+"
},
{
"userAgent": "P3P Validator"
},
{
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:40.0) Gecko/20100101 Firefox/40.0"
},
{
"userAgent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0"
},
{
"userAgent": "Bloglines/3.1 (http://www.bloglines.com)"
},
{
"userAgent": "SonyEricssonK810i/R1KG Browser/NetFront/3.3 Profile/MIDP-2.0 Configuration/CLDC-1.1"
},
{
"userAgent": "SonyEricssonT610/R201 Profile/MIDP-1.0 Configuration/CLDC-1.0"
},
{
"userAgent": "Mozilla/5.0 (Linux; U; Android 1.5; de-de; Galaxy Build/CUPCAKE) AppleWebKit/528.5 (KHTML, like Gecko) Version/3.1.2 Mobile Safari/525.20.1"
},
{
"userAgent": "Baiduspider ( http://www.baidu.com/search/spider.htm)"
},
{
"userAgent": "Mozilla/5.0 (Windows NT 6.2; ARM; Trident/7.0; Touch; rv:11.0; WPDesktop; NOKIA; Lumia 920) like Geckoo"
},
{
"userAgent": "Mozilla/5.0 (Linux; U; Android 1.5; de-de; Galaxy Build/CUPCAKE) AppleWebKit/528.5 (KHTML, like Gecko) Version/3.1.2 Mobile Safari/525.20.1"
},
{
"userAgent": "SEC-SGHX210/1.0 UP.Link/6.3.1.13.0"
},
{
"userAgent": "SEC-SGHX210/1.0 UP.Link/6.3.1.13.0"
},
{
"userAgent": "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/534.14 (KHTML, like Gecko) Chrome/9.0.601.0 Safari/534.14"
},
{
"userAgent": "Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko"
},
{
"userAgent": "Gaisbot/3.0 (robot#gais.cs.ccu.edu.tw; http://gais.cs.ccu.edu.tw/robot.php)"
},
{
"userAgent": "Mozilla/5.0 (Maemo; Linux armv7l; rv:10.0.1) Gecko/20100101 Firefox/10.0.1 Fennec/10.0.1"
},
{
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:47.0) Gecko/20100101 Firefox/47.0"
},
{
"userAgent": "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3 like Mac OS X; de-de) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8F190"
}
]
]
This is just an example, but the pattern is common. Multiple UA's from the same IP over a period of time, and the IP is almost always tied to a common consumer ISP. Any thoughts on this?
Let me know if you can think of something else I could log that would also be useful. Thanks!

But then why wouldn't the spammer just use something super common like the current Safari iPhone UA?
Spammers usually use specific tools like Xrumer, which allows automatic change useragent, email registration, solve Captcha etc.
Anti-spam efficiency is based on resource consumption. A moderator must waste a few seconds to remove spam, but a spammer must waste a few minutes to made his durty things.
Therefore, it is necessary to deprive the spammer of the opportunity to automate his process.
Use a serious captcha - reCaptcha, hCaptcha, etc.
Close the ability to post without registration.
Prohibit the use of automatic mail services such as mailforspam.com for registration.
If we are dealing with a bot and not a person, invisible fields are added to the registration form, which the person will not fill, but the bot will see these in the HTML code and fill in.
Replacing the Submit button with the corresponding image. <input type = 'submit' value='Post'> remains in the HTML code, but the form is not submitted by it. Submitting the form is done by clicking on the picture, which the robot does not see, but the person sees.
There are many tricks, but it all depends on the capabilities of the forum engine.
To begin with, it would be nice to determine whether a person is spamming or a robot.

Related

symfony 3.4 - vuejs, logged user data under vuejs unavailable

Like in the title. I have a problem with recieving logged user data under vuejs.
I use
- FOS User - to login
- Fos Rest - to api
- Jms Serializer
This is my function to take data from database
public function getUser()
{
$userId = $this->container->get('security.token_storage')->getToken()->getUser('id');
return $this->repository->FindOneBy(['id' => $userId]);
}
Now, when it is in form like above, console.log return an empty object, in vuejs. However, when I change $userId to 5 for example -
$this->repository->FindOneBy(['id' => 5]);
object is available with data.
now. I checked api addres in both cases - works. i also return a dump in both cases. everything in both cases is identical.
this is my log
when $userId
127.0.0.1 - - [07/Apr/2018:03:55:24 +0200] "GET /ekopanel2/web/app_dev.php/api/v1/greenker/user HTTP/1.1" 204 380 "http://localhost:8080/greenker" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"
and this is when 5
127.0.0.1 - - [07/Apr/2018:03:55:34 +0200] "GET /ekopanel2/web/app_dev.php/api/v1/greenker/user HTTP/1.1" 200 895 "http://localhost:8080/greenker" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0"
i noticed that status code is different, when 5 it is 200 and working, when $userId, status code is 204, so it looks like it gets empty data.
Can you help please?
Let's suppose you are logged in.
If not, $this->container->get('security.token_storage')->getToken()->getUser() would return the string "anon."
Have you tried with this :
$userId = $this->container->get('security.token_storage')->getToken()->getUser()->getId();
Be carefull to check if the user is logged in. Otherwise, it will throw an exception like "Call to a member function getId() on string".

Pig - Chararray data type and Int data types are converted to Chararray implicitly by Pig when REGEX_EXTRACT_ALL

I have a sample web log data which has IP, date in dd/mmm/yyyy format, url and other details that are generated in a web log. I am trying to separate the web log into fields - IP, date and url. Below are the scripts that I have created in PIG:
A = Load 'weblogs_rebuild sample.txt' using TextLoader() as Log:chararray;
B = foreach A generate flatten (REGEX_EXTRACT_ALL(Log, '([\\S]+)[\\s+-]+[\\[]+([\\d]+)[/]+([\\w]+)[/]+([\\d]+)(.*)[\\]]+[\\s+]+[\\"]+([\\w\\s+/\\d.]+)[\\"]+[\\s+]+(.*)')) as (field1:chararray,date:int,month:chararray,year:int,timefield:chararray,useraction:chararray,userfiled:chararray);
when I press enter after creating relation B, it gives me warning
org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning
NO_LOAD_FUNCTION_FOR_CASTING_BYTEARRAY 14 time(s)
If I press run the relation B script again and press enter, the warning message shows
org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning
NO_LOAD_FUNCTION_FOR_CASTING_BYTEARRAY 21 time(s)
When I do describe B, it shows me all fields in bytearray.
B: {field1: bytearray,date: bytearray,month: bytearray,year: bytearray,timefield: bytearray,useraction: bytearray,userfiled: bytearray}
I am not getting why the numbers are getting multplied by 7 each times and why the datatypes chararray, int have converted to Bytearray.
This is what I see when I do dump B;
(364.635.03.677,26,Oct,2011,:22:39:30 -0500,GET /feeds/press HTTP/1.1,200 0 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729)")
Sample web log:
323.81.303.680 - - [25/Oct/2011:01:41:00 -0500] "GET /download/download6.zip HTTP/1.1" 200 0 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19"
668.667.44.3 - - [25/Oct/2011:07:38:30 -0500] "GET /download/download3.zip HTTP/1.1" 200 0 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070719 CentOS/1.5.0.12-3.el5.centos Firefox/1.5.0.12"
13.386.648.380 - - [25/Oct/2011:17:06:00 -0500] "GET /download/download6.zip HTTP/1.1" 200 0 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.3; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.2)"
06.670.03.40 - - [26/Oct/2011:13:24:00 -0500] "GET /product/demos/product2 HTTP/1.1" 200 0 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
18.656.618.46 - - [26/Oct/2011:17:15:30 -0500] "GET /download/download4.zip HTTP/1.1" 200 0 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-us) AppleWebKit/531.22.7 (KHTML, like Gecko) Version/4.0.5 Safari/531.22.7"
14.688.663.667 - - [26/Oct/2011:21:02:30 -0500] "GET /news HTTP/1.1" 200 0 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"
13.07.338.684 - - [26/Oct/2011:21:02:30 -0500] "GET /download HTTP/1.1" 200 0 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; OfficeLiveConnector.1.4; OfficeLivePatch.1.3)"
14.688.663.667 - - [26/Oct/2011:21:02:30 -0500] "GET /news HTTP/1.1" 200 0 "/news" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"

camel netty4http and camel rest dsl: Get remote address

I'm looking for a way to get the ip address with camel rest dsl and the Netty4 Http component.
I checked on the documentation, I've put a breakpoint on my rest and checked on the headers, the properties,...everywhere, and couldn't find a proper way get this information.
Headers log:
GET: http://localhost:8080/category,
{Accept=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8, Accept-Encoding=gzip, deflate, sdch, Accept-Language=fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4, breadcrumbId=ID-nateriver-54582-1445489005229-0-1, CamelCATEGORY_ACTION=listAction, CamelHttpMethod=GET, CamelHttpPath=, CamelHttpUri=/category, CamelHttpUrl=http://localhost:8080/category, CamelJmsDeliveryMode=2, Connection=keep-alive, Content-Length=0, Cookie=JSESSIONID=fowfzar8n09e16ej9jui6nmsv, Host=localhost:8080, JMSCorrelationID=null, JMSDeliveryMode=2, JMSDestination=topic://Statistics, JMSExpiration=0, JMSMessageID=ID:nateriver-54592-1445489009836-3:1:7:1:1, JMSPriority=4, JMSRedelivered=false, JMSReplyTo=null, JMSTimestamp=1445489017233, JMSType=null, JMSXGroupID=null, JMSXUserID=null, Upgrade-Insecure-Requests=1, User-Agent=Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36}
You should get two headers populated:
CamelNettyLocalAddress and CamelNettyRemoteAddress.
See here where the debug log of netty-http shows this clearly.
http://camel.465427.n5.nabble.com/How-to-create-case-insensitive-URI-route-with-netty4-http-td5766517.html#a5766558

Prevent connection close with JSoup

I am not much knowledgeable with it comes to networking (i.e. http) or JSoup. I am using JSoup to get meta tag contents from a url. I am getting the error
Connection closed unexpectedly by server at URL: http://blahblah
Here is my code
Document doc = Jsoup.connect(url).get();
Elements metas = doc.getElementsByTag("meta");
...
How do I "configure" JSoup to just grab the content of the webpage, close the connection, and then proceed to parse the content obtained? I am asking the question like this because I imagine the closing of connection is due to it taking too long. Or is it something else? Like the server knows it's not a human caller or such? Say the site is cnn or whatever and I am trying to parse a news article for meta-tag contents. And no I am not crawling: I am given a url and I am sifting through that one page.
May be You have to send some header data as below.
Please try it.
Document doc = Jsoup
.connect(url.trim())
.timeout(3000)
.header("Host", "someip")
.header("Connection", "keep-alive")
.header("Content-Length", "111")
.header("Cache-Control", "max-age=0")
.header("Accept",
"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
.header("User-Agent",
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36")
.header("Content-Type", "application/x-www-form-urlencoded")
.header("Referer", url.trim())
.header("Accept-Encoding", "gzip,deflate,sdch")
.header("Accept-Language", "en-US,en;q=0.8,ru;q=0.6")
.userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36")
.get();
I have absolutely no idea why, but the problem stops when I do
Connection connection = Jsoup.connect(url);
Document doc = connection.get();
Elements metas = doc.getElementsByTag("meta");
...
Instead of
Document doc = Jsoup.connect(url).get();
Elements metas = doc.getElementsByTag("meta");
...
It makes completely no sense to me. But it is what it is. I have heard of "constructors escaping", which is what lead me to do the separation. And while this is probably not the same thing, but some similar type of voodoo may be happening under the hood that I just don't understand.

Protractor - Setting DPI with chromeOptions

I'm writing functional tests with protractor and am trying to emulate a mobile device using chrome on a desktop. I've successfully set the user agent with:
config.capabilities = {
browserName: 'chrome',
chromeOptions: {
args: ['--user-agent="Mozilla/5.0 (iPad; CPU OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53"']
}
};
Now, I'd like to change the DPI to reflect mobile devices (usually with a DPI of 2). I've explored other args and came across --ash-host-window-bounds (http://peter.sh/experiments/chromium-command-line-switches/#ash-host-window-bounds). I've tried:
chromeOptions: { args: ['--ash-host-window-bounds="1024x768*2"'] }
It seems like the argument doesn't work with protractor, as it does not affect the window dimensions when removing the DPI argument.
How could I set the DPI? Or, how could I enable Chrome Mobile Emulation via chromeOptions?
capabilities: {
"browserName": 'chrome',
chromeOptions: {
args: ['--user-agent="Mozilla/5.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/600.1.3 (KHTML, like Gecko) Version/8.0 Mobile/12A4345d Safari/600.1.4"'],
mobileEmulation: {
"device": 'Apple iPhone 6 Plus',
"deviceMetrics": {
"width": 414,
"height": 736,
"pixelRatio": 3.0
}
}
}
},

Resources