Monitor hosting environment for spam keywords - spam-prevention

What environment-independent tools are available to detect new spam blogs or comments appearing on a hosting infrastructure?
As an occasional hosting provider, we want to watch for new blogs or comments which appear spammy, but avoid relying on plugins or modules in the CMS environment (because these are easy to circumvent, or expose only to Google).
A (pseudo) example would be to set up Google Alert for "viagra ip:10.0.0.1", where 10.0.0.1 is the front-facing IP of the servers. (Google doesn't offer such an advanced operator term though ...)
Seems I'm looking for a combination of Nagios + Google Alerts + ( ??? ) ... what does fill this space?

I would setup an hourly cronjob that wgets the entire website then greps the resulting files for whatever spam strings you're looking for an alerts on a hit. Let me know if you'd like me to hack up a quick example, or if thats not the direction you were thinking.

Related

How can I tell if I'm including google analytics twice?

I have a web app and I include google analytics. My active users seems to of spiked and I'm incredibly paranoid that I'm somehow double counting my analytics.
Is there any way to see if I'm doing this?
As Nebojsa mentioned, you can inspect source and search for ga.js or analytics.js to see if it's in your application twice.
Look through your source code to see if you have the partial rendering in multiple places (ex. header and footer)
Setup another Google Analytics account and test locally if its double counting your visits. See this post for setting up GA on localhost
Use the Google Analytics Tag Assistant to verify that everything is setup correctly. It will tell you if there are any implementation problems, including multiple tracking codes. It also helps with Adwords, re-marketing and other Google product scripts.
Use the Google Analytics Debugger. This would probably be the most helpful to determine if a single hit is being double counted as it walks you though every single function call the analytics urchin makes.
just open source in the browser and look-up for code of analitics...par example
_gaq.push(['_setAccount', ...

why i couldn't see any text in "http://crawlservice.appspot.com/?key=123456&url=http://mydomain.com#!article"?

Ok, i found this link https://code.google.com/p/gwt-platform/wiki/CrawlerSupport#Using_gwtp-crawler-service that explain how you can make your GWTP app crawlable.
I got some GWTP experience, but i know nothing about AppEngine.
Google said its "crawlservice.appspot.com" can parse any Ajax page. Now I have a page "http://mydomain.com#!article" that has an artice that was pulled from Database. Say that page has the text "this is my article". Now I open this link:
crawlservice.appspot.com/?key=123456&url=http://mydomain.com#!article, then i can see all javascript but I couldn't find the text "this is my article".
Why?
Now let check with a real life example
open this link https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k & you will see the text "If i open that url in IE"
Now you open http://crawlservice.appspot.com/?key=123456&url=https://groups.google.com/forum/#!topic/google-web-toolkit/Syi04ArKl4k you can see all javascript but there is no text "If i open that url in IE",
Why is it?
SO if i use http://crawlservice.appspot.com/?key=123456&url=mydomain#!article then Can google crawler be able to see the text in mydomain#!article?
also why the key=123456, it means everyone can use this service? do we have our own key? does google limit the number of calls to their service?
Could you explain all these things?
Extra Info:
Christopher suggested me to use this example
https://github.com/ArcBees/GWTP-Samples/tree/master/gwtp-samples/gwtp-sample-crawler-service
However, I ran into other problem. My app is a pure GWTP, it doesn't have appengine-web.xml in WEB-INF. I have no idea what is appengine or GAE mean or what is Maven.
DO i need to register AppEngine?
My Appp may have a lot of traffic. Also I am using Godaddy VPS. I don't want to register App Engine since I have to pay for Google for extra traffic.
Everything in my GWTP App is ok right now except Crawler Function.
So if I don't use Google App Engine, then how can i build Crawler Function for GWTP?
I tried to use HTMLUnit for my app, but HTMLUnit doesn't work for GWTP (See details in here Why HTMLUnit always shows the HostPage no matter what url I type in (Crawlable GWT APP)? )
I believe you are not allowed to crawl Google Groups. Probably they are actively trying to prevent this, so you do not see the expected content.
There's a couple points I wish to elaborate on:
The Google Code documentation is no longer maintained. You should look on Github instead: https://github.com/ArcBees/GWTP/wiki/Crawler-Support
You shouldn't use http://crawlservice.appspot.com. This isn't a Google service, it's out of date and we may decide to delete it down the road. This only serves as a public example. You should create your own application on App Engine (https://appengine.google.com/)
There is a sample here (https://github.com/ArcBees/GWTP-Samples/tree/master/gwtp-samples/gwtp-sample-crawler-service) using GWTP's Crawler Service. You can basically copy-paste it. Just make sure you update the <application> tag in appengine-web.xml to the name of your application and use your own service key in CrawlerModule.
Finally, if your client uses GWTP and you followed the documentation, it will work. If you want to try it manually, you must encode the Query Parameters.
For example http://crawlservice.appspot.com/?key=123456&url=http://www.arcbees.com#!service will not work because the hash (everything including and after #) is not sent to the server.
On the other hand http://crawlservice.appspot.com/?key=123456&url=http%3A%2F%2Fwww.arcbees.com%2F%23!service will work.

How can i get a connection ticket to quickbooks online for dev/testing a one off web app?

I'm trying to build a one off webapp that will communicate with our business QuickBooks Online account. Most of the research I've found says for development and testing, use the "desktop" way of connecting (to avoid cert headaches). So I've registered an app and got the id from here:
appreg.intuit.com/
The problem is, everywhere I've seen tell me then to go here to get a connection ticket
login.ptc.quickbooks.com/j/qbn/sdkapp/confirm?serviceid=2004&appid=YOUR-APPLICATION-ID-HERE
When I do that however, it redirects me to here
qbo.intuit.com/c1/v60.147/0/login?redirect=true
I mucked around with and finally find some way to access the "old" quick books stuff, when i went to setup a test company it sent me here
ptc.qbo.intuit.com/c1/v0/offline.shtml
which tells me
"The QBO PTC environment is no longer available, and was replaced by the new E2E environment. Please go to https://e2e.qbo.intuit.com/ going forward. We are very sorry for any inconvenience and confusion. If you have questions, please reach out to Eric Bullen"
How can I get a connection to a test account to do some development?
How can I get a connection to a test account to do some development?
Unless you're an Intuit employee, you can't get a test account.
With that said, you don't need to.
Just use your live QuickBooks Online account. You can sign up on their website to get a 30-day free trial if you need to.
When you register at https://appreg.intuit.com new link - http://developer.intuit.com/Application/Create/QBOE, make sure you register in DESKTOP mode, for the PRODUCTION environment.
Then use the production URL to get your connection ticket:
https://login.quickbooks.com/j/qbn/sdkapp/confirm?appid=YOUR-APPLICATION-ID-HERE&serviceid=2004&appdata=1
(substitute in your application ID that you get back from appreg)
More info and examples on our QuickBooks integration wiki.

Malware on D7 website - Blacklisted by Google

I want to tell you about the malware attack to my Drupal website. Not just for your suggestions but also to create something helpful to anybody tha could suffer for the same problems. Well...
INITIAL SETUP
Drupal 7.9
Activated modules:
CORE: Block, Contextual links, Database logging, Field, Field SQL storage, Field UI, File, Filter, Image, List, Locale, Menu, Node, Number, Options, Overlay, Path, PHP Filter, RDF, System, Taxonomy, Text, Toolbar, User
CCK: Multiselectd
CHAOS TOOL SUITE: Chaos tools
DATA/ORA: Calendar, Date, Date API, Date Popup, Date views
FIELDS: Email, Field permission, Link
OTHER: Google Plus One +1, Pathauto, Token, Weight
SHARING: Share this, Share this block
TAXONOMY MENU: Taxonomy menu
VIEWS: Views, Views PDF Display, Views PHP, Views UI
OTHER MODULES THAT I REMOVED: CKEDITOR, VIEWS_SLIDESHOW, IMCE, DOMPDF, PRINT, WYSIWIG
MY SETUP ERRORS
In order to satisfy the custome, I modified some of the modules and I've never update them (AUCH!)
The customer was in posses of the login data, and maybe his computer wasn't safe (MMM...)
I didn't have a copy of the webiste, because I trusted on the provider weekly backup (DOH!)
ATTACK EXTERNAL SYMPTOMS
All the link of the homepage redirected to a malware website
Google blacklisted the website
Critical alert on the Google Webmaster Tools panel
FTP SYMPTOMS
Lots of "strange" files: mainma3.php (I found this one in every folder!), functoins.php, sum75.html, wlc.html, aol.zip, chase.zip, chaseverification.zip, 501830549263.php, wp-conf.php and a dozen of wtmXXXXn.php (dove X = numero) in the root folder. All these files was plenty of malicious functions (unescape, base64_decode, eval, etc.)
Install.php was modified with a long line of malicious code
To EVERY javascript files was appended this line of code:
;document.write('');
The weekly backup was also infeceted
Dozen of repeated "strange" request, found on the Drupal log panel (my domain is obscured with the string "-----"):
index.php?q=ckeditor/xss > Notice: Undefined offset: 5 in eval() (linea 29 di /web/htdocs/-----/home/modules/php/php.module(74) : eval()'d code(1) : eval()'d code).
-----/user?destination=node/add > Failed login by shadowke
calendar/week/2012-W19?year=2011&mini=2012-12 > page not found
misc/]};P.optgroup=P.option;P.tbody=P.tfoot=P.colgroup=P.caption=P.thead;P.th=P.td;if(!c.support.htmlSerialize)P._default=[1, > page not found
misc/)h.html(f?c( > page not found
mail.htm > page not found
RECOVER [Thank to this article: http://25yearsofprogramming.com/blog/20070705.htm]
I've put the website on Maintanance mode (error503.php + .htaccess). Traffic open just for my IP Address
[see this useful guide: http://25yearsofprogramming.com/blog/20070704.htm]
I've downloaded the whole website in local
I've searched and removed the strange files > I found forty of them
I've searched the files for these worlds [with the freeware AGENT RANSACK]: eval(base64_decode($POST["php"])), eval(, eval (, base64, document.write, iframe, unescape, var div_colors, var _0x, CoreLibrariesHandler, pingnow, serchbot, km0ae9gr6m, c3284d, upd.php, timthumb. > I've acted in one of the follow ways: a) I've replaced eval with php_eval() (the eval safe version of drupal); b) I've wrote down the suspected modules; c) I've compared the code with the fresh downloaded module; d) I've removed all the malicious code (see the javascript mentioned above)
I've searched for mohanges in the file system [with the freeware WINMERGE]
I've identifyed some suspected modules, thank to the list written at the point 4 above, and thank to some researches on Google (name_of_the_module security issue, name_of_the_module hacked, etc...) and on Secunia [http://secunia.com/community/advisories/search]
I've scan my computer (Avast, Search&Destroy, Malwarebytes Antimalware) > I didn't found any virus or spyware
I've changed all the logins (ftp, cpanel, drupal admin panel)
I've reloaded the whole website
I've removed all the suspected modules: CKEDITOR, VIEWS_SLIDEWHOW, PRINT, DOMPDF, IMCE, CAPTCHA, WYSIWIG, WEBFORM.
I've tell the whole story to the provider assistance
I request Google for a revision (they did it in 12 hours)
DRUPAL LOG NOW
dozen of these messages
- wtm4698n.php?showimg=1&cookies=1 > page not found
- fhd42i3d.html > page not found
- wp-conf.php?t2471n=1 > page not found
- -----/user?destination=node/add > Failed login by Elovogue
LESSONS LEARNED
Never touch the modules, so you can update them
Keep all the login in a safe computer / Use a safe computer to work on the FTP
Search for any security issue before installing a module
Keep a clean copy of the website somewhere
MY QUESTIONS:
What kind of attack I've received?
There are other unsure module in my installation?
What can I do yet?
Thanks to everybody for your patience!
If you are using m$ windows, I think it is a trojan/virus that steals your ftp passwords and automatically editing files. I know many such stories.
Switch to WinSCP.net.

Options for Filtering Data in real time - Will a rule engine based approach work?

I'm looking for options/alternative to achieve the following.
I want to connect to several data sources (e.g., Google Places, Flickr, Twitter ...) using their APIs. Once I get some data back I want to apply my "user-defined dynamic filters" (defined at runtime) on the fetched data.
Example Filters
Show me only restaurants that have a ratting more than 4 AND have more than 100 ratings.
Show all tweets that are X miles from location A and Y miles from location B
Is it possible to use a rule engine (esp. Drools) to do such filtering ? Does it make sense ?
My proposed architecture is mobile devices connecting to my own server and this server then dispatching requests to the external world and doing all the heavy work (mainly filtering) of data based on user preferences.
Any suggestions/pointers/alternatives would be appreciated.
Thanks.
Yes, Drools Fusion allows you to easily deal with this kind of scenario. Here is a very simple example application that plays around with twitter messages using the twitter4j API:
https://github.com/droolsjbpm/droolsjbpm-contributed-experiments/tree/master/twittercbr
Please note that there is an online and an offline version in that example. To run the online version you need to get access tokens on the twitter home page and configure them in the configuration file:
https://github.com/droolsjbpm/droolsjbpm-contributed-experiments/blob/master/twittercbr/src/main/resources/twitter4j.properties
check the twitter4j documentation for details.

Resources