Is there another way to do screen scraping apart from regular expressions? - screen-scraping

I'm doing a personal, just for fun, project that is using screen scraping to give me a System Tray notification in case another line on an HTML table is added, modified or deleted.
Having done this before I thought: well let's go with the regular expression thing and that's it, but being a curious person, made me think that there could be something else out there that could have another paradigm but be as simple to use.
I know about DOM and X-Path and all the xml'ish approaches. I'm looking for something outside the box, something that can even be defined in a set of rules so you can make a plugin system to aggregate various sites.

See Options for HTML Scraping

Here's an idea: assuming your main use case is getting a notification whenever an HTML file changes, why not use a standard diff tool and then loop through the changed lines, applying your rules?
Also, if this is a situation where you have access to the server and the files you're watching, you might be able to put everything under source control with CVS (or similar) and just watch for commits. If you want to use this approach for random sites on the web, just write a script that periodically downloads the html for the appropriate URLs and then commits it to source control and watch the diffs.
Not very practical, but outside the box.

If you can convert the source into valid XHTML/XML using something like SgmlReader or HtmlTidy then you could use XSLT. Simply create a XSL template for each site you wish to scrape.

Related

Howto programmatically create a PDF from a predesigned template made in InDesign

The goal is to design beautiful templates in InDesign, which are then being used to programmatically generate printable PDFs within a special application connected to a database, so I can fill data from the database into the templates.
I have no idea how to approach this. I found a lot of HTML to PDF conversion related info, but that approach has its limitations.
Did anybody face the same question and might point me in the right direction?
Yes, the scenario you described can be fully handled by InDesign Scripting using ExtendScript. I have done this in the past several times and it works quite well. The key in my opinion is to have a designer prepare the file for you as finished as possible and make good use of the built in InDesign automations. That means they will do the layout, but also set up all the paragraph styles, character styles, object styles and possibly grep styles as well as master spreads for each different page.
Then the job of the script that you run will mostly be to fill in the contents and to assign the mentioned styles and master spread as needed. If everything is set up properly, most of the layout should fall into place automatically.
Also, contrary to the comments to your question, I don't think you need InDesign server for that. Especially if you run everything locally anyways.

Hiding the word "joomla" from a script in contact form

Whenever i create a contact form in my Joomla! 3.3.6, some script appears in the the page's HTML code that contains many words Joomla in it. I'd like to change those Joomla words and replace them with another words (i.e. Foo) for some security issue. I'd like to know whether or not i'm able to do so and how.
That script is:
<script>(function(){var strings={"JLIB_FORM_FIELD_INVALID":"\u0641\u06cc\u0644\u062f \u0646\u0627\u0645\u0639\u062a\u0628\u0631:&#160"};if(typeof Joomla=='undefined'){Joomla={};Joomla.JText=strings;}
else{Joomla.JText.load(strings);}})();</script>
I have no idea whether a plugin or an extension creates it or not.
Thank you
Regards
This script seems to be translating some text required for the form to use in its javascript, eg validation messages. It does this using a javascript version of JText, which is part of core Joomla. There is some info on how that works here. Weirdly, there seems to be little information in the official Joomla documentation about it.
The main JText function it is calling appears here: media/system/js/core.js
I'm sure it would be possible to write a plug-in to remove this script before the page is rendered and then to translate any untranslated text with your own scripts. However, I'm not sure I see any security benefit in doing this so it seems a waste of time.
Ultimately, someone sniffing a site for what it is built in is far more likely to see if core files exist by going direct to places like media/system/js/core.js, rather than to scan the code for the word "Joomla" - which would trigger a lot of false-positives (any site which just mentions Joomla) and negatives (any page which doesn't have a form on it). It also does not reveal the version of Joomla, which is the info a hacker would more likely be after.
I think you have to search for the script (i.e via Notepad++) in the whole directory. It must be a plugin for the contact form that has some inline script in it.
also do you use any special third party plugin or so? that might be the source of it.
PS: also i had some similar experience, i don't know exactly how i got rid of those words, but like you, i wanted to do that to hide the fact that i'm using joomla for security.
Its actually Joomla who add this, from the file: Joomlainstall/libraries/joomla/document/html/renderer/head.php
And load it globaly from:
Joomlainstall/libraries/cms/html/formbehavior.php
The developer ad that code by using the function, JText, for an example:
JText::_( 'COM_CONTACT_EMAIL_FORM' )
In my case it was the plugin ContactUs Form who add the javascript. If JText is not used, it is not loaded. If I disabled the plugin, the javascript was then not loaded. If you have that plugin enabled, my be try an other contact form?
For security reson it is bad programming by the developer off Joomla, for sure.

Should we create custom pages for all objects?

I noticed that salesforce doesn't allow to override control function for all objects.
Say if you want to do something whenever objects get saved there is no way to attach the action
unless you create a custom page and include either standard controller or extension. Or if you want
to add the same meta-tag on all pages I run into this limitation. Is there better way to do this?
Generally - no. Roughly speaking if Salesforce doesn't allow you to do something it usually means there's pretty good hint you're doing in it wrong. I realize it sounds like I'm a fanboy but in reality - can you expand your question with concrete example why would you want to do something like that? For example governor limits are evil, annoying etc. - but they force you to write effective code that doesn't strain the database too much.
if you want to do something whenever objects get saved
That's what triggers are for. Ask yourself a question if the "action" you need to make should happen only from web UI or also when performed from API (mass data load, a smartphone application etc).
if you want to add the same meta-tag on all pages
You could maybe pull off similar result by adding a component to the sidebar. It won't cover all cases (like accessing Reports/Dashboards) but it's hard to say more without knowing what you're really after. Then again - custom VF page overrides won't help you when it comes to Reports either.
I wanted to add this as a comment, but was unable to.
Anyways, For the example that you mentioned in the comment, You can add that jQuery plugin in the Home page side bar component and activate the plugin only on those custom objects where you wnat to run this plugin. You might already know that we can deduce which object a record belongs to by looking at the 1st 3 letter of the record Id, using this logic, check if the record belongs to the custom object you want your plugin to act on and run the plugin.
But As eyescream has pointed out adding script in side bar has its own limitations: you cannot use the global variables , side bar components are not loaded on the reports and dashboard tabs etc.
-ಸಮಿರ್

Write blog in drupal - the best way to insert text to content type

I want to know what is the best and common way to write SDK blog in drupal.
Write the blog directly to new content type in drupal.
Write and design the blogs in word document (or something similar), and later, with process (module that I will write) load all the docs into content type.
Other.
Your best bet is to enter your blog entries directly into Drupal -- like you said into whatever content type you're using for your blogs! If that's an option you (or whoever the blog author(s) may be) are willing to do, better get used to it at the beginning.
There are several problems you may run into if you're choosing to create your content outside of Drupal and pasting it in after the fact. At the minimum I can now think of the following two:
Folks can run into formatting issues when Word styles are copied and pasted into the Drupal editor of your choice (even though there are 'Paste from Word' options for some of the editors). Especially if there are multiple authors and some adhere to proper copying and pasting, while others don't, you can end up with posts that have different font sizes, font styles, etc and that can make the site look less professional. I'd suggest always creating the content within the Drupal editor, regardless of the scenario. However, that is particularly true when there are multiple authors creating content outside the Drupal editor (for example in Word) and pasting it in after the fact. With multiple folks, training and adherence to best practices can become more of an issue. To see formatting problems, give it a test yourself: create some fancy, styled content in Word (not just plain paragraphs or it won't really be that interesting) and paste it into CKEditor or whatever else you're using. Then inspect the html code that results using the 'Source' button on the editor. Don't use 'Paste from Word' option that usually helps.
If you have images in your posts, you won't be able to just copy and paste them into the Drupal editor from Word... you'd still have to upload the image and insert it into the post the Drupal way
So, rather than creating the content once, copying and pasting it and potentially having to do a ton of clean-up after the fact, just get used to creating it in Drupal!
Then -- as your blog grows and you potentially add to it (additional fields, additional features), you'll be able to more fully take advantage of your Drupal environment if your workflow doesn't include an outside program like Word in it!!

dynamic content on pages

I'm trying to migrate all "content" pages on a website to DNN5 Pro.
So I just created all the pages in DNN, added HTML module to the ContentPane and copied and pasted the HTML content from old pages.
The problem is that most of the pages have bits of classic asp code which do some minor server-side tasks - for example, populate tables with prices fetched from DB, pre-select user's country based on his IP address, do some basic dates calculations, etc.
Obviously, this code won't work in DNN.
If I had to migrate to PHP, I'd just rewrite these bits of code from classic ASP to PHP, then assign values from PHP to smarty and then would use them in templates.
But as DNN has a completely different architecture, I can't see how similar approach can be used.
Token replacement feature in HTML module looks like what I need, but it allows to "map" only tokens provided by DNN.
So, maybe anyone had similar issue with DNN and/or knows how this should be done.
It seems like you are attempting to subvert the entire point of DotNetNuke. While certainly there are a variety of hacky ways you could try to make this work just like the php site, it's a terrible idea to do so.
Instead, you need to evaluate each of the dynamic sections of the php site and find or create a DotNetNuke module that will replicate that functionality.
To make the initial conversion quicker, you can build modules you create using simple ASP style inline scripting but you should definitely use existing modules for things like displaying data in a grid.
You could write code directly in your skin file. Do some logic like:
<%If PortalSettings.ActiveTab.TabID = 33 Then
//code here of what you want
End If%>
Where 33 is the page id for the page you want to run custom code on. There are other ways to do this like creating skin objects, or creating custom modules, but this is probably the easiest thing to do. Just write code directly in you skin. Or make a copy of your skin for each page you need to do custom code for... again, more elegant ways but this will get'er done.

Resources