Saving code in database, what are pitfall I should be careful about - database

I am designing a system which takes user submitted code and saves it in database. Code can be in any language, ruby, python, elixir, javascript, etc. There's no restriction on language. Code saved in database is never meant to be run. It will be displayed in blog article or converted into file for download. Similar example might be GitHub gist or Cacher, both takes user submitted code and displays on website.
How do I make sure User submitted code is sanitised and secure to be displayed on webpage with code highlighter?
What processing do I need to do on code such that I can safely display it? I don't want to impose strict restrictions on users.
Any gotcha I need to be aware?
Any idea how those website implement this feature?
I am using Elixir and Phoenix framework. Is there any pitfalls I should be careful about? I am thinking of using Phoenix.HTML module to escape codes. I just wanna be sure that my approach doesn't have known loop holes.

I think you are looking for this https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet

Related

Hiding the word "joomla" from a script in contact form

Whenever i create a contact form in my Joomla! 3.3.6, some script appears in the the page's HTML code that contains many words Joomla in it. I'd like to change those Joomla words and replace them with another words (i.e. Foo) for some security issue. I'd like to know whether or not i'm able to do so and how.
That script is:
<script>(function(){var strings={"JLIB_FORM_FIELD_INVALID":"\u0641\u06cc\u0644\u062f \u0646\u0627\u0645\u0639\u062a\u0628\u0631:&#160"};if(typeof Joomla=='undefined'){Joomla={};Joomla.JText=strings;}
else{Joomla.JText.load(strings);}})();</script>
I have no idea whether a plugin or an extension creates it or not.
Thank you
Regards
This script seems to be translating some text required for the form to use in its javascript, eg validation messages. It does this using a javascript version of JText, which is part of core Joomla. There is some info on how that works here. Weirdly, there seems to be little information in the official Joomla documentation about it.
The main JText function it is calling appears here: media/system/js/core.js
I'm sure it would be possible to write a plug-in to remove this script before the page is rendered and then to translate any untranslated text with your own scripts. However, I'm not sure I see any security benefit in doing this so it seems a waste of time.
Ultimately, someone sniffing a site for what it is built in is far more likely to see if core files exist by going direct to places like media/system/js/core.js, rather than to scan the code for the word "Joomla" - which would trigger a lot of false-positives (any site which just mentions Joomla) and negatives (any page which doesn't have a form on it). It also does not reveal the version of Joomla, which is the info a hacker would more likely be after.
I think you have to search for the script (i.e via Notepad++) in the whole directory. It must be a plugin for the contact form that has some inline script in it.
also do you use any special third party plugin or so? that might be the source of it.
PS: also i had some similar experience, i don't know exactly how i got rid of those words, but like you, i wanted to do that to hide the fact that i'm using joomla for security.
Its actually Joomla who add this, from the file: Joomlainstall/libraries/joomla/document/html/renderer/head.php
And load it globaly from:
Joomlainstall/libraries/cms/html/formbehavior.php
The developer ad that code by using the function, JText, for an example:
JText::_( 'COM_CONTACT_EMAIL_FORM' )
In my case it was the plugin ContactUs Form who add the javascript. If JText is not used, it is not loaded. If I disabled the plugin, the javascript was then not loaded. If you have that plugin enabled, my be try an other contact form?
For security reson it is bad programming by the developer off Joomla, for sure.

CakePHP Beginner: Advice needed, Everything on a single view or multi part forms

Thanks in advance for any help offered and patience for my current web-coding experience.
Background:
I'm currently attempting to develop an web based application for my family's business. There is a current version of this system I have developed in C#, however I want to get the system web-based and in the process learn cakephp and the MVC pattern.
Current problem:
I'm currently stuck in a controller that's supposed to take care of a PurchaseTicket. This ticket will have an associated customer, line items, totals etc. I've been trying to develop a basic 'add()' function to the controller however I'm having trouble with the following:
I'm creating a view with everything on it: a button for searching customer, a button to add line items, and a save button. Since I'm used to developing desktop applications, I'm thinking that I might be trying to transfer the same logic to web-based. Is this something that would be recommended or do'able?
I'm running into basic problems like 'searching customer'. From the New Ticket page I'm redirecting to the customer controller, searching and then putting result in session variable or posting it back, but as I continue my process with the rest of the required information, I'm ending up with a bit of "spaghetti" code. Should I do a multi part form? If I do I break the visual design of the application.
Right now I ended up instantiating my PurchaseTicket model and putting it in a session variable. I did this to save intermediate data however I'm not sure if instantiating a Model is conforming to cakephp standards or MVC pattern.
I apologize for the length, this is my first post as a member.
Thanks!
Welcome to Stack Overflow!
So it sounds like there's a few questions, all with pretty open-ended answers. I don't know if this will end up an answer as such, but it's more information than I could put in a comment, so here I go:
First and foremost, if you haven't already, I'd recommend doing the CakePHP Blog Tutorial to get familiar with Cake, before diving straight into a conversion of your existing desktop app.
Second, get familiar with CakePHP's bake console. It will save you a LOT of time if you use it to get started on the web version of your app.
I can't stress how important it is to get a decent grasp of MVC and CakePHP on a small project before trying to tackle something substantial.
Third, the UI for web apps is definitely different to desktop apps. In the case of CakePHP, nothing is 'running' permanently on the server. The entire CakePHP framework gets instantiated, and dies, with every single page request to the server. That can be a tricky concept when transitioning from desktop apps, where everything is stored in memory, and instances of objects can exist for as long as you want them to. With desktop apps, it's easier to have a user go and do another task (like searching for a customer), and then send the result back to the calling object, the instance of which will still exist. As you've found out, if you try and mimic this functionality in a web app by storing too much information in sessions, you'll quickly end up with spaghetti code.
You can use AJAX (google it if you don't already know about it) to update parts of a page only, and get a more streamlined UI, which it sounds like something you'll be needing to do. To get a general idea of the possibilities, you might want to take a look at Bamboo Invoice. It's not built with CakePHP, but it's built with CodeIgniter, which is another open source PHP MVC framework. It sounds like Bamboo Invoice has quite a few similar functionalities to what you're describing (an Invoice has line items, totals, a customer, etc), so it might help you to get an idea of how you should structure your interface - and if you want to dig into the source code, how you can achieve some of the things you want to do.
Bamboo Invoice uses Ajax to give the app a feel of 'one view with everything on it', which it sounds like you want.
Fourth, regarding the specific case of your Customer Search situation, storing stuff in a session variable probably isn't the way to go. You may well want to use an autocomplete field, which sends an Ajax request to server after each time a character is entered in the field, and displays the list list of suggestions / matching customers that the server sends back. See an example here: http://jqueryui.com/autocomplete/. Implementing an autocomplete isn't totally straight forward, but there should be plenty of examples and tutorials all over the web.
Lastly, I obviously don't know what your business does, but have you looked into existing software that might work for you, before building your own? There's a lot of great, flexible web-based solutions, at very reasonable prices, for a LOT of the common tasks that businesses have. There might be something that gives you great results for much less time and money than it costs to build your own solution.
Either way, good luck, and enjoy CakePHP!

How to collect data from a website

Preface: I have a broad, college knowledge, of a handful of languages (C++, VB,C#,Java, many web languages), so go with which ever you like.
I want to make an android app that compares numbers, but in order to do that I need a database. I'm a one man team, and the numbers get updated biweekly so I want to grab those numbers off of a wiki that gets updated as well.
So my question is: how can I access information from a website using one of the languages above?
What I understand the problem to be: Some entity generates a data set (i.e. numbers) every other week and you have a need to download that data set for treatment (e.g. sorting).
Ideally, the web site maintaining the wiki would provide a Service, like a RESTful interface, to easily gather the data. If that were the case, I'd go with any language that provides easy manipulation of HTTP request & response, and makes your data manipulation easy. As a previous poster said, Java would work well.
If you are stuck with the wiki page, you have a couple of options. You can parse the HTML your browser receives (Perl comes to mind as a decent language for that). Or you can use tools built for that purpose such as the aforementioned Jsoup.
Your question also mentions some implementation details such as needing a database. Evidently, there isn't enough contextual information for me to know whether that's optimal, so I won't address this aspect of the problem.
http://jsoup.org/ is a great Java tool for accessing content on html pages
Consider https://scraperwiki.com/ - it's a site where users can contribute scrapers. It's free as long as you let your scraper be public. The results of your scraper are exposed as csv and JSON.
If you don't know what a "scraper" is, google "screen scraping" - it's a long and frustrating tradition for coders, who have dealt with the same problem you have since the beginning of networked computing.
You could check out :http://web-harvest.sourceforge.net/
For Python, BeautifulSoup is one of the most tolerant HTML parsers out there. The documentation also lists similar libraries in Ruby and Java, so you'll probably find something relevant there.

Apache module FORM handling in C

I'm implementing an Apache 2.0.x module in C, to interface with an existing product we have. I need to handle FORM data, most likely using POST but I want to handle the GET case as well.
Nick Kew's Apache Modules book has a section on handling form data. It provides code examples for POST and GET, which return an apr_hash_t of the key+value pairs in the form. parse_form_from_POST marshalls the bucket brigade and flattens it into a buffer, while parse_form_from_GET can simply reference the URL. Both routines rely on a parse_form_from_string routine to walk through each delimited field and extract the information into the hash table.
That would be fine, but it seems like there should be an easier way to do this than adding a couple hundred lines of code to my module. Is there an existing module or routines within apache, apr, or apr-util to extract the field names and associated data from a GET or POST FORM into a structure which C code can more easily access? I cannot find anything relevant, but this seems like a common need for which there should be a solution.
I switched to G-WAN which offers a transparent ANSI C scripts interface for GET and POST forms (and many other goodies like charts, GIF I/O, etc.).
A couple of AJAX examples are available at the GWAN developer page
Hope it helps!
While, on it's surface, this may seem common, cgi-style content handlers in C on apache are pretty rare. Most people just use CGI, FastCGI, or the myriad of frameworks such as mod_perl.
Most of the C apache modules that I've written are targeted at modifying the particular behavior of the web server in specific, targeted ways that are applicable to every request.
If it's at all possible to write your handler outside of an apache module, I would encourage you to pursue that strategy.
I have not yet tried any solution, since I found this SO question as a result of my own frustration with the example in the "Apache Modules" book as well. But here's what I've found, so far. I will update this answer when I have researched more.
Luckily it looks like this is now a solved problem in Apache 2.4 using the ap_parse_form_data funciton.
No idea how well this works compared to your example, but here is a much more concise read_post function.
It is also possible that mod_form could be of value.

How do screen scrapers work? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I hear people writing these programs all the time and I know what they do, but how do they actually do it? I'm looking for general concepts.
Technically, screenscraping is any program that grabs the display data of another program and ingests it for it's own use.
Quite often, screenscaping refers to a web client that parses the HTML pages of targeted website to extract formatted data. This is done when a website does not offer an RSS feed or a REST API for accessing the data in a programmatic way.
One example of a library used for this purpose is Hpricot for Ruby, which is one of the better-architected HTML parsers used for screen scraping.
Lots of accurate answers here.
What nobody's said is don't do it!
Screen scraping is what you do when nobody's provided you with a reasonable machine-readable interface. It's hard to write, and brittle.
As an example, consider an RSS aggregator, then consider code that gets the same information by working through a normal human-oriented blog interface. Which one breaks when the blogger decides to change their layout?
Of course, sometimes you have no choice :(
In general a screen scraper is a program that captures output from a server program by mimicing the actions of a person sitting in front of the workstation using a browser or terminal access program. at certain key points the program would interpret the output and then take an action or extract certain amounts of information from the output.
Originally this was done with character/terminal outputs from mainframes for extracting data or updating systems that were archaic or not directly accessible to the end user. in modern terms it usually means parsing the output from an HTTP request to extract data or to take some other action. with the advent of web services this sort of thing should have died away, but not all apps provide a nice api to interact with.
A screen scraper downloads the html page, and pulls out the data interested either by searching for known tokens or parsing it as XML or some such.
In the early days of PC's, screen scrapers would emulate a terminal (e.g. IBM 3270) and pretend to be a user in order to interactively extract, update information on the mainframe. In more recent times, the concept is applied to any application that provides an interface via web pages.
With emergence of SOA, screenscraping is a convenient way in which to services enable applications that aren't. In those cases, the web page scraping is the more common approach taken.
Here's a tiny bit of screen scraping implemented in Javascript, using jQuery (not a common choice, mind you, since scraping is usually a client-server activity):
//Show My SO Reputation Score
var repval = $('span.reputation-score:first'); alert('StackOverflow User "' + repval.prev().attr('href').split('/').pop() + '" has (' + repval.html() + ') Reputation Points.');
If you run Firebug, copy the above code and paste it into the Console and see it in action right here on this Question page.
If SO changes the DOM structure / element class names / URI path conventions, all bets are off and it may not work any longer - that's the usual risk in screen scraping endeavors where there is no contract/understanding between parties (the scraper and the scrapee [yes I just invented a word]).
Technically, screenscraping is any program that grabs the display data of another program and ingests it for it's own use.In the early days of PC's, screen scrapers would emulate a terminal (e.g. IBM 3270) and pretend to be a user in order to interactively extract, update information on the mainframe. In more recent times, the concept is applied to any application that provides an interface via web pages.
With emergence of SOA, screenscraping is a convenient way in which to services enable applications that aren't. In those cases, the web page scraping is the more common approach taken.
Quite often, screenscaping refers to a web client that parses the HTML pages of targeted website to extract formatted data. This is done when a website does not offer an RSS feed or a REST API for accessing the data in a programmatic way.
Typically You have an HTML page that contains some data you want. What you do is you write a program that will fetch that web page and attempt to extract that data. This can be done with XML parsers, but for simple applications I prefer to use regular expressions to match a specific spot in the HTML and extract the necessary data. Sometimes it can be tricky to create a good regular expression, though, because the surrounding HTML appears multiple times in the document. You always want to match a unique item as close as you can to the data you need.
Screen scraping is what you do when nobody's provided you with a reasonable machine-readable interface. It's hard to write, and brittle.
As an example, consider an RSS aggregator, then consider code that gets the same information by working through a normal human-oriented blog interface. Which one breaks when the blogger decides to change their layout.
One example of a library used for this purpose is Hpricot for Ruby, which is one of the better-architected HTML parsers used for screen scraping.
You have an HTML page that contains some data you want. What you do is you write a program that will fetch that web page and attempt to extract that data. This can be done with XML parsers, but for simple applications I prefer to use regular expressions to match a specific spot in the HTML and extract the necessary data. Sometimes it can be tricky to create a good regular expression, though, because the surrounding HTML appears multiple times in the document. You always want to match a unique item as close as you can to the data you need.
Screen scraping is what you do when nobody's provided you with a reasonable machine-readable interface. It's hard to write, and brittle.
Not quite true. I don't think I'm exaggerating when I say that most developers do not have enough experience to write decents APIs. I've worked with screen scraping companies and often the APIs are so problematic (ranging from cryptic errors to bad results) and often don't give the full functionality that the website provides that it can be better to screen scrape (web scrape if you will). The extranet/website portals are used my more customers/brokers than API clients and thus are better supported. In big companies changes to extranet portals etc.. are infrequent, usually because it was originally outsourced and now its just maintained. I refer more to screen scraping where the output is tailored, e.g. a flight on particular route and time, an insurance quote, a shipping quote etc..
In terms of doing it, it can be as simple as web client to pull the page contents into a string and using a series of regular expressions to extract the information you want.
string pageContents = new WebClient("www.stackoverflow.com").DownloadString();
int numberOfPosts = // regex match
Obviously in a large scale environment you'd be writing more robust code than the above.
A screen scraper downloads the html
page, and pulls out the data
interested either by searching for
known tokens or parsing it as XML or
some such.
That is cleaner approach than regex... in theory.., however in practice its not quite as easy, given that most documents will need normalized to XHTML before you can XPath through it, in the end we found the fine tuned regular expressions were more practical.

Resources