Which programming language single page web scraping? [closed] - screen-scraping

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I want to build (hire someone to build) a program for windows. This program has to save some data of a single web page like name of the website, product name and product price on a command (under right-click or keyboard shortcuts) in a local database. Which programming language can I chose best? The amount of (affordable) programmers and the possibility to add some extra functionalities in the future is also important.
I found for example that python, Java, Ruby and XPath are used for this job.
Thank You.

Java, python and ruby are all good choices. Xpath is not a programming language, it's a query specification that allows you to extract the data you want from xml or html. No matter which language you choose you will need to also use xpath (all 3 have xpath libraries available).
Python seems to be the most popular but the future of it's libraries
is also the most uncertain (nobody has bothered to port mechanize to
python3 yet, beautiful soup has died and then come back).
Java's biggest strength may be that it's already installed on most
windows machines, but it's also the only one of the three that is not
a scripting language and therefore development time will likely be
longer.
Ruby is a good choice with excellent scraping libs and plenty of
programmers using it.

Related

Is there any Dictionary Library for C? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to know if there is any dictionary library for C. A dictionary library, literally (It has nothing to do with python dictionaries (hashmaps)). With all the words of the english language, and with tools like...
"I want to print all words that begins with C and end with Y".
I'll not google it, because I really want to know your opinion, if there is any that is specifically good.
Thank you!
You might want to start by looking at Aspell. While it mostly functions as a spell-checker, Aspell also has support for using multiple dictionaries at once and intelligently handling personal dictionaries when more than one Aspell process is open at once. I don't believe you have to be connected to the Internet to use it as well.
Wiktionary might also be of any help. There are a lot of localized variations to support different languages and there will probably be a way to ask them to support your language of interest, if it is not already there.
There's amazing Wordnik API, if you don't mind using Internet for this task. The API is fairly easy and supports regex search. The method you are looking for is /words.{format}/search/{query}
It also has methods to retrieve meanings (/word.{format}/{word}/definitions), synonyms (/word.{format}/{word}/relatedWords), and many other things.
There currently are no C wrappers, although it's very easy to use API directly with libcurl and any JSON or XML parser.

How to start building a programming language in C? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I really would love to go through the experience of building a compiler, lexer, and so on using C, however I havn't found a single resource on creating one. I've read the book about creating your own language using Ruby, but it just talks about how C is the best option, and won't tell you where to go from there.
Is their any nice resources for building a language using C? I don't care how long it is, I just want to know how to build one.
One of the nice things about compilers/interpreters is that it doesn't really matter what language they are written in. In the final stage they will just be an executable on someone's machine.
That being said while writing my compiler (something I am currently doing) I have used several books that have been extremely helpful:
Compiler Construction by Niklaus Wirth
Compilers Principles, Techniques, and Tools by Jeffrey Ullman, Alfred Aho, Ravi Sethi
The Wirth book will walk you through all the stages of creating a compiler for a language called Oberon-0. It also has the entire source code for his finished compiler, so you can play around with it on your own machine. The compiler itself was written in Pascal (something else that Wirth created).
The Dragon Book has really good information and examples in C! This may be what you are looking for, but as I said above, the language you write the compiler in isn't all that important.

Classroom management software; storing data? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
So I am working on a mini-project for the summer to keep my coding skills sharp. I will be using the Qt4 and C++ to make a classroom management system for college professors. I just came up with the idea like 10 minutes ago so I don't have much.
One question I have is what is the best way to store student/class/assignment information so that the software could still be portable and used my different schools.
My first guess would be a MySQL database. I need a gurus opinion on this one though.
Since different sites have different database preferences you might wish to use a layer such as ActiveRecord or PDO or ODBC to abstract out the specific database that your end users want to use. This would allow people to deploy onto PostgreSQL or MySQL or whatever they prefer.
A good choice for single-process server systems could be SQLite3. It's not suitable for all systems, but if your system is designed to scale to a few dozen users at most, it'll probably work fine. (The amount of work you'd need to put into a server to make SQLite3 scale into the hundreds or thousands might argue for planning for a database server environment instead.)
http://www.sqlite.org/
might be a good option. It is embeddable so you don't need a specific database instance running wherever you deploy it
also, http://www.microsoft.com/sqlserver/2005/en/us/compact.aspx is an option

Is there a good tutorial for figuring out what a website is doing so your program can do the same thing? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is there a good guide or tutorial for people who need to programmatically interact with dynamic websites? There's been a rash of Perl questions about that lately, and I haven't found a good resource to point people toward. I'm asking not because I need one but because I don't want to waste my time writing it if it already exists. Although I'm most interested in Perl, the extra tools and techniques are mostly the same.
Typically, I see see these problems in people's questions:
Handling, setting, and saving cookies
Finding and interacting with forms
Handling JavaScript inside your user-agent
especially things like onLoad, onSumbit, and Ajax
Using HTTP sniffer tools
Using Web developer plugins in interactive browsers
Interacting with DOM, screen scraping, etc.
If there's no good tutorial, I'll add it to my list of things to do (unless someone else wants to do it). Along the way, if you don't have a suggestion for an existing tutorial, please suggest the things that you think should be in a new one, including links, your favorite tools, and your own user-agent development experiences. I don't care about the particular language you use.
The best I've seen is a Defcon presentation video.
Look at perl library of libraries. Some html parsing libraries should be made for talking to dynamic websites.
Like:
http://metacpan.org/pod/HTML::DOM
But do you want to use web-browser enhanced by perl. Or perl stand alone app?

Obfuscation and reverse engineering deterrents for C++ Win/OSX app [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've got a C++ app that ships on Windows and OSX. It communicates with our backend using TCP (encrypted with OpenSSL, natch). I'd like to throw up some speed bumps for folks who are trying to reverse engineer the protocol and/or disassemble the executable.
Skype does an excellent job of this, which is why you won't find a lot of apps that speak skype. Here is a really good read about what it does: http://www.secdev.org/conf/skype_BHEU06.handout.pdf
I'd like some ideas about how to accomplish similar stuff our app. Are there commercial products that make code harder to statically analyze? What is the best way to invest my time to accomplish the goals I've listed?
Thanks,
Some simple suggestions for OSX:
Prevent gdb from attaching to your program
http://www.steike.com/code/debugging-itunes-with-gdb/
(this can be worked around, but will keep some casual explorers away)
Have at least some of the code in your product stored outside the text segment of the executable, for example in data, or in an external (encrypted) shared library.
Minimally protect any sensitive string data by not storing it in plain text. Run "strings" against your executable, and if you see anything that might be helpful to someone trying to figure out the protocol, encrypt it.
GCC's -fomit-frame-pointer option can make debugging more painful (but can interact badly with C++ exceptions).
If I remember correctly Skype is using something similar (maybe they pay them to implement it in Skype, who knows) to "Code Guards" described in:
https://www.cerias.purdue.edu/tools_and_resources/bibtex_archive/archive/2001-49.pdf

Resources