which one is better for screen scraping? simple html dom or snoopy ??
i use simple html dom and find it comfortable..
does snoopy has any advantage over simple html dom?
my requirements : if i wanna scrape contents from a page(after login)..
simple html dom is easy but it takes a lotta time to print the results..
Is Snoopy that well known / mature of a package?
If it's not, then all other things being equal, I'd probably go with generic HTML DOM code - especially if the scraping is somewhat simple.
But only you know when your code is starting to get too big, unmanageable, etc., at which point it might be better to look at another tool out there like Snoopy.
(Which, admittedly, I don't have experience with; it's apparently at http://sourceforge.net/projects/snoopy/ for those not familiar with it - "Snoopy is a PHP class that simulates a web browser. It automates the task of retrieving web page content and posting forms, for example.")
The real reason I'm posting, even though I don't know Snoopy per se and thus can't definitively answer your question, is to ask if you've considered using Selenium (http://www.seleniumhq.org/) instead of Snoopy.
Selenium is a fairly well-known testing tool, and it occurred to me that one of the nice things about using that for what you're doing (if you can) is that it has built in tests.
The reason that's good is that screen scraping is kind of an inherently brittle task - if the target site changes something, blam, your scraping fails. So it's kind of a nice design to have an automated scrape/test-that-scraping-worked system.
Something to think about, anyway.
I've stumbled into BeautifulSoup, which is Python-based. I suppose there are a bunch of others too.
Looks like Snoopy is PHP-based, and hence can be run server-side only. Is this what you are really looking for? What are your requirements? Please elaborate on that.
Related
I have been using both in my projects and sometimes I find the need to use a Material UI component within a bootstrap component and the UI displays as I would expect. I have been advised though not to use this approach. Is there any reason why since both are using the grid and can be flexed?
I tend to be verbose so I'll put the concise answer up top here:
Conclusion:
Whoever said it was bad to use both might just be expressing their
opinion, in reality saying it's bad to use both really lacks context
in what you're designing. #user3770494 made a very good point- but the
point, while valid and truthy about the build size, it does depend on
the scope of the application. If it's an intraoffice application with
everyone on a fiber network local it'd all be cached in memory
anyways... but (not that you know me) I would not judge you negatively
if you mixed them together-- unless it was for a MAJOR million user
application to run on mobile (an very low devices), desktop, and other
devices around the world requiring real(ish) time updates, and
streaming dynamic content with 10000s of active users at any givin
time 24/7.
In all truth- if it's not life or death-- I'd say use both- and also
do your own. The experience of understanding more then one thing is
better then just "committing to a single solution" for personal
growth.
The rest of this reading is optional - you're welcome :)
I personally have used both in production applications (both together, and independently)... I've also done it all from scratch... (CSS is my least favorite part of job things I do - luckily I have a coworker who is great at it) Here are my thoughts:
Warning: I tend to be verbose.
Disclaimer:
As someone who likes function over form, form is an afterthought that is nit picky for clients to ask about tenuous little changes. I am going to try to leave my opinion on "how each option looks for feels" out of this as much as I can.
Also I'm looking at your question in a current choice I'm making- which is using ReactJS / create-react-app to make "demo" projects for touch screen embeaded systems- so I am going to roll out half a dozen mock programs for demos nothing that really does anytihng exactly (CCscanner, barcode scanner, gps, webcam integration, fun stuff like that). So I'm researching what will be easist for me to just commit to for this "beacuse I'm bored and got a pi3b+board for fun).
Answer:
If you have the time, dedication and resources, there is really
nothing wrong with mixing them together. But you just need to think
about the time/cost/benifit of it. DIY to make the end user happy-
even if you mix them. Totally yourself is remaking the wheel- but you
can always pull in boostrap styles etc.
The inherent risks is that if you use both- make sure you don't "Mix them" to much- because then you will always have issues with trying
to ever do version changes on either one.
I like a lot of MaterialUIs things, but I honestly dislike how a few things look (style wise by default)- functionally I like it better
then bootstrap, but at the same time, I do not like MaterialUIs React
programming style (as a purist who hates CSS But knows how important
it is- having to use !important ever ever ever ... is big big big
nonononononononon) compared to whatever way my coworkers and I use
for conventions. Not saying it's better or worse then my own
preference- but a few things about it really irk me (even if they are
done for good reasons).
Bootstrap has a lot of choices to use for- I like how it looks better, I like how it plays with ReactJS better- but there is
reactstrap vs. react-bootstrap (which is why I found your post trying
to figure which one to use for this demo thing I'm doing).
Most recently (for production projects) I do try to stick just with one but normally I'm making systems that are function over form. So
they don't really care that much about the UI elements, it's about
how to use it not make it pretty. So I stick with using a single one
just to make my job easier- and usally still override styles myself
... if the original styles piss me off. But I do not stick with one
because "It's bad form to use both" I just stick with one for that
reason mentioned above. I'd actually say if bandwidth is not an
issue- it's quality to use both- but only use the parts of them that
you actually use.
(I noticed someone once importing full jQuery when the only thing that they used from it was $.ajax (all be it a lot but still) ... I was like... is that not overkill?!) -- So if you use both and want to keep things slim- just make sure on compilation you're only importing what you're using. Pythonically I'm saying- never use import * from module (however you express that in Javascript as a concept - webpack/gulp/whomever should take care of most of that for you). Assuming you're using ES6/7 style Javascript.
I encountered this scenario recently and opted to use both but for specific tasks. With it being a responsive web app Bootstrap made the most sense to use for the layout and Material UI for the widgets (just my personal preference).
You can definitely use both but you should be aware of the following:
There will be a lot of overlap in offered features unless you take time and effort to manage it by carefully picking elements from each of those libraries. And even then you will face situations where you can't avoid overlap. This basically results in a bigger bundle.
You will have to maintain theming variables for both systems to have a consistent presentation across your app. Even then, there will be situations like where your table checkboxes look different from your form checkboxes because they are from different libraries.
You have to learn and understand both of the systems. It means sometimes it'll be harder to find what's causing a certain bug. You'll also be spending more time deciding which library to use for which component.
Overall, it's more work for you to work with two different systems and a higher chance of things looking inconsistent. That said, mixing in things like a grid system with limited theming might not be too bad.
If possible, I highly encourage you to choose one system and stick with it.
Using both will increase your production js size.
Material Ui and bootstrap both provide components with basic styles like buttons so choose one.
You can use bootsrap grid for structure only or even go with flex.
Here are the numbers
min+gzip 26k
gzip 90k
original 450+k
And React doesn't have many features in it's documentation. Why is it so big?
I have a feeling that it's the implementation of lightweight DOM. But I want to be sure.
React does quite a bit! The biggest nonobvious part of React is probably the events system -- not only does React implement its own event dispatching and bubbling, it normalizes common events across browsers so that you don't need to worry as much about it. For example, SelectEventPlugin is a built-in event "plugin" which provides an onSelect event which behaves the same way in all browsers.
The virtual DOM implementation does take a decent amount of code as well; a lot of effort is spent on performance optimization, which is why the unminified version includes ReactPerf, which is a tool for measuring rendering performance. In updating the DOM, React also does some convenient things for you like maintaining any input selection and keeping the current scroll position the same.
React also has a few other tricks up its sleeve. One of the coolest is that it fully supports rendering a component to an HTML string even if you don't have a browser environment, so you can send down a page that works even before JS is loaded.
What are you comparing React against? react-15.0.2.min.js is 43k (gzipped), but jQuery is 33k while ember-2.6.0.prod.js is 363k (also gzipped). Obviously these frameworks don't do exactly the same things but they do have a lot of overlap, so I think the comparison is reasonable.
If you're worried about download size, I wouldn't worry too much about JS code contributing to that. Here's an ad which I see on the right side of my Stack Overflow page right now:
Its download size is 95k -- I wouldn't think twice about including an image like that in a page because (even if I was worried about performance) there are usually so many other performance-related things to fix that are more lucrative.
In short, I don't think React is that big, and what size it does have comes from the many small things it does to help you out. You cite React's small API as a reason that React's code should be small, but a better question might be, "How is React's API able to be so simple given all the things it does for you?"
…but that's a totally separate question. :) Hope I answered your question -- happy to expand if I didn't.
A couple of thoughts.. I had some of the same concerns with it's size, but after using it, no joke, I would use it if it was 5MB in size. It's just that good. That said, I did decide to reduce as many dependencies on other libraries as possible. I was using jquery for two things.. ajax and the automatic ajax response and request handling (beforeSend, etc) that would handle when a token was in a response after login, and then make sure every ajax request added it to the Authorization header before sending. I replaced this with a super small simple bit of native javascript. Works great. Also, I was trying to use _underscore. I've replaced it with lodash, which is smaller and faster, although presently I am not using it so may remove it altogether.
Here is one article, of many on google, that I found that has some alternatives using native JS over jquery.
http://www.sitepoint.com/jquery-vs-raw-javascript-1-dom-forms/
I'm using CakePHP 2.3 and would like to know how to properly go about building a CakePHP website using test-driven development (TDD). I've read the official documentation on testing, read Mark Story's Testing CakePHP Controllers the hard way, and watched Mark Story's Win at life with Unit testing (PDF of slides) but am still confused. I should note that I've never been very good about writing tests in any language and don't have a lot of experience with it, and that is likely contributing to my confusion.
I'd like to see a step by step walkthrough with code examples on how to build a CakePHP website using TDD. There are articles on TDD, there are articles on testing with CakePHP, but I have yet to find an in-depth article that is about both. I want something that holds my hand through the whole process. I realize this is somewhat of a tall order because, unless my Google-fu is failing me, I'm pretty sure such an article hasn't yet been published, so I'm basically asking you to write an article (or a long Stack Overflow answer), which takes time. Because this is a tall order, I plan to start a bounty on this question worth a lot of points once I'm able to in order to better reward somebody for their efforts, should anybody be willing to do this. I thank you for your time.
TDD is a bit of falacy in that it's essentially just writing tests before you code to ensure that you are writing tests.
All you need to do is create your tests for a thing before you go create it. This requires thought and analysis of your use cases in order to write tests.
So if you want someone to view data, you'll want to write a test for a controller. It'll probably be something like testViewSingleItem(), you'll probably want to assertContains() some data that you want.
Once this is written, it should fail, then you go write your controller method in order to make the test pass.
That's it. Just rinse and repeat for each use case. This is Unit Testing.
Other tests such as Functional tests and Integration tests are just testing different aspects of your application. It's up to you to think and decide which of these tests are usefull to your project.
Most of the time Unit Testing is the way to go as you can test individual parts of the application. Usually parts which will impact on the functionality the most, the "Critical path".
This is an incredibly useful TDD tutorial. http://net.tutsplus.com/sessions/test-driven-php/
I've been tasked with developing an application for internal use in our company, the end-users being the operators of the call center and their superviser.
There would be therefore be two types of users: operator & supervisor.
The operator view would be purely passive: ability to see their monthly goals (calls taken, calls answered etc) as well as those of their "cell"(group of operators+supervisor) and other "cells" and that's it.
The supervisor one however would be able active: they need to be able to set monthly goals for their subordinates as well as view them.
The application needs to live in the browser, and that browser is...sigh, either IE6 or IE7. So my question is, should I use something client-side like backbone.js, or something server-side like, say, Code Igniter?
I need to be able to develop it in a short time frame and add features as requested.
Any advice is greatly appreciated.
Thanks.
Firstly, IE6 is very old. It is still supported but support will end fairly soon (about a year, I think), so your company needs to have an upgrade plan in place. IE7 has a bit longer to run, but will also fall out of support at some point. Your company must make plans for an upgrade process. And you need to make sure that anything you write today will continue working on upgraded browser versions when they come.
Okay, that aside, today you need to support these browsers.
The first advice I would give is to use jQuery for all your Javascript needs. It is specifically targeted at being compatible with IE6 and above, and it hides a lot of the complexities with cross-browser and old-browser support from the developer, but also works well with newer browsers too.
IE6/7 do have a number of serious bugs and omissions in their Javascript support, but these can generally be worked around. Using jQuery will mean you can forget about most of them.
In the main, I would recommend against using a client-side framework like Backbone. Stick with server-driven simple HTML pages. Maybe a bit of ajax using jQuery, but nothing too much more than that. IE6 and IE7 are very very slow browsers, so the less work you make them do, the happier they'll be. Put too much Javascript on the front end and you could wind up with a system that is too slow to be usable. Also, a lot of modern JS libraries don't support IE6 at all. I'm not sure about Backbone, but even if it works now, you can't be sure that later versions will continue supporting it. (even with jQuery, some developers are starting to push for IE6 to be dropped. I don't think it'll happen just yet though)
Make sure you specify a valid <!DOCTYPE> for all your pages. Without it, IE will drop into quirks mode. This will make it very difficult to upgrade your site to a newer browser later on. There are many valid doctypes, but it doesn't really matter which one you use, as long as it's valid. Therefore, I suggest using the HTML5 doctype, simply because it's valid and it's short and simple: <!DOCTYPE html> -- that's all there is to it.
CSS is where you'll really have some pain if you're used to working with modern browsers. IE6 in particular has terrible CSS support. For IE6/7 CSS compatibility, I recommend using the Quirksmode.org compatibility charts to find out what does and does not work in those browsers.
Finally, make sure you read up on the well known IE6 bugs. There are a lot of them, and they often cause weird and wonderful rendering errors on perfectly valid code. Knowing about them in advance will help you avoid them, and help you recognise them when you (inevitably) hit them.
Hope that helps.
Oh, and good luck -- it sounds like you'll need it! ;-)
Developing websites or web applications for Internet Explorer 6 is possible as long as you do not use things such as HTML5 Javascript functions.
As for the live part, maybe you could use AJAX long polling or something similar. Example: http://techoctave.com/c7/posts/60-simple-long-polling-example-with-javascript-and-jquery
Hope this helps!
I know it has great out-of-the-box features but is it easy to customize?
Like when I query stuff from the database or change css layouts.
Is it faster to create my own modules for it or just go on and write everything from scratch using frameworks like Cake
I'm currently working on an Elgg-based site and I absolutely hate it. The project was near completion when I stepped in, but the people who created were no longer available, so I took it over as a freelancer.
As a personal impression, you are much better off writing the app from scratch in a framework. I don't know if the people before me butchered it, but the code looks awful, the entity-based relationship model is wierd to say the least and debugging is horrendous. Also, from my point of view, it doesn't scale very well. If you were to have a consistent user base, I'd be really really worried.
It keeps two global objects ($vars and $CONFIG) that have more than 5000(!) members loaded in memory on each page. This is a crap indicator.
I've worked extensively with cake. With Elgg, for about a month in a project that is on QA stage right now.
My advise is: if you need something quick with a lot of features and you only need to customize a little, go with Elgg.
If you're going to customize a lot and you can afford the development of all the forums, friends, invites, etc. features, go with Cake or any other MVC framework.
I have been working on a Elgg site for the past month or so, its code is horrible, however it's not the worst I've seen :D. it's not built for programmers like Drupal is :D. But it's not too bad. Once I got a handle on the metadata functions and read most of the code I was able to navigate it well and create custom modules and such.
What would help immensely would be some real documentation and explanation of the Elgg system. I don't think that's going to happen though :).
Out of the box there are a few problems, there are some bugs that haven't been fixed for a while and I've had to go in and fix them myself. Overall, you can make it pretty and it has some cool functions, but i wouldn't dive in until i had read the main core code to get a handle on what's happening on the backend.
Oh and massive use of storing values in globals. and a crap ton of DB calls (same with Drupal though).
i wonder if the use of storing everything, and i mean everything for your site in the globals will really hinder the server if you have a massive user load.
If you want to build a product based on a social networking platform/framework then Elgg is definately a good way to go. The code is not that bad if you actually look before leaping and doing what elgg expects. You go against its processes and structures and it will leave you beaten by the side of the road.
Developing modules/plugins or editing CSS is easy and Elgg does give you great flexability to basically build your own product ontop of it. Dolphin, as comparrison, does not allow you to do anything outside of what it expects you to do.
If you however just need a framework (not primarily for social networking etc) with some user based functionality then i suggest Cake, or if your project is HUGE then maybe Symfony or Zend. They all have plugins you can download and use/hack which would be easirer to adjust for personalised needs.
To show what you can do with elgg here is a site Mobilitate we built with Elgg 1.7. This is a very complicated website and was built ontop of Elgg.
We are starting a new project with Elgg 1.8. The new version is a major improvement they have made a lot of elements easier, incorporated better JS and CSS implementation/structure and have better commented their own code.
Elgg's database schema is horrific. They've essentially implemented a NoSQL database in SQL. It completely defeats the purpose of using a relational table structure.
If you can ignore this, and aren't doing much customization, you might be OK with Elgg. If not, STAY AWAY.
I've been working with Elgg for over a year. It is easier to customize than it would be to build something from scratch using a framework like CakePHP. I tried CakePHP and found it even more complicated than Elgg.
It is difficult to query the database due to the entity-based relationship model. You should use the build-in methods for accessing data. However, I have written many queries to double check on what is actually stored in the database.
You cannot change layouts using CSS alone. You have to deal with the various Elgg views. But CakePHP uses the same Model/View/Controller MVC concept so that would be just as difficult.