I have been recived some job offers to develop simple static pages (only with a contact form) and I have been tempted to suggest appengine for the hosting, but, is this appropiate? I don't want to see appengine become in the new geocities.
I think so. It's free after all, so worth a shot. You can even use something like DryDrop (http://drydrop.binaryage.com/) to make it super easy to manage.
Google Sites could also be a possibility for hosting static pages. Uploading HTML files directly is not supported, but you could copy and past the source of the pages you have created as described here.
One limitation that you should take into consideration before suggesting this solution is that AppEngine will not work with naked domain names. In other words, if your client wants to host static webpages at myawesomedomain.com, you would have make sure that users were making requests to www.myawesomedomain.com.
Well, AppEngine gives you access to Google's CDN, which is useful. But you might look at SimpleCDN, S3 (and/or CloudFront), AOL's CDN, and such before making a recommendation.
hosting static pages is fine as it gives the scope to grow and since low hit pages wont cost anything in hosting is win win. Also putting your own domqin can be useful for brainding.g
If you are only serving static pages, it will be easy to move the website somewhere else if AppEngine ever does disappear like geocities.
Related
I am currently building a large NextJS application. I just need to integrate Google Analytics 4 for the client. Should I rather include Google Analytics directly in my application or include the Tag Manager and configure Google Analytics through it?
I have never really worked with Google Analytics before and I am confused. Theoretically it is more practical to integrate it directly, because I can set dimensions etc. and don't have to configure so much in the Tag Manager?
Thanks
You should use a tag manager. It abstracts trivial implementations from your codebase and makes it much easier to maintain and scale them without requiring dev resources in vast majority of cases.
On the other hand, if all you'll ever need from analytics are pageviews, then sure, you don't need GTM, just implement it directly and then once you see that you need more, you can move the logic to a tag manager.
GTM is pretty good and free. However, proprietary. Matomo is free and open sourced. Not as good as GTM though. Adobe Launch is probably the best, but it's not free (Adobe doesn't consider it a standalone product) and it requires a bit more skills than GTM. In between, there are also things like Ensighten or Tealium, but they're not very advanced.
If you only need Analytics on the website, no need to use GTM : you just add Analytics and that's it.
GTM makes tags easier to manage and gives trigger options, so it can be useful :
if non-technical people have to manage tags (like the client)
if you have several tags and want to manage them all from the same interface
if you need triggers and rules that are easy to setup
I have a Drupal website that has a ton of data on it. However, people can quite easily scrape the site, due to the fact that Drupal class and IDs are pretty consistent.
Is there any way to "scramble" the code to make it harder to use something like PHP Simple HTML Dom Parser to scrape the site?
Are there other techniques that could make scraping the site a little harder?
Am I fighting a lost cause?
I am not sure if "scraping" is the official term, but I am referring to the process by which people write a script that "crawls" a website and parses sections of it in order to extract data and store it in their own database.
First I'd recommend you to google over web scraping anti-scrape. There you'll find some tools for fighting web scrapig.
As for the Drupal there should be some anti-scrape plugins avail (google over).
You might be interesting my categorized layout of anti-scrape techniques answer. It's for techy as well as non-tech users.
I am not sure but I think that it is quite easy to crawl a website where all contents are public, no matter if the IDs are sequential or not. You should take into account that if a human can read your Drupal site, a script also does.
Depending on your site's nature if you don't want your content to be indexed by others, you should consider setting registered-user access. Otherwise, I think you are fighting a lost cause.
Our app is a sort-of self-service website builder for a particular industry. We need to be able to store the HTML and image files for each customer's site so that users can easily access and edit them. I'd really like to be able to store the files on S3, but potentially other places like Box.net, Google Docs, Dropbox, and Rackspace Cloud Files.
It would be easiest if there there some common file system API that I could use over these repositories, but unfortunately everything is proprietary. So I've got to implement something. FTP or SFTP is the obvious choice, but it's a lot of work. WebDAV will also be a pain.
Our server-side code is Java.
Please someone give me a magic solution which is fast, easy, standards-based, and will solve all my problems perfectly without any effort on my part. Please?
Not sure if this is exactly what you're looking for but we built http://mover.io to address this kind of thing. We currently support 13 different end points and we have a GUI interface and an API for interfacing with all these cloud storage providers.
I have created a GWT application and now want to deploy it outside GAE. The reason I wish to deploy outside the GAE is the Sandbox security feature of GAE, which disallows me from writing files to my system. I store my data in the form of an ontology (.owl file) under my '/war/WEB_INF' and I want the end user to be able to modify (write to / save) this file through the server.
I understand that GAE does not let me do this, but is there a paid Google Service (e.g. google apps) that would allow hosting a GWT application which would allow writing files to the system? For instance, like an add-on to GAE?
If not, what solution would you recommend to host a GWT application (that would let me write a file to the WEB-INF folder) on the web?
EDIT: I solved this by deploying the GWT project as a .war file and hosting in TomCat.
I'm very new to GAE, but in case you haven't looked at their experimental write/read blobstore services you can check that out here. They have a similar API for python I believe. It's ofcourse stored on the GAE blobstore and not under /war/WEB-INF/ directory but It does allow a possible solution to what you're looking for.
Also, if you're looking to run your own server (possibly on EC2 for example), then you might want to look into AppScale. But I, personally, would stay away from that as a solution because I highly doubt that AppScale performs as well as google's GAE web servers and furthermore lacks the same degree of support/development.
Have you ruled out something like creating an Owl Entity to hold your ontologies, and arranging for *.owl requests to be handled by using that as a key name to find and serve the corresponding Owl? That's really simple code.
GWT is primarily a client side technology. GAE is a server side technology. You seem to be getting GWT and GAE engine mixed up with each other. GAE can work with almost any client side technology, and GWT can connect to many different back end platforms.
Are you trying to move your back end code directly to a new platform? Are you planning on rewriting the back end for a new platform, but keep the GWT code? What is your goal for this application? To be used by you and a few friends, or by thousands of people? For free or paying customers?
If you want to move off of AppEngine, you can switch to pretty much any java hosting service that you want - anything from a tiny shared VPS up to a Amazon EC2 mini cloud of your own. I don't think google offers generic java hosting. I don't know how you have built your application's back end, but you probably used servlets, which you should be able to get working pretty much anywhere.
If you want to stay on AppEngine, you should think about whether or not you can break your owl file into smaller sections that can be stored as entities in the database.
Whichever platform you choose, if you are planning on serving more than a few people, you will need some way to prevent one giant owl file from becoming a huge bottleneck.
Search engine bots crawl the web and download each page they go to for analysis, right?
How exactly do they download a page? What way do they store the pages?
I am asking because I want to run an analysis on a few webpages. I could scrape the page by going to the address but wouldn't it make more sense to download the pages to my computer and work on them from there?
wget --mirror
Try HTTrack
About the way they do it:
The indexing starts from a designated starting point (an entrance if you prefer). From there, the spider follows recursively all hyperlinks until a given depth.
Search engine spiders work like this as well, but there are many crawling simultaneously and there are other factors that count. For example a newly created post here in SO will be picked up by google very fast, but an update at a low traffic web site will be picked up even days later.
You can use the debugging tools built into Firefox (or firebug) and Chrome to examine how the page works. As far as downloading them directly, I am not sure. You could maybe try viewing the page source in your browser, and then copy and paste the code.