I am using the Google Appengine remote shell, with Python. I am walking through an entire database table updating all my entities, and I am doing this in 500 entity chunks. This is all working fine. The task involves
fire up the remote shell
kick off the job
wait 10 minutes
rinse, repeat
I'd like to keep this up while I'm at work, and just do it in the background, without of course impacting my productivity :-). What's getting in my way is the firewall, which prevents this sort of transfer of data, when logged in over VPN.
So is there a way to do this, like in a separate Emacs shell? If I had two computers, I'd just run this thing on my spare, but I don't. (I do have an iPad, but I doubt that helps).
I may be misunderstanding the core issues, and hence, my question.
Rather than using the remote shell, it'll probably be easier - and certainly quicker - to run the job entirely on the server via the mapreduce API.
Related
I start off with my question: how can I setup redundancy with several DBs without having to install anything special on each of the clients?
I have seen that the usual way to do it is to install something like pgpool on each client which can handle whether to write to the master or to pick a slave for the read. I would like to find a way to move the responsibility to the server, either by having a very small machine with just that dedicated function (effectively acting as Load Balancer) or something that can be setup on the DBs.
So the clients will only try to connect to http://IP_BALANCER and that machine will be able to revert all the operations accordingly, exactly as pgpool would do. Is that possible? If so, is that a good way to do it?
I am asking this question because the 'clients' would be a set of very light pods on a Kubernetes cluster. I managed to make the pods very minimal (they run with only compiled machine code) and I wouldn't want to have to add pgpool on top of them, since I would have to start adding an actual OS to be able to achieve that.
Sorry If my title didn't made any sense, but my problem is that, I have one application which is hosted on a server, the application uses a database which is hosted on the same server, also the same server is using sidekiq to process a lot of queues.
One problem, is that a lot of memory is used, and everything works very slow, and even if I have a 8 core processor, I can't take advantage of it when processing queues because the application was developed on MRI and is using Unicorn.
I was thinking at moving all the part which is used to process the queues on a different server, there install Puma, and jRuby and process the queues in there(this process should be a lot faster by taking advantage of multiple cores.
All the data processed by sidekiq, is coming from a Database and is stored in a Database(currently is the same database from where it takes the info and where is storing the data). Most of the sidekiq workers are receiving some information, and are using that information to get other informations so they need to connect to the same db as the app.
What will be a good solution, to serve the same database to 2 different applications?
And is it a good idea to have another server with Puma and jRuby installed for sidekiq only(maybe other things in the future)?
Thank you
Even with MRI and Unicorn you can take advantage of multiple cores: Just start unicorn multiple times or use the clustered mode provided by Puma. Same for Sidekiq. No need to switch to JRuby right away.
Accessing the database from multiple application is no problem. But do yourself a favor and use a dedicated database server. Makes added more application servers way easier.
I am new to the community and looking forward to being a contributing member. I wanted to throw this out there and see if anyone had an advice:
I am currently in the middle of developing a MVC 3 app that controls various SQL Jobs. It basically allows user to schedule jobs to be completed in the future, but also also allows them to run jobs on demand.
I was thinking of having a thread run in the web app that pulls entity information into an XML file, and writing a window service to monitor this file to perform the requested jobs. Does this sound like a good method? Has anyone done something like this before and used a different approach? Any advice would be great. I will keep the forum posted on progress and practices.
Thanks
I can see you running into some issues using a file for complex communication between processes - files can generally only be written by one process at a time, so what happens if the worker process tries to remove a task at the same time as the web process tries to add a task?
A better approach would be to store the tasks in a database that is accessible to both processes - a database can be written to by multiple processes, and it is easy to select all tasks that have a scheduled date in the past.
Using a database you don't get to use FileSystemWatcher, which I suspect is one of the main reasons you want to use a file. If you really need the job to run instantly there are various sorts of messaging you could use, but for most purposes you can just check the queue table on a timer.
I'm currently developing a small hobby project (open sourced at https://github.com/grav/mailbum) which quite simply takes images from a Gmail account and puts them in albums on Picasa Web.
Since it's (currently) only dealing with Google-hosted data, I was thinking about hosting it on Google App Engine, but I'm not sure if it's well-suited for GAE:
Will the maximum execution time be a problem? It's currently 10 minutes according to http://googleappengine.blogspot.com/2010/12/happy-holidays-from-app-engine-team-140.html, but I'd think the tasks (i.e. processing a single mail) would be easy to run in parallel. I'm also guessing that dealing with Google-hosted data would be quite efficient on GAE?
Will the fact that it's written in Clojure be an obstacle? I've researched a bit in getting Clojure to run on GAE, but I've never tried it. Any pin-pointers?
Thanks for any advice and thoughts on the project!
It seems like your application is doable on GAE. My points of concern would be:
Does your code ever store the images that it is processing to temporary files? If so it will need to be changed to do everything in memory, because GAE applications are sandboxed and not allowed to write to the filesystem (if you need temporary persistent storage, you might be able to work something out where you write your file data to a BLOB field in the GAE datastore).
How do you get the images into Picasa Web? If they provide a simple REST/HTTP API then all is well. If you need something more involved than that (like a raw TCP socket) then it won't work.
The 10-minute execution time limit only applies to background tasks. When actually servicing web requests the time limit is 30 seconds. So if you provide a web-based interface to your app, you need to structure things so that the interface is just scheduling jobs that run in the background (i.e. you can't fire off a job directly as part of servicing a web request).
If none of those sound like show-stoppers to you, then I think your app should work just fine on GAE.
Can't really say if Clojure will work though. I have, however, spent time in the past getting some third-party libraries to work on App-Engine. Generally all I had to do was remove/modify/disable any parts of the library that accessed features that are forbidden by the sandbox (for instance, I had to disable the automatic caching to disk to get commons-fileupload to work on GAE). Not sure if the same would apply to Clojure, or even what the scope would be on a task like that.
I have been dabbling with Clojure and App Engine for a while now and I have to recommend appengine-magic. It abstracts most of the Java stuff away and is very easy to use. As a plus the project seems to be very active.
I have a very limited experience of database programming and my applications that access databases are simple ones :). Until now :(. I need to create a medium-size desktop application (it's called rich client?) that will use a database on the network to share data between multiple users. Most probably i will use C# and MSSQL/MySQL/SQLite.
I have performed a few drive tests and discovered that on low quality networks database access is not so smooth. In one company's LAN it's a lot of data transferred over network and servers are at constant load, so it's a common situation that a simple INSERT or SELECT SQL query will take 1-2 minutes or even fail with timeout / network error.
Is it any best practices to handle such situations? Of course i can split my app into GUI thread and DB thread so network problems will not lead to frozen GUI. But what to do with lots of network errors? Displaying them to user too often will be not very good :(. I'm thinking about automatic creating local copy of a database on each computer my app is running: first updating local database and synchronize it in background, simple retrying on network errors. This will allow an app to function event if network has great lags / problems.
Any hints and buzzwords what can i look into? Maybe it's some best practices already available that i don't know :)
Sorry this is prob not the answer you are looking for but you mention that a simple insert / update could take 1-2 minutes or even fail with timeout / network error.
This to me sounds like there may be another problem rather than the network itself. If your working on a corporate network there would have to be insane levels of traffic for this sort of behavior. I would do everything in your power to look at improving the network before proceeding. Can you post the result of a ping to the db box?
If your going to architect your application around this type of network it will significantly alter the end product and even possibly result in a poor quality product for other clients.
Depending upon the nature of the application maybe look at implementing an async persistence queue and caching data on startup or even embedding a copy of the db into your application.
Even though async behaviour/queues/caching/copying the database to each local instance etc will help solve the symptoms, the problem will still remain. If the network really is that bad then I'd address it with their I.T. department, or the project manager and build some performance requirement from their side of things into the contract.