Is it possible to have an example on how to create an ensamble learner with Encog for time series forecasting?
I was thinking at a iRPROP+ ensamble learner.
Thank you
Ensemble learning is still in the process of being added to Encog. More examples will be added when it is finalized. You can add a feature request on the issues page if you would like to suggest a specific example. https://github.com/encog/encog-java-core/issues
One technique you could try would be to simply aggregate the output of several networks together after each epoch. In order to have different networks, you will have to initialize each network with different starting weights.
Related
I'm trying to implement a collaborative canvas in which many people can draw free-handly or with specific shape tools.
Server has been developed in Node.js and client with Angular1-js (and I am pretty new to them both).
I must use a consensus algorithm for it to show always the same stuff to all the users.
I'm seriously in troubles with it since I cannot find a proper tutorial its use. I have been looking and studying Paxos implementation but it seems like Raft is very used in practical.
Any suggestions? I would really appreciate it.
Writing a distributed system is not an easy task[1], so I'd recommend using some existing strongly consistent one instead of implementing one from scratch. The usual suspects are zookeeper, consul, etcd, atomix/copycat. Some of them offer nodejs clients:
https://github.com/alexguan/node-zookeeper-client
https://www.npmjs.com/package/consul
https://github.com/stianeikeland/node-etcd
I've personally never used any of them with nodejs though, so I won't comment on maturity of clients.
If you insist on implementing consensus on your own, then raft should be easier to understand — the paper is surprisingly accessible https://raft.github.io/raft.pdf. They also have some nodejs implementations, but again, I haven't used them, so it is hard to recommend any particular one. Gaggle readme contains an example and skiff has an integration test which documents its usage.
Taking a step back, I'm not sure if the distributed consensus is what you need here. Seems like you have multiple clients and a single server. You can probably use a centralized data store. The problem domain is not really that distributed as well - shapes can be overlaid one on top of the other when they are received by server according to FIFO (imagine multiple people writing on the same whiteboard, the last one wins). The challenge is with concurrent modifications of existing shapes, by maybe you can fallback to last/first change wins or something like that.
Another interesting avenue to explore here would be Conflict-free Replicated Data Types — CRDT. Folks at github used them to implement collaborative "pair" programming in atom. See the atom teletype blog post, also their implementation maybe useful, as collaborative editing seems to be exactly the problem you try to solve.
Hope this helps.
[1] Take a look at jepsen series https://jepsen.io/analyses where Kyle Kingsbury tests various failure conditions of distribute data stores.
Try reading Understanding Paxos. It's geared towards software developers rather than an academic audience. For this particular application you may also be interested in the Multi-Paxos Example Application referenced by the article. It's intended both to help illustrate the concepts behind the consensus algorithm and it sounds like it's almost exactly what you need for this application. Raft and most Multi-Paxos designs tend to get bogged down with an overabundance of accumulated history that generates a new set of problems to deal with beyond simple consistency. An initial prototype could easily handle sending the full-state of the drawing on each update and ignore the history issue entirely, which is what the example application does. Later optimizations could be made to reduce network overhead.
Disclaimer: I started working with spreadsheets in depth this week, prior to that it was basic usage. I've read the rules and this does relate to programming, it's just my ignorance of programming keeps me from asking a specific question. I'm new to this, I want to learn, I have to start somewhere.
I want to create two separate spreadsheet documents, one as a database for another. I want one to be able to query the other in a way similar to the VLOOKUP() function or something along those lines.
These are very large files hence the need for separate documents.
I am learning about scripting and think there might be a way there. If that's the case please appreciate that I literally started reading about scripts this morning and know nothing (yet) about them.
All I need to know is, if it's possible and what functions to use, I'll figure out how to use them. I just don't have a working knowledge of all the script functions, and a limited knowledge of spreadsheet functions.
The IMPORTRANGE() function is limited to 50 per spreadsheet, given how I want to use it, that is not enough. Unless you know a work around. That and I only want one cell of information at a time and it doesn't need to be displayed, just usable.
Also, efficiency is king since I'm working with such large amounts of data. I used to have almost 1500 VLOOKUP functions as I was building what I already have and that sucker was starting to bog down. Then I realized I didn't need a dynamic database for that aspect of the sheet. I killed about two thirds of them and it runs much better. I'd like to keep it that way, or at least try.
Finally I may have bitten off more than I can chew, but this has been a fun challenge for me, and I've met with success so far. Please don't dismiss me out of hand because I don't know the right questions to ask, or I'm trying to fit a square peg in a round hole, everyone has to start somewhere.
Thanks!
This is totally possible, though you will quickly find that spreadsheet functions are too cumbersome for this sort of operation.
With Google Apps Scripts you can query and write to and from multiple workbooks with ease. You would be working in Javascript, using javascript objects and arrays.
Start by reading the Google documentation and checking out their examples.
So the solid consensus I got from the answers to this question: Editing a single line in a large text file
was that instead of using a text file I should create a database and store my data there. While I think this is a great idea, I don't know the first thing about databases, the programming languages used for databases, or how to use a database once I have set it up. Could you guys give me a shove in the right direction and point me an absolute noob tutorial that might help me with this?
UPDATE: Hey guys, so I was looking at mySQL and there are a whole bunch of versions! The Cluster CGE looks like the best one, and it says it is good for "real-time open source transactional database designed for fast, always-on access to data under high throughput conditions" which just about hits the nail on the head of what I need. It says commercial next to it though, so I don't know if I would have to pay some god awful fee for it. I tried it anyway, and it said I should have gotten a license already, and until I did I could only use it for 30 days. Im confused...
Can I get this version for free? If so, where do I get the license?
Is this version way overpowered for what I need? I need:
1. A storage medium through which I can store large amounts of data
2. Read and write from in real time with simultaneous access
3. Have two different "keys" (I think I'm using that right, I need to be able to search for entrees based on one of two criteria).
MySQL is a great choice, given your Python flair.
http://dev.mysql.com/tech-resources/articles/mysql_intro.html
Good luck!
I'm creating a simulator for a large scale P2P-system. In order to make the simulations as good as possible I would like to use data from the real world. I'd like to use this data to simulate each node's behavior (primarily it's availability). Is there any availability-data that has been recorded from large P2P-systems (such as BitTorrent) available?
I'm not too sure about other P2P protocols, but here's a stab at answering the question for BitTorrent:
You should be able to glean some stats from a BitTorrent tracker log, in the case where the tracker was centralised (as opposed to decentralised tracker, or where a decentralised hash table is used).
To wrap your head around the logs, have a look at one of the many log analyzers, like BitTorrent Tracker Log Analyzer.
As for actual data, you can find them all over the web. There's a giant RedHat9 tracker log here ☆, for instance. I'd search Google for "bittorrent tracker log".
☆ The article Dissecting BitTorrent: Five Months in a Torrent's Lifetime on that page also looks interesting.
Another way of appropaching this is to simulate availability mathematically. Availability will follow some powerlaw distribution, e.g. the vast majority of nodes are available very rarely and for short periods of time, and a very few nodes are available nearly always over long periods.
Real world networks will of course have many other types of patterns in the data so this is not a perfect simulation, but I figure it's pretty good.
I've found two web-sites that have what I was looking for. http://p2pta.ewi.tudelft.nl/pmwiki/?n=Main.Home and http://www.cs.uiuc.edu/homes/pbg/availability/
I started working on a project in which i must tag documents with keywords, and it is really hard and time consuming if you do it manually (specially if you have thousands of documents). So I am planning to automatize the process (knowing that the result would not perfect but at least it gives you some suggested tags ).
In the latest firefox version they implemented a system like this (when you bookmark a page, it suggests you some tags).
yahoo term extraction service is also a great example
So if any body can help me get around this problem I would really appreciate the help. Or if someone know about the firefox tagging system a little bit of help would be great.
Would a statistical algorithm work? Something Bayesian perhaps? I know they're used in spam filtering, maybe you can adapt a Bayes filter to suit your needs.
At the very least, you could suggest words that are used frequently but are not common words in English (he, she, I, and, it, then, or, etc...)