Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
My employer runs a Hadoop cluster, and as our data is rarely larger than 1GB, I have found that Hadoop is rarely needed to meet the needs of our office (this isn't big data), but my employer seems to want to be able to say we're using our Hadoop cluster, so we're actively seeking out data that needs analysis using our big fancy tool.
I've seen some reports saying that anything less than 5tb shouldn't utilize hadoop. What's the magic size where Hadoop becomes a practical solution to data analysis?
There isn't something like magic size. Hadoop is not only about the amount of data, it include resources and processing "cost". It's not the same process one image that could require a lot of memory and CPU than parse a text file. And haoop is being used for both.
To justify the use of hadoop you need to answer the follow questions:
Is your process able to run in one machine and complete the work on time ?
How fast is your data growing?
It's not the same read one time by day 5TB to generate a report than to read 1GB ten times by second from a customer facing API. But if you haven't facing these kind of problems before, very probably that you don't require use hadoop to process your 1GB :)
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have few doubts on database storage techniques:
How to store CPU usage activity to consider it for later use?
How to store RAM usage variation for a certain amount of time?
Similarly, how to store Disk usage?
All these data will be later used for ANOVA test.
I am trying to get these values from a c# application which will be monitoring the activities of a system for a certain amount of time.
A much better idea is to use the Performance Manager built into Windows (perfmon.exe). You can set it to record many performance items including the three you mention (CPU and RAM by program as well as in total). There is also a free analyser called PAL at Codeplex which can help you set the recording and then analyse it for you.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
For a small start-up mobile app/website what options are there for storing its data? I.e. Physical server or cloud hosted data base such as azure.
Any other options or insight would be helpful thank you!
Edit:
For some background I'm looking at something that users could regularly upload data to and consumers could query to find results through an app or website.
I guess it depends on your work load and also on the your choice of data store. Generally, SQL based storage are costlier on cloud based solution due to the fact that those can be only vertically upgraded whereas no-sql ones are cheaper.
So according to me you should first decide on your choice of data-store, which depends on following factors:
The type of data; is your data structured or it falls under non-structured category?
Operations that you will perform on the data. Do you have any transactional use-cases?
Write/Read pattern; is it a read heavy use case or a write heavy one ?
These factors should help you decide on an appropriate data-store. Each database has its own set of advantages and disadvantages. The trick is to choose one based on your use cases and above mentioned factors.
Hope it helps.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Basically my application needs to dump data daily into a database. But for any data written down, there is no need to update.
Hence, is appending to csv or json file sufficient for the purpose. Or it will be more computationally efficient to write in standard SQL?
Edit
Use-Case Update
I am expecting to store one entry of for each particular activity count daily. There are about 6-8 activities.
It is exactly like a log in some sense. I would like to perform some analysis with the trend of activities for example. There is no relations between different activities though.
If say in some cases there might be a need for update, would that imply a proper database will be more suitable rather than text file?
It depends on the nature of the data, but there may be another style of database other than an SQL one which could be suitable, like MongoDB which essentially stores JSON objects.
SQL is great when you need entities to have relationships to each other, or if you can take advantage of the type of select queries it can provide you with.
Database systems do have some overhead and could have some gotchas you might not expect, like loading up a heap of crap into memory so it's ready to be searched.
But storing text files can have drawbacks, like it might become difficult to manage your data in the future.
It basically sounds like your use-case is similar to logging, in which case dumping it into a file is fine.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am using redis to store highly requested info, but I want to store less-requested stuff in a list/set on the disk. I have been looking around with no luck. Memcached, Riak etc. don't seem to have list/set datatypes. Is there a database that have those features?
Thanks.
Run another Redis instance and configure it with AOF on. Append-only file.
You can read more about it here:
http://redis.io/topics/persistence
Scroll half way down, there is alot of good information on it.
Append-only file
Snapshotting is not very durable. If your computer
running Redis stops, your power line fails, or you accidentally kill
-9 your instance, the latest data written on Redis will get lost. While this may not be a big deal for some applications, there are use
cases for full durability, and in these cases Redis was not a viable
option. The append-only file is an alternative, fully-durable strategy
for Redis. It became available in version 1.1. You can turn on the AOF
in your configuration file: appendonly yes From now on, every time
Redis receives a command that changes the dataset (e.g. SET) it will
append it to the AOF. When you restart Redis it will re-play the AOF
to rebuild the state.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have the following scenario. I need a db to store XML messages that have been created by a reader. I then want to use a transport (wcf) to read the db external to the populating app and send the messages to a central db Generally the db needs to run on mono, and windows.
I did look at sqlite3, and it seemed to fit all my requirements, but i'm reading its not so good on multi process access and t's moving away from my sweet spot, these last couple of days.
Thanks.
Have you considered just using XML to store the data? It doesn't get any more portable than that and will work fine as long your client-side storage needs are simple. E.g. not a large amount of many domain objects that need to be stored.
Additionally using an XML data store solves a lot of setup and installation headaches. You simply reference a file (or files) relative to your executable. You don't need to worry about installing db engines for a variety of platforms and then worry about upgrading.
WOuld it be feasible to give each process their own sqlite3 database? They all ultimately use the central database anyway, right?
Have a look at Firebird.
You can use it as an embedded engine just like SQLite, but it can scale to a full blown server as well.
The only drawback is, that the documentation is a mess