First Of all thanks for stopping by and here is my qusetion.
I'm working on a project where i use an array that contains much information (About 300 variables every variable is about 25 Characters).So my question here is What is the best way to store it...??
I Have two possible way and please tell me which is better...
First Way : Make a normal local where i can store all the needed information and of course it will be stored on the RAM (As far as I Know).
Second Way : To store them in a file and whenever i need the array i simply read the data form the file and get the array.
Note That : The array is used occasionally and not every time.
My Second Question is :
Is there a possible error that may occur to the Hard Drive if i made the program write and read so many time in a short period of time.And if so what is the min period i can write and read safely without any possible error...???
Thanks In Advance
Reading and writing files are very slow operation in comparison to RAM access. 300 strings with 25 elements each wouldn't consume too much space with modern RAM. If you need access to this data rarely (once per 10 minutes or once per hour), you, probably, could keep this data on HDD, but, to my opinion, it will be simpler for you to keep it in RAM for all the time.
You can find an answer to your second question here.
This data you have mentioned can be easily stored in memory without any memory crash. If there is need to store much larger amount of data, then NSKeyedArchiver (used to stored object in any format in NSData form to disk) can be used or CoreData framework can also be used. CoreData framework also supports caching and it is faster http://nshipster.com/nscoding/. Hope this helps.
Related
I'm a bit confused here... I'm being offered to get into a project, where would be an array of certain sensors, that would give off reading every millisecond ( yes, 1000 reading in a second ). Reading would be a 3 or 4 digit number, for example like 818 or 1529. This reading need to be stored in a database on a server and accessed remotely.
I never worked with such big amounts of data, what do you think, how much in terms of MBs reading from one sensor for a day would be?... 4(digits)x1000x60x60x24 ... = 345600000 bits ... right ? about 42 MB per day... doesn't seem too bad, right?
therefor a DB of, say, 1 GB, would hold 23 days of info from 1 sensor, correct?
I understand that MySQL & PHP probably would not be able to handle it... what would you suggest, maybe some aps? azure? oracle?
3 or 4 digit number =
4 bytes if you store it as a string.
2 bytes storing it as a 16bit (0-65535) integer
1000/sec -> 60,000/minute -> 3,600,000/hour, 86,400,000/day
as string: 86,400,000 * 4 bytes = 329megabytes/day
as integer:86,400,000 * 2bytes = 165megabytes/day
Your DB may not perform too well under that kind of insert load, especially if you're running frequent selects on the same data. optimizing a DB for largescale retrieval slows things down for fast/frequent inserts. On the other hand, inserting a simple integer is not exactly a "stressful" operation.
You'd probably be better off inserting into a temporary database, and do an hourly mass copy into the main 'archive' database. You do your analysis/mining on that main archive table, with the understanding that its data will be up to 1 hour stale.
But in the end, you'll have to benchmark variations of all this and see what works best for your particular usage case. There's no "you must do X to achieve Y" type advice in databaseland.
Most likely you will need not to keep the data with such a high discretization for a long time. You may use several options to minimize the volumes. First, after some period of time you may collapse hourly data into min/max/avg values; you may keep detailed info only for some unstable situations detected or situations that require to keep detailed data by definition. Also, many things may be turned into events logging. These approaches were implemented and successfully used a couple of decades ago in some industrial automation systems provided by the company I have been working for at that time. The available storage devices sizes were times smaller than you can find today.
So, first, you need to analyse the data you will be storing and then decide how to optimize it's storage.
Following #MarcB's numbers, 2 bytes at 1kHz, is just 2KB/s, or 16Kbit/s. This is not really too much of a problem.
I think a sensible and flexible approach should be to construct a queue of sensor readings which the database can simply pop until it is clear. At these data rates, the problem is not the throughput (which could be handled by a dial-up modem) but the gap between the timings. Any system caching values will need to be able to get out of the way fast enough for the next value to be stored; 1ms is not long to return, particularly if you have GC interference.
The advantage of a queue is that it is cheap to add something to the queue at one end, and the values can be processed in bulk at the other end. So the sensor end gets the responsiveness it needs and the database gets to process in bulk.
İf you do not need relational database you can use a NoSQL database like mongodb or even a much simper solution like JDBM2, if you are using java.
I need to read a Big List of Object(each object contains 2 String and 1 Int32) (i extract those object from a XML WebPage), it should contains like 10000 Objects.
I need take from this list about 20 records each minute.
I would know if in terms of performance and for the memory safe , it's better keep this List in memory and take those 20 records (every min) or download The Xml from WebPage , read it from Local Disk each minute and find those 20 records.
Any other solutions would be accepted too :)
Update : Forgot to say i'm talking about a Winform C# Application
The first rule of optimisation is to measure before optimising.
When you load all the objects into memory how much memore does it consume? How much more memory do you need for the rest of your app? How much memory does the machine have? Are you running in a 32 or 64 bit address space? How much memory do any other required apps need at the same time?
Once you've answered these questions you can then start to break down your approach to optimisation. In this case you need 20 records each minute, any 20 records? Do you need to iterate through all 10,000 to find the 20? How often does the XML file change?
P.S. Look at XmlReader vs XmlDocument for parsing the XML file.
For performance, you'll be much better off to avoid disk I/O.
For memory, both solutions will be the same unless you process the file a line at a time (which is going to give horrible performance, since you won't be able to hash/index the results).
If there is a "key" I would keep them in memory in a dictionary.
If in doubt, profile.
I'm generating data at a rate of 4096 bytes every 16.66ms. This data needs to be stored constantly and will be read randomly. It would be nice to have it in a relational database, but I think doing so many inserts would create too much overhead for the processor I'm working with (ARM11). And I don't need all the features that something like SQLite offers.
In fact, just writing this stuff to a file seems tempting because while most of the time I'll just be writing lots of data, when I actually do need to read data, I can just seek to the block I need. However, I just know I'm going to run into some problem along the way. Especially when I leave this thing running for a day and end up with gigabytes of data.
This just seems like a very naive solution to my problem and I need someone else to tell me so I can start thinking about a better solution. Thanks.
You should add some more details to get better answers. What are you use cases, do you need ACID, what is your storage you are writing to, etc.
What is your OS, do you only write fixed size records. Just saying like I will do random access and this is my write rate is something is much too unspecific.
You are writing at 240 kb/s, which are 20 GB/day.
If you have just fixed size records, only append data and use Linux, then a plain file is great. Perhaps think about using some fsync calls, if your storage is fast enough.
Why is it that every RDBMS insists that you tell it what the max length of a text field is going to be... why can't it just infer this information form the data that's put into the database?
I've mostly worked with MS SQL Server, but every other database I know also demands that you set these arbitrary limits on your data schema. The reality is that this is not particulay helpful or friendly to work with becuase the business requirements change all the time and almost every day some end-user is trying to put a lot of text into that column.
Does any one with some inner working knowledge of a RDBMS know why we just don't infer the limits from the data that's put into the storage? I'm not talking about guessing the type information, but guessing the limits of a particular text column.
I mean, there's a reason why I don't use nvarchar(max) on every text column in the database.
Because computers (and databases) are stupid. Computers don't guess very well and, unless you tell them, they can't tell that a column is going to be used for a phone number or a copy of War and Peace. Obviously, the DB could be designed to so that every column could contain an infinite amount of data -- or at least as much as disk space allows -- but that would be a very inefficient design. In order to get efficiency, then, we make a trade-off and make the designer tell the database how much we expect to put in the column. Presumably, there could be a default so that if you don't specify one, it simply uses it. Unfortunately, any default would probably be inappropriate for the vast majority of people from an efficiency perspective.
This post not only answers your question about whether to use nvarchar(max) everywhere, but it also gives some insight into why databases historically didn't allow this.
It has to do with speed. If the max size of a string is specified you can optimize the way information is stored for faster i/o on it. When speed is key the last thing you want is a sudden shuffling of all your data just because you changed a state abbreviation to the full name.
With the max size set the database can allocate the max space to every entity in that column and regardless of the changes to the value no address space needs to change.
This is like saying, why can't we just tell the database we want a table and let it infer what type and how many columns we need from the data we give it.
Simply, we know better than the database will. Supposed you have a one in a million chance of putting a 2,000 character string into the database, most of the time, it's 100 characters. The database would probably blow up or refuse the 2k character string. It simply cannot know that you're going to need 2k length if for the first three years you've only entered 100 length strings.
Also, the length of the characters are used to optimize row placement so that rows can be read/skipped faster.
I think it is because the RDBMS use random data access. To do random data access, they must know which address in the hard disk they must jump into to fastly read the data. If every row of a single column have different data length, they can not infer what is the start point of the address they have to jump directly to get it. The only way is they have to load all data and check it.
If RDBMS change the data length of a column to a fixed number (for example, max length of all rows) everytime you add, update and delete. It is an extremely time consuming
What would the DB base its guess on? If the business requirements change regularly, it's going to be just as surprised as you. If there's a reason you don't use nvarchar(max), there's probably a reason it doesn't default to that as well...
check this tread http://www.sqlservercentral.com/Forums/Topic295948-146-1.aspx
For the sake of an example, I'm going to step into some quicksand and suggest you compare it with applications allocating memory (RAM). Why don't programmers ask for/allocate all the memory they need when the program starts up? Because often they don't know how much they'll need. This can lead to apps grabbing more and more memory as they run, and perhaps also releasing memory. And you have multiple apps running at the same time, and new apps starting, and old apps closing. And apps always want contiguous blocks of memory, they work poorly (if at all) if their memory is scattered all over the address space. Over time, this leads to fragmented memory, and all those garbage collection issues that people have been tearing their hair out over for decades.
Jump back to databases. Do you want that to happen to your hard drives? (Remember, hard drive performance is very, very slow when compared with memory operations...)
Sounds like your business rule is: Enter as much information as you want in any text box so you don't get mad at the DBA.
You don't allow users to enter 5000 character addresses since they won't fit on the envelope.
That's why Twitter has a text limit and saves everyone the trouble of reading through a bunch of mindless drivel that just goes on and on and never gets to the point, but only manages to infuriate the reader making them wonder why you have such disreguard for their time by choosing a self-centered and inhumane lifestyle focused on promoting the act of copying and pasting as much data as the memory buffer gods will allow...
When developing software that records input signals (numbers) in real time, how can this data be best stored and compressed? Would an SQL engine be good for this, permitting fast data mining in the future, or are there other data formats that would be suitable or compressed enough for upto 1000 data samples per second?
I don't mind building in VC++ but ideas applicable to C# would be ideal.
It is hard to say without more info, such as, what is the source, will you be needing to query the stored data, and so on.
But for 1000 samples/sec, you should propably look at holding a few seconds of data in memory, and then writing them out in bulk to persistent storage on another thread. (Multi-processor machine recommended).
If you decide to do it via a managed language, keep the same data structure around for keeping the samples - so that the GC does not need to collect memory too often. You can get marginally better performance by using pointers and the unsafe keyword (provides direct access to the memory structure and eliminates bounds checking code for arrays).
I don't know how much CPU time is needed for you to collect each sample; and how time-critical it is to read each sample at a specified time (will they be buffered in the device you are reading from ?). If the sampling is time-critical, you have 1 ms per sample; and then you probably cannot afford the risk of the garbage collector kicking in, as it will block your thread for some time. In this case, I would go for an unmanaged approach.
SQL Server would easily be able to hold your data, or you could write them to a file. It mostly depends on what you need to do with the data at a later time. I don't know how much data each sample is, but let's assume it is 8 bytes. Then you have 8000 bytes per second to write of raw data - perhaps you have some overhead, so it could be 10 kB/s. Most storage mechanisms I can think of will be able to write data at this speed. Just make sure to write on another thread than the one that are doing the sampling.
You may want to look at time-series databases, rather than relational. These will be optimised to deal with the sort of data and usage you're considering.
Kx is a popular choice, as is Fame.