What is the difference between having one data file or multiple data files for a tablespace? - database

Oracle allow to create table space with multiple data files. what is the different with one data file size 1TB and 2 data files 500GB each? Is it any performance gain?

Performance? Could be. If you have one large (or two smaller) datafiles on the same hard disk, that will probably run somewhat slower than having two smaller datafiles on diferent hard disks. You know, you & me accessing data at the same time. HDD head will have to "jump" from one place to another to send data to both of us. If those were two disks, there's a chance that each disk will provide data separately and that would be faster.

Related

Reed Solomon Erasure Encoding and Replication Factor

I'm researching distributed file system architectures and designs. Quite a few DFS(s) I've come across usually have the following architecture:
A namenode or metadata server used to manage the location of data blocks / chunks as well as the hierarchy of the filesystem.
A data node or data server used to store chunks or blocks of data belonging to one or more logical files
A client that talks to a namenode to find appropriate data nodes to read/write from/to.
Many of these systems have two primary variants, a block size and a replication factor.
My question is:
Are Replication Factor and Forward Error Correction like Reed Solomon Erasure Encoding compatible here? Does it makes sense to use both techniques to ensure high availability of data? Or is it enough to use one or hte other (what are the trade offs?)
Whether you can mix and match plain old replication and erasure codes is dependent on what the distributed file system in question offers in its feature set but they are usually mutually exclusive.
Replication is simple in the sense that the file/object is replicated as a whole to 'n' (the replication factor) data nodes. Writes go to all nodes. Reads can be served from any one of the nodes individually since they host the whole file. So you can distribute different reads among multiple nodes. There is no intermediate math involved and is mostly I/O bound. Also, for a given file size, the disk usage is more (since there are 'n' copies).
Erasure codes are complex in the sense that parts of the file/object are encoded and spread among the 'n' data nodes during writes. Reads need to fetch data from more than one node, decode it and reconstruct the data. So math is involved and can become CPU bound. Compared to replication, the disk usage is less but so is the ability to tolerate faults.

Read a file after write and closing it in C

My code does the following
do 100 times of
open a new file; write 10M data; close it
open the 100 files together, read and merge their data into a larger file
do steps 1 and 2 many times in a loop
I was wondering if I can keep the 100 open w/o opening and closing them too many times. What I can do is fopen them with w+. After writing I set position the beginning to read, after read I set position to the beginning to write, and so on.
The questions are:
if I read after write w/o closing, do we always read all the written data
would this save some overhead? File open and close must have some overhead, but is this overhead large enough to save?
Bases on the comments and discussion I will talk about why I need to do this in my work. It is also related to my other post
how to convert large row-based tables into column-based tables efficently
I have a calculation that generates a stream of results. So far the results are saved in a row-storage table. This table has 1M columns, each column could be 10M long. Actually each column is one attribute the calculation produces. At the calculation runs, I dump and append the intermediate results the table. The intermediate results could be 2 or 3 double values at each column. I wanted to dump it soon because it already consumes >16M memory. And the calculate needs more memoy. This ends up a table like the following
aabbcc...zzaabbcc..zz.........aabb...zz
A row of data are stored together. The problem happens when I want to analyze the data column by column. So I have to read 16 bytes then seek to the next row for reading 16 bytes then keep on going. There are too many seeks, it is much slower than if all columns are stored together so I can read them sequentially.
I can make the calculation dump less frequent. But to make the late read more efficent. I may want to have 4K data stored together since I assume each fread gets 4K by default even if I read only 16bytes. But this means I need to buffer 1M*4k = 4G in memory...
So I was thinking if I can merge fragment datas into larger chunks like that the post says
how to convert large row-based tables into column-based tables efficently
So I wanted to use files as offline buffers. I may need 256 files to get a 4K contiguous data after merge if each file contains 1M of 2 doubles. This work can be done as an asynchronous way in terms of the main calculation. But I wanted to ensure the merge overhead is small so when it runs in parallel it can finish before the main calculation is done. So I came up with this question.
I guess this is very related to how column based data base is constructed. When people create them, do they have the similar issues? Is there any description of how it works on creation?
You can use w+ as long as the maximum number of open files on your system allows it; this is usually 255 or 1024, and can be set (e.g. on Unix by ulimit).
But I'm not too sure this will be worth the effort.
On the other hand, 100 files of 10M each is one gigabyte; you might want to experiment with a RAM disk. Or with a large file system cache.
I suspect that huger savings might be reaped by analyzing your specific problem structure. Why is it 100 files? Why 10 M? What kind of "merge" are you doing? Are those 100 files always accessed in the same order and with the same frequency? Could some data be kept in RAM and never be written at all?
Update
So, you have several large buffers like,
ABCDEFG...
ABCDEFG...
ABCDEFG...
and you want to pivot them so they read
AAA...
BBB...
CCC...
If you already have the total size (i.e., you know that you are going to write 10 GB of data), you can do this with two files, pre-allocating the file and using fseek() to write to the output file. With memory-mapped files, this should be quite efficient. In practice, row Y, column X of 1,000,000 , has been dumped at address 16*X in file Y.dat; you need to write it to address 16*(Y*1,000,000 + X) into largeoutput.dat.
Actually, you could write the data even during the first calculation. Or you could have two processes communicating via a pipe, one calculating, one writing to both row-column and column-row files, so that you can monitor the performances of each.
Frankly, I think that adding more RAM and/or a fast I/O layer (SSD maybe?) could get you more bang for the same buck. Your time costs too, and the memory will remain available after this one work has been completed.
Yes. You can keep the 100 files open without doing the opening-closing-opening cycle. Most systems do have a limit on the number of open files though.
if I read after write w/o closing, do we always read all the written data
It depends on you. You can do an fseek goto wherever you want in the file and read data from there. It's all the way you and your logic.
would this save some overhead? File open and close must have some overhead, but is this overhead large enough to save?
This would definitely save some overhead, like additional unnecessary I/O operations and also in some systems, the content which you write to file is not immediately flushed to physical file, it may be buffered and flushed periodically and or done at the time of fclose.
So, such overheads are saved, but, the real question is what do you achieve by saving such overheads? How does it suit you in the overall picture of your application? This is the call which you must take before deciding on the logic.

Data distribution in btrfs single profile array: using file instead of block level?

I have an array of 3 different drives which I use in single profile (no raid). I don't use raid because the data isn't that important to spend some extra money for additioinal drives.
But what I could not figure out exactly is on what granularity the data is distributed on the 3 drives.
I could find this on the wiki page:
When you have drives with differing sizes and want to use the full
capacity of each drive, you have to use the single profile for the
data blocks, rather than raid0
As far as I understand this means that not the whole files are distributed/allocated on one of the 3 drives but each of the file's data blocks.
This is unfortunate because losing only 1 drive will destroy the whole array. Is there a possibility to balance a single profile array at a file level?
I would be fine with the risk of losing all files on 1 drive in the array but not losing the whole array if 1 drive fails.

what is a sequential write and what is random write

I want to know what exactly is sequential write and what is random write in definition. I will be even more helpful with example. I tried to google the result. But not much google explanation.
Thanks
When you write two blocks that are next to each-other on disk, you have a sequential write.
When you write two blocks that are located far away from eachother on disk, you have random writes.
With a spinning hard disk, the second pattern is much slower (can be magnitudes), because the head has to be moved around to the new position.
Database technology is (or has been, maybe not that important with SSD anymore) to a large part about optimizing disk access patterns. So what you often see, for example, is trading direct updates of data in their on-disk location (random access) versus writing to a transaction log (sequential access). Makes it more complicated and time-consuming to reconstruct the actual value, but makes for much faster commits (and you have checkpoints to eventually consolidate the logs that build up).

Will writing million times to a file, spoil my harddisk?

I have a IO intensive simulation program, that logs the simulation trace / data to a file at every iterations. As the simulation runs for more than millions of iterations or so and logs the data to a file in the disk (overwrite the file each time), I am curious to know if that would spoil the harddisk as most of storage disk has a upper limit to write/erase cycles ( eg. flash disk allow up to 100,000 write/erase cycles). Will splitting the file in to multiple files be a better option?
You need to recognize that a million write calls to a single file may only write to each block of the disk once, which doesn't cause any harm to magnetic disks or SSD devices. If you overwrite the first block of the file one million times, you run a greater risk of wearing things out, but there are lots of mitigating factors. First, if it is a single run of a program, the o/s is likely to keep the disk image in memory without writing to disk at all in the interim — unless, perhaps, you're using a journalled file system. If it is a journalled file system, then the actual writing will be spread over lots of different blocks.
If you manage to write to the same block on a magnetic spinning hard disk a million times, you are still not at serious risk of wearing the disk out.
A Google search on 'hard disk write cycles' shows a lot of informative articles (more particularly, perhaps, about SSD), and the related searches may also help you out.
On an SSD, there is a limited amount of writes (or erase cycles to be more accurate) to any particular block. It's probably more than 100K to 1 million to any given block, and SSD's use "wear loading" to avoid unnecessary "writes" to the same block every time. SSD's can only write zeros, so when you "reset" a bit to one, you have to erase the whole block. [One could put an inverter on the cell to make it the other way around, but you get one or t'other, so it doesn't help much].
Real hard disks are more of a mechanical device, so there isn't so much of a with how many times you write to the same place, it's more the head movements.
I wouldn't worry too much about it. Writing one file should be fine, it has little consequence whether you have one file or many.

Resources