Read large .sql file in Java and write the data in database. - database

I have to read a large. Sql file (50 GB) in Java and write the content in a database. I tried it with small files (200 MB), it works but only for the first file, by the second file it becomes too slowly and doesn't terminate correctly (OOM Java heap space) I changed the xmx to 6144m but it still slowly by the first file. Can I refresh the memory after every iteration.

Related

How to configure the configuration file in /var/lib/taos/vnode/vnode2/wal/

I'm using TDengine 3.0. Now it is found that a large amount of 0000000.log is generated under /var/lib/taos/vnode/vnode2/wal/, which takes up a lot of space.
How should the log file be configured, and how should the file be cleaned up?
you could set WAL_RETENTION_PERIOD value to 0 then each WAL file is deleted immediately after its contents are written to disk. it would decrease the space immediately.
from https://docs.tdengine.com/taos-sql/database/

Multithreaded compression, random access and on-the-fly reading

I have a program running on linux which generates thousand of text files. I want these files to be packed into a single (compressed) file.
The compressed file will later be opened by a C program, which needs to access specific files inside that container, in a random fashion.
The whole thing is working as follows:
Linux program generates thousands of small files
zip -9 out.zip *
C program with libzip accesing specific files inside .zip, depending on what the user requests. These reads are done on memory (no writing decompressed files to disk).
Works great. However, it takes about ~20 minutes for the compression to finish. Because such compression runs on a 40-core server, I have been experimenting with lbzip2 with excellent results in terms of both compression ratio and speed. I have also used zip -0 to pack all the .bz files into a single .zip container, which I assume is a better option than tar because of random access.
So my question is, how can I read .bz files compressed inside a .zip file? As far as I can tell, gzopen takes a file path as first argument.
You could just stick with your current zip format for random access. Run separate zip commands individually on each text file to turn them into many single entry zip files. Launch all those at once, and your 40 cores will be kept busy until done. Once done, use zipmerge to combine them all into a single zip file.

How operating system store file that was edited by user?

I know that file system use clusters (n x sectors (512 B) usualy 4KB in size) for storing files. If I have file of size 5 KB then it use two cluster to store and remaining space is called slack space. My question is related to situation where user read file from disk, modify (add few characters) and save this file again. What will happened, will OS (overwrite) write file from location from it started to read file, or file will be writen in new cluster completely, and address of file starting cluster will be erased and replaced with new cluster address.
new part:
I just read in a book "Information technologie:An Introduction for Today’s Digital World" that if file use 2 bloks (clusters) and second file use 4 consecutive blocks after first file. First file is edited and modified, his file size increased to total of 3 blocks. This file will be writen after second file and previously occupied 2 blocks are free. But still don t know what will happend if I for example increase file with one character and file is still smaller then total of 2 blocks. Will this data be added on existing file, to existing first two blocks, or it will be stored on new disk physical location (new 2 blocks)?
When user store file it will occupy some space on disk (cluster = combine several sectors = 4 KB since sector is usually 512 bytes). If file take 3KB then 1KB stay unused in this cluster. Now what will happened if I increase little file adding some data to this file. Answer now depend of procedure that user use to modify file.
1. If I manualy add data to file (using echo "some text" >> filename) this data will add this data in existing cluster since there is 1KB of space availabile. If file site increase it will take another free sectors (file use "extents" to address all this sectors)
2. If i use text editor it will copy file on other location on disc (because of multiuser and situation when two users access same file in a same time). Previous location will be "free" (content in sector stay but File system don t have reference to that) and replace with new location on disk.
Since majority of users use some editor for editing file then scenario 2 is most common.

size limit with output redirection or files created with fopen?

Redirecting the output of a program to a file:
program > file.log 2>1&
Does not include all the rows I see when running on the console without redirection.There are no errors . Windows 10. Roughly get 50k rows in a file of 1,800 KB.
I get more rows in the file if I reduce the size of each row (rounding of numbers(.
I did try handling the file directly with fopen, but I still not get all the output.
program > file.log 2>1&
Expected result: see in the log file the same output I see displayed on the console.
Actual result: a log file that is truncated, either by redirecting the output console or creating the file directly with fopen. No issues seen on sterr or running the program in debug mode.
Having done redirection a lot, I can say with reasonable confidence you have one of exactly four issues.
1) You ran out of disk space.
2) You ran out of disk space quota.
3) You reached the maximum file size for that volume. Note that FAT32 (includes almost all USB sticks) has a maximum file size of 2GB.
4) You are saving to NTFS and need to defragment your hard disk.

Python reading of files

I am new with Python and I am facing my first troubles.
I have to read some .dat files (100), and each file contains a set of 5000 power traces. The total amount of memory taken by the files is almost 10 GB, so I cannot read the files all toghether because I fill the RAM. So, the np.fromfile method with a for loop in every files is not usefull.
I would like to make a memory mapping, reading just few files at time, but I need to handle the data at the same time.
Do you have some suggestion?
Cheers

Resources