I've been assigned to take over 5k of csv files and merge them to create seperate files which contain transposed data with each filename becoming a column in a new file (the source column 1 being extracted from each file as the data) and rows = dates.
I was after some input/suggestions on how to accomplish this..
Example details as follows:
File1.csv -> File5000.csv
Each file contains the following
Date, Quota, Price, % Value, BaseCost,...etc..,Units
'date1','value1-1','value1-2',....,'value1-8'
'date2','value2-1','value2-2',....,'value2-8'
....etc....
'date20000','value20000-1','value20000-2',....,'value20000-8'
The resulting/merged csv file(s) would look like this:
Filename- Quota.csv
Date,'File1','File2','File3',etc.,'File5000'
'date1','file1-value1-1''file2-value1-1','file3-value1-1',
etc.,'File5000-value20000-1'
'date20000','file1-value20000-1','file2-value20000-1','file3-value20000-1',
etc.,'File5000-value20000-1'
Filename Price,csv
Date,'File1','File2','File3',etc.'File5000'
'date1','file1-value2-1''file2-value2-1','file3-value2-1',
etc.,'File5000-value2-1'
'date20000','file1-value20000-2','file2-value20000-2','file3-value20000-2',
etc.,'File5000-value20000-2'
....up to Filename: Units.csv
Date,'File1','File2','File3',etc.'File5000'
'date1','file1-value2-8''file2-value2-8','file3-value2-8',
etc.,'File5000-value20000-8'
'date20000','file1-value20000-8','file2-value20000-8','file3-value20000-8',
etc.,'File5000-value20000-8'
I've been able to use an array contruct to reformat the data, but due to the shear number of files and entries it uses way too much RAM - the array gets too big, and this approach is not scalable.
I was thinking of simply loading each of the 5,000 files one at a time and extracting each line 'one at a time' per file, then outputing the results to each new files 1-8 row-by-row, however this may take an extremely long time to convert the data even on an SSD drive with over 80million lines of data in 5k+ files.
The idea was it would load File1.csv, extract the first line, store the Date and first column data into a simple array. Then load the second File2.csv, extract the first line, check if the Date matches and if so store the first column data in the same array....repeat for all 5k files and once completed store the array into a new file Column1-8.csv. Then repeat each file again for the corresponding dates and only extract the first data column of each file to add to the Value1.csv file. Then repeat the whole process for Column2 data, up to Column8....taking forever :(
Any ideas/suggestions on approach via scripting language?
Note: The machine it will likely run on only has 8GB RAM, using *nix.
We have survey data for each survey on a server. The data is separated with one method I can download the column names (via pooling) and with another I can download the data (via SSE). I have accomplished to write a procedure that creates a datatable dynamically for each survey I choose to download. Now I have to get the data into that table. For this I have streamed the data via SSE into a List(of String) where each element comprises a comma separated string.
This looks like
For Each x As String In stringList
Console.WriteLine(x)
Next
would give me
(1)data:[1,2,1,5,2,6,John,Winchester,234]
(2)data:[5,3,2,4,1,6,Mike,Lenchester,555]
...
Each Element of the List is a dataset I have to put into one column of my datatable. So I guess I have to loop through the List Object and the pick each element between the comma and write them into the columns.
So my problem now is to get the data into the database.
Usually I provide an approach but this time I have no clue how to start.
I tried to experiment with this
.Parameters.Add("#id", SqlDbType.varchar(max)).Value = x
but ended up in frustration.
Could anyone give me something to start with? The data size is up to 500MB if I store it in a .txt file.
I am working on a case where I have to convert CSV file into another CSV with custom format where I am using bindy component and its good with annotation,
MyRoute:
final DataFormat inputCSV = new BindyCsvDataFormat(InputCSV.class);
final DataFormat outputCSV = new BindyCsvDataFormat(OutputCSV.class);
from("file:inbox/inputFile?fileName=inputProducts.csv&noop=true")
.split().tokenize("\n", 500)
.unmarshal(inputCSV)
.bean(Processor.class, "processCSV")
.marshal(outputCSV)
.to("file:inbox/outputFile?fileExist=append&fileName=outProduct.csv");
}
Here I'm getting different output every time
Input CSV file contain 3000 row
1) its keep repeating data again and again its produce 7000+ row
2)If I do chunk size 800 its work as 21row,100row and random row with repeating row.
3)If I do 1000 chunk size its not working, nothing is hpn.
4)I don't know how it is work Thread.sleep(5000) more time more repeating row.
5)And after every chunks its generate header which is I want to generate once.
I also try with threads() and streaming().
My question is why split chunks not working as it has to and why its reading data again and again?
I am downloading MODIS LST data in Matlab. The outputted data I have appears like this:
I have no problems with downloading the data, just how can I save this data into a new array. I want this data to just go into the first column, next set into the second etc...
First, you have to convert your matrix from class Uint16/unit8 to double.
lst =double(your_matrix_name);
then, you can arrange the data into an array of desired columns and rows.
lst1=reshape(lst, [rows columns]);
eg: lst1=reshape(lst, [365 1]); for 365 rows and 1 column
Below I have linked my CSV for a store (woocommerce), I have columns C and D pulling data from other columns. So when I export as CSV and upload to woocommerce the data from those cells isn't represented correctly, what I need to do is make those cells actually contain the text that is displayed using the concentrate function.
Is there an easy way to go about this, or am I bound to copy paste from two columns and add text that repeats?
https://docs.google.com/spreadsheets/d/189DoiV2LwV5JrXqPVAZTL9ww5hAq2Ws7_IdWqsd7mqo/edit?usp=sharing
I have figured it out. I have selected the whole column that used the formula to generate the value (text) and copied it, then I have pasted it somewhere else in the same document, deleted the original column and pasted the value column in its place.