Iterating through a column in a CSV file in C - c

Not sure how to get loop through the columns only
Given the function header
double getMin(char csvfile[], char column[]);
return the minimum score of the specified column.
When implementing the above functions, you can assume csvfile is a valid CSV file and it can always be opened for reading, and the column parameter is a short form of the column heading and it starts with a letter (P for Participation, C for Challenge, and L for Lab) followed by a chapter number, a dot, and a section number. For example, P5.6, C5.4 and L5.25 represent the column headings 5.6 – Participation, 5.4 – Challenge, and 5.25 – Lab respectively. You shall return -2.0 (double) or 2 (integer) if the corresponding column name or student name does not exist, and -1.0 or 1 if the corresponding column is optional.
Ex. file-
Last name,First name,5.1 - Participation (11),5.2 - Participation (20), 5.1 - Challenge (0),5.2 - Challenge (9), 5.25 - Lab (10)
Alvarez,Abel,100,100,100,100,100
Mendez,Charlene,0,0,0,0,0
Zimmerman,Drew,100,100,100,100,60

set accumulator to 0.
set numberoflines to 0.
read the file line by line (but ignore first header line); goto 6 if no more lines
for each line skip n-1 commas (beware commas embedded in quotes) increment numberoflines and add the value to the accumulator
goto 3.
divide accumulator by numberoflines and return.

Related

Invalid formula - Operator "+" doesn't support TEXT + NUMBER. Operator "+" supports NUMBER + NUMBER

This was working last week but now it has an error like this:
Invalid formula - Operator "+" doesn't support TEXT + NUMBER. Operator "+" supports NUMBER + NUMBER
Current formula:
COUNT_DISTINCT(CASE
WHEN First Duration+Second Duration<=24 THEN New ID
ELSE NULL
END ) / COUNT_DISTINCT(CASE
WHEN First Duration IS NULL THEN NULL
WHEN Second Duration IS NULL THEN NULL
ELSE New ID
END )
I can only recommend checking that the First Duration and Second Duration dimensions are fields in the type number format. Otherwise, you can change it manually or create a new field to format the text dimension into a number.
I had problems where a field of type number or date, after a database update, was interpreted by Google Data Studio in text format. Apparently your formula doesn't have any syntax errors, only the dimensions used don't seem to follow the recommended format.

formatting arrays with numbers and characters

I need help turning a Decay.txt file into an array, the first 1-3 and 5th columns are numbers, the third column is "time elapsed" in integers, but the 4th column is a unit of time (milliseconds, Months, Days) but its spelled out with characters. i cant get this mixed array (numbers and characters) to transfer over to matlab
ideally id like to take the unit of time (4th column) change it to a seconds value, (i.e. hour becomes 3600 seconds) then multiply it by the number in the third column and have a final 4 column array where the 3rd column is simply the time elapsed in seconds
anyone know how to do either of these things?
ive tried
Decay = fopen('Decay.txt','r');
B = fscanf(Decay,'%f',[5 inf]);
which stops and has an error as soon as it hits the 4th column
and
Decay = fopen('Decay.txt','r');
B = fscanf(Decay,'%s',[5 inf]);
but this just creates a 5x10000 column where every single number, decimal, and letter is on its own in its own cell of the array
Your first example
Decay = fopen('Decay.txt','r');
B = fscanf(Decay,'%f',[5 inf]);
Breaks because it can't scan the fourth column (a string) as a number (%f). Your second example doesn't have numbers because you're scanning everything as a string (%s).
The correct specifier for your format should be
'%f %f %f %s %f'
However, if you call fscanfwith it, as per documentation:
If formatSpec contains a combination of numeric and character specifiers, then A is numeric, of class double, and fscanf converts each text characters to its numeric equivalent. This occurs even when formatSpec explicitly skips all numeric fields (for example, formatSpec is '%*d %s').
So this input file:
50 1.2 99 s 0
6.42 1.2 3.11 min 1
22 37 0.01 h 2
Has this (undesired) output:
>> fscanf(Decay, "%f %f %f %s %f", [5, inf])
ans =
50.0000 6.4200 110.0000 104.0000
1.2000 1.2000 1.0000 2.0000
99.0000 3.1100 22.0000 0
115.0000 109.0000 37.0000 0
0 105.0000 0.0100 0
That happens because a matrix in MATLAB can't have multiple data of different types. So, your best bet is scanning into a cell array, which can have any type inside.
B = textscan(Decay, "%f %f %f %s %f")
Returns a cell array with the appropriate types. You can use this output to convert the time data into the same unit and build your vectors/matrix. Columns 1, 2, 3 and 5 are trivial to do, just by accessing the cell B{n} for each n.
Column 4 is a cell array of cells. In each internal cell, there's the string you have. You need to apply a conversion from string to the number you need. For my example, such function would look like:
function scale = DecayScale(unit)
switch(unit)
case 's'
scale = 1;
case 'min'
scale = 60;
case 'h'
scale = 3600;
otherwise
throw('Number format not recognized');
end
end
Which you could then apply to the 4th column like:
timeScale = cellfun(#DecayScale, B{4})
And get the final time as:
timeColumn = B{3} .* timeScale

Replace values only if they are different

I have a vcf file like this:
http://www.1000genomes.org/node/101
Here's the example from that site:
##fileformat=VCFv4.0
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=1000GenomesPilot-NCBI36
##phasing=partial
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
20 1234567 microsat1 GTCT G,GTACT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
After the header lines, each line has fields that contain genotypes starting with the 10th field. The 10th field is below the NA0001 heading; the 11th field is genotype NA0002, etc. I have a file with 123 different genotypes, so going from position 10 to 133 (NA0001 until NA0123). What is shown in these fields can be 0/0, 0/1, 0/2 .... till 8/9 for instance. Now I want to replace all the non-equal ones. So I would like to keep 0/0, 1/1, 2/2, etc. And replace 0/1, 0/2, 1/2, 4/5, 4/6 etc by ./.
I would like to write this in a C script. Thought about using sed y/regexp/replacement/ but no idea how to write all those unequal values in a regular expression. And on other positions in the file there could also be these values, so really only positions 10 till 133 should be replaced. And it needs to be replaced; I will be needing the rest of the file with the new values.
Hope it is clear. Anyone any idea how to do this?
This regex should do what you want: \s(\d)[|\/](?!\1)\d: Replace matches with ./.:
Breakdown:
\s(\d) matches a space followed by a single digit, capturing the digit in capture group #1
[|\/] matches a pipe or slash (since it seems that the VCF format allows either)
(?!\1)\d uses a negative lookahead to ensure that the next character is not the same as capture group #1, and matches the digit
Caveats:
I matched a leading space and trailing : to try to ensure it matches only the intended values. I couldn't work out a good way to limit it to fields 10 and after.
Example using perl:
perl -pe 's#\s(\d)[|/](?!\1)\d:# ./.:#g' testfile.vcf > testfile_afterchange.vcf
Note: I used # as the delimiter to avoid having to escape the / characters in the regex.

Lex/Flex - Split the phone number Up?

I am making a program which got to split the phone-number apart, each part has been divided by a hyphen (or spaces, or '( )' or empty).
Exp: Input: 0xx-xxxx-xxxx or 0xxxxxxxxxx or (0xx)xxxx-xxxx
Output: code 1: 0xx
code 2: xxxx
code 3: xxxx
But my problem is: sometime "Code 1" is just 0x -> so "Code 2" must be xxxxx (1st part always have hyphen or a parenthesis when 2 digit long)
Anyone can give me a hand, It would be grateful.
According to your comments, the following regex will extract the information you need
^\(?(0\d{1,2})\)?[- ]?(\d{4,5})[- ]?(\d{4})$
Break down:
^\(?(0\d{1,2})\)? matches 0x, 0xx, (0xx) and (0x) at he beggining of the string
[- ]? as parenthesis can only be used for the first group, the only valid separators left are space and the hyphen. ? means 0 or 1 time.
(\d{4,5}) will match the second group. As the length of the 3rd group is fixed (4 digits), the regex will automatically calculate the length of the Group1 and 2.
(\d{4})$ matches the 4 digits at the end of the number.
See it in action
You can the extract data from capture group 1,2 and 3
Note: As mentionned in the comments of the OP, this only extracts data from correctly formed numbers. It will match some ill-formed numbers.

How to add leading zeros to decimal value in tsql

i have weight column in a table where weight must be inserted with following format '09.230'. Weight column is of varchar type. so value from front end comes as '9.23' it should get converted to above mentioned format i.e.(09.230). I am able to add trailing zero but adding leading zero is a problem.
This is what i have done to add trailing zero
CAST(ROUND(#Weight,3,0) AS DECIMAL (9,3))
Suppose #Weight = 6.56 output with above comes out be '6.560' but output wanted as '06.560'.
RIGHT('0'+ CONVERT(VARCHAR, CAST(ROUND(#Weight,3,0) AS DECIMAL (9,3))), 6)
This
takes your expression,
converts it to a varchar (retaining the trailing zeros, since the source data type was decimal),
adds a 0 in front of it, and
trims it to 6 characters by removing characters from the front, if needed (e.g. 012.560 -> 12.560, but 06.560 -> 06.560).
Do note, though, that this only works for numbers with at most two digits before the decimal point: 100.123 would be truncated to 00.123!

Resources