Reading specified lines below a found string - c

I'm going to create a simple database bank account in C-language but I haven't quite figured out how I'm gonna fetch data for a specific account already created and sent to a file. I was thinking of doing a search from the beginning of the file using fseek for an account number specified since all account numbers will be unique. Is there a way to read the the amount of lines specified below that account number once it is found? For e.g in my file accounts.txt there will be the accounts
Account # : 13398
First Name : Eric
Last Name : Walters
Parish : St.tofu
Year of Birth : 1980
Age : 34
Savings Period : 5 year(s)
Password : Eric1
Account # : 13398
Account balance: $0.00
====================================
I want to search through the file for the account number and fetch it along with everything else 10 lines below it and display it on the screen if this is possible then say 'aye' and point me to a certain area I should study to achieve this and when I'm successful i'll post my coding here to show what I have done.

fseek() allows you to skip a certain number of bytes in each file. If your lines are not always the same length, you will have to read the entire file, not just to search for the account numbers, but also to find the ten newlines that delimit each account. To do this, you are better off using fgets().
The steps would be something like this
foreach line in file
if line starts with "Account Number"
if the number is the one you want
print the next 10 lines
else
skip the next 10 lines
else
keep looking

Firstly, fseek is used to move the file pointer not for searching. For search text, i.e. account id in your case, there is some examples Trying to find and replace a string from file in C. To write your own code, learning the basic use of file handling functions is enough. Furthermore, since your data is structured (every 11 lines represent one account), you code can be accelarated. At last, what you are trying to do is what database software offers and it is hard too implement your own database as fast as commercial software.

You could search in the file, but that would be a bit tedious. Even more tedious if you wanted to modify the account details.
Why don't you use SQLite:
It is designed to replace fopen().
?

Related

CSV file not recognized as csv, reason nominal value not declared in header

I am trying to load a dataset in weka, I have tried many solutions such as arff format, comas etc. but it was all a failure. Could any of you give me a working solution or load this dataset according to the format.
Here is a link to dataset
Instead of using Weka's functionality for reading CSV files, you could use ADAMS (developed at the same university; I'm the lead developer) instead.
Download the adams-ml-app snapshot and then use the Weka Investigator to load/save the file:
Load it as ADAMS Spreadsheets (.csv, .csv.gz)
Save it as Arff data files (.arff, .arff.gz) or Simple ARFF data files (.arff, .arff.gz)
The Reviews column contains an erroneous 3.0M, which prevents it from becoming numeric.
If you want to have an introduction to the Weka Investigator, then take a look at my talk from the Weka User Conference 2021: Taking Weka to the next level with ADAMS .
There are too many issues with lines in this file.
In line 23, I eliminated the odd looking brackets.
I removed all single quotes (')
I eliminated all repeated double quotes ("")
In line 10474 the first two fields (before the number) didn't seem to be separated, so I added a comma.
This allowed the file to go through initial screening, but...
The file contains a lot of odd emojis. I started to eliminate them one by one, but there are clearly more of these than I wish to deal with.
Each time I got rid of one, it would read farther into the file, then stop at the next one.
If I just try to read the top of the file, the first 20 lines before we get to any of these problems, it reads fine.
My partial editing can be found here: https://www.dropbox.com/s/ij707mb23dt1jvz/googleplaystore3.csv?dl=0
I think if you clear up the remaining emojis the file should be usable.

UberEats System in C - File Handling

So, i have a school project to build the uberEats management system in C.
I have a csv file that contains the cities where the system is available and their respective code. Eg: 1, New York.
And i need to read that file, so when the user inserts one code, it associates to the specific city. Can anyone help me how to do that?
you could read the file and make a matrix of 2xno_cities, and when the user puts a number, you just do
array[input-given-by-user][1]
since the first [] would be for the row, and the second one for the column, which is where you save the cities,

ISPF/Mainframe Send File to Host with variable length

I need help with something I'm trying to do and cannot find help anywhere.
I'm trying to upload a file to Host via ISPF (ISPF -> Command -> "Send File to Host"). And the problem I'm having is that the file have variable length (it was exported from a DB2 database via a SH script) and it's not working well.
What I mean is:
In windows, the file looks like this:
This is line one
This is the second line
And this is the third
But in Host it always ends being like this:
This is line one This is
the second line and this
is the third
Or similar, depending on the "Record length" I set when allocating the data set.
I don't know if the problem is how I'm creating the file on Host. If the problem is with the send parameters.. or maybe is with the TXT file.
I tried creating the dataset with different Record Formats (F, FB, V, VB) and with all was the same.
And also tried modifing the Send parameters in here:
Send parameters
And checked the txt file, but it seems to be ok.
Well, thanks in advance for the help! and sorry for my the poor english.
UPDATE 03/18
Hi! I'm still trying to solve this. But now I have a more info!
It seems that problem is within the file exported, not the configuration of the terminal.
I'm using a linux script to export the file from a DB2 database, and I'm trying to upload it from a Windows PC (that have the E3270 terminal).
I read a lot, and noticed that the file exported from DB2 to linux only use the "New Line" code to mark an End of Line (0A in hex), while Windows use "Carriage Return + New Line" (which are "0D 0A" in hex).
Could the problem be there?
I tried creating a new txt file with Windows (which end each line with 0D 0A).. and it worked great! But I tried to modify the exported file.. adding an "space" at the end, and then changing that space hex (20) with the 0D (so I had 0D 0A.. it didn't let me "add" a new hexa).. but it didn't work. That.. throw me away the whole theory haha, but maybe I'm doing something wrong.
well, thanks!
From the Host output the file (dataset) is being considered as fixed length of 24. It needs to be specified as Variable (VB) in the send.
From here Personal Communications 6.0.0>Product Documentation>Books>Emulator User's Reference>Transferring Files it appears that you can specify this as per :-
Record Format
Valid only for VM/CMS and MVS/TSO when APPEND is not specified for
file transmission. You can select any of the following:
Default
Fixed (fixed length)
Variable (variable length)
Undefined (undefined mode for MVS/TSO only)
If you select the Default value, the record format is selected
automatically by the host system.
Specifying Variable for VM file transfer enables host disk space to be
used efficiently. Logical Record Length (LRECL)
Valid only for VM/CMS and MVS/TSO when APPEND is not specified for
file transmission.
Enter the logical record length to be used (host record byte count) in
the LRECL text box. If Variable and Undefined Mode are specified as
the record format, the logical record length is the maximum record
length within a file. The maximum value is 32767.
The record length of a file sent from a workstation to the host system
might exceed the logical record length specified here. If so, the host
file transfer program divides the file by the logical record length.
When sending a text file from a workstation to a host, if the text
file contains 2-byte workstation codes (such as kanji codes), the
record length of the file is changed because SO and SI have been
inserted.
To send a file containing long records to the host system, specify a
sufficiently long logical record length.
Because the record length of a workstation file exceeds the logical
record length, a message does not appear normally if each record is
divided. To display a message, add the following specification to the
[Transfer] item of the workstation profile:
DisplayTruncateMessage = Y
As I don't have access I can't actually look into this further but I do recall that it can be a little confusing to use the file transfer.
I'd suggest using the 32767 as the LRECL, along with variable, and perhaps having a look at the whole page that has been linked. Something on the PC side will have to know how to convert the file (ie at LF determine the length of the record and prefix the record with that record length (if I recall correctly 2 bytes/a word)) so you might have to use variable in conjunction with another selectable parameter.
If you follow the link, you will see that Record Format is part of the Defining Transfer Types, you may have to define a transfer type as per :-
Click Edit -> Preferences -> Transfer from the session window.
Click the tab for your host type or modem protocol.
The property page for the selected host or modem protocol opens. The items that appear depend on the selected host system.
Enter transfer-type names in the Transfer Type box, or select them from the drop-down list.
Select or enter the required items (see Items to Be Specified).
To add or replace a transfer type, click Save. To delete a transfer type, click Delete.
A dialog box displays, asking for confirmation. Click OK.

Fix CSV file with new lines

I ran a query on a MS SQL database using SQL Server Management Studio, and some the fields contained new lines. I selected to save the result as a csv, and apparently MS SQL isn't smart enough to give me a correctly formatted CSV file.
Some of these fields with new lines are wrapped in quotes, but some aren't, I'm not sure why (it seems to quote fields if they contain more than one new line, but not if they only contain one new line, thanks Microsoft, that's useful).
When I try to open this CSV in Excel, some of the rows are wrong because of the new lines, it thinks that one row is two rows.
How can I fix this?
I was thinking I could use a regex. Maybe something like:
/,[^,]*\n[^,]*,/
Problem with this is it matches the last element of one line and the 1st of the next line.
Here is an example csv that demonstrates the issue:
field a,field b,field c,field d,field e
1,2,3,4,5
test,computer,I like
pie,4,8
123,456,"7
8
9",10,11
a,b,c,d,e
A simple regex replacement won't work, but here's a solution based on preg_replace_callback:
function add_quotes($matches) {
return preg_replace('~(?<=^|,)(?>[^,"\r\n]+\r?\n[^,]*)(?=,|$)~',
'"$0"',
$matches[0]);
}
$row_regex = '~^(?:(?:(?:"[^"*]")+|[^,]*)(?:,|$)){5}$~m';
$result=preg_replace_callback($row_regex, 'add_quotes', $source);
The secret to $row_regex is knowing ahead of time how many columns there are. It starts at the beginning of a line (^ in multiline mode) and consumes the next five things that look like fields. It's not as efficient as I'd like, because it always overshoots on the last column, consuming the "real" line separator and the first field of the next row before backtracking to the end of the line. If your documents are very large, that might be a problem.
If you don't know in advance how many columns there are, you can discover that by matching just the first row and counting the matches. Of course, that assumes the row doesn't contain any of the funky fields that caused the problem. If the first row contains column headers you shouldn't have to worry about that, or about legitimate quoted fields either. Here's how I did it:
preg_match_all('~\G,?[^,\r\n]++~', $source, $cols);
$row_regex = '~^(?:(?:(?:"[^"*]")+|[^,]*)(?:,|$)){' . count($cols[0]) . '}$~m';
Your sample data contains only linefeeds (\n), but I've allowed for DOS-style \r\n as well. (Since the file is generated by a Microsoft product, I won't worry about the older-Mac style CR-only separator.)
See an online demo
If you want a java programmatic solution, open the file using the OpenCSV library. If it is a manual operation, then open the file in a text editor such as Vim and run a replace command. If it is a batch operation, you can use a perl command to cleanup the CRLFs.

Twitter name length in DB

I'm adding a field to a member table for twitter names for members on a site. From what I can work out the maximum twitter name length is 20 so it seems obvious that I should set the field size to varchar(20) (SQL Server).
Is this a good idea?
What if Twitter starts allowing multi-byte characters in the user names? Should I make this field nvarchar?
What if Twitter decides to increase the size of a username? Should I make it 50 instead and then warn a user if they enter a name longer than 20?
I'm trying to code defensively so that I can reduce the chances of modifying the code around this input field and the DB schema changes that might be needed.
while looking for the same info i found the following in a sort of weird place in the twitter help section (why not in the API docs? who knows?):
"Your user name can contain up to 15 characters. Why no more? Because we append your user name to your 140 characters on outgoing SMS updates and IM messages. If your name is longer than 15 characters, your message would be too long to send in a single text message."
http://help.twitter.com/entries/14609-how-to-change-your-username
so perhaps one could even get away with varchar(16)
While new accounts has a limit of 15 characters in the username and 20 characters in the name, for old accounts this limit seems to be undefined. The documentation here states:
Earlybirds: Early users of Twitter may have a username or real name longer than user names we currently allow. This is ok until you need to save changes to your account settings. No changes will save unless your user/real name is the appropriate length; this means you have to change your real name/username to meet our most modern regulations.
So you are probably better of having a long field and save yourself some time when you hit the border cases.
Nowadays, space is usually not a concern, so I'd use a mostly generic approach: use nvarchar(200).
When designing DB schemas you must think 2 steps ahead, even more than when programming. Or get yourself a good schema update strategy, then you'll be fine also with varchar(20).
Personally I wouldn't worry. Use something like 200 (or a nice round number like 256) and you won't have this problem. The limit then is on their API, so you might be best to do some verification that it is a real username anyway. That verification implicitly includes the length checking.
Twitter allows for 140 characters to be typed in as the message payload for transmission, and includes "[username]:" at the beginning of the SMS message. With an upper limit of 140 characters for the message combined with the messaging system being based on SMS, I think they would have to decrease the allowable message size to increase the username. I think it is a pretty safe bet that 20 characters would be the max username length. I'd use nvarchar just in case someone uses 16-bit characters, and maybe pad it a little. nvarchar(24) should work; I wouldn't go any higher than nvarchar(32).
If you're going to develop an app for their service, you should probably watch the messages on Twitter's API Announcements mailing list.
[opinion only]
Twitter works on SMS and the limit there is something like 256 characters, so the name has to be small to avoid hitting into the message.
nvarchar would be a good idea for all twitter text
If the real ID of a Twitterer is a cell-phone then the longest phone number is your max - 20 should easily cover it!
Defensive programming is always good :) !
[/opinion only]
There's only so much you can code defensively, I'd suggest looking at the twitter API documentation and following anything specified there. That said, from a cursory look through nowhere seems to specify the length of the username, annoyingly :/
One thing to keep in mind here is that a field using nvarchar needs twice as much space, since it needs 2 bytes to store each potential unicode character. So, a twitter status would need a size of 280 using nvarchar, PLUS some more for possible retweets, as those aren't inlcuded in the 140 char limit. I discovered this just today in fact!
For example:
RT #chatrbyte: here's some great tweet
that I'm retweeting.
The RT #chatrbyte: is not included in the 140 character limit.
So, assuming that a Twitter username has a 20 character limit, and wanting to also capture a ReTweet, a field to hold a full tweet would need to be a nvarchar of size 280 + 40 (for the username) + 8 (for the initial RT # before a retweet) +4 (for the :+space after a Retweet username) = 330.
I would say go for nvarchar(350) to give yourself a little room. That's what I am trying right now. If I'm wrong I'll update here.
I'm guessing you are managing the data entry on the Twitter name field in your application somewhere other than just in the database. If you open the field to 200 characters, you only have to change the code in one place or if you allow users to enter Twitters names with more than 20 characters, you don't have to worry about a change at all.

Resources