How do you store text from a .txt file into a cell array in matlab? - arrays

For this project I am doing, I have to analyze tweets off of a company's twitter page. I took the last thirty tweets of this company, and I put it into a .txt document, where each line is a different tweet. I am supposed to store all of the hashtags in a cell array, and then print out these hashtags to the command window. (The hashtags are supposed to include the phrase or words inside the hashtags as well. for example, #matlab #programming #stackoverflow.) I am really confused on how I would store them into a cell array. This is the code, that I have so far. All it does is count the number of hashtags in the entire file.
%% Collecting the hashtags
fid=fopen('twitter.txt');
hashtag=0
nextLine=(fgetl(fid));
while ischar(nextLine)
if regexp(nextLine,'#')
hashtag=hashtag+length(regexp(nextLine,'#'));
end
nextLine=(fgetl(fid));
end
Is there a way to just take the file contents and store it into a cell array with a command, or would I have to manually copy and paste the entire content of the file into something like the variable below and then use a while loop to just access the cell array and use fprintf to print each hashtag out?
hashtagArray={'#...','#..',..}

If your file contains only the text from tweets, load the whole thing into a cell array with textscan (tested with a random selection of made up tweets):
fid=fopen('twitter.txt');
C = textscan(fid,'%s');
C = C{1};
C should now be a cell array of words/hashtags (split by whitespace). We only want the hashtags:
k = strncmp(C, '#', 1); %looks for those with hash at the #start
C2 = C(k);
Note: Officially Twitter considers either whitespace or punctuation to be the end of a hashtag (see this question/answer). So C2 may contain something like #noican't whereas Twitter would recognise the actual hashtag as #noican.

Related

Logic App take first 5 characters from the items in array

I would like to be able to extract the first 5 characters from the element in the array in Logic App. I am able to generate the list (below image) but I need only first 5 characters not the whole name
and here is the result
Thank you :)
Using substring function you can extract first five characters of the file name. I have reproduced issue from my side and below are steps I followed,
Created logic app as shown below,
Taken http trigger and then initialized a variable of type array.
Next listing Azure blobs in storage account.
In foreach loop, iterating over list of blobs and then appending display name of blob to array variable. Here using substring function to take first 5 characters of file name with expression,
substring(items('For_each')?['DisplayName'],0,5)
Creating blob with using content as array variable output.
File names in storage account displayed as shown below,
Logic app ran successfully and blob created in storage account as shown below,

sorting a JList populated with Strings from a .txt file

I have the following problem. I have an array of String within my program. It is populated with Strings from a .txt file. Every position of the array corresponds to a single String containing description, title, price and the path to an image of my products. Is it possible somehow to extract from a position only a certain part of the string, let's say only the path for the image?
Consider using a Product class for that.
Use for example the method substring to get out the parts and create an Array of a new class Product, which contains the description, title, price and the image.
String s;
String part = s.substring (0, 8);

Format text file so I can import it into excel

I have a huge list of addresses and details I need to convert into an Excel spreadsheet and I think the best way would be to read the data and then write a second document that separates the lines so that they are tab-delimited whilst recognizing blank lines (between data entries) to preserve each separate address.
It is in the format:
AddressA1
AddressB1
Postcode1
Name1
PhoneNumber1
AddressA2
AddressB2
Postcode2
Name2
Name2
PhoneNumber2
AddressA3
AddressB3
Postcode3
Name3
PhoneNumber3
So the difficulty also comes when there are multiple names for a company, but I can hand format those if necessary (ideally they want to take on the same address as each other).
The resulting text document then, wants to be tab-delimited to:
Name|AddressA|AddressB|Postcode|Phone Number
I am thinking this would be easiest to do within a simple .bat command? or should I open the list in excel and run a script through that..?
I'm thinking if I can run through where it adds each entry to an array ($address $name etc) then I can use that to build a new text file by writing $name[i] tab $address[$i] etc
There are hundreds of entries and putting it in by hand is proving.. difficult.
I have some experience in MEL (basically C++) so I do understand programming in general, but somewhat at a loss in how .bat and Excel (VB?) handle and define empty lines and tabs.
The first step would be to bring the data into an Excel file. Once the data has been imported, we can re-package it to meet your specs. The first step:
Sub BringFileIn()
Dim TextLine As String, CH As String
Close #1
Open "C:\TestFolder\question.txt" For Input As #1
Dim s As String
Dim I As Long, J As Long
J = 1
I = 1
Do While Not EOF(1)
Line Input #1, TextLine
Cells(I, J) = TextLine
I = I + 1
Loop
Close #1
End Sub
Any text editor that can do regex search and replace across multiple lines can do the job nicely.
I have written a hybrid JScript/batch utility called REPL.BAT that performs a regex search and replace on stdin and writes the result to stdout. It is pure script that works on any modern Windows machine from XP forward - No 3rd party executable required. Full documentation is embedded within the script.
Assuming REPL.BAT is in your current directory, or better yet, somewhere within your PATH, then:
type file.txt|repl "\r?\n" "\t" mx|repl "\t\t" "\n" x|repl "^(([^\t]*\t){4})([^\t]*)$" "$1\t$3" x >newFile.txt
The above modifies the file in 3 steps and writes the result to a new file, leaving the original intact:
convert all newlines into tabs
convert consecutive tabs into newlines
insert an empty column (tab) before the last column on any line that contains only 5 columns.
Here's a method using only Word and Excel. I used the data that you posted. I am assuming that Name2 is the only optional field.
Paste your text into Word.
Replace all paragraph marks with a special
characters. (Ctrl-h, Search for ^p, Replace with |)
Replace all line breaks with a different special character. (Ctrl-h, Special character, search for Manual line break, replace with ;)
This is what it looks like in Word:
AddressA1;AddressB1;Postcode1;Name1;PhoneNumber1|AddressA2;AddressB2;Postcode2;Name2;Name2;PhoneNumber2|AddressA3;AddressB3;Postcode3;Name3;PhoneNumber3||
Then convert text to table (Insert -> Table -> Convert text to table), delimiting by ;. This gives 3 rows (plus 2 blank rows) of 1 column.
Then copy the table.
Now in Excel:
Paste the table. (It'll be one row in each row, with all of your fields in column A.)
Convert the text to columns (Data tab, Text to columns, Delimited, check semicolon)
Sort by column E. The phone numbers should be grouped together.
Cut the phone numbers in column E and copy to column F.

open text file, modify text, place into sql database with groovy

I have a text file that has a large grouping of numbers (137mb text file) and am looking to use groovy to open the text file, read it line-by-line, modify the numbers, and then place them into a database (as strings). There are going to be 2 items per line that need to be written to separate database columns, which are related.
My text file looks as such:
A.12345
A.14553
A.26343
B.23524
C.43633
C.23525
So the flow would be:
Step 1.The file is opened
Step 2.Line 1 is red
Step 3.Line 1 is split into letter/number pair [:]
Step 4.The number is divided by 10
Step 5.Letter is written to letter data base (as string)
Step 6.Number is written to number database (as string)
Step 7.Letter:number pair is also written to a separate comma separated text file.
Step 8.Proceed to next line (line 2)
Output text file should look like this:
A,1234.5
A,1455.3
A,2634.3
B,2352.4
C,4363.3
C,2352.5
Database for numbers should look like this:
1:1234.5
2:1455.3
3:2634.3
4:2352.4
5:4363.3
6:2352.5
*lead numbers are database index locations, for relational purpose
Database for letters should look like this:
1:A
2:A
3:A
4:B
5:C
6:C
*lead numbers are database index locations, for relational purpose
I have been able to do most of this; the issue I am running into is not be able to use the .eachLine( line -> ) function correctly... and have NO clue how to output the values to the databases.
There is one more thing I am quite dense about, and that is the instance where the script encounters an error. The text file has TONS of entries (around 9000000) so I am wondering if there is a way to make it so if the script fails or anything happens that I can restart the script from the last modified line.
Meaning, the script has an error (my computer gets shut down somehow) and stops running at line 125122 (completes modification of line 125122) of the text file... how do I make it so when I start the script the second time run the script at line 125123.
Here is my sample code so far:
//openfile
myFile = new File("C:\\file.txt")
//set fileline to target
printFileLine = { it }
//set target to argument
numArg = myFile.eachLine( printFileLine )
//set argument to array split at "."
numArray = numArg.split(".")
//set int array for numbers after the first char, which is a letter
def intArray = numArray[2] { it as int } as int
//set string array for numbers after the first char, which is a letter
def letArray = numArray[1] { it as string }
//No clue how to write to a database or file... or do the persistence thing.
Any help would be appreciated.
I would use a loop to cycle over every line within the text file, I would also use Java methods for manipulating strings.
def file = new File('C:\\file.txt')
StringBuilder sb = new StringBuilder();
file.eachLine { line ->
//set StringBuilder to new line
sb.setLength(0);
sb.append(line);
//format string
sb.setCharAt(1, ',');
sb.insert(5, '.');
}
You could then write each line to a new text file, example here. You could use a simple counter (e.g. counter = 0; and then counter++;) to store the latest line that has been read/written and use that if an error occurs. You could catch possible errors within a try/catch statement if you are regularly getting crashes also.
This guide should give you a good start with working with a database (presuming SQL).
Warning, all of this code is untested and should hopefully give you more direction. There are probably many other ways to solve this differently, so keep an open mind.

Using dlmwrite to write cell objects MATLAB

I have a cell array with 7 columns. All these columns contain strings. I want to write this cell array into a text file. To start, I was doing this on only 1 element of the cell and this is my code:
dlmwrite('735.txt',cell{1},'delimiter','%s\t');
cell{1} looks like this:
Columns 1 through 2
[1x30 char] [1x20 char]
Column 3
'Acaryochloris'
Column 4
'Cyanobacteria001'
Columns 5 through 6
'Cyanobacteria00' 'Cyanobacteria'
Column 7
'Bacteria'
It gives me the output without separating the columns. Sample output is:
Acaryochloris_marina_MBIC11017AcaryochlorismarinaAcaryochlorisCyanobacteria001Cyanobacteria00CyanobacteriaBacteria
The correct output should have spaces between all the columns :
Acaryochloris_marina_MBIC11017 Acaryochloris_marina Acaryochloris Cyanobacteria001 Cyanobacteria00 Cyanobacteria Bacteria
Note that for the second column, we need to add the underscore between Acaryochloris and marina. There is originally a space between those two words.
I hope I explained the problem correctly, Would appreciate the help. Thanks!
DLMWRITE is for numerical data. In your case it process the char data as numbers, each character as a time. You probably view the resulted file in such a way that you don't see tab delimiters.
You can use XLSWRITE to write cell string array to a file. If you don't want the output to be in Excel format, run DLMWRITE before it to write some number to a file.
dlmwrite(filename,1)
xlswrite(filename, Acell{1})
Don't call you variable cell, which a keyword in MATLAB.
As an alternative you can write to a file with lower level function, like FPRINTF.
UPDATE:
If you want to use XLSWRITE in a for-loop and not to overwrite the data you can specify the row to start from:
dlmwrite(filename,1)
for k = 1:10
xlswrite( filename, Acell{k}, 1, sprintf('A%d',k) )
end
UPDATE 2:
Unfortunately it does not work anymore in the latest MATLAB releases (I believe starting from R2012b). XLSWRITE gives error about wrong file type.
Something along the lines of the following should do what you want:
fid = fopen('735.txt', 'w');
fprintf(fid, '%s\t', cell{1}{:});
fclose(fid);

Resources