Need to write a regex to get 3 groups from strings-
<whatever text including new lines optional -group 1>/command <text until \n or </p> is encountered- group 2><whatever text including new lines optional -group 3>
what I tried is-
Pattern pattern1 = Pattern.compile('(.*?)[/]command (.*?)\n?(.*?)');
It should give the following output for string-
some\nthing/command cmdtext/nasdfjaklsdjf\nfgskdlkfg\ndgsdfgsdfgsdfg
group 1 - some\nthing
group 2 - cmdtext
group 3 - asdfjaklsdjf\nfgskdlkfg\ndgsdfgsdfgsdfg
What I am not getting is how to get the occurrence of </p> and .* is not considering the group. Although this is working for me-
String a = '\na\na\n\n\n\n\n\naaa';
Pattern pattern2 = Pattern.compile('\n(?s:.)*');
Matcher mchr = GiphyPattern.matcher(a);
system.assert (mchr.matches());
This regular expression should match what you need:
'([\\s\\S]*)/command (.*?)(?:\n|</p>)([\\s\\S]*)'
You cannot match \n with .* So I am using \\s\\S instead (which is actually \s\S but with Apex escaped backslashes).
Related
I have a problem with \n when I am trying to write a datestring and number values in txt file
pattern = [ ...
'Date %s - First %d \n', ...
'Date %s - Second %d \n' ...
'%d, \n', ...
'*ENDDO\n\n'];
t = datetime('now');
[fid, msg] = fopen('date_and_values.txt', 'wt');
assert(fid ~= -1, 'Cannot open file %s: %s', 'долбоеб.txt', msg);
formatOut='dd.mm.yy';
dateString = datestr(t);
disp(dateString);
formatNumb = '\t%d';
res = [dateString num2str(1,formatNumb) num2str(2,formatNumb)];
for k = 1:17
fprintf(fid, pattern, res);
% % Perhaps this is faster:
% % fwrite(fid, strrep(pattern, '%d', sprintf('%d', k)), 'char');
end
fclose(fid);
I want the data looks like this:
But instead I get data in file look like this:
What am I doing wrong?
Change pattern to
pattern = ['Date %1$s - First %2$d \n', ...
'Date %1$s - Second %3$d \n\n'];
and use
fprintf(fid, pattern, dateString, num2str(1,formatNumb), num2str(2,formatNumb));
instead, you will get the desired output.
Note the use of identifiers in the above. (ctrl+F "identifiers" in documentation.) Without identifiers, each time you have a new formatting operator, a new input is expected by fprintf(). On top of that, every uniquely identified operator in your pattern should correspond to 1 input in fprintf().
(The pattern in OP also contains some superfluous trailing bits that are not found in the example output.)
I don't know if I understand what you are looking for, but, have you tried this?
res = [dateString num2str(1,formatNumb) num2str(2,formatNumb) '\n'];
I have a problem processing Holter ECG (medical) files based on headers within. Those are binary datafiles that are approximately 20 MB in size starting with structured header and than the data. What I would like to achive is preferably with vbs script is:
1) To check all files in the current folder and move the processed ones to the archive folder -based on specific string in the header:
After a constant string "User Field #20" comes a 250-400 chars long text string that contains a substring like "Wn:" or "WN:" or "wn:" (with colon). If its there the file is processed and goes to archive.
The two examples hold conclusion strings like:
i)
Analize przeprowadzono w warunkach szpitalnych. Rytm prowadzacy zatokowy z HR sr 70/min ( zakres 45-133/min).
Zarejestrowano 1 SVPB, bez epizodow czestoskurczu. Komorowych zaburzeń rytmu serca nie ma.
PQ i QTc w normie.
WN: zapis prawidłowy bez zaburezń rytmu serca
ii)
Zapis w warunkach szpitalnych. Rytm zatokowy, HR w zakresie 38 /min do 126/min, średnio 66/min; przeciętnie w dzień 58-95/min, w nocy 52-65/min. Nie zarejestrowano SVPB, VPB, pauz>2,5sek.PQ w normie wiekowej. QTc prawidłowe. Dobowy profil rytmu w normie.
Wn: Zapis holterowski bez cech istotnej patologii
Newlines, special and regional chars possible within the string. I cant tell for sure but seems like the conclusions string ends with hex 80 (euro sign).
2) If possible - add log to the script - plain text, semicolon separated (maybe to be uploaded to excel if necessary).
archive_log.txt: Timestamp; Lastname; Firstname; DateRecorded; DateProcessed; ConclusionsLongText (about 250-400 chars).
DateRecorded and DateProcessed based on files date created and last modified.
This is extension of a problem that was solved some time ago. The problem is different, only the files to handle are the same. Use the contents of a file to rename it
You could use the same strategy Ansgar used in the link you referenced. Read the file contents and then you can use InStr() to search for your string:
' Read the entire file into a string...
strContents = objFS.OpenTextFile("file.dat").ReadAll()
' Search for the string "WN:" (case-insensitive)...
intPos = InStr(1, strContents, "WN:", vbTextCompare)
If intPos > 0 Then
' Found
End If
This should find the first "WN:" occurrence within the file. Note that this could be some other occurrence, outside the header, so you could also determine the position of "User Field" and compare that to the position of "WN:". For example:
intPosUser = InStr(1, strContents, "User Field", vbTextCompare)
intPosWN = InStr(1, strContents, "WN:", vbTextCompare)
' "WN:" should be within 400 chars of the first User Field record...
If intPosWN > intPosUser And intPosWN - intPosUser < 400 Then
' Found
End If
I am writing in C and I am trying to read this line:
phillip.allen#enron.com -> tim.belden#enron.com at 989883540
I want 4 separate strings:
sender_username: phillip.allen
sender_hostname: enron.com
receiver_username: tim.belden
receiver_hostname: enron.c
I want to get rid of the "at 989883540" part of the text.
I am using this conversion:
"%49[^# ]#%49s -> %49[^# ]#%49s"
I seem to get the sender username and hostname, so the first part of the email before the -> symbol, but I cannot read the receiver part of tim.belden part.
Replacing %49s with %49[^ ] should do the trick:
"%49[^#]#%49[^ ] -> %49[^#]#%49[^ ]"
Here is a demo on ideone.
Python 3 program allows people to choose from list of employee names.
Data held on text file look like this: ('larry', 3, 100)
(being the persons name, weeks worked and payment)
I need a way to assign each part of the text file to a new variable,
so that the user can enter a new amount of weeks and the program calculates the new payment.
Below is my code and attempt at figuring it out.
import os
choices = [f for f in os.listdir(os.curdir) if f.endswith(".txt")]
print (choices)
emp_choice = input("choose an employee:")
file = open(emp_choice + ".txt")
data = file.readlines()
name = data[0]
weeks_worked = data[1]
weekly_payment= data[2]
new_weeks = int(input ("Enter new number of weeks"))
new_payment = new_weeks * weekly_payment
print (name + "will now be paid" + str(new_payment))
currently you are assigning the first three lines form the file to name, weeks_worked and weekly_payment. but what you want (i think) is to separate a single line, formatted as ('larry', 3, 100) (does each file have only one line?).
so you probably want code like:
from re import compile
# your code to choose file
line_format = compile(r"\s*\(\s*'([^']*)'\s*,\s*(\d+)\s*,\s*(\d+)\s*\)")
file = open(emp_choice + ".txt")
line = file.readline() # read the first line only
match = line_format.match(line)
if match:
name, weeks_worked, weekly_payment = match.groups()
else:
raise Exception('Could not match %s' % line)
# your code to update information
the regular expression looks complicated, but is really quite simple:
\(...\) matches the parentheses in the line
\s* matches optional spaces (it's not clear to me if you have spaces or not
in various places between words, so this matches just in case)
\d+ matches a number (1 or more digits)
[^']* matches anything except a quote (so matches the name)
(...) (without the \ backslashes) indicates a group that you want to read
afterwards by calling .groups()
and these are built from simpler parts (like * and + and \d) which are described at http://docs.python.org/2/library/re.html
if you want to repeat this for many lines, you probably want something like:
name, weeks_worked, weekly_payment = [], [], []
for line in file.readlines():
match = line_format.match(line)
if match:
name.append(match.group(1))
weeks_worked.append(match.group(2))
weekly_payment.append(match.group(3))
else:
raise ...
email = self.request.get('email')
name = self.request.get('name')
mail.send_mail(sender="myemail", email=email, body=name, subject="sss " + name + "sdafsaã")
// added ã: the problem was that "sdafsaã" should be u"sdafsaã". with a "u" before the string. and now it works
then i get this
main.py", line 85, in post
subject="sss " + name + "sdafsa",
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 36: ordinal not in range(128)
the might have chars like õ ó and something like that.
for more details:
the code to run the worker(the code before)
the name is the one that is received from the datastore and contains chars like õ and ó...
taskqueue.add(url='/emailworker', params={'email': e.email, 'name': e.name})
thanks
Try reading a little about how unicode works in Python:
Dive Into Python - Unicode
Unicode In Python, Completely Demystified
Also, make sure you're running Python 2.5 if you are seeing this error on the development server.
You should use:
email = self.request.get('email')
name = self.request.get('name')
mail.send_mail(sender="myemail",
email=email,
body=name,
subject="hello " + name.encode('utf-8') + " user!")
The variable name is a unicode string and should encoded in utf-8 or in the kind of encode you are using in you web application before concatenating to other byte strings.
Without name.encode(), Python uses the default 7 bits ascii codec that can't encode that specific character.
the problem is joining 2 strings: ||| body = name + "ã" => error ||| body = name + u"ã" => works!!! |||
Try with encode
t ='việt ứng '
m = MyModel()
m.data = t.encode('utf-8')
m.put() #success!