Locating a dynamic string in a text file - file

Problem:
Hello, I have been struggling recently in my programming endeavours. I have managed to receive the output below from Google Speech to Text, but I cannot figure out how draw data from this block.
Excerpt 1:
[VoiceMain]: Successfully initialized
{"result":[]}
{"result":[{"alternative":[{"transcript":"hello","confidence":0.46152416},{"transcript":"how low"},{"transcript":"how lo"},{"transcript":"how long"},{"transcript":"Polo"}],"final":true}],"result_index":0}
[VoiceMain]: Successfully initialized
{"result":[]}
{"result":[{"alternative":[{"transcript":"hello"},{"transcript":"how long"},{"transcript":"how low"},{"transcript":"howlong"}],"final":true}],"result_index":0}
Objective:
My goal is to extract the string "hello" (without the quotation marks) from the first transcript of each block and set it equal to a variable. The problem arises when I do not know what the phrase will be. Instead of "hello", the phrase may be a string of any length. Even if it is a different string, I would still like to set it to the same variable to which the phrase "hello" would have been set to.
Furthermore, I would like to extract the number after the word "confidence". In this case, it is 0.46152416. Data type does not matter for the confidence variable. The confidence variable appears to be more difficult to extract from the blocks because it may or may not be present. If it is not present, it must be ignored. If it is present however, it must be detected and stored as a variable.
Also please note that this text block is stored within a file named "CurlOutput.txt".
All help or advice related to solving this problem is greatly appreciated.

You could do this with regex, but then I am assuming you will want to use this as a dict later in your code. So here is a python approach to building this result as a dictionary.
import json
with open('CurlOutput.txt') as f:
lines = f.read().splitlines()
flag = '{"result":[]} '
for line in lines: # Loop through each lin in file
if flag in line: # check if this is a line with data on it
results = json.loads(line.replace(flag, ''))['result'] # Load data as a dict
# If you just want to change first index of alternative
# results[0]['alternative'][0]['transcript'] = 'myNewString'
# If you want to check all alternative for confidence and transcript
for result in results[0]['alternative']: # Loop over each alternative
transcript = result['transcript']
confidence = None
if 'confidence' in result:
confidence = result['confidence']
# now do whatever you want with confidence and transcript.

Related

VBSCRIPT REPLACE not removing spaces from Decrypted fields

Got quite a head-scratcher....
I'm using the VBScript function REPLACE to replace spaces in a decrypted field from a MSSQL DB with "/".
But the REPLACE function isn't "seeing" the spaces.
For example, if I run any one of the following, where the decrypted value of the field "ITF_U_ClientName_Denc" is "Johnny Carson":
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc")," ","/")
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc")," ","/")
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc"),"Chr(160)","/")
REPLACE(CSTR(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc"))," ","/")
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc")," ","/",1,-1,1)
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc")," ","/",1,-1,0)
The returned value is "Johnny Carson" (space not replaced with /)
The issue seems to be exclusively with spaces, because when I run this:
REPLACE(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc"),"a","/")
I get "Johnny C/rson".
Also, the issue seems to be exclusively with spaces in the decrypted value, because when I run this:
REPLACE("Johnny Carson"," ","/")
Of course, the returned value is "Johnny/Carson".
I have checked what is being written to the source of the page and it is simply "Johnny Carson" with no encoding or special characters.
I have also tried the SPLIT function to see if it would "see" the space, but it doesn't.
Finally, thanks to a helpful comment, I tried VBS REGEX searching for \s.
Set regExp = New RegExp
regExp.IgnoreCase = True
regExp.Global = True
regExp.Pattern = "\s" 'Add here every character you don't consider as special character
strProcessed = regExp.Replace(ITF_U_Ledger.Fields("ITF_U_ClientName_Denc"), "?")
Unfortunately, strProcessed retruns "Johnny Carson" (ie. spaces not detected/removed).
If I replace regExp.Pattern = "a", strProcessed returns "Johnny C?rson".
Many thanks for your help!!
As we found, the right character code is 160, and that did the trick:
replace(..., ChrW(160), "...")
This seems to be data specific and, additionally, as an alternative you can try to get same encoding of the source script (i.e. save with Save As with Encoding), or convert received database value into a different target encoding.

Generating a reverse-engineerable code to save game, python 3

So I'm making a little text based game in Python and I decided for a save system I wanted to use the old "insert code" trick. The code needs to keep track of the players inventory (as well as other things, but the inventory is what I'm having trouble with).
So my thought process on this would be to tie each item and event in the game to a code. For example, the sword in your inventory would be stored as "123" or something unique like that.
So, for the code that would generate to save the game, imagine you have a sword and a shield in your inventory, and you were in the armory.
location(armory) = abc
sword = 123
shield = 456
When the player inputs the command to generate the code, I would expect an output something like:
abc.123.456
I think putting periods between items in the code would make it easier to distinguish one item from another when it comes to decoding the code.
Then, when the player starts the game back up and they input their code, I want that abc.123.456 to be translated back into your location being the armory and having a sword and shield in your inventory.
So there are a couple questions here:
How do I associate each inventory item with its respective code?
How do I generate the full code?
How do I decode it when the player loads back in?
I'm pretty damn new to Python and I'm really not sure how to even start going about this... Any help would be greatly appreciated, thanks!
So, if I get you correctly, you want to serialize info into a string which can't be "saved" but could be input in your program;
Using dots is not necessary, you can program your app to read your code without them.. this will save you a few caracters in lenght.
The more information your game needs to "save", the longer your code will be; I would suggest to use as short as possible strings.
Depending on the amount of locations, items, etc. you want to store in your save code: you may prefer longer or shorter options:
digits (0-9): will allow you to keep 10 names stored in 1 character each.
hexadecimal (0-9 + a-f, or 0-9 + a-F): will allow you to keep from 16 to 22 names (22 if you make your code case sensitive)
alphanum (0-9 + a-z, or 0-9 + a-Z): will allow you to keep from 36 to 62 names (62 if case sensitive)
more options are possible if you decide to use punctuation and punctuated characters, this example will not go there, you will need to cover that part yourself if you need.
For this example I'm gonna stick with digits as I'm not listing more than 10 items or locations.
You define each inventory item and each place as dictionaries, in your source code:
You can a use single line like I have done for places
places = {'armory':'0', 'home':'1', 'dungeon':'2'}
# below is the same dictionary but sorted by values for reversing.
rev_places = dict(map(reversed, places.items()))
Or for improved readability; use multiple lines:
items = {
'dagger':'0',
'sword':'1',
'shield':'2',
'helmet':'3',
'magic wand':'4'
}
#Below is the same but sorted by value for reversing.
rev_items = dict(map(reversed, items.items()))
Store numbers as strings, for easier understanding, also if you use hex or alphanum options it will be required.
Then also use dictionaries to manage in game information, below is just a sample of how you should represent your game infos that the code will produce or parse, this portion should not be in your source code, I have intentionally messed items order to test it.;
game_infos = {
'location':'armory',
'items':{
'slot1':'sword',
'slot2':'shield',
'slot3':'dagger',
'slot4':'helmet'
}
}
Then you could generate your save code with following function that reads your inventory and whereabouts like so:
def generate_code(game_infos):
''' This serializes the game information dictionary into a save
code. '''
location = places[game_infos['location']]
inventory = ''
#for every item in the inventory, add a new character to your save code.
for item in game_infos['items']:
inventory += items[game_infos['items'][item]]
return location + inventory # The string!
And the reading function, which uses the reverse dictionaries to decipher your save code.
def read_code(user_input):
''' This takes the user input and transforms it back to game data. '''
result = dict() # Let's start with an empty dictionary
# now let's make the user input more friendly to our eyes:
location = user_input[0]
items = user_input[1:]
result['location'] = rev_places[location] # just reading out from the table created earlier, we assign a new value to the dictionary location key.
result['items'] = dict() # now make another empty dictionary for the inventory.
# for each letter in the string of items, decode and assign to an inventory slot.
for pos in range(len(items)):
slot = 'slot' + str(pos)
item = rev_items[items[pos]]
result['items'][slot] = item
return result # Returns the decoded string as a new game infos file :-)
I recommend you play around with this working sample program, create a game_infos dictionary of your own with more items in inventory, add some places, etc.
You could even add some more lines/loops to your functions to manage hp or other fields your game will require.
Hope this helps and that you had not given up on this project!

Good way to avoid java.lang.IndexOutOfBoundsException, when joining a list

So I am trying to read a rather large XML file into a String. Currently joining a list of .readLines() like this:
def is = zipFile.getInputStream(entry)
def content = is.getText('UTF-8')
def xmlBodyList = content.readLines()
return xmlBodyList[1..xmlBodyList.size].join("")
However I am getting this output in console:
java.lang.IndexOutOfBoundsException: toIndex = 21859
I don't need any explanation on IndexOutOfBoundsExceptions, but I am having a hard time figuring out how to program around this issue.
How can I implement this differently, so it allows for a large enough file size?
About Good way to avoid java.lang.IndexOutOfBoundsException
error is here:
return xmlBodyList[1..xmlBodyList.size].join("")
A good way to check variables before accessing and you can use relative range accessor:
assert xmlBodyList.size>1 //check value
return xmlBodyList[1..-1].join("") //use relative indexes -1 = the last one
About large files processing
If you need to iterate through all the lines and execute some operation here is an example:
def stream = zipFile.getInputStream(entry)
stream.eachLine("UTF-8"){line, index->
if(index>1){ //skip first line
//do something here with each line from file
println "$line $index"
}
}
there are a lot of additional groovy methods over java.io.InputStream that could help you to process large file without loading it into memory:
http://docs.groovy-lang.org/latest/html/groovy-jdk/java/io/InputStream.html

removing portion of filename

I have done some searching but cannot see how to actually code this. I am new to Python and not really sure what method I should use to try to do this.
I have some files that I would like to rename. Unfortunately the portion towards the file extension is never the same and would like to just remove it.
File name is like AC_DC - Shot Down In Flames (Official Video)-UKwVvSleM6w.mp3
Any help would be appreciated.
Since this looks like the result from youtube-dl, the "random" substring is most likely the unique video id, which in my experience is always 11 characters long. It can, however, include dashes (-), so the regex-approach suggested by smitrp would not always work.
I use this "dirty" workaround:
>>> original_name="AC_DC - Shot Down In Flames (Official Video)-UKwVvSleM6w.mp3"
>>> new_name=original_name[:-16]+".mp3"
>>> new_name
'AC_DC - Shot Down In Flames (Official Video).mp3'
Edit:
If you really, REALLY want to find the "-XXXX"-portion, have a look at str.rfind(). This will help you to find the index of the last dash (-), which you can directly use for the slice notation of the string.
Disclaimer:
This will provide wrong results, if the video id contains a dash, e.g. here: https://www.youtube.com/watch?v=7WVBEB8-wa0
Then you will find the last dash, remove -wa0 and be left with -7WVBEB8 at the end of the filename.
Using idea of the above answer, one can also take into account that a normal word does not
contain more than one capital character.
def youtube_name_fix(folder):
import os
from pathlib import Path
import re
REGEX = re.compile(r'[A-Z]')
for name in os.listdir(folder):
basename = Path(name)
last_12 = basename.stem[-12:]
# check if the end string is not all uppercase (then it could be part of a valid name)
if not last_12.isupper():
# check if the last string has more than one uppercase letters
if len(REGEX.findall(last_12)) > 1:
# remove the end youtube string and create new full path
new_name = os.path.join(folder, basename.stem[:-12] + basename.suffix)
try:
os.rename(os.path.join(folder,name), new_name)
except Exception as e:
print(e)
> youtube_name_fix(p)
old name -> "4-Discrete and Continuous Probability Models-esHwigpYggU.mp4"
new name -> "4-Discrete and Continuous Probability Models.mp4"

Import timeseries via loop (pot. generic)

Solved, now working example!
I have a set of time series that is populated in a folder structure as follows:
TimeSeriesData\Config_000000\seed_001\Aggregate_Quantity.txt
…
TimeSeriesData\Config_000058\seed_010\Aggregate_Quantity.txt
And in each file there is a first line containing a character, followed by lines containing the data. E.g.
share
-21.75
-20.75
…
Now, I would like to import all those files (at best, without specifying ex post the number of configs and seeds, at worst with supplying it) into single time series of the like: “AggQuant_config_seed” where config relates to the config ID and seed to the seed id.
I tried the following (using the non-preferred way), but “parsing” the “path” does not work / I do not know how to do it.
string base = "TimeSeriesData/Config_"
string middle = "/seed_"
string endd = "/Aggregate_Quantity.txt"
string path = ""
loop for (i=0;i<=58;i+=1)
loop for (j=1;j<=10;j+=1)
path = ""
sprintf path "%s%06d%s%03d%s",base,i,middle,j,endd
append #path #will be named share
rename share Agg_Q_$i_$j #rename
endloop
endloop
To sum up the problem, the following does not work:
string path="somwhere/a_file.txt" #string holding path
#wrong: append $path #use append on string
append #path #works!
And if possible, a way to search recursively through a set of folders, using the folder information together with file-information for the variable name, would be nice. Is that possible within gretl?
I realize that in many cases I would like to refer to a specific “help” section like those for functions and commands, but for operators instead (like “$”, which is obviously wrong here).

Resources