How to collect a text value that is between two other objects using ' cheerio '? - css-selectors

The sitemap looks like this:
When trying to recover in general:
var Result = $('h3.thick.scoretime').text().trim();
The result is this:
FT
1 - 0
(HT 1 - 0)
I tried to use the search by child:
var Result = $('h3.thick.scoretime:nth-child(2)').text().trim();
But the result is blank:
Expected result:
1 - 0
How should I proceed in these cases?

You could just remove the spans:
$('.scoretime span').remove()
now it's in:
$('.scoretime').text()

You can get the total texts with
var TotalText = $('h3.thick.scoretime').text()
Then get the texts in child nodes with
var ChildTexts = $('h3.thick.scoretime span.match-state').text()
And then replace the ChildTexts in TotalText with '' so that parent node text only will remain.
BTW 'h3.thick.scoretime' and 'h3.thick.scoretime:nth-child(2)' are CSS Selectors, not XPath expressions.

Related

AttributeError: 'Group' object has no attribute 'dtype'

I want to combine the datasets within a single hdf5 file to form one dataset in a seperate file, but am struggling to set the dtype of the new dataset. I am getting the error AttributeError: 'Group' object has no attribute 'dtype' on the line with ds_0_dtype = h5f1[ds].dtype. the code below (based on some example code posted on stackoverflow)
with
h5py.File('xxx_xxx_signals.hdf5','r') as h5f1 , \
h5py.File('file2.h5','w') as h5f2 :
for i, ds in enumerate(h5f1.keys()) :
if i == 0:
ds_0 = ds
ds_0_dtype = h5f1[ds].dtype
n_rows = h5f1[ds].shape[0]
n_cols = h5f1[ds].shape[1]
else:
if h5f1[ds].dtype != ds_0_dtype :
print(f'Dset 0:{ds_0}: dtype:{ds_0_dtype}')
print(f'Dset {i}:{ds}: dtype:{h5f1[ds].dtype}')
sys.exit('Error: incompatible dataset dtypes')
if h5f1[ds].shape[0] != n_rows :
print(f'Dset 0:{ds_0}: shape[0]:{n_rows}')
print(f'Dset {i}:{ds}: shape[0]:{h5f1[ds].shape[0]}')
sys.exit('Error: incompatible dataset shape')
n_cols += h5f1[ds].shape[1]
prev_ds = ds
h5f2.create_dataset('ds_xxxx', dtype=ds_0_dtype, shape=(n_rows,n_cols), maxshape=(n_rows,None))
first = 0
for ds in h5f1.keys() :
xfer_arr = h5f1[ds][:]
last = first + xfer_arr.shape[1]
h5f2['ds_xxxx'][:, first:last] = xfer_arr[:]
first = last
Likely you have 1 or more Groups in addition to Datasets at the Root level. h5f1.keys() accesses all Nodes -- which can be Datasets or Groups. You need to add a test to skip over Groups. You do this with an isinstance() logic test. Something like this:
else:
if not isinstance(h5f1[ds], h5py.Dataset) :
print(f'Node 0:{ds_0}: is not a dataset')
sys.exit('Error: unexpected Group; only Datasets expected')
if h5f1[ds].dtype != ds_0_dtype :
Once you know how to identify groups, you can also modify code to avoid copying them to the second file. However, that may not be your desired result. I have an extended SO post on using isinstance(). See this link:
Is there a way to get datasets in all groups at once in h5py?

Saving json array output to txt file and getting errors when attempting to parse

I am unable to parse a JSON array from a text file due to errors and my limited knowledge of JSON.
The file looks something like this [{"random":"fdjsf","random56":128,"name":"dsfjsd", "rid":1243,"rand":674,"name":"dsfjsd","random43":722, "rid":126},{"random":"fdfgfgjsf","random506":120,"name":"dsfjcvcsd", "rid":12403,"rando":670,"name":"dsfooojsd","random4003":720, "rid":120}] It has more than one object({}) in the entire array however I did not want to include all 600. The layout shown above is basically how all of them look.
r = s.get(getAPI, headers=header, verify=False)
f = open('text.txt', 'w+')
f.write(r.text)
f.close
output_file = open ('text.txt', 'r')
json_array = json.load(output_file)
json_list = []
for item in json_array:
name = "name"
rid = "rid"
json_items = {name:None, rid:None}
json_items = [name] = item[name]
json_items = [rid] = item[rid]
json_list.append(json_items)
print(json_list)
I would like to loop through an array and find any time it says "name":... eventually followed by "rid":... and store those in a dictionary as key value pairs.
Errors:
ValueError: too many values to unpack (expected 1)
There is a syntax error when you assign values to json_items, change it to:
json_items[name] = item[name]
json_items[rid] = item[rid]

Scala read only certain parts of file

I'm trying to read an input file in Scala that I know the structure of, however I only need every 9th entry. So far I have managed to read the whole thing using:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
The issue, this leaves me with an array that is huge (we're talking 20GB of data). Not only have I seen myself forced to write some very ugly code in order to convert between RDD[Array[String]] and Array[String] but it's essentially made my code useless.
I've tried different approaches and mixes between using
.map()
.flatMap() and
.reduceByKey()
however nothing actually put my collected "cells" into the format that I need them to be.
Here's what is supposed to happen: Reading a folder of text files from our server, the code should read each "line" of text in the format:
*---------*
| NASDAQ: |
*---------*
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
and only keep a hold of the stock_symbol as that is the identifier I'm counting. So far my attempts have been to turn the entire thing into an array only collect every 9th index from the first one into a collected_cells var. Issue is, based on my calculations and real life results, that code would take 335 days to run (no joke).
Here's my current code for reference:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkNum {
def main(args: Array[String]) {
// Do some Scala voodoo
val sc = new SparkContext(new SparkConf().setAppName("Spark Numerical"))
// Set input file as per HDFS structure + input args
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
var collected_cells:Array[String] = new Array[String](0)
//println("[MESSAGE] Length of CC: " + collected_cells.length)
val divider:Long = 9
val array_length = fields.count / divider
val casted_length = array_length.toInt
val indexedFields = fields.zipWithIndex
val indexKey = indexedFields.map{case (k,v) => (v,k)}
println("[MESSAGE] Number of lines: " + array_length)
println("[MESSAGE] Casted lenght of: " + casted_length)
for( i <- 1 to casted_length ) {
println("[URGENT DEBUG] Processin line " + i + " of " + casted_length)
var index = 9 * i - 8
println("[URGENT DEBUG] Index defined to be " + index)
collected_cells :+ indexKey.lookup(index)
}
println("[MESSAGE] collected_cells size: " + collected_cells.length)
val single_cells = collected_cells.flatMap(collected_cells => collected_cells);
val counted_cells = single_cells.map(cell => (cell, 1).reduceByKey{case (x, y) => x + y})
// val result = counted_cells.reduceByKey((a,b) => (a+b))
// val inmem = counted_cells.persist()
//
// // Collect driver into file to be put into user archive
// inmem.saveAsTextFile("path to server location")
// ==> Not necessary to save the result as processing time is recorded, not output
}
}
The bottom part is currently commented out as I tried to debug it, but it acts as pseudo-code for me to know what I need done. I may want to point out that I am next to not at all familiar with Scala and hence things like the _ notation confuse the life out of me.
Thanks for your time.
There are some concepts that need clarification in the question:
When we execute this code:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val fields = lines.map(line => line.split(","))
That does not result in a huge array of the size of the data. That expression represents a transformation of the base data. It can be further transformed until we reduce the data to the information set we desire.
In this case, we want the stock_symbol field of a record encoded a csv:
exchange, stock_symbol, date, stock_price_open, stock_price_high, stock_price_low, stock_price_close, stock_volume, stock_price_adj_close
I'm also going to assume that the data file contains a banner like this:
*---------*
| NASDAQ: |
*---------*
The first thing we're going to do is to remove anything that looks like this banner. In fact, I'm going to assume that the first field is the name of a stock exchange that start with an alphanumeric character. We will do this before we do any splitting, resulting in:
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields = validLines.map(line => line.split(","))
It helps to write the types of the variables, to have peace of mind that we have the data types that we expect. As we progress in our Scala skills that might become less important. Let's rewrite the expression above with types:
val lines: RDD[String] = sc.textFile("hdfs://moonshot-ha-nameservice/" + args(0))
val validLines: RDD[String] = lines.filter(line => !line.isEmpty && line.head.isLetter)
val fields: RDD[Array[String]] = validLines.map(line => line.split(","))
We are interested in the stock_symbol field, which positionally is the element #1 in a 0-based array:
val stockSymbols:RDD[String] = fields.map(record => record(1))
If we want to count the symbols, all that's left is to issue a count:
val totalSymbolCount = stockSymbols.count()
That's not very helpful because we have one entry for every record. Slightly more interesting questions would be:
How many different stock symbols we have?
val uniqueStockSymbols = stockSymbols.distinct.count()
How many records for each symbol do we have?
val countBySymbol = stockSymbols.map(s => (s,1)).reduceByKey(_+_)
In Spark 2.0, CSV support for Dataframes and Datasets is available out of the box
Given that our data does not have a header row with the field names (what's usual in large datasets), we will need to provide the column names:
val stockDF = sparkSession.read.csv("/tmp/quotes_clean.csv").toDF("exchange", "symbol", "date", "open", "close", "volume", "price")
We can answer our questions very easy now:
val uniqueSymbols = stockDF.select("symbol").distinct().count
val recordsPerSymbol = stockDF.groupBy($"symbol").agg(count($"symbol"))

Lua/LOVE indexing problems

I'm getting a very irritating error whenever I do anything like this with arrays. I have code that sets up the array in the love.load() function:
function iceToolsInit()
objectArray = {} --for object handling
objectArrayLocation = 0
end
and then code that allows for the creation of an object. It basically grabs all of the info about said object and plugs it into an array.
function createObject(x, y, renderimage) --used in the load function
--objectArray is set up in the init function
objectArrayLocation = objectArrayLocation + 1
objectArray[objectArrayLocation] = {}
objectArray[objectArrayLocation]["X"] = x
objectArray[objectArrayLocation]["Y"] = y
objectArray[objectArrayLocation]["renderimage"] =
love.graphics.newImage(renderimage)
end
After this, an update function reads through the objectArray and renders the images accordingly:
function refreshObjects() --made for the update function
arrayLength = #objectArray
arraySearch = 0
while arraySearch <= arrayLength do
arraySearch = arraySearch + 1
renderX = objectArray[arraySearch]["X"]
renderY = objectArray[arraySearch]["Y"]
renderimage = objectArray[arraySearch]["renderimage"]
if movingLeft == true then --rotation for rightfacing images
renderRotation = 120
else
renderRotation = 0
end
love.graphics.draw(renderimage, renderX, renderY, renderRotation)
end
end
I of course clipped some unneeded code (just extra parameters in the array such as width and height) but you get the gist. When I set up this code to make one object and render it, I get this error:
attempt to index '?' (a nil value)
the line it points to is this line:
renderX = objectArray[arraySearch]["X"]
Does anyone know what's wrong here, and how I could prevent it in the future? I really need help with this.
It's off-by-one error:
arraySearch = 0
while arraySearch <= arrayLength do
arraySearch = arraySearch + 1
You run through the loop arrayLength+1 number of times, going through indexes 1..arrayLength+1. You want to go through the loop only arrayLength number of times with indexes 1..arrayLength. The solution is to change the condition to arraySearch < arrayLength.
Another (more Lua-ly way) is to write this as:
for arraySearch = 1, #objectArray do
Even more Lua-ly way is to use ipairs and table.field reference instead of (table["field"]):
function refreshObjects()
for _, el in ipairs(objectArray) do
love.graphics.draw(el.renderimage, el.X, el.Y, movingLeft and 120 or 0)
end
end
objectArray and movingLeft should probably be passed as parameters...

compare and delete on an array in matlab

I am trying to write a short code to read a .m file(testin1.m) into an array, and search for a particular word( 'auto'). if match is found,delete it. i have the following code, please help me figure out my mistake.
fid = fopen('testin1.m');
txt = textscan(fid,'%s');
fclose(fid);
m_file_idx = 1;
data=['auto'];
B=cellstr(data);
for idx = i : length(txt)
A=txt{i};
is_auto=isequal(A, B);
if is_auto==0
txt{i}=[];
end
end
if txt{i}=auto then it should delete that row.
AK4749's answer is absolutely correct in showing where you went wrong. I'll just add an alternative solution to yours, which is shorter:
C = textread('testin1.m', '%s', 'delimiter', '\n');
C = C(cellfun(#isempty, regexp(C, 'auto')));
That's it!
EDIT #1: answer modified to remove the lines that contains the word 'auto', not just the word itself.
EDIT #2: answer modified to accept regular expressions.
This is an error i have hit many amany many many times:
when you set txt(i) = [], you change the length of the array. Your for loop condition is no longer valid.
A better option would be to use the powerful indexing features:
A(find(condition)) = [];
or account for the change in length:
A(i) = [];
i--; % <-- or i++, it is too early to think, but you get the idea
EDIT: I just noticed you were also using A in your program. mine was just some random variable name, not the same A you might be using
When you set txt(i) = [], you changed the length of the array but the loop indexing does not account for the change. You can use logical indexing to avoid the loop and the problem.
Example:
wordToDelete = 'auto';
txt = {'foo', 'bar', 'auto', 'auto', 'baz', 'auto'}
match = strcmp(wordToDelete, txt)
txt = txt(~match)
Output:
txt =
'foo' 'bar' 'auto' 'auto' 'baz' 'auto'
match =
0 0 1 1 0 1
txt =
'foo' 'bar' 'baz'

Resources