Check if the field contains certain values - solr

I have documents who contains a lot of values, as:
"mydata":["699 50000 50002 50031",
"699 878",
"200000 200150 200170 200171",
"200000 200150 200170 200172",
"200000 200150 200170 200175",
"200000 200150 200170 200176",
"200000 200150 200170 200174",
"200000 200150 200170 200173",
"200000 200150 200170 200177",
"200000 200150 200190 200191",
"200000 200150 200190 200181"],
And I want to select the documents according to two select cases:
1/ The mydata field contains values between 200000 to 299999
I bring my documents with the
mydata:[200000 TO 299999]
clause. no problems.
2/ The mydata field contains values 1 to 199999
but if I do:
mydata:[1 TO 199999]
I bring back 0 documents, yet the mydata contains the values 699 50000 50002 50031 699 878 who are ok, so this document should be return.
How can I do that ? Thanks for help

Related

MATLAB - concatenating a datetime array to an empty array

I have a project which outputs data into many files. These files look something like this as interpreted by readmatrix:
ans =
8×8 table
Date_Time EZOO2Con___ EZOCO2Con_ppm_ SGPCO2Con_ppm_ SGPTVOC_ppb_ BMEHumidity___ BMEPressure_Pa_ BMETemp_DegC_
___________________ ___________ ______________ ______________ ____________ ______________ _______________ _____________
09/06/2022 11:55:17 19.16 419 400 0 48.5 95948 22.57
09/06/2022 11:55:18 19.16 419 400 0 48.89 99577 22.58
09/06/2022 11:55:19 19.16 419 400 0 48.89 99578 22.58
09/06/2022 11:55:20 19.15 420 400 0 48.84 99584 22.57
09/06/2022 11:55:21 19.15 420 400 0 48.95 99574 22.58
09/06/2022 11:55:22 19.15 421 400 0 49.15 99578 22.57
09/06/2022 11:55:23 19.15 421 400 0 48.9 99577 22.56
09/06/2022 11:55:24 19.15 422 400 0 48.9 99573 22.57
For my previous test, I have approx. 289 separate files of this format which I'd like to combine together in 8 arrays for plotting.
The Date_Time column is a string with MM/DD/YYYY and HH:MM:SS separated by a space. When using the table2array command, I am able to convert the date&time column of each file's data into a datetime array. However, I am unable to use the cat or vertcat functions to append the date&time column to my "combined" array. Below is the code that's giving me trouble:
for k = 1:length(fileList)
baseFileName = fileList(k).name;
fullFileName = fullfile(fileList(k).folder, baseFileName);
fprintf(1, 'Now reading %s\n', fullFileName);
data = readtable(fullFileName);
timecol = data(:,1);
EZOCO2col = data(:,2);
EZOO2col = data(:, 3);
SGP30CO2col = data(:, 4);
SGP30TVOCcol = data(:, 5);
BME280Humcol = data(:, 6);
BME280Presscol = data(:, 7);
BME280Tempcol = data(:, 8);
timecol_array = table2array(timecol);
EZOCO2col_array = table2array(EZOCO2col);
EZOO2col_array = table2array(EZOO2col);
SGP30CO2col_array = table2array(SGP30CO2col);
SGP30TVOCcol_array = table2array(SGP30TVOCcol);
BME280Humcol_array = table2array(BME280Humcol);
BME280Presscol_array = table2array(BME280Presscol);
BME280Tempcol_array = table2array(BME280Tempcol);
timecol_tot = cat(1, timecol_tot, timecol_array);
EZOCO2col_tot = cat(1, EZOCO2col_tot, EZOCO2col_array);
EZOO2col_tot = cat(1, EZOO2col_tot, EZOO2col_array);
SGP30CO2col_tot = cat(1, SGP30CO2col_tot, SGP30CO2col_array);
SGP30TVOCcol_tot = cat(1, SGP30TVOCcol_tot, SGP30TVOCcol_array);
BME280Humcol_tot = cat(1, BME280Humcol_tot, BME280Humcol_array);
BME280Presscol_tot = cat(1, BME280Presscol_tot, BME280Presscol_array);
BME280Tempcol_tot = cat(1, BME280Tempcol_tot, BME280Tempcol_array);
end
I receive this error each time:
Error using datetime/cat (line 1376)
All inputs must be datetimes or date/time character vectors or date/time strings.
Error in Plot_attempt_9_6_22_1 (line 66)
timecol_tot = cat(1, timecol_tot, timecol_array);
As per How to preallocate a datetime array in matlab, I have tried:
timecol_tot = [];,
timecol_tot = datetime([],[],[],[],[],[]);, and
timecol_tot = NaT(1,1); to no avail.
Because the length of each of these files may vary, I didn't try pre-allocating the datetime array in the size of the incoming data, since that may not work across different datasets. However, it does work if I have only one file.
Is there a way to do this that would allow me to just initialize an empty array and concatenate datetime arrays to it without defining the size of the first datetime set to add?
I ended up fixing this problem by adding an if statement, and now, it looks something like this:
if k == 1
timecol_tot = timecol_array;
EZOCO2col_tot = cat(1, EZOCO2col_tot, EZOCO2col_double);
EZOO2col_tot = cat(1, EZOO2col_tot, EZOO2col_double);
SGP30CO2col_tot = cat(1, SGP30CO2col_tot, SGP30CO2col_array);
SGP30TVOCcol_tot = cat(1, SGP30TVOCcol_tot, SGP30TVOCcol_array);
BME280Humcol_tot = cat(1, BME280Humcol_tot, BME280Humcol_array);
BME280Presscol_tot = cat(1, BME280Presscol_tot, BME280Presscol_array);
BME280Tempcol_tot = cat(1, BME280Tempcol_tot, BME280Tempcol_array);
else
timecol_tot = cat(1, timecol_tot, timecol_array);
EZOCO2col_tot = cat(1, EZOCO2col_tot, EZOCO2col_double);
EZOO2col_tot = cat(1, EZOO2col_tot, EZOO2col_double);
SGP30CO2col_tot = cat(1, SGP30CO2col_tot, SGP30CO2col_array);
SGP30TVOCcol_tot = cat(1, SGP30TVOCcol_tot, SGP30TVOCcol_array);
BME280Humcol_tot = cat(1, BME280Humcol_tot, BME280Humcol_array);
BME280Presscol_tot = cat(1, BME280Presscol_tot, BME280Presscol_array);
BME280Tempcol_tot = cat(1, BME280Tempcol_tot, BME280Tempcol_array);
end
Another reason that I was getting a type error was because some of my files were empty or corrupted. If you're having similar issues, check to make sure that you're not working with empty or corrupted csv's!
You are asking 2 questions, I am new here but some moderators may tell you break it down into 2 separate questions.
I am here answering both anyway :
The error doesn't seem to be coming from cat
With the data you have provided I have reproduced cat output and cat works fine; it concatenates DATATIME when required and floats told to.
The problem has to be somewhere else ; as long as you feed cat with same format there shouldn't be any error.
The error comes from a function called Plot_attempt_9_6_22_1 right?
Either
you are sending to plot rows instead of columns therefore plot input vectors contain mixed formats, a transpose or more than one missing somewhere,
or
somewhere your are mixing date/time formats with something that is not date/time type.
2.- Do you really need to preallocate memory?
In any case MATLAB suggests to preallocate tables with the following expression :
T = table('Size',sz,'VariableTypes',varTypes)
not with the C/C++ style mentioned in question

character(0) get the first table but canot get to second table on pae

Trying to webscrape from HTML.
Using Inspector gadget.
No problems with the 1st table on page. With 2nd table, Iget character(0) or Nodeset(0)
library(rvest)
library(plyr)
date1=20161011
gdf1 <- data.frame(matrix(0, ncol = 11, nrow = 1))
newdate<-date1
# HOCKEY
year<-2015
# date1=newdate[d]
date1=2010
#for (yr in 1:10){
date1=date1+1
c<-paste("https://www.hockey-reference.com/leagues/NHL_",date1,".html",sep="")
nbc<-read_html(c)
nbc
#tables<-html_nodes(nbc,".center , #games , .right:nth-child(5), .right:nth-child(3), #games a")
tables<-html_nodes(nbc,"#stats .right , #stats a")
g<-html_text(tables)

Unique combinations of different values in json using jq

I have a json file(input.json) which looks like this :
{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"} (repeat of line 5)
I want to filter out only the unique combinations of each of the values jq.
Results should look like:
{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"}
I tried doing group by of header1 with the other headers but it didn't generate unique results.
I've used unique but that didnt generate the proper results.
How can I get this? Im new to jq and not finding many tutorials on it.
Thanks
The sample lines you give are not valid JSON. Since your preamble introduces them as JSON, the following will assume that you intended to present an array of JSON objects.
The question is unclear in several respects, but from the example, it looks as though unique might be what you're looking for, so consider:
Invocation: jq -c 'unique[]' input.json
Output:
{"header1":"a","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"a","header2":"2a","header3":"1a","header4":"banana"}
{"header1":"b","header2":"1a","header3":"1a","header4":"orange"}
{"header1":"b","header2":"1a","header3":"2a","header4":"orange"}
{"header1":"b","header2":"2a","header3":"2a","header4":"orange"}
{"header1":"c","header2":"1a","header3":"2a","header4":"banana"}
{"header1":"d","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"d","header2":"2a","header3":"1a","header4":"apple"}
If you need the output in some other format, you could do that using jq as well, but the requirements are not so clear, so let's leave that as an exercise :-)
Since as peak indicated your input isn't legal JSON I've taken the liberty of correcting it and converting to a list of individual objects:
{"header1":"a","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"b","header2":"2a", "header3":"2a", "header4":"orange"}
{"header1":"c","header2":"1a", "header3":"2a", "header4":"banana"}
{"header1":"d","header2":"2a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}
{"header1":"b","header2":"1a", "header3":"2a", "header4":"orange"}
{"header1":"b","header2":"1a", "header3":"1a", "header4":"orange"}
{"header1":"d","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}
If this data is in data.json and you run
jq -M -s -f filter.jq data.json
with the following filter.jq
foreach .[] as $r (
{}
; ($r | map(.)) as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)
it will generate the following output in the original order with no duplicates.
{"header1":"a","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"b","header2":"2a","header3":"2a","header4":"orange"}
{"header1":"c","header2":"1a","header3":"2a","header4":"banana"}
{"header1":"d","header2":"2a","header3":"1a","header4":"apple"}
{"header1":"a","header2":"2a","header3":"1a","header4":"banana"}
{"header1":"b","header2":"1a","header3":"2a","header4":"orange"}
{"header1":"b","header2":"1a","header3":"1a","header4":"orange"}
{"header1":"d","header2":"1a","header3":"1a","header4":"apple"}
Note that the
($r | map(.))
is used to generate an array containing just the values from each row
which is assumed to always produce a unique key path. This is true for
the sample data but may not be true for more complex values.
A slower but more robust filter.jq is
foreach .[] as $r (
{}
; [$r | tojson] as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)
which uses the json representation of the entire row as a unique key to determine if a row has been previously seen.

Storing multi-dimensional data in an array in R

In R, I am running a rolling regression of an independent variable on several dependent variables, using this code:
P <- nrow(Returns)-20
coefi <- vector('list',P)
for(i in 0:(P-1)){
stepfit <- regsubsets(Dependent1~.,data=Returns[(1+i):(20+i),],method="backward")
coefi[[i+1]] <- coef(stepfit,id=which.max(stepfit.summary$adjr2))
}
Now I would like to apply the same loop but to several dependent variables. So, I want this calculation to run down rows, and then across j columns (where j is the number of dependent variables, in my case 2). I then want to store this in a 3-dimensional array.
I tried to include my data matrix, but it is large and from what I read, I have to manually format it here. I can do that if it helps with a response.
Dependent 1 Dependent 2 Independent 1 Independent 2 Independent 3 Independent 4 Independent 5
1/31/2008 3.28% -2.13% -0.27% 0.09% -0.03% -0.28% 3.86%
2/29/2008 0.83% 1.81% 0.52% -0.40% 1.53% 0.48% -0.54%
3/31/2008 1.12% -0.55% -0.75% -0.46% 1.48% 0.25% 4.86%
4/30/2008 -2.30% 2.21% 0.36% -0.92% 1.43% 2.67% 0.31%
5/31/2008 -0.56% -0.21% 0.85% 1.21% -0.32% 0.63% 2.75%
6/30/2008 1.98% -1.99% -0.33% 1.27% 0.07% -3.02% 9.04%
7/31/2008 -0.79% -4.69% 0.17% -0.76% 0.88% -1.61% 5.86%
8/31/2008 0.01% 2.27% -0.37% 3.89% -0.43% -1.01% -5.60%
9/30/2008 3.85% 1.83% 3.28% -2.28% -2.69% -0.42% 1.88%
10/31/2008 7.49% 3.79% 3.86% -3.17% 0.45% 3.25% 6.14%
11/30/2008 0.76% 1.11% -0.54% -0.36% 4.74% 4.14% 25.68%
12/31/2008 -0.27% -1.53% 4.86% 0.07% 1.26% 0.50% 10.00%
1/31/2009 1.20% 1.38% 0.31% -0.31% -1.72% 1.91% 3.81%
2/28/2009 1.65% -2.07% 2.75% 0.28% 1.92% -0.78% 1.85%
3/31/2009 -2.10% 0.86% 9.04% 0.71% 0.09% 2.09% 1.91%
4/30/2009 -3.07% 5.22% 5.86% 3.65% -0.40% 1.65% 2.07%
5/31/2009 -0.41% 13.76% -5.60% 1.65% -0.46% 1.21% -1.34%
6/30/2009 0.25% 3.11% 1.88% 2.28% -0.92% -1.84% 3.59%
7/31/2009 2.67% -1.87% 6.14% 3.24% 1.21% 0.02% 0.60%
8/31/2009 0.63% 4.81% -0.27% 2.61% 1.27% -2.07% 4.29%
9/30/2009 -3.02% 4.78% 3.66% 0.88% -0.76% -0.32% 1.86%
10/31/2009 -1.61% 1.65% 0.55% 1.55% 3.89% -1.04% -0.95%
11/30/2009 -1.01% 1.60% 0.60% 1.19% -2.28% 0.41% 2.13%
12/31/2009 -0.42% 2.89% 0.28% -1.62% 1.49% 0.84% 2.72%

StAX Parser : Duplicated Node name and specific comments

I'm try to parse xml file with StAX parser but I face two problems:
First: Two nodes have the same name
Second: read the exactly comment before the values
<database>
<!-- 2015-03-10 01:29:00 EET / 130 --> <row><v> 2.74 </v><v> 1.63 </v></row>
<!-- 2015-03-10 01:30:00 EET / 170 --> <row><v> 5.33 </v><v> 1.68 </v></row>
<!-- 2015-03-10 01:31:00 EET / 180 --> <row><v> 7.62 </v><v> 1.83 </v></row>
<database>
I want to collect the data like that:
Date:2015-03-10 01:29:00
V1: 2.74
V2:1.63
I was using Dom parser before and it was so easy to deal with dublicate node name and comments unfortunately I have to use StAX now and I don't know how to solve those problems :(
The first issue: two nodes have the same name
<v> 2.74 </v><v> 1.63 </v>
There is no issue with StAX, if you follow the events you will get in order:
startElement ( v )
characters ( 2.74 )
endElement ( v )
startElement ( v )
characters ( 1.63 )
endElement ( v )
So it is up to you to handle minimal of context information in your code to know if it is the first or the second time you are starting a <v> element.
The second issue: read the comments
There is no issue neither, the StAX parsing triggers events for comments as well, you can simply get the comment as String with the API and extract yourself the expected value, for instance:
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLStreamReader streamReader = inputFactory.createXMLStreamReader(inputStream);
while (streamReader.hasNext()) {
int event = streamReader.next();
if(event == XMLStreamConstants.COMMENT) {
String aDateStringVal = streamReader.getText();
// + extract your date value from the comment string
}
}

Resources