I have the following formula:
// First part to collect records from a SQL database
ForAll(
WSL_INVENT_X_LOCATION;
If(
Descr = drop_materiales.SelectedText.Value;
Collect(
colMateriales;
ShowColumns(
WSL_INVENT_X_LOCATION;
"InvtID";
"Descr";
"WhseLoc";
"SiteID";
"QtyOnHand"
)
)
)
);;
//Second part to remove duplicates
ClearCollect(
colMateriales;
ForAll(
Distinct(
colMateriales;
ThisRecord
);
Result
)
);;
The problem that I am encountering is that:
Infinite loop occurs while this formula runs
Collection only adding 0's in the column QtyOnHand, where in fact there are records with numbers greater than 0
I appreciate your help.
You may be hitting some delegation limits in your query (doc: Understanding delegation), which may explain why some rows are not being retrieved. You can rewrite your expression to delegate One way to avoid those would be to rewrite the expression in a way that some of the filtering is done at the server side:
ClearCollect(
colMateriales;
ShowColumns(
Filter(
WSL_INVENT_X_LOCATION;
Descr = drop_materiales.SelectedText.Value);
"InvtID";
"Descr";
"WhseLoc";
"SiteID";
"QtyOnHand"
)
);;
I have the following code in my while loop and it is significantly slow, any suggestions on how to improve this?
open IN, "<$FileDir/$file" || Err( "Failed to open $file at location: $FileDir" );
my $linenum = 0;
while ( $line = <IN> ) {
if ( $linenum == 0 ) {
Log(" This is header line : $line");
$linenum++;
} else {
$linenum++;
my $csv = Text::CSV_XS->new();
my $status = $csv->parse($line);
my #val = $csv->fields();
$index = 0;
Log("number of parameters for this file is: $sth->{NUM_OF_PARAMS}");
for ( $index = 0; $index <= $#val; $index++ ) {
if ( $index < $sth->{NUM_OF_PARAMS} ) {
$sth->bind_param( $index + 1, $val[$index] );
}
}
if ( $sth->execute() ) {
$ifa_dbh->commit();
} else {
Log("line $linenum insert failed");
$ifa_dbh->rollback();
exit(1);
}
}
}
By far the most expensive operation there is accessing the database server; it's a network trip, hundreds of milliseconds or some such, each time.
Are those DB operations inserts, as they appear? If so, instead of inserting row by row construct a string for an insert statement with multiple rows, in principle as many as there are, in that loop. Then run that one transaction.
Test and scale down as needed, if that adds up to too many rows. Can keep adding rows to the string for the insert statement up to a decided maximum number, insert that, then keep going.†
A few more readily seen inefficiencies
Don't construct an object every time through the loop. Build it once befor the loop, and then use/repopulate as needed in the loop. Then, there is no need for parse+fields here, while getline is also a bit faster
Don't need that if statement for every read. First read one line of data, and that's your header. Then enter the loop, without ifs
Altogether, without placeholders which now may not be needed, something like
my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 1 });
# There's a $table earlier, with its #fields to populate
my $qry = "INSERT into $table (", join(',', #fields), ") VALUES ";
open my $IN, '<', "$FileDir/$file"
or Err( "Failed to open $file at location: $FileDir" );
my $header_arrayref = $csv->getline($IN);
Log( "This is header line : #$header_arrayref" );
my #sql_values;
while ( my $row = $csv->getline($IN) ) {
# Use as many elements in the row (#$row) as there are #fields
push #sql_values, '(' .
join(',', map { $dbh->quote($_) } #$row[0..$#fields]) . ')';
# May want to do more to sanitize input further
}
$qry .= join ', ', #sql_values;
# Now $qry is readye. It is
# INSERT into table_name (f1,f2,...) VALUES (v11,v12...), (v21,v22...),...
$dbh->do($qry) or die $DBI::errstr;
I've also corrected the error handling when opening the file, since that || in the question binds too tightly in this case, and there's effectively open IN, ( "<$FileDir/$file" || Err(...) ). We need or instead of || there. Then, the three-argument open is better. See perlopentut
If you do need the placeholders, perhaps because you can't have a single insert but it must be broken into many or for security reasons, then you need to generate the exact ?-tuples for each row to be inserted, and later supply the right number of values for them.
Can assemble data first and then build the ?-tuples based on it
my $qry = "INSERT into $table (", join(',', #fields), ") VALUES ";
...
my #data;
while ( my $row = $csv->getline($IN) ) {
push #data, [ #$row[0..$#fields] ];
}
# Append the right number of (?,?...),... with the right number of ? in each
$qry .= join ', ', map { '(' . join(',', ('?')x#$_) . ')' } #data;
# Now $qry is ready to bind and execute
# INSERT into table_name (f1,f2,...) VALUES (?,?,...), (?,?,...), ...
$dbh->do($qry, undef, map { #$_ } #data) or die $DBI::errstr;
This may generate a very large string, what may push the limits of your RDBMS or some other resource. In that case break #data into smaller batches. Then prepare the statement with the right number of (?,?,...) row-values for a batch, and execute in the loop over the batches.‡
Finally, another way altogether is to directly load data from a file using the database's tool for that particular purpose. This will be far faster than going through DBI, probably even including the need to process your input CSV into another one which will have only the needed data.
Since you don't need all data from your input CSV file, first read and process the file as above and write out a file with only the needed data (#data above). Then, there's two possible ways
Either use an SQL command for this – COPY in PostgreSQL, LOAD DATA [LOCAL] INFILE in MySQL and Oracle (etc); or,
Use a dedicated tool for importing/loading files from your RDBMS – mysqlimport (MySQL), SQL*Loader/sqlldr (Oracle), etc. I'd expect this to be the fastest way
The second of these options can also be done out of a program, by running the appropriate tool as an external command via system (or better yet via the suitable libraries).
† In one application I've put together as much as millions of rows in the initial insert -- the string itself for that statement was in high tens of MB -- and that keeps running with ~100k rows inserted in a single statement daily, for a few years by now. This is postgresql on good servers, and of course ymmv.
‡
Some RDBMS do not support a multi-row (batch) insert query like the one used here; in particular Oracle seems not to. (We were informed in the end that that's the database used here.) But there are other ways to do it in Oracle, please see links in comments, and search for more. Then the script will need to construct a different query but the principle of operation is the same.
I am using Neo4j as a database to store voting information related to another database object.
I have a Vote object which has fields:
type:String with values of UP or DOWN.
argId:String which is a string ID value linking to a unique argument object
I am trying to query the number of votes assigned to a given argId using the following queries:
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='DOWN'
RETURN {downvotes: COUNT(v)} AS votes
UNION
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
RETURN {upvotes: COUNT(v)} AS votes
Note that this above cypher -- works and returns the expected result result like so:
[
{
"downvotes": 1
},
{
"upvotes": 10
}
]
But I feel like the query could be a bit neater and want to write something like this:
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
MATCH (b:Vote) WHERE b.argId = '214' AND b.type='DOWN'
RETURN {upvotes: COUNT(v), downvotes: COUNT(b)}
Just reading it through, I think it makes sense, b and v are declared as separate variables, so all should be good (so I thought).
But running it given me this:
{
"upvotes": 10,
"downvotes": 10
}
But it should be what I have above.
Why is this?
I'm kinda new to neo4j and cypher so I've probably not understood how cypher works fully.
Can anyone shine any light?
Thank you!
p.s. I'm using Neo4j 3.5.6 and running the queries via the Desktop web browser app.
I think if you run this query you will get a clearer picture of what is happeneing. Your query produces a cartesian product of the upvotes(10) and the downvotes(1). The product is a result set of 10 rows. When they are subsequently counted, there are ten of each.
MATCH (v:Vote) WHERE v.argId = '214' AND v.type='UP'
MATCH (b:Vote) WHERE b.argId = '214' AND b.type='DOWN'
RETURN v.type, b.type
In order to get the result you want you need to filter the values and count them individually.
Rather than have two match statements, have a single match statement that retreives all of the values of interest and then use a conditional statement to filter them into upvotes and downbotes buckets.
Something like this may suit you.
MATCH (v:Vote {argId: '214'})
WHERE v.type IN ['UP', 'DOWN']
RETURN {
upvotes: count(CASE WHEN v.type = 'DOWN' THEN 1 END),
downvotes: count(CASE WHEN v.type = 'UP' THEN 1 END)
} AS vote_result
Using APOC you could do something like this whereby you use the type values themselves to aggregate the counts and then use APOC to convert it to a map with the types as the keys in the map.
MATCH (v:Vote {argId: '214'})
WHERE v.type IN ['UP', 'DOWN']
WITH [v.type, count(*)] AS vote_pair
RETURN apoc.map.fromPairs(collect(vote_pair)) AS votes
I was presented with the following (this has been simplified for the question):
int programId = 3;
int refugeeId = 5;
var q = ( from st in Students
join cr in Class_Rosters on st.StudentId equals cr.StudentId
join pp in Student_Program_Part on cr.StudentId equals pp.StudentId
from refg in (Student_Program_Participation_Values
.Where(rudf => rudf.ProgramParticipationId == pp.ProgramParticipationId
&& rudf.UDFId == refugeeId)).DefaultIfEmpty()
where cr.ClassId == 22898
&& pp.ProgramId == programId
select new
{
StudentId = st.StudentId,
Refugees = refg.Value ?? "IT WAS NULL",
Refugee = Student_Program_Participation_Values
.Where(rudf => rudf.ProgramParticipationId == pp.ProgramParticipationId
&& rudf.RefugeeId == refugeeId)
.Select(rudf => (rudf.Value == null ? "IT WAS NULL" : "NOT NULL!"))
.First() ?? "First Returned NULL!",
});
q.Dump();
In the above query the Student_Program_Participation_Values table does not have records for all students. The Refugees value properly returns the Value or "IT WAS NULL" when there is a missing record in Student_Program_Participation_Values. However, the Refugee column returns either "NOT NULL!" or "First Returned NULL!".
My question is, why is "First Returned NULL!" being seen since, in my experience with Linq, calling First() on an empty set should throw an exception, but in this query it appears to be doing something completely different. Note that refg.Value is never null in the database (it is either a valid value, or there is no record).
Note also that this is Linq to SQL and we are running this query in Linqpad.
To clarify, here is some sample output:
StudentId Refugees Refugee
22122 True NOT NULL!
2332 IT WAS NULL First Returned NULL!
In the above when Refugees returns "IT WAS NULL" there was no record in the Student_Program_Participation_Values table, so I expected First() to throw an exception, but instead it was null so Refugee shows "First Returned NULL!".
Any ideas?
Update: Enigmativity pushed me in the right direction by pointing out that I was stuck on the First() call when being a IQueryable the First() wasn't really a function call at all, but simply translated into "TOP 1" in the query. It was obvious when I looked at the generated SQL in LINQPad. Below is the important part of the generated SQL that makes it clear what is happening and why. I won't paste the entire thing since it's enormous and not germane to the discussion.
...
COALESCE((
SELECT TOP (1) [t12].[value]
FROM (
SELECT
(CASE
WHEN 0 = 1 THEN 'IT WAS NULL'
ELSE CONVERT(NVarChar(11), 'NOT NULL!')
END) AS [value], [t11].[ProgramParticipationId], [t11].[UDFId]
FROM [p_Student_Program_Participation_UDF_Values] AS [t11]
) AS [t12]
WHERE ([t12].[ProgramParticipationId] = [t3].[ProgramParticipationId]) AND ([t12].[UDFId] = #p8)
), 'First Returned NULL!') AS [value3]
...
So, here you can clearly see that Linq converted the First() into TOP (1) and also determined that "IT WAS NULL" could never happen (thus the 0 = 1) since the whole thing is based on an outer join and the entire query simply coalesces into 'First Returned NULL!'.
So, it was all a perception mistake on my part not separating in my mind that Linq To SQL (and LINQ to Entities for that matter) is very different from calling the same-named methods on Lists and the like.
I hope my mistake is useful to someone else.
Without having your database I couldn't test this code, but try it anyway and see if it works.
var q =
(
from st in Students
join cr in Class_Rosters on st.StudentId equals cr.StudentId
where cr.ClassId == 22898
join pp in Student_Program_Part on cr.StudentId equals pp.StudentId
where pp.ProgramId == programId
select new
{
StudentId = st.StudentId,
refg =
Student_Program_Participation_Values
.Where(rudf =>
rudf.ProgramParticipationId == pp.ProgramParticipationId
&& rudf.UDFId == refugeeId)
.ToArray()
}
).ToArray();
var q2 =
from x in q
from refg in x.refg.DefaultIfEmpty()
select new
{
StudentId = x.StudentId,
Refugees = refg.Value ?? "IT WAS NULL",
Refugee = refg
.Select(rudf => (rudf.Value == null ? "IT WAS NULL" : "NOT NULL!"))
.First() ?? "First Returned NULL!",
};
q2.Dump();
Basically the idea is to capture the records cleanly from the database, bring them in to memory, and then do all the null stuff. If this works then it is because of the failure to translate the LINQ into the same SQL. The translated SQL can sometimes be a little off so you don't get the results you expect. It's like translating English into French - you might not get the correct translation.
Noob here.
I have a super column family sorted by timeuuidtype which has a number of entries. I'm trying to perform a simple get function with phpcassa that wont work. I'm trying to return a specific value from a UTF8 sorted column within a TimeUUID sorted SC. The exact code works with a similar SC Family sorted by BytesType.
Here is the info on the scf I'm trying to get from which i previously entered via -cli.
ColumnFamily: testSCF (Super)
Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType/org.apache.cassandra.db.marshal.UTF8Type
RowKey: TestKey
=> (super_column=48dd0330-5bd6-11e0-adc5-343960c1b6b8,
(column=test, value=74657374, timestamp=1301603831288000))
=> (super_column=141a69b0-5c6e-11e0-bcce-343960c1b6b8,
(column=new test, value=6e657774657374, timestamp=1301669004440000))
And here is the phpcassa script I'm using to retrieve the data.
<?php
require_once('.../connection.php');
require_once('.../columnfamily.php');
$conn = new Connection('siteRoot');
$scf = 'testSCF';
$key = 'testKey';
$super = '141a69b0-5c6e-11e0-bcce-343960c1b6b8';
$col = 'new test';
$entry = new ColumnFamily($conn, $scf);
$q = ($entry->get($key, $columns=array($super)));
echo $q[$super][$col];
?>
Also if I don't specify the SC like so.
$q = ($entry->get($key));
print_r($q);
It returns:
Array ( [HÝ0[ÖàÅ49`Á¶¸] => Array ( [test] => test ) [i°\nà¼Î49`Á¶¸] => Array ( [new test] => newtest ) )
I know part of the issue might have been brought up in How do I insert a row with a TimeUUIDType column in Cassandra?
But it didn't really help me as I presumably have accepted timeuuidtypes.
Thanks for any help guys.
Suppose I didn't try hard enough to begin with. The answer in fact was everything to do with the link.
Appears that the -cli accepted what jbellis in the link describes as 32 byte representation of the timeUUID (141a69b0-5c6e-11e0-bcce-343960c1b6b8) when I inserted it. This confused me.
It works fine when you 'get()' with the "raw" 16 byte form (HÝ0[ÖàÅ49`Á¶¸).
Cheers.