So far I have this,
Order = struct('Name',{},'Item',{},'Quantity',{},'DueDate',{});
Order(1).Name = 'Order 1'; Order(1).Item = 'Rolo'; Order(1).Quantity = '1'; Order(1).DueDate = '735879';
Order(1).Name = 'Order 1'; Order(1).Item = 'Trident'; Order(1).Quantity = '2'; Order(1).DueDate = '735887';
Order(2).Name = 'Order 2'; Order(2).Item = 'Hershey';Order(2).Quantity = '3'; Order(2).DueDate = '735875';
Order(3).Name = 'Order 3'; Order(3).Item = 'Kitkat'; Order(3).Quantity = '6'; Order(3).DueDate = '735890';
Within each order, there are multiple items and quantities of items, so I would like each struct array for each order to be able to hold multiple items, quantities, and due dates of orders.
Thank you!
The best option is to use table() (or dataset() if your Matlab version is older than 2014a but you have the Statistics toolbox):
Order = table({'Order 1';'Order 2';'Order 3'},...
{'Trident';'Hershey';'Kitkat'},...
[2; 3; 6],...
[735887; 735875; 735890],...
'VariableNames',{'Name','Item','Quantity','DueDate'})
Order =
Name Item Quantity DueDate
_________ _________ ________ _______
'Order 1' 'Trident' 2 735887
'Order 2' 'Hershey' 3 735875
'Order 3' 'Kitkat' 6 735890
You can access it as you would do with a structure but you have more advantages, e.g. accessing and inspecting data is easier, smaller memory footprint etc..
What you are trying to build 'manually' is a structure array (and let me stress the array here):
% A structure array
s = struct('Name', {'Order 1';'Order 2';'Order 3'},...
'Item', {'Trident';'Hershey';'Kitkat'},...
'Quantity', {2; 3; 6},...
'DueDate', {735887; 735875; 735890});
s =
3x1 struct array with fields:
Name
Item
Quantity
DueDate
Each scalar structure (/unit/record/object/member call it how you like) of the array will have a set of properties:
s(1)
ans =
Name: 'Order 1'
Item: 'Trident'
Quantity: 2
DueDate: 735887
The organization of the data looks intuitive. However, if you want to apply operations across the whole array, e.g. select those which have Quantity > 2, you need to first concatenate the whole field into a temporary array and only then apply your operation, and in the worst case scenario (if you nest the fields) you will have to loop.
I do personally prefer a database/dataset/table approach where each record is a row and columns are the properties. You can do this by flattening the structure array into a scalar structure (pay attention to the braces):
% A flat structure
s = struct('Name', {{'Order 1';'Order 2';'Order 3'}},...
'Item', {{'Trident';'Hershey';'Kitkat'}},...
'Quantity', [2; 3; 6],...
'DueDate', [735887; 735875; 735890]);
s =
Name: {3x1 cell}
Item: {3x1 cell}
Quantity: [3x1 double]
DueDate: [3x1 double]
Even though the data organization does't appear as intuitive as previously, you will be able to index directly into the structure (and will have lower memory footprint).
Related
Let's say I have a dimension and a fact table:
SELECT
LAST_DAY(TO_Date((f.Date),'MON-YYYY')) "MonthYear",
f.Region,
d.Formula as "Molecule",
d.product_name,
d.supplier.
SUM(f.Sales) as "Sales",
SUM(f.ext_units) as "ext_Units",
SUM(f.units) "Units",
SUM(f.ext_units) / SUM(f.units) as "Product_units"
CASE
WHEN d.Formula = 'ABC:CBA' THEN 'ABC'
WHEN d.Formula = 'DEF-FED' THEN 'DEF'
WHEN d.Formula = 'xyz;zyx' THEN 'xyz'
ELSE d.Formula
END AS "Molecule"
FROM Fact.Sales f
INNER JOIN DM_Product d on f.P-Code = d.P-Code
WHERE
f.Region = 'Mars'
AND d.Supplier = 'Simpsons'
AND d.Formula in ('ABC','DEF','GHI','JKL','BDHJK',
'FGL','MNP','RSTU', 'KCL', 'xyz', 'UWX',
'xyz;zyx', 'DEF-FED', 'ABC:CBA')
GROUP BY f.Date,
f.Region,
d.Formula,
d.product_name,
d.supplier
so, basically what I intend to do is to get all these Molecules from the Product Dimension table and find their relevant sales. but the puzzle is I have two different set of conditions:
to look into to all the two syllabuses formulas and aggregate them with equivalent one syllabi formula to find aggregate sales.
Find "Product_units" using SUM(f.ext_units) / SUM(f.units) as per my code.
(f.units) does have 0 values.
As I put for the 1st condition, I used CASE WHEN and it works perfect to find the 2 syllabuses formulas and plug them to relevant 1 syllabi formula but for the second condition that I need to know how to make sure that f.units <> 0 that is checking if SUM(f.units) <> 0 THEN SUM(f.ext_units) /SUM(f.units) ELSE 0 ?
I am trying to import a csv into Neo4j that contains relationships between people, organizations, banks, assets, etc., where there is only one relationship per row. The column names are FROM, A.Type, TO, B.type, and then different properties. Here, the from and to labels have the name, and A-B.type say if it belongs to a person, org., etc. respectibly.
I managed to create the nodes (around 3500) depending on type with FOREACH, like so:
FOREACH (_ IN CASE WHEN line.`A.type` = 'ASSET' THEN [1] ELSE [] END | MERGE (asset:Asset {Name:line.FROM}))
FOREACH (_ IN CASE WHEN line.`A.type` = 'BANK' THEN [1] ELSE [] END | MERGE (bank:Bank {Name: line.FROM}))
.
.
.
FOREACH (_ IN CASE WHEN line.`B.type` = 'ACTIVO' THEN [1] ELSE [] END | MERGE (asset:Asset {Name:line.TO}))
FOREACH (_ IN CASE WHEN line.`B.type` = 'BANCO' THEN [1] ELSE [] END | MERGE (bank:Bank {Name: line.TO}))
.
.
.
My problem now is creating the relationships per row, I've tried many different ways and nothing seems to work.
For Example:
In this case, I changed the FOREACH to two different nodes depending on if they are on the FROM or TO column:
WITH 'link' as line
LOAD CSV WITH HEADERS FROM url AS line
WITH line WHERE line.FROM = 'ASSET' AND line.TO = 'ORGANIZATION'
MERGE (a1:Asset {Name:line.FROM})
MERGE (o2:Organization {Name:line.TO})
CREATE (a1)-[con:PROPERTY_OF]->(o2)
I also tried a variation of the code for creating nodes:
FOREACH(n IN (CASE WHEN line.`A.type` = 'ASSET' THEN [1] ELSE [] END) | FOREACH(t IN CASE WHEN line.`B.type` = 'ORGANIZATION' THEN [1] ELSE [] END | MERGE (asset)-[ao:CONNECTED_WITH]->(organization)))
This time I used the APOC library to try generating dinamic relationships depending on the relashionship type:
WITH asset, organization, line
CALL apoc.create.relationship(asset, line.RelationshipType, NULL, organization) YIELD rel
RETURN asset, rel, organization
And different variations of each, creating the nodes from scratch or matching them. Everytime the query seems to work, it runs, but it creates no relationships or it creates a single relationship between new nodes that don't exist in the csv, with no name or label.
I am completely new to Cypher/Neo4j and am at my wits end, if someone could point out my mistakes and how to correct them, it would be HIGHLY appreciated.
Thank you in advance!
You should be using A.type and B.type when appropriate.
Since the nodes already exist, you should replace the 2 existing MERGE clauses with MATCH clauses.
If you want to ensure you don't create duplicate relationships, you should use MERGE instead of CREATE for the relationship.
You can use the APOC procedure apoc.do.case to perform conditional write operations.
For example:
LOAD CSV WITH HEADERS FROM 'file:///' AS line
CALL apoc.do.case([
line.`A.type` = 'ASSET' AND line.`B.type` = 'ORGANIZATION',
'MATCH (a:Asset {Name: line.FROM}), (b:Organization {Name: line.TO}) MERGE (a)-[:FOO]->(b) RETURN a, b',
line.`A.type` = 'ASSET' AND line.`B.type` = 'BANCO',
'MATCH (a:Asset {Name: line.FROM}), (b:Bank {Name: line.TO}) MERGE (a)-[:FOO]->(b) RETURN a, b',
line.`A.type` = 'BANK' AND line.`B.type` = 'ACTIVO',
'MATCH (a:Bank {Name: line.FROM}), (b:Asset {Name: line.TO}) MERGE (a)-[:FOO]->(b) RETURN a, b'
],
'', // empty ELSE case
{line: line}
) YIELD value
RETURN value
please pardon the level of detail. I'm not completely sure how to phrase this question.
I am new to scala and still learning the intricacies of the language. I have a project where all the data I need is contained in a table with a layout like this:
CREATE TABLE demo_data ( table_key varchar(10), description varchar(40), data_key varchar(10), data_value varchar(10) );
Where the table_key column contains the main key I'm searching on, and the description repeats for every row with that table_key. In addition there are descriptive keys and values contained in the data_key and data_value pairs.
I need to consolidate a set of these data_keys into my resulting class so that the class will end up like this:
case class Tab ( tableKey: String, description: String, valA: String, valB: String, valC: String )
object Tab {
val simple = {
get[String]("table_key") ~
get[String]("description") ~
get[String]("val_a") ~
get[String]("val_b") ~
get[String]("val_c") map {
case tableKey ~ description ~ valA ~ valB ~ valC => Tab(table_key, description, valA, valB, valC)
}
}
def list(tabKey: String) : List[Tab] = {
DB.withConnection { implicit connection =>
val tabs = SQL(
"""
SELECT DISTINCT p.table_key, p.description,
a.data_value val_a,
b.data_value val_b,
c.data_value val_c
FROM demo_data p
JOIN demo_data a on p.table_key = a.table_key and a.data_key = 'A'
JOIN demo_data b on p.table_key = b.table_key and b.data_key = 'B'
JOIN demo_data c on p.table_key = c.table_key and c.data_key = 'C'
WHERE p.table_key = {tabKey}
"""
).on('tabKey -> tabKey).as(Tab.simple *)
}
return tabs
}
}
which will return what I want, however I have more than 30 data keys that I wish to retrieve in this manner, and the joins to itself rapidly becomes unmanageable. As in the query ran for 1.5 hours and used up 20GB worth of temporary tablespace before running out of disk space.
So instead I am doing a separate class that retrieves a list of data keys and data values for a given table key using the "where data_key in ('A','B','C',...)", and now I'd like to "flatten" the returned list into a resulting object that will have the valA, valB, valC, ... in it. I still want to return a list of the flattened objects to the calling routine.
Let me try to idealize what I'd like to accomplish..
Take a header result set and a detail result set, extract out the keys out of the detail result set to populate additional elements/properties in the header result set and produce a list of classes containing the all the elements of the header result set, and the selected properties from the detail result set. So I get a list of TabHeader(tabKey,Desc) and for each I retrieve a list of interesting TabDetail(DataKey,DataValue), I then extract out the element where the DataKey == 'A' and put the DataValue element in Tab(valA), and do the same for DataKey == 'B', 'C', ... After I'm done I wish to produce a Tab(tabKey, Desc, valA, valB, valC, ...) in place of the corresponding TabHeader. I could quite possibly muddle through this in Java, but I'm treating this as a learning opportunity and would like to know a good way to do this in Scala.
I'm feeling that something with the scala mapping should do what I need, but I haven't been able to track down exactly what.
I'm calling the database based on the criteria passed from the view, so it has to be dynamic.
Let's say I have 2 arrays:
columns = ['col1', 'col2', 'col3']
vals = ['val1', 'val2', val3']
The query is easy to create, I can do concatenation, like
query = columns[0] + " = (?) AND" + ...
But what about the parameters?
#finalValues = MyTable.find(:all, :conditions => [query, vals[0], vals[1]... ])
But I don't know how many parameters I will receive. So while the query problem is solved with a for loop and concatenation, can I do something like:
#finalValues = MyTable.find(:all, :conditions => [query, vals])
And rails will understand that I'm not passing an array for an IN (?) clause, but to split the values for every individual (?)?
Or is my only option to do a full raw string and just go with it?
you can create condition array with query as first element and append all val element to it.
query = columns.map {|col| "#{col} = ?"}.join(" AND ")
#finalValues = MyTable.find(:all, :conditions => [query, *vals])
point of caution the columns and vals should have equal number of elements.
I can't find any good documentation about dataset(), so that's why I want to ask you guys, I'll keep the question short:
Can I set headers (column titles) in a dataset, without entering data into the dataset yet? I guess not, so the 2nd part of the question would be:
Can I make a one-row dataset, in which I name the headers, with empty data, and overwrite it later?
Let me show you what I was trying, but did not work:
dmsdb = dataset({ 'John','Name'},{'Amsterdam','City'},{10,'number' });
produces:
Name City number
John Amsterdam 10 --> Headers are good!
Problem is, that when I am going to add more data to the dataset, it expects all strings to be of the same length. So I use cellstr():
dmsdb(1,1:3) = dataset({ cellstr('John'),'Name'},{cellstr('Amsterdam'),'City'},{10,'number' });
Produces:
Var1 Var2 Var3
'John' 'Amsterdam' 10
Where did my headers go? How do I solve this issue, and what is causing this?
You can set up an empty dataset like either
data = dataset({[], 'Name'}, {[], 'City'}, {[], 'number'});
or
data = dataset([], [], [], 'VarNames', {'Name', 'City', 'number'});
Both will give you:
>> data
data =
[empty 0-by-3 dataset]
But we can see that the column names are set by checking
>> get(data, 'VarNames')
ans =
'Name' 'City' 'number'
Now we can add rows to the dataset:
>> data = [data; dataset({'John'}, {'Amsterdam'}, 10, 'VarNames', get(data, 'VarNames'))]
data =
Name City number
'John' 'Amsterdam' 10
You had the basic idea, but just needed to put your string data in cells. This replacement for your first line works:
>> dmsdb = dataset({ {'John'},'Name'},{{'Amsterdam'},'City'},{10,'number' });
dmsdb =
Name City number
'John' 'Amsterdam' 10
The built-in help for dataset() is actually really good at laying out the details of these and other ways of constructing datasets. Also check out the online documentation with examples at:
http://www.mathworks.com/help/toolbox/stats/dataset.html
One of the Mathworks blogs has a nice post too:
http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/
Good luck!
Here is an example:
%# create dataset with no rows
ds = dataset(cell(0,1),cell(0,1),zeros(0,1));
ds.Properties.VarNames = {'Name', 'City', 'number'};
%# adding one row at a time
for i=1:3
row = {{'John'}, {'Amsterdam'}, 10}; %# construct new row each iteration
ds(i,:) = dataset(row{:});
end
%# adding a batch of rows all at once
rows = {{'Bob';'Alice'}, {'Paris';'Boston'}, [20;30]};
ds(4:5,:) = dataset(rows{:});
The dataset at the end looks like:
>> ds
ds =
Name City number
'John' 'Amsterdam' 10
'John' 'Amsterdam' 10
'John' 'Amsterdam' 10
'Bob' 'Paris' 20
'Alice' 'Boston' 30
Note: if you want to use concatenation instead of indexing, you have to specify the variable names:
vars = {'Name', 'City', 'number'};
ds = [ds ; dataset(rows{:}, 'VarNames',vars)]
I agree, the help for dataset is hard to understand, mainly because there are so many ways to create a dataset and most methods involve a lot of cell arrays. Here are my two favorite ways to do it:
% 1) Create the 3 variables of interest, then make the dataset.
% Make sure they are column vectors!
>> Name = {'John' 'Joe'}'; City = {'Amsterdam' 'NYC'}'; number = [10 1]';
>> dataset(Name, City, number)
ans =
Name City number
'John' 'Amsterdam' 10
'Joe' 'NYC' 1
% 2) More compact than doing 3 separate cell arrays
>> dataset({{'John' 'Amsterdam' 10} 'Name' 'City' 'number'})
ans =
Name City number
'John' 'Amsterdam' [10]