I can't find any good documentation about dataset(), so that's why I want to ask you guys, I'll keep the question short:
Can I set headers (column titles) in a dataset, without entering data into the dataset yet? I guess not, so the 2nd part of the question would be:
Can I make a one-row dataset, in which I name the headers, with empty data, and overwrite it later?
Let me show you what I was trying, but did not work:
dmsdb = dataset({ 'John','Name'},{'Amsterdam','City'},{10,'number' });
produces:
Name City number
John Amsterdam 10 --> Headers are good!
Problem is, that when I am going to add more data to the dataset, it expects all strings to be of the same length. So I use cellstr():
dmsdb(1,1:3) = dataset({ cellstr('John'),'Name'},{cellstr('Amsterdam'),'City'},{10,'number' });
Produces:
Var1 Var2 Var3
'John' 'Amsterdam' 10
Where did my headers go? How do I solve this issue, and what is causing this?
You can set up an empty dataset like either
data = dataset({[], 'Name'}, {[], 'City'}, {[], 'number'});
or
data = dataset([], [], [], 'VarNames', {'Name', 'City', 'number'});
Both will give you:
>> data
data =
[empty 0-by-3 dataset]
But we can see that the column names are set by checking
>> get(data, 'VarNames')
ans =
'Name' 'City' 'number'
Now we can add rows to the dataset:
>> data = [data; dataset({'John'}, {'Amsterdam'}, 10, 'VarNames', get(data, 'VarNames'))]
data =
Name City number
'John' 'Amsterdam' 10
You had the basic idea, but just needed to put your string data in cells. This replacement for your first line works:
>> dmsdb = dataset({ {'John'},'Name'},{{'Amsterdam'},'City'},{10,'number' });
dmsdb =
Name City number
'John' 'Amsterdam' 10
The built-in help for dataset() is actually really good at laying out the details of these and other ways of constructing datasets. Also check out the online documentation with examples at:
http://www.mathworks.com/help/toolbox/stats/dataset.html
One of the Mathworks blogs has a nice post too:
http://blogs.mathworks.com/loren/2009/05/20/from-struct-to-dataset/
Good luck!
Here is an example:
%# create dataset with no rows
ds = dataset(cell(0,1),cell(0,1),zeros(0,1));
ds.Properties.VarNames = {'Name', 'City', 'number'};
%# adding one row at a time
for i=1:3
row = {{'John'}, {'Amsterdam'}, 10}; %# construct new row each iteration
ds(i,:) = dataset(row{:});
end
%# adding a batch of rows all at once
rows = {{'Bob';'Alice'}, {'Paris';'Boston'}, [20;30]};
ds(4:5,:) = dataset(rows{:});
The dataset at the end looks like:
>> ds
ds =
Name City number
'John' 'Amsterdam' 10
'John' 'Amsterdam' 10
'John' 'Amsterdam' 10
'Bob' 'Paris' 20
'Alice' 'Boston' 30
Note: if you want to use concatenation instead of indexing, you have to specify the variable names:
vars = {'Name', 'City', 'number'};
ds = [ds ; dataset(rows{:}, 'VarNames',vars)]
I agree, the help for dataset is hard to understand, mainly because there are so many ways to create a dataset and most methods involve a lot of cell arrays. Here are my two favorite ways to do it:
% 1) Create the 3 variables of interest, then make the dataset.
% Make sure they are column vectors!
>> Name = {'John' 'Joe'}'; City = {'Amsterdam' 'NYC'}'; number = [10 1]';
>> dataset(Name, City, number)
ans =
Name City number
'John' 'Amsterdam' 10
'Joe' 'NYC' 1
% 2) More compact than doing 3 separate cell arrays
>> dataset({{'John' 'Amsterdam' 10} 'Name' 'City' 'number'})
ans =
Name City number
'John' 'Amsterdam' [10]
Related
I have a column (text) in my Postgres DB (v.10) with a JSON format.
As far as i now it's has an array format.
Here is an fiddle example: Fiddle
If table1 = persons and change_type = create then i only want to return the name and firstname concatenated as one field and clear the rest of the text.
Output should be like this:
id table1 did execution_date change_type attr context_data
1 Persons 1 2021-01-01 Create Name [["+","name","Leon Bill"]]
1 Persons 2 2021-01-01 Update Firt_name [["+","cur_nr","12345"],["+","art_cd","1"],["+","name","Leon"],["+","versand_art",null],["+","email",null],["+","firstname","Bill"],["+","code_cd",null]]
1 Users 3 2021-01-01 Create Street [["+","cur_nr","12345"],["+","art_cd","1"],["+","name","Leon"],["+","versand_art",null],["+","email",null],["+","firstname","Bill"],["+","code_cd",null]]
Disassemble json array into SETOF using json_array_elements function, then assemble it back into structure you want.
select m.*
, case
when m.table1 = 'Persons' and m.change_type = 'Create'
then (
select '[["+","name",' || to_json(string_agg(a.value->>2,' ' order by a.value->>1 desc))::text || ']]'
from json_array_elements(m.context_data::json) a
where a.value->>1 in ('name','firstname')
)
else m.context_data
end as context_data
from mutations m
modified fiddle
(Note:
utilization of alphabetical ordering of names of required fields is little bit dirty, explicit order by case could improve readability
resulting json is assembled from string literals as much as possible since you didn't specified if "+" should be taken from any of original array elements
the to_json()::text is just for safety against injection
)
Currently my code have simple tables containing the data needed for each object like this:
infantry = {class = "army", type = "human", power = 2}
cavalry = {class = "panzer", type = "motorized", power = 12}
battleship = {class = "navy", type = "motorized", power = 256}
I use the tables names as identifiers in various functions to have their values processed one by one as a function that is simply called to have access to the values.
Now I want to have this data stored in a spreadsheet (csv file) instead that looks something like this:
Name class type power
Infantry army human 2
Cavalry panzer motorized 12
Battleship navy motorized 256
The spreadsheet will not have more than 50 lines and I want to be able to increase columns in the future.
Tried a couple approaches from similar situation I found here but due to lacking skills I failed to access any values from the nested table. I think this is because I don't fully understand how the tables structure are after reading each line from the csv file to the table and therefore fail to print any values at all.
If there is a way to get the name,class,type,power from the table and use that line just as my old simple tables, I would appreciate having a educational example presented. Another approach could be to declare new tables from the csv that behaves exactly like my old simple tables, line by line from the csv file. I don't know if this is doable.
Using Lua 5.1
You can read the csv file in as a string . i will use a multi line string here to represent the csv.
gmatch with pattern [^\n]+ will return each row of the csv.
gmatch with pattern [^,]+ will return the value of each column from our given row.
if more rows or columns are added or if the columns are moved around we will still reliably convert then information as long as the first row has the header information.
The only column that can not move is the first one the Name column if that is moved it will change the key used to store the row in to the table.
Using gmatch and 2 patterns, [^,]+ and [^\n]+, you can separate the string into each row and column of the csv. Comments in the following code:
local csv = [[
Name,class,type,power
Infantry,army,human,2
Cavalry,panzer,motorized,12
Battleship,navy,motorized,256
]]
local items = {} -- Store our values here
local headers = {} --
local first = true
for line in csv:gmatch("[^\n]+") do
if first then -- this is to handle the first line and capture our headers.
local count = 1
for header in line:gmatch("[^,]+") do
headers[count] = header
count = count + 1
end
first = false -- set first to false to switch off the header block
else
local name
local i = 2 -- We start at 2 because we wont be increment for the header
for field in line:gmatch("[^,]+") do
name = name or field -- check if we know the name of our row
if items[name] then -- if the name is already in the items table then this is a field
items[name][headers[i]] = field -- assign our value at the header in the table with the given name.
i = i + 1
else -- if the name is not in the table we create a new index for it
items[name] = {}
end
end
end
end
Here is how you can load a csv using the I/O library:
-- Example of how to load the csv.
path = "some\\path\\to\\file.csv"
local f = assert(io.open(path))
local csv = f:read("*all")
f:close()
Alternative you can use io.lines(path) which would take the place of csv:gmatch("[^\n]+") in the for loop sections as well.
Here is an example of using the resulting table:
-- print table out
print("items = {")
for name, item in pairs(items) do
print(" " .. name .. " = { ")
for field, value in pairs(item) do
print(" " .. field .. " = ".. value .. ",")
end
print(" },")
end
print("}")
The output:
items = {
Infantry = {
type = human,
class = army,
power = 2,
},
Battleship = {
type = motorized,
class = navy,
power = 256,
},
Cavalry = {
type = motorized,
class = panzer,
power = 12,
},
}
I would like to build a table in "ascii" format, using the "struct2table" function. When printed into a cell my structure as the following characteristics:
report.COUNTRY.SOURCE.SCENARIO.CATEGORY.ENTITY = YEAR YEAR
Therefore in order to use the "struct2table" function I wrote a script that loops over field names as follow:
function comboloop = loopover(myStruct)
country=fieldnames(myStruct);
for countryidx=1:length(country)
countryname=country{countryidx};
source=fieldnames(myStruct.(countryname))
for sourceidx=1:length(source)
sourcename=source{sourceidx};
scenario=fieldnames(myStruct.(countryname).(sourcename))
for scenarioidx=1:length(scenario)
scenarioname=scenario{scenarioidx};
category=fieldnames(myStruct.(countryname).(sourcename).(scenarioname))
for categoryidx=1:length(category)
categoryname=category{categoryidx};
struct2table(myStruct.(countryname).(sourcename).(scenarioname).(categoryname))
end
end
end
end
end
Then, I have two problem with the output:
The output is printed directly on the command window, therefore how can I get an output as a table (ascii) in the workspace or outside matlab?
When I use table=struct2table(myStruct.(countryname).(sourcename).(scenarioname).(categoryname)) I get the error message "too many output arguments".
How can I append the results of the line category=fieldnames(myStruct.(countryname).(sourcename).(scenarioname))as a row header of the created output?
Any kind of helps or hints is greatly appreciated since I am struggling quiet a lot for this task!
As an example:
country = HUN
countryname = HUN
source = CRF2014
sourcename = CRF2014
scenario = BASEYEAR
scenarioname = BASEYEAR
category = CAT0 CAT1 CAT2 CAT3
categoryname = CAT0 CAT1 CAT2 CAT3
when running the struct2table, for each categoryname I have several gas entities as a column header (CO2, CH4, N2,...) and below I have the attached years (1990, 1991, 1992,...)
Please let me know if there is any query where in I remove the repeating entries in a row.
For eg: I have a table which has name with 9 telephone numbers:
Name Tel0 Tel1 Tel2 Tel3 Tel4 Tel5 Tel6 Tel7 Tel8
John 1 2 2 2 3 3 4 5 1
The final result should be as shown below:
Name Tel0 Tel1 Tel2 Tel3 Tel4 Tel5 Tel6 Tel7 Tel8
John 1 2 3 4 5
regards
Maddy
I fear that it will be more complicated to keep this format than to split the table in two as I suggested. If you insist on keeping the current schema then I would suggest that you query the row, organise the fields in application code and then perform an update on the database.
You could also try to use SQL UNION operator to give you a list of the numbers, a UNION by default will remove all duplicate rows:
SELECT Name, Tel FROM
(SELECT Name, Tel0 AS Tel FROM Person UNION
SELECT Name, Tel1 FROM Person UNION
SELECT Name, Tel2 FROM Person) ORDER BY Name ;
Which should give you a result set like this:
John|1
John|2
You will then have to step through the result set and saving each number into a separate variable (skipping those variables that do not exist) until the "Name" field changes.
Tel1 := Null; Tel2 := Null;
Name := ResultSet['Name'];
Tel0 := ResultSet['Tel'];
ResultSet.Next();
if (Name == ResultSet['Name']) {
Tel1 := ResultSet['Tel'];
} else {
UPDATE here.
StartAgain;
}
ResultSet.Next();
if (Name == ResultSet['Name']) {
Tel2 := ResultSet['Tel'];
} else {
UPDATE here.
StartAgain;
}
I am not recommending you do this, it is very bad use of a relational database but once implemented in a real language and debugged that should work.
So far I have this,
Order = struct('Name',{},'Item',{},'Quantity',{},'DueDate',{});
Order(1).Name = 'Order 1'; Order(1).Item = 'Rolo'; Order(1).Quantity = '1'; Order(1).DueDate = '735879';
Order(1).Name = 'Order 1'; Order(1).Item = 'Trident'; Order(1).Quantity = '2'; Order(1).DueDate = '735887';
Order(2).Name = 'Order 2'; Order(2).Item = 'Hershey';Order(2).Quantity = '3'; Order(2).DueDate = '735875';
Order(3).Name = 'Order 3'; Order(3).Item = 'Kitkat'; Order(3).Quantity = '6'; Order(3).DueDate = '735890';
Within each order, there are multiple items and quantities of items, so I would like each struct array for each order to be able to hold multiple items, quantities, and due dates of orders.
Thank you!
The best option is to use table() (or dataset() if your Matlab version is older than 2014a but you have the Statistics toolbox):
Order = table({'Order 1';'Order 2';'Order 3'},...
{'Trident';'Hershey';'Kitkat'},...
[2; 3; 6],...
[735887; 735875; 735890],...
'VariableNames',{'Name','Item','Quantity','DueDate'})
Order =
Name Item Quantity DueDate
_________ _________ ________ _______
'Order 1' 'Trident' 2 735887
'Order 2' 'Hershey' 3 735875
'Order 3' 'Kitkat' 6 735890
You can access it as you would do with a structure but you have more advantages, e.g. accessing and inspecting data is easier, smaller memory footprint etc..
What you are trying to build 'manually' is a structure array (and let me stress the array here):
% A structure array
s = struct('Name', {'Order 1';'Order 2';'Order 3'},...
'Item', {'Trident';'Hershey';'Kitkat'},...
'Quantity', {2; 3; 6},...
'DueDate', {735887; 735875; 735890});
s =
3x1 struct array with fields:
Name
Item
Quantity
DueDate
Each scalar structure (/unit/record/object/member call it how you like) of the array will have a set of properties:
s(1)
ans =
Name: 'Order 1'
Item: 'Trident'
Quantity: 2
DueDate: 735887
The organization of the data looks intuitive. However, if you want to apply operations across the whole array, e.g. select those which have Quantity > 2, you need to first concatenate the whole field into a temporary array and only then apply your operation, and in the worst case scenario (if you nest the fields) you will have to loop.
I do personally prefer a database/dataset/table approach where each record is a row and columns are the properties. You can do this by flattening the structure array into a scalar structure (pay attention to the braces):
% A flat structure
s = struct('Name', {{'Order 1';'Order 2';'Order 3'}},...
'Item', {{'Trident';'Hershey';'Kitkat'}},...
'Quantity', [2; 3; 6],...
'DueDate', [735887; 735875; 735890]);
s =
Name: {3x1 cell}
Item: {3x1 cell}
Quantity: [3x1 double]
DueDate: [3x1 double]
Even though the data organization does't appear as intuitive as previously, you will be able to index directly into the structure (and will have lower memory footprint).