I am searching for an simple data storage solution, interfaced from local bash (write and read).
Background: I'm collecting sensor data and save values with timestamp (actually in a text file, every week a new file is created).
I like to visualize the data on request with the help of php.
Is there a database (like sqlite) which can be easily written from bash ?
You could try rrdtool. It's a round robin time series db, perfect for visualizations.
sqlite3 can take a query as argument (see "Using sqlite3 in a shell script").
DB='example'
VAR='sensordata'
QUERY="INSERT INTO table(column) VALUES ('${VAR}')"
sqlite3 "$DB" "$QUERY"
What you won't get is string escaping so you need to be sure that $VAR is safe from SQL injections.
Related
I like the idea of Sqlite but I'm more comfortable with PostgreSQL, Mysql, even MS Access or Oracle.
I've got something written by someone else which generates Sqlite databases that include a date/time field and I want to get those into a format that Gnuplot can understand. Both Sqliteman and Sqlite browser show the field is an integer, and it looks like a unix time_t when I just query it, except it's 3 digits longer, like 1444136564028.
It doesn't have to be done by piping sqlite3 into Gnuplot, and it doesn't have to use the unixepoch/%s time format. I just can't find any examples of converting the Sqlite time fields in a query. One example "SELECT strftime('%s','now')" works, but when I replace now with a field in a real query it doesn't work. All the examples I find seem to use immediate/literal values, not fields from queries.
And can Sqlite use a tablename.fieldname format or does it have to be select fieldname from tablename?
Unix timestamps use seconds; using milliseconds is a common Java quirk:
> WITH MyLittleTable(d) AS (VALUES(1444136564028))
SELECT datetime(d / 1000, 'unixepoch') FROM MyLittleTable;
2015-10-06 13:02:44
I cannot install SQLite on a remote machine, so I have to find a way to store a large amount of data in some kind of database structure.
Example data
key,values...
key,values....
..
There are currently about a million rows in a 20MB flat file, and hourly I have to read through each record and value in the file and update or add a record. Since it is a flat file I have to rewrite the whole file each time.
I am looking at the Storable module, but I think it also writes data sequentially. I want to edit only those records which need to be changed.
reading and updating of random records is a requirement. Additions can be anywhere(order is not important)
Can anyone suggest something? How will I know if I can setup a native Berkeley database file on these systems, which are a mixture of Solaris and Linux?
________________finally__________________
finally I understood things better (thank you all), and based on your suggestions I used AnyDBM_File. It found NDBM_File ('C' library) installed on all OS. So far so good.
Just to check how it will play out in real world. I ran a sample script to add 1 million records (the max records i think i may ever get in a day, normally between 500k to 700k). OMG it created a 110G data file on my disk !!!! and all the records were like:
a628234 = 0.178532683639599
I mean my real world records are longer than that. compare this to a flat file which is holding real-life 700k+ records and is only 15Mb on disk.
I am disappointed with the slowness and bloat-ness of this, so for now i think i will pay the price by writing the whole file each time an edit is required.
Thanks again for all your help.
As they said in the comments you may use SDBM_File module. For example:
#!/usr/bin/perl
use strict;
use warnings;
use v5.14;
use Fcntl;
use SDBM_File;
my $filename = "dbdb";
my %h;
tie %h, 'SDBM_File', $filename, O_RDWR|O_CREAT, 0666
or die "Error: $!\n";
# To run only one time to fill the dbdb file.
# Next time you may delete this line and
# the output will be the same "16,40".
$h{$_} = $_ * 2 . "," . $_ * 5 for 1..100;
say $h{8};
untie %h;
Output: 16,40
Depends, what your program logic needs, but one solution is to partition database, based on keys. So you can deal with many smaller files instead of one big file.
I'm trying to find an effective way of saving the result of my Spark Job as a csv file. I'm using Spark with Hadoop and so far all my files are saved as part-00000.
Any ideas how to make my spark saving to file with a specified file name?
Since Spark uses Hadoop File System API to write data to files, this is sort of inevitable. If you do
rdd.saveAsTextFile("foo")
It will be saved as "foo/part-XXXXX" with one part-* file every partition in the RDD you are trying to save. The reason each partition in the RDD is written a separate file is for fault-tolerance. If the task writing 3rd partition (i.e. to part-00002) fails, Spark simply re-run the task and overwrite the partially written/corrupted part-00002, with no effect on other parts. If they all wrote to the same file, then it is much harder recover a single task for failures.
The part-XXXXX files are usually not a problem if you are going to consume it again in Spark / Hadoop-based frameworks because since they all use HDFS API, if you ask them to read "foo", they will all read all the part-XXXXX files inside foo as well.
I'll suggest to do it in this way (Java example):
theRddToPrint.coalesce(1, true).saveAsTextFile(textFileName);
FileSystem fs = anyUtilClass.getHadoopFileSystem(rootFolder);
FileUtil.copyMerge(
fs, new Path(textFileName),
fs, new Path(textFileNameDestiny),
true, fs.getConf(), null);
Extending Tathagata Das answer to Spark 2.x and Scala 2.11
Using Spark SQL we can do this in one liner
//implicits for magic functions like .toDf
import spark.implicits._
val df = Seq(
("first", 2.0),
("choose", 7.0),
("test", 1.5)
).toDF("name", "vals")
//write DataFrame/DataSet to external storage
df.write
.format("csv")
.save("csv/file/location")
Then you can go head and proceed with adoalonso's answer.
I have an idea, but not ready code snippet. Internally (as name suggest) Spark uses Hadoop output format. (as well as InputFormat when reading from HDFS).
In the hadoop's FileOutputFormat there is protected member setOutputFormat, which you can call from the inherited class to set other base name.
It's not really a clean solution, but inside a foreachRDD() you can basically do whatever you like, also create a new file.
In my solution this is what I do: I save the output on HDFS (for fault tolerance reasons), and inside a foreachRDD I also create a TSV file with statistics in a local folder.
I think you could probably do the same if that's what you need.
http://spark.apache.org/docs/0.9.1/streaming-programming-guide.html#output-operations
I'm tring to create an SSIS package to import some dataset files, however given that I seem to be hitting a brick
wall everytime I achieve a small part of the task I need to take a step back and perform a sanity check on what I'm
trying to achieve, and if you good people can advise whether SSIS is the way to go about this then I would
appreciate it.
These are my questions from this morning :-
debugging SSIS packages - debug.writeline
Changing an SSIS dts variables
What I'm trying to do is have a For..Each container enumerate over the files in a share on the SQL Server. For each
file it finds a script task runs to check various attributes of the filename, such as looking for a three letter
code, a date in CCYYMM, the name of the data contained therein, and optionally some comments. For example:-
ABC_201007_SalesData_[optional comment goes here].csv
I'm looking to parse the name using a regular expression and put the values of 'ABC', '201007', and
'SalesData' in variables.
I then want to move the file to an error folder if it doesn't meet certain criteria :-
Three character code
Six character date
Dataset name (e.g. SalesData, in this example)
CSV extension
I then want to lookup the Character code, the date (or part thereof), and the Dataset name against a lookup table
to mark off a 'checklist' of received files from each client.
Then, based on the entry in the checklist, I want to kick off another SSIS package.
So, for example I may have a table called 'Checklist' with these columns :-
Client code Dataset SSIS_Package
ABC SalesData NorthSalesData.dtsx
DEF SalesData SouthSalesData.dtsx
If anyone has a better way of achieving this I am interested in hearing about it.
Thanks in advance
That's an interesting scenario, and should be relatively easy to handle.
First, your choice of the Foreach Loop is a good one. You'll be using the Foreach File Enumerator. You can restrict the files you iterate over to be just CSVs so that you don't have to "filter" for those later.
The Foreach File Enumerator puts the filename (full path or just file name) into a variable - let's call that "FileName". There's (at least) two ways you can parse that - expressions or a Script Task. Depends which one you're more comfortable with. Either way, you'll need to create three variables to hold the "parts" of the filename - I'll call them "FileCode", "FileDate", and "FileDataset".
To do this with expressions, you need to set the EvaluateAsExpression property on FileCode, FileDate, and FileDataset to true. Then in the expressions, you need to use FINDSTRING and SUBSTRING to carve up FileName as you see fit. Expressions don't have Regex capability.
To do this in a Script Task, pass the FileName variable in as a ReadOnly variable, and the other three as ReadWrite. You can use the Regex capabilities of .Net, or just manually use IndexOf and Substring to get what you need.
Unfortunately, you have just missed the SQLLunch livemeeting on the ForEach loop: http://www.bidn.com/blogs/BradSchacht/ssis/812/sql-lunch-tomorrow
They are recording the session, however.
I just want to know if there could be any way by which we can read a value from an .xls file using a .bat file.
For eg:If i have an .xls named test.xls which is having two columns
namely 'EID' and then 'mail ID'.Now when we give the input to the .xls the EID name.it should extract the mail id which corresponds to the EID and echo the result out.
**EID** **MailID**
E22222 MynameisA#company.com
E33333 MynameisB#company.com
...
...
So by the above table,when i give the input to the xls file using my .bat file as E22222,it should read the corresponding mail ID as MynameisA#company.com and it should echo the value.
So i hope i am able to present my doubt.Please get back to me for more clarifications.
Thanks and regards
Maddy
There is no facility to do this directly with traditional .bat files. However, you might investigate PowerShell, which is designed to be able to do this sort of thing. PowerShell integrates well with existing Windows applications (such as Excel) and may provide the tools you need to do this easily.
A quick search turned up this example of reading Excel files from PowerShell.
You can't do this directly from a batch file. Furthermore, to manipulate use Excel files in scripting you need Excel to be installed.
What you can do is wrap the Excel-specific stuff in a VBScript and call that from your batch.
You can do it with Alacon - command-line utility for Alasql database.
It works with Node.js, so you need to install Node.js and then Alasql package:
To take data from Excel file you can use the following command:
> node alacon "SELECT VALUE [mail ID] FROM XLS('mydata.xls', {headers:true})
WHERE EID = ?" "E2222"
Fist parameter is a SQL-expresion, which read data from XLSX file with header and search data
for second parameter value: "E22222". The command returns mail ID value.
This will be hard (very close to impossible) in BAT, especially when using the original XLS file, but even after an export to CSV it will be much easier to use a script/programming language (Perl, C, whatever) to do this.