read duplicate metrics in extract schema in U-SQL - analytics

` #input =EXTRACT firstname string,name string,name string FROM "/table1.csv"
USING Extractors.Csv(quoting : false, silent : true);
#output =SELECT * FROM #input;
OUTPUT #output TO "/data_output.csv" USING Outputters.Csv(quoting : false); `
extract schema contains duplicate metrics (NAME)
How can we read duplicate metrics ?

U-SQL reads columns by position in EXTRACT statement and not by name, so you can call your columns, for example, Name1 and Name2 (or something more logical to your business domain).
#input =EXTRACT firstname string,name1 string,name2 string FROM "/table1.csv"
USING Extractors.Csv(quoting : false);

Related

String delimiter present in string not permitted in Polybase?

I'm creating an external table using a CSV stored in an Azure Data Lake Storage and populating the table using Polybase in SQL Server.
However, I ran into this problem and figured it may be due to the fact that in one particular column there are double quotes present within the string, and the string delimiter has been specified as " in Polybase (STRING_DELIMITER = '"').
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Could not find a delimiter after string delimiter
Example:
I have done quite an extensive research in this and found that this issue has been around for years but yet to see any solutions given.
Any help will be appreciated.
I think the easiest way to fix this up because you are in charge of the .csv creation is to use a delimiter which is not a comma and leave off the string delimiter. Use a separator which you know will not appear in the file. I've used a pipe in my example, and I clean up the string once it is imported in to the database.
A simple example:
IF EXISTS ( SELECT * FROM sys.external_tables WHERE name = 'delimiterWorking' )
DROP EXTERNAL TABLE delimiterWorking
GO
IF EXISTS ( SELECT * FROM sys.tables WHERE name = 'cleanedData' )
DROP TABLE cleanedData
GO
IF EXISTS ( SELECT * FROM sys.external_file_formats WHERE name = 'ff_delimiterWorking' )
DROP EXTERNAL FILE FORMAT ff_delimiterWorking
GO
CREATE EXTERNAL FILE FORMAT ff_delimiterWorking
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = '|',
--STRING_DELIMITER = '"',
FIRST_ROW = 2,
ENCODING = 'UTF8'
)
);
GO
CREATE EXTERNAL TABLE delimiterWorking (
id INT NOT NULL,
body VARCHAR(8000) NULL
)
WITH (
LOCATION = 'yourLake/someFolder/delimiterTest6.txt',
DATA_SOURCE = ds_azureDataLakeStore,
FILE_FORMAT = ff_delimiterWorking,
REJECT_TYPE = VALUE,
REJECT_VALUE = 0
);
GO
SELECT *
FROM delimiterWorking
GO
-- Fix up the data
CREATE TABLE cleanedData
WITH (
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
id,
body AS originalCol,
SUBSTRING ( body, 2, LEN(body) - 2 ) cleanBody
FROM delimiterWorking
GO
SELECT *
FROM cleanedData
My results:
String Delimiter issue can be avoided if you have the Data lake flat file converted to Parquet format.
Input:
"ID"
"NAME"
"COMMENTS"
"1"
"DAVE"
"Hi "I am Dave" from"
"2"
"AARO"
"AARO"
Steps:
1 Convert Flat file to Parquet format [Using Azure Data factory]
2 Create External File format in Data Lake [Assuming Master key, Scope credentials available]
CREATE EXTERNAL FILE FORMAT PARQUET_CONV
WITH (FORMAT_TYPE = PARQUET,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);
3 Create External Table with FILE_FORMAT = PARQUET_CONV
Output:
I believe this is the best option as Microsoft don't have an solution currently to handle this string delimiter occurring with in the data for External table

SQL Server: How to remove a key from a Json object

I have a query like (simplified):
SELECT
JSON_QUERY(r.SerializedData, '$.Values') AS [Values]
FROM
<TABLE> r
WHERE ...
The result is like this:
{ "2019":120, "20191":120, "201902":121, "201903":134, "201904":513 }
How can I remove the entries with a key length less then 6.
Result:
{ "201902":121, "201903":134, "201904":513 }
One possible solution is to parse the JSON and generate it again using string manipulations for keys with desired length:
Table:
CREATE TABLE Data (SerializedData nvarchar(max))
INSERT INTO Data (SerializedData)
VALUES (N'{"Values": { "2019":120, "20191":120, "201902":121, "201903":134, "201904":513 }}')
Statement (for SQL Server 2017+):
UPDATE Data
SET SerializedData = JSON_MODIFY(
SerializedData,
'$.Values',
JSON_QUERY(
(
SELECT CONCAT('{', STRING_AGG(CONCAT('"', [key] ,'":', [value]), ','), '}')
FROM OPENJSON(SerializedData, '$.Values') j
WHERE LEN([key]) >= 6
)
)
)
SELECT JSON_QUERY(d.SerializedData, '$.Values') AS [Values]
FROM Data d
Result:
Values
{"201902":121,"201903":134,"201904":513}
Notes:
It's important to note, that JSON_MODIFY() in lax mode deletes the specified key if the new value is NULL and the path points to a JSON object. But, in this specific case (JSON object with variable key names), I prefer the above solution.

find number in json array value with regex

i want match string in json string that like:
"ids":[44,53,1,3,12,45]
i want run query in sqlite send only one digit as id and match one of the above id in sql statement
i write this regex "ids":[\[] for matching start of key
but i don't have any idea to match middle id and escape starting id
example:
i have calc_method table like this:
CREATE TABLE "calc_method" (
"calc_method_id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
"calc_method_name" TEXT NOT NULL,
"calc_method_value" TEXT NOT NULL
);
in calc_method_value column i store calcMethod class which convert to json using Gson
class calcMethod{
var memberCafeIds:ArrayList<Long>,
var memberBarIds:ArrayList<Long>
}
after i convert calcMethod to json i have output like below and this value store in calc_method_value column:
{"memberCafeIds":[1,2,14,5,44],"memberBarIds":[23,1,5,78]}
now i want select row that match to my regex pattern like if calc_method_value column have memberBarIds with id 1
SELECT * FROM calc_method WHERE calc_method_value REGEXP '"memberCafeIds":\[[:paramId]'
:paramId is method parameter
Regards, a programmer struggle with regex
In Sqlite, use JSON1 functions to work with JSON, not regular expressions. In particular, json_each() to turn the JSON array into a table you can query:
sqlite> CREATE TABLE ex(json);
sqlite> INSERT INTO ex VALUES ('{"ids":[44,53,1,3,12,45]}');
sqlite> SELECT * FROM ex WHERE 1 IN (SELECT value FROM json_each(ex.json, '$.ids'));
json
-------------------------
{"ids":[44,53,1,3,12,45]}
sqlite> SELECT * FROM ex WHERE 50 IN (SELECT value FROM json_each(ex.json, '$.ids'));
sqlite>

DbString, IsFixedLength and IsAnsi for varchar

I am new to Dapper, want to know why below is suggested, when my code runs without it?
Ansi Strings and varchar
Dapper supports varchar params, if you are executing a where clause on
a varchar column using a param be sure to pass it in this way:
Query<Thing>("select * from Thing where Name = #Name", new {Name = new
DbString { Value = "abcde", IsFixedLength = true, Length = 10, IsAnsi
= true });
On SQL Server it is crucial to use the unicode when querying unicode
and ansi when querying non unicode.
Below is my code, which runs against SQL server 2012 without using DbString etc.
create table Author (
Id int identity(1,1),
FirstName varchar(50),
LastName varchar(50)
);
go
insert into Author (FirstName, LastName) values ('Tom', 'John');
public Author FindByVarchar(string firstName)
{
using (IDbConnection db = DBHelper.NewSqlConnection())
{
return db.Query<Author>("Select * From Author WHERE FirstName = #firstName", new { firstName }).SingleOrDefault();
}
}
Questions:
1 Why is DbString type used in this case?
2 Why the length is set to 10 (e.g.Length = 10) when "abcde" is 5?
3 Do I still need to use DbString when my current code works?
4 Is it correct to set IsAnsi = false for unicode column?
5 For varchar column, is it correct to set IsFixedLength = false, and ignore setting Length?
The purpose of the example is that it is describing a scenario where the data type is char(10). If we just used "abcde", Dapper might think that nvarchar(5) was appropriate. This would be very inefficient in some cases - especially in a where clause, since the RDBMS can decide that it can't use the index, and instead needs to table scan doing a string conversion for every row in the table from char(10) to the nvarchar version. It is for this reason that DbString exists - to help you control exactly how Dapper configures the parameter for text data.
I think this answers your 1 and 2.
3: are you using ANSI (non-unicode text) or fixed-width text? Note that the ANSI default can also be set globally if you always avoid unicode
4: yes
5: yes
4+5 combined: if you're using nvarchar: just use string

SQLITE check if table exist in C [duplicate]

How do I, reliably, check in SQLite, whether a particular user table exists?
I am not asking for unreliable ways like checking if a "select *" on the table returned an error or not (is this even a good idea?).
The reason is like this:
In my program, I need to create and then populate some tables if they do not exist already.
If they do already exist, I need to update some tables.
Should I take some other path instead to signal that the tables in question have already been created - say for example, by creating/putting/setting a certain flag in my program initialization/settings file on disk or something?
Or does my approach make sense?
I missed that FAQ entry.
Anyway, for future reference, the complete query is:
SELECT name FROM sqlite_master WHERE type='table' AND name='{table_name}';
Where {table_name} is the name of the table to check.
Documentation section for reference: Database File Format. 2.6. Storage Of The SQL Database Schema
This will return a list of tables with the name specified; that is, the cursor will have a count of 0 (does not exist) or a count of 1 (does exist)
If you're using SQLite version 3.3+ you can easily create a table with:
create table if not exists TableName (col1 typ1, ..., colN typN)
In the same way, you can remove a table only if it exists by using:
drop table if exists TableName
A variation would be to use SELECT COUNT(*) instead of SELECT NAME, i.e.
SELECT count(*) FROM sqlite_master WHERE type='table' AND name='table_name';
This will return 0, if the table doesn't exist, 1 if it does. This is probably useful in your programming since a numerical result is quicker / easier to process. The following illustrates how you would do this in Android using SQLiteDatabase, Cursor, rawQuery with parameters.
boolean tableExists(SQLiteDatabase db, String tableName)
{
if (tableName == null || db == null || !db.isOpen())
{
return false;
}
Cursor cursor = db.rawQuery(
"SELECT COUNT(*) FROM sqlite_master WHERE type = ? AND name = ?",
new String[] {"table", tableName}
);
if (!cursor.moveToFirst())
{
cursor.close();
return false;
}
int count = cursor.getInt(0);
cursor.close();
return count > 0;
}
You could try:
SELECT name FROM sqlite_master WHERE name='table_name'
See (7) How do I list all tables/indices contained in an SQLite database in the SQLite FAQ:
SELECT name FROM sqlite_master
WHERE type='table'
ORDER BY name;
Use:
PRAGMA table_info(your_table_name)
If the resulting table is empty then your_table_name doesn't exist.
Documentation:
PRAGMA schema.table_info(table-name);
This pragma returns one row for each column in the named table. Columns in the result set include the column name, data type, whether or not the column can be NULL, and the default value for the column. The "pk" column in the result set is zero for columns that are not part of the primary key, and is the index of the column in the primary key for columns that are part of the primary key.
The table named in the table_info pragma can also be a view.
Example output:
cid|name|type|notnull|dflt_value|pk
0|id|INTEGER|0||1
1|json|JSON|0||0
2|name|TEXT|0||0
SQLite table names are case insensitive, but comparison is case sensitive by default. To make this work properly in all cases you need to add COLLATE NOCASE.
SELECT name FROM sqlite_master WHERE type='table' AND name='table_name' COLLATE NOCASE
If you are getting a "table already exists" error, make changes in the SQL string as below:
CREATE table IF NOT EXISTS table_name (para1,para2);
This way you can avoid the exceptions.
If you're using fmdb, I think you can just import FMDatabaseAdditions and use the bool function:
[yourfmdbDatabase tableExists:tableName].
The following code returns 1 if the table exists or 0 if the table does not exist.
SELECT CASE WHEN tbl_name = "name" THEN 1 ELSE 0 END FROM sqlite_master WHERE tbl_name = "name" AND type = "table"
Note that to check whether a table exists in the TEMP database, you must use sqlite_temp_master instead of sqlite_master:
SELECT name FROM sqlite_temp_master WHERE type='table' AND name='table_name';
Here's the function that I used:
Given an SQLDatabase Object = db
public boolean exists(String table) {
try {
db.query("SELECT * FROM " + table);
return true;
} catch (SQLException e) {
return false;
}
}
Use this code:
SELECT name FROM sqlite_master WHERE type='table' AND name='yourTableName';
If the returned array count is equal to 1 it means the table exists. Otherwise it does not exist.
class CPhoenixDatabase():
def __init__(self, dbname):
self.dbname = dbname
self.conn = sqlite3.connect(dbname)
def is_table(self, table_name):
""" This method seems to be working now"""
query = "SELECT name from sqlite_master WHERE type='table' AND name='{" + table_name + "}';"
cursor = self.conn.execute(query)
result = cursor.fetchone()
if result == None:
return False
else:
return True
Note: This is working now on my Mac with Python 3.7.1
You can write the following query to check the table existance.
SELECT name FROM sqlite_master WHERE name='table_name'
Here 'table_name' is your table name what you created. For example
CREATE TABLE IF NOT EXISTS country(country_id INTEGER PRIMARY KEY AUTOINCREMENT, country_code TEXT, country_name TEXT)"
and check
SELECT name FROM sqlite_master WHERE name='country'
Use
SELECT 1 FROM table LIMIT 1;
to prevent all records from being read.
Using a simple SELECT query is - in my opinion - quite reliable. Most of all it can check table existence in many different database types (SQLite / MySQL).
SELECT 1 FROM table;
It makes sense when you can use other reliable mechanism for determining if the query succeeded (for example, you query a database via QSqlQuery in Qt).
The most reliable way I have found in C# right now, using the latest sqlite-net-pcl nuget package (1.5.231) which is using SQLite 3, is as follows:
var result = database.GetTableInfo(tableName);
if ((result == null) || (result.Count == 0))
{
database.CreateTable<T>(CreateFlags.AllImplicit);
}
The function dbExistsTable() from R DBI package simplifies this problem for R programmers. See the example below:
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
# let us check if table iris exists in the database
dbExistsTable(con, "iris")
### returns FALSE
# now let us create the table iris below,
dbCreateTable(con, "iris", iris)
# Again let us check if the table iris exists in the database,
dbExistsTable(con, "iris")
### returns TRUE
I thought I'd put my 2 cents to this discussion, even if it's rather old one..
This query returns scalar 1 if the table exists and 0 otherwise.
select
case when exists
(select 1 from sqlite_master WHERE type='table' and name = 'your_table')
then 1
else 0
end as TableExists
My preferred approach:
SELECT "name" FROM pragma_table_info("table_name") LIMIT 1;
If you get a row result, the table exists. This is better (for me) then checking with sqlite_master, as it will also check attached and temp databases.
This is my code for SQLite Cordova:
get_columnNames('LastUpdate', function (data) {
if (data.length > 0) { // In data you also have columnNames
console.log("Table full");
}
else {
console.log("Table empty");
}
});
And the other one:
function get_columnNames(tableName, callback) {
myDb.transaction(function (transaction) {
var query_exec = "SELECT name, sql FROM sqlite_master WHERE type='table' AND name ='" + tableName + "'";
transaction.executeSql(query_exec, [], function (tx, results) {
var columnNames = [];
var len = results.rows.length;
if (len>0){
var columnParts = results.rows.item(0).sql.replace(/^[^\(]+\(([^\)]+)\)/g, '$1').split(','); ///// RegEx
for (i in columnParts) {
if (typeof columnParts[i] === 'string')
columnNames.push(columnParts[i].split(" ")[0]);
};
callback(columnNames);
}
else callback(columnNames);
});
});
}
Table exists or not in database in swift
func tableExists(_ tableName:String) -> Bool {
sqlStatement = "SELECT name FROM sqlite_master WHERE type='table' AND name='\(tableName)'"
if sqlite3_prepare_v2(database, sqlStatement,-1, &compiledStatement, nil) == SQLITE_OK {
if sqlite3_step(compiledStatement) == SQLITE_ROW {
return true
}
else {
return false
}
}
else {
return false
}
sqlite3_finalize(compiledStatement)
}
c++ function checks db and all attached databases for existance of table and (optionally) column.
bool exists(sqlite3 *db, string tbl, string col="1")
{
sqlite3_stmt *stmt;
bool b = sqlite3_prepare_v2(db, ("select "+col+" from "+tbl).c_str(),
-1, &stmt, 0) == SQLITE_OK;
sqlite3_finalize(stmt);
return b;
}
Edit: Recently discovered the sqlite3_table_column_metadata function. Hence
bool exists(sqlite3* db,const char *tbl,const char *col=0)
{return sqlite3_table_column_metadata(db,0,tbl,col,0,0,0,0,0)==SQLITE_OK;}
You can also use db metadata to check if the table exists.
DatabaseMetaData md = connection.getMetaData();
ResultSet resultSet = md.getTables(null, null, tableName, null);
if (resultSet.next()) {
return true;
}
If you are running it with the python file and using sqlite3 obviously. Open command prompt or bash whatever you are using use
python3 file_name.py first in which your sql code is written.
Then Run sqlite3 file_name.db.
.table this command will give tables if they exist.
I wanted to add on Diego VĂ©lez answer regarding the PRAGMA statement.
From https://sqlite.org/pragma.html we get some useful functions that can can return information about our database.
Here I quote the following:
For example, information about the columns in an index can be read using the index_info pragma as follows:
PRAGMA index_info('idx52');
Or, the same content can be read using:
SELECT * FROM pragma_index_info('idx52');
The advantage of the table-valued function format is that the query can return just a subset of the PRAGMA columns, can include a WHERE clause, can use aggregate functions, and the table-valued function can be just one of several data sources in a join...
Diego's answer gave PRAGMA table_info(table_name) like an option, but this won't be of much use in your other queries.
So, to answer the OPs question and to improve Diegos answer, you can do
SELECT * FROM pragma_table_info('table_name');
or even better,
SELECT name FROM pragma_table_list('table_name');
if you want to mimic PoorLuzers top-voted answer.
If you deal with Big Table, I made a simple hack with Python and Sqlite and you can make the similar idea with any other language
Step 1: Don't use (if not exists) in your create table command
you may know that this if you run this command that will have an exception if you already created the table before, and want to create it again, but this will lead us to the 2nd step.
Step 2: use try and except (or try and catch for other languages) to handle the last exception
here if you didn't create the table before, the try case will continue, but if you already did, you can put do your process at except case and you will know that you already created the table.
Here is the code:
def create_table():
con = sqlite3.connect("lists.db")
cur = con.cursor()
try:
cur.execute('''CREATE TABLE UNSELECTED(
ID INTEGER PRIMARY KEY)''')
print('the table is created Now')
except sqlite3.OperationalError:
print('you already created the table before')
con.commit()
cur.close()
You can use a simple way, i use this method in C# and Xamarin,
public class LoginService : ILoginService
{
private SQLiteConnection dbconn;
}
in login service class, i have many methods for acces to the data in sqlite, i stored the data into a table, and the login page
it only shows when the user is not logged in.
for this purpose I only need to know if the table exists, in this case if it exists it is because it has data
public int ExisteSesion()
{
var rs = dbconn.GetTableInfo("Sesion");
return rs.Count;
}
if the table does not exist, it only returns a 0, if the table exists it is because it has data and it returns the total number of rows it has.
In the model I have specified the name that the table must receive to ensure its correct operation.
[Table("Sesion")]
public class Sesion
{
[PrimaryKey]
public int Id { get; set; }
public string Token { get; set; }
public string Usuario { get; set; }
}
Look into the "try - throw - catch" construct in C++. Most other programming languages have a similar construct for handling errors.

Resources