For my Cassandra Database, I need to set a value in column for all rows in my table.
I see in SQL, we can do :
UPDATE table SET column1= XXX;
but in CQL (in cqlsh), It doesn't work !
I don't want to update row by row until 9500 rows.
Do you have any suggestion ?
Thank you :)
You can use update query with IN clause instead of executing 9500 query.
At first select primary_key from your table and then copy values to this query:
UPDATE table SET column1 = XXX WHERE primary_key IN (p1, p2, p3, ...);
I just added a new column to a table (+60000 rows), and I looked the way to initialize all the values of the column with something (not null), and I found nothing. Is not the same asked here, but if you drop and add the column my solution will solve it. So, this is what I did:
cqlsh> COPY tablename (primary_key, newcolumn) TO 'FILE.txt'
Open FILE.TXT on notepad++ and press Ctrl+H (Replace option), and replace all the \r\n with 'something\r\n'
And finally,
cqlsh> COPY tablename (primary_key, newcolumn) FROM 'FILE.txt'
Note1: You should be carefull if you primary_key contains \r\n.
Note2: May be in your SO the lines doesn't ends with \r\n.
As you are finding out, CQL != SQL. There is no way to do what you're asking in CQL, short of iterating through each row in your table.
Robert's suggestion about redefining column1 to be a static column may help. But static columns are tied to their partition key, so you would still need to specify that:
aploetz#cqlsh:stackoverflow2> UPDATE t SET s='XXX' WHERE k='k';
Also, it sounds like you only want to be able to set a column value for all rows. A static column won't work for you if you want that column value to be different for CQL rows within a partition (from the example in the DataStax docs):
aploetz#cqlsh:stackoverflow2> INSERT INTO t (k, s, i) VALUES ('k', 'I''m shared', 0);
aploetz#cqlsh:stackoverflow2> INSERT INTO t (k, s, i) VALUES ('k', 'I''m still shared', 1);
aploetz#cqlsh:stackoverflow2> SELECT * FROM t;
k | i | s
---+---+------------------
k | 0 | I'm still shared
k | 1 | I'm still shared
(2 rows)
Note that the value of column s is the same across all CQL rows under partition key k. Just so you understand how that works.
Related
I ingest data into a table source_table with AVRO data. There is a column in this table say "avro_data" which will be populated with variant data.
I plan to copy data into a structured table target_table where columns have the same name and datatype as the avro_data fields in the source table.
Example:
select avro_data from source_table
{"C1":"V1", "C2", "V2"}
This will result in
select * from target_table
------------
| C1 | C2 |
------------
| V1 | V2 |
------------
My question is when schema of the avro_data evolves and new fields get added, how can I keep schema of the target_table in sync by adding equivalent columns in the target table?
Is there anything out of the box in snowflake to achieve this or if someone has created any code to do something similar?
Here's something to get you started. It shows how to take a variant column and parse out the internal columns. This uses a table in the Snowflake sample data database, which is not always the same. You can to adjust the table name and column name.
SELECT DISTINCT regexp_replace(regexp_replace(f.path,'\\\\[(.+)\\\\]'),'(\\\\w+)','\"\\\\1\"') AS path_name, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING') AS attribute_type, -- This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\\\[(.+)\\\\]'),'[^a-zA-Z0-9]','_') AS alias_name -- This generates column aliases based on the path
FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."JCUSTOMER",
LATERAL FLATTEN("CUSTOMER", RECURSIVE=>true) f
WHERE TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[');
This is a snippet of code modified from here: https://community.snowflake.com/s/article/Automating-Snowflake-Semi-Structured-JSON-Data-Handling. The blog author attributes credit to a colleague for this section of code.
While the current incarnation of the stored procedure will create a view from the internal columns in a variant, an alternate version could create and/or alter a table to keep it in sync with changes.
I have a table like this:
As you can see, the rows 2 and 3 are similar, and row 3 is a useless duplicate. My question is how can we delete row 3 only but keep row2 and row4 at the same time.
Like this:
Thanks for your help!
You don't have duplicates. If you had a heap table with identical records, then every value in one or more records would be the same. One means of dealing with this would be to add an identity column. Then the identity column can be used to remove some but not all of the duplicates.
In your case, you want to delete records if another record exists that is similar and perhaps has "better" data. You can use an EXISTS clause to do this. The logic below is not what you want, but it should give you the idea of how to handle this.
DELETE t
FROM MyTable t
WHERE t.BCT IS NULL -- delete only records with no values?
AND t.BCS IS NULL
AND EXISTS( -- another record with a value exists, so this one might not be needed?
SELECT *
FROM MyTable x
WHERE (x.BCT IS NOT NULL OR t.BCS IS NOT NULL)
AND x.portCode = t.portCode
AND x.effDate = t.effDate
AND LEFT(x.issueName, 26) = LEFT(t.issueName, 26)
)
I have a table where one of the columns is a path to an image and I need to create a directory for the record being inserted.
Example:
Id | PicPath |<br>
1 | /Pics/1/0.jpg|<br>
2 | /Pics/2/0.jpg|
This way I can be sure that the folder name is always valid and it is unique (no clash between two records).
Question is: how can I safely refer to the current id of the record being insert? Keep in mind that this is a highly concurrent environment, and I would like to avoid multiple trips to the DB if possible.
I have tried the following:
insert into Dummy values(CONCAT('a', (select IDENT_CURRENT('Dummy'))))
and
insert into Dummy values(CONCAT('a', (select SCOPE_IDENTITY() + 1)))
The first query is not safe, for when running 1000 concurrent inserts I got 58 'duplicate key' exceptions.
The second query didn't work because SCOPE_IDENTITY() returned the same value for all queries as I suspected.
What are my alternatives here?
Try a temporary table to track your inserted ids using OUTPUT clause
INSERT #temp_ids(someval) OUTPUT inserted.identity_column
This will get all the inserted ids from your queries. 'inserted' is context safe.
Let's consider simple Excel table associated with SQL Server's table:
ID some_data
0 a
1 b
2 c
I'd like to extend it with manually added column (not present in SQL Server's table):
ID some_data my_column
0 a some_data_for_0
1 b some_data_for_1
2 c some_data_for_2
However, when source data are changed (rows inserted / deleted / updated) the relation between my_column and ID column is not preserved. For example, when new row (3, d) is added:
ID some_data my_column
0 a some_data_for_0
1 b some_data_for_1
2 c
3 d some_data_for_2
Is there any Excel built-in solution that would allow me to specify how my_column rows should be ordered in relation to ID column or do I need to implement it by myself using VBA?
You could use an ORDER BY clause in your SQL statement, but even that's not very reliable. The only reliable way to do this is store your additional data in its own table and use a formula to relate it to the SQL data.
On a separate worksheet, put
ID my_column
0 some_data_for_0
1 some_data_for_1
2 some_data_for_2
Now in a column adjacent to the SQL data, put
=IFERROR(VLOOKUP([#ID],tblAddtlInfo,2,FALSE),"")
However the SQL data is sorted, the additional info will be in the right row. This assumes you made your additional info list into a table and named it tblAddtlInfo.
If you want to get fancy, you can write some code in the Change event that looks for non-formulas in the extra column. If the formula gets over written, then grab the new data, add it to (or update) your additional info table, and restore the formula. Then you can type the data in the row, but maintain the integrity by moving it to a different table.
i have a table like this :
CREATE TABLE [Mytable](
[Name] [varchar](10),
[number] [nvarchar](100) )
i want to find [number]s that include Alphabet character?
data must format like this:
Name | number
---------------
Jack | 2131546
Ali | 2132132154
but some time number insert informed and there is alphabet char and other no numeric char in it, like this:
Name | number
---------------
Jack | 2[[[131546ddfd
Ali | 2132*&^1ASEF32154
i wanna find this informed row.
i can't use 'Like' ,because 'Like' make my query very slow.
Updated to find all non numeric characters
select * from Mytable where number like '%[^0-9]%'
Regarding the comments on performance maybe using clr and regex would speed things up slightly but the bulk of the cost for this query is going to be the number of logical reads.
A bit outside the box, but you could do something like:
bulk copy the data out of your table into a flat file
create a table that has the same structure as your original table but with a proper numeric type (e.g. int) for the [number] column.
bulk copy your data into this new table, making sure to specify a batch size of 1 and an error file (where rows that won't fit the schema will go)
rows that end up in the error file are the rows that have non-numerics in the [number] column
Of course, you could do the same thing with a cursor and a temp table or two...