I have an application with a Room database, including DAO and Repository. The sequence of events is as follows:
Insert a new row.
Send the row to the server
Update the row to indicate the row has been sent.
The problem is that the ID of the new row is not always available before I send to the server, or when I do the update. Therefore the DAO update method doesn't work since it doesn't have a proper ID.
Is there a trick to making the Insert return the ID before I go on? I know that is circumventing the benefit of an async process, but may be necessary in this case.
I was hoping someone else may have solved this problem.
The problem is that the ID of the new row is not always available before I send to the server
For tables in Room there is always a value that uniquely identifies the inserted row available if the row is inserted via the convenience #Insert (different matter if using #Query("INSERT INTO ....."))
note if the row was not inserted due to a trapped/handled conflict then the value returned will be -1.
Is there a trick to making the Insert return the ID before I go on?
Not really a trick, at least if IDs are an integer type, ,more what is expected/anticipated usage. Just have the insert Dao return Long (or long if java)
e.g. (Java)
#Insert
long insert(TheEntity theEntity);
or (Kotlin)
#Insert
fun insert(theEntity: TheEntity): Long
The Long/long returned will be the value of the rowid column or an alias thereof.
rowid is a column that ALL tables have (except those defined with WITHOUT ROWID, which Room doesn't allow), although it is a hidden column. If you code an integer type annotated with #PrimaryKey then the column will be an alias of the rowid.
note by integer type you could have
Long, long,
Int, int Integer,
Short, short,
Byte, byte,
even, albeit it pretty useless, Boolean, boolean.
However, Long,long is the most suitable value for an id column as SQLite can store a 64bit signed integer which equates to Long/long. Hence why inserts can only return Long/long.
The is little if any advantage to not using Long/long SQLite will store the value in as little space as possible from 1 byte - 8 bytes. So a value of 1 be it byte/short/int/long will take up a single byte when stored in the database.
e.g. for Kotlin code fun insert(toDoData: ToDoData): Int and you get error: Not sure how to handle insert method's return type. public abstract int insert(#org.jetbrains.annotations.NotNull()
Non-Integer ID types/Composite Primary Keys
Other column types String, Byte[], Double, Float etc, or composite primary keys, will not be aliases of the rowid column. To get the primary key from these other types you could use the rowid returned, as it is unique to the row, to then get the primary key e.g. (for #PrimaryKey val id: String)
#Query("SELECT id FROM todo_table WHERE rowid=:rowid" )
fun getPrimaryKeyByRowid(rowid: Long): String`
Of course you could bypass getting the PrimayKey value(s) and pass and then use the rowid value to determine the row to update, but not using the #Update convenience method/function but an #Query update with UPDATE .... WHERE rowid=:passRowid.
Related
We are using a software that has limited Oracle capabilities. I need to filter through a CLOB field by making sure it has a specific value. Normally, outside of this software I would do something like:
DBMS_LOB.SUBSTR(t.new_value) = 'Y'
However, this isn't supported so I'm attempting to use CAST instead. I've tried many different attempts but so far these are what I found:
The software has a built-in query checker/validator and these are the ones it shows as invalid:
DBMS_LOB.SUBSTR(t.new_value)
CAST(t.new_value AS VARCHAR2(10))
CAST(t.new_value AS NVARCHAR2(10))
However, the validator does accept these:
CAST(t.new_value AS VARCHAR(10))
CAST(t.new_value AS NVARCHAR(10))
CAST(t.new_value AS CHAR(10))
Unfortunately, even though the validator lets these ones go through, when running the query to fetch data, I get ORA-22835: Buffer too small when using VARCHAR or NVARCHAR. And I get ORA-25137: Data value out of range when using CHAR.
Are there other ways I could try to check that my CLOB field has a specific value when filtering the data? If not, how do I fix my current issues?
The error you're getting indicates that Oracle is trying to apply the CAST(t.new_value AS VARCHAR(10)) to a row where new_value has more than 10 characters. That makes sense given your description that new_value is a generic audit field that has values from a large number of different tables with a variety of data lengths. Given that, you'd need to structure the query in a way that forces the optimizer to reduce the set of rows you're applying the cast to down to just those where new_value has just a single character before applying the cast.
Not knowing what sort of scope the software you're using provides for structuring your code, I'm not sure what options you have there. Be aware that depending on how robust you need this, the optimizer has quite a bit of flexibility to choose to apply predicates and functions on the projection in an arbitrary order. So even if you find an approach that works once, it may stop working in the future when statistics change or the database is upgraded and Oracle decides to choose a different plan.
Using this as sample data
create table tab1(col clob);
insert into tab1(col) values (rpad('x',3000,'y'));
You need to use dbms_lob.substr(col,1) to get the first character (from the default offset= 1)
select dbms_lob.substr(col,1) from tab1;
DBMS_LOB.SUBSTR(COL,1)
----------------------
x
Note that the default amount (= length) of the substring is 32767 so using only DBMS_LOB.SUBSTR(COL) will return more than you expects.
CAST for CLOB does not cut the string to the casted length, but (as you observes) returns the exception ORA-25137: Data value out of range if the original string is longert that the casted length.
As documented for the CAST statement
CAST does not directly support any of the LOB data types. When you use CAST to convert a CLOB value into a character data type or a BLOB value into the RAW data type, the database implicitly converts the LOB value to character or raw data and then explicitly casts the resulting value into the target data type. If the resulting value is larger than the target type, then the database returns an error.
CREATE TABLE [sql_table1] ([c0] varbinary(25) NOT NULL primary key)
go
insert into sql_table1 values (0x3200),(0x32);
go
I get
Cannot insert duplicate key in object 'dbo.sql_table'. The duplicate
key value is (0x32).
Why? 0x32 does not equal 0x3200
It gets right padded. BINARY data gets tricky when you try to specify what should normally be equivalent numerically hex values. If you try this it will work:
insert into #sql_table1 values (0x32),(CAST(50 as VARBINARY(25)));
-- inserts 0x00000032
-- and 0x32
But these are numerically equivalent. Generally speaking, it's a bad idea to have a BINARY column of any sort be a primary key or try to put a unique index on it (moreso than a CHAR/VARCHAR/NVARCHAR column) - any application that inserts into it is going to almost certainly be CASTing from some native format/representation to binary, but there's no guarantee that that CAST actually works in a unique manner in either direction - did the application insert the value 50 (= 0x32), or did it try to insert the literal 0x32, or did it try to insert the ASCII value of 2 (= 0x32), or did it insert the first byte(s) of something else....? If one app submits 0x32 and another 0x0032 are they the same or different (SQL Server says different - see below)?
The bottom line is that SQL Server is going to behave unusually if you try to compare binary columns flat out without some context. What will work is comparing binary data using BINARY_CHECKSUM
SELECT BINARY_CHECKSUM(0x32) -- 50
SELECT BINARY_CHECKSUM(0x320) -- 16! -- it's treating this as having a different number or ordering of bytes
SELECT BINARY_CHECKSUM(0x3200) -- 50
SELECT BINARY_CHECKSUM(0x32000) -- 16
SELECT BINARY_CHECKSUM(0x0032) -- 50
SELECT BINARY_CHECKSUM(0x00032) -- 50
SELECT BINARY_CHECKSUM(0x000000000032) -- 50
but again, this only helps you see that the hexidecimal represenation of the binary data isn't going to work exactly the way it would seem. The point is, your primary key is going to be based on the BINARY_CHECKSUMs of the data instead of any particular format/representation of the data. Normally that's a good thing, but with binary data (and padding) it becomes a lot trickier. Even then, in my example above the BINARY_CHECKSUM of both columns will be exactly the same (SELECT BINARY_CHECKSUM(c0) FROM sql_table1 will output 50 for both rows). Weird - a little further testing is showing that any different number of leading 0s that fit into the column length will bypass the unique check even though the checksum is the same (e.g. VALUES (0x32), (0x032), (0x0032) etc.).
This only gets worse if you start throwing different versions of SQL Server into the mix (per MSDN documentation).
What you should do for PK/Unique design on a table is figure out what context will make sense of this data - an order number, a file reference, a timestamp, a device ID, some other business or logical identifier, etc.... If nothing else, pseudokey it with an IDENTITY column.
tldnr;
How should I check the bounds on my SqlParameter values before attempting to put them in the database?
Longer version:
So, I have these dynamically generated SQL statements where I pass in a bunch of SqlParameters.
The way we declare SqlParameters is just
new SqlParameter("fieldName", value)
and we let the runtime figure out what the dbtype is.
That said, occasionally the update / insert statement fails, and we'd like to determine which field is too big (since our server only tells us that a field update failed, not WHICH one), by doing bounds checking. That is we can't put a two digit number in a column which only allows 1 digit (say a decimal(1,0) for example.)
We have the schema of the columns in memory (information_schema.columns ftw), so we could just try to do bounds checking on the SqlParameter value, but since the value is an object and not even necessarily a numeric type, how should I check that a value is in the range?
Or am I making the problem too hard and instead should have supplied the precision and scale when constructing the SqlParameters to begin with? Or even better, should we be using types that reflect what the columns in the database are?
Update:
Setting the precision / scale doesn't seem to have any consequence as seen in this code:
decimal d= 10.0M*(decimal)Math.Pow(10, 6);
SqlParameter p = new SqlParameter("someparam", SqlDbType.Decimal);
p.Precision = (byte)1;
p.Scale = (byte)0;
p.Value = d;
Console.WriteLine(p.SqlValue); // doesn't throw an error, i would think the sql value would be at most 10, not 10 million.
It seems that SqlParameter does not validate upon the Value property being set. And DataColumn does not allow for specifying either Precision or Scale so not terribly useful. However, there is a way:
Using the collection of schema info that you already have, dynamically create an array of SqlMetaData based on the size of the schema collection and populate it with the column name and size data:
SqlMetaData[] _TempColumns = new SqlMetaData[_SchemaCollection.Count];
loop-of-some-sort
{
switch (_SchemaCollection.DataType)
{
case "decimal":
_TempColumns[_Index] = new SqlMetaData(
_SchemaCollection.Name,
SqlDbType.Decimal,
(byte)_SchemaCollection.Precision,
(byte)_SchemaCollection.Scale
);
break;
case "others...."
}
}
Create a new SqlDataRecord using the SqlMetaData[] from step 1:
SqlDataRecord _TempRow = new SqlDataRecord(_TempColumns);
loop through _TempRow calling the appropriate Set method for each position, in a try / catch:
string _DataAintRight;
try
{
_TempRow.SetDecimal(_Index, _SchemaCollection.Value);
}
catch
{
_DataAintRight = _SchemaCollection.Name;
break;
}
NOTES:
This will only do the same validation that passing params to a proc would do. Meaning, it will silently truncate values that are too long, such as too many digits to the right of a decimal point, and a string that exceeds the max size.
Fixed-length numeric types should already be in their equivalent .Net types (i.e. SMALLINT value in an Int16 variable or property) and hence are already pre-verified. If this is indeed the case then there is no additional benefit from testing them. But if they currently reside in a more generic container (a larger Int type or even a string), then testing here is appropriate.
If you need to know that a string will be truncated, then that has to be tested separately. At least not as SqlMetaData, but in the loop and switch, just test the length of the string in that case.
Regardless of any of this testing stuff, it is best to not create parameters by having .Net guess the type via: new SqlParameter("fieldName", value) or even _Command.Parameters.AddWithValue(). So regarding the question of if you "should have supplied the precision and scale when constructing the SqlParameters to begin with", absolutely yes.
Another option (which I can elaborate on tomorrow when I will have time to update this) is to validate everything as if there were no built-in containers that are supposed to be reflections of the real database/provider datatypes. So, there are two main considerations that will drive the implementation:
Is the source data currently strongly typed or is it all serialized as strings?
and
Is there a need to know if the value will be truncated (specifically in cases where the value would otherwise be silently truncated, not causing an error, which could lead to unexpected behavior). The issue here is that inserting data into a field in a table that exceeds the specified max length will cause a string or binary data will be truncated error. But when passing data to a parameter (i.e. not direct to a table) that data will be truncated without causing an error. Sometimes this is ok, but sometimes this can lead to a situation where an input parameter has been specified incorrectly (or was correct but then the field was expanded and the parameter was never updated to match the new length) and might be chopping off the ends of some values that goes undetected until a customer reports that "something doesn't look quite right on a report, and oh, by the way, this has been happening off and on for maybe four or five months now, but I have been really busy and kept forgetting to mention it, maybe it's been nine months, I can't remember, but yeah, something's wrong". I mean, how often do we test our code by passing in the max value for each parameter to make sure that the system can handle it?
If the source data is in the appropriate .Net types:
There are several that do not need to be checked since fixed-length numeric types are the same between .Net and SQL Server. The ones that are pre-validated simply by existing in their respective .Net types are:
bool -> BIT
byte -> TINYINT
Int16 -> SMALLINT
Int32 -> INT
Int64 -> BIGINT
Double -> FLOAT
Single -> REAL
Guid -> UNIQUEIDENTIFIER
There are some that need to be checked only for truncation (if that is a concern) as their values should always be in the same range as their SQL Server counterparts. Keep in mind that here we are talking about strict truncation when the values are passed to parameters of a smaller scale, but will actually round up (well, at 5) when inserted directly into a column having a smaller scale. For example, sending in a DateTime value that is accurate to 5 decimal places will truncate the 3 right-most numbers when passed to a parameter defined as DATETIME2(2).
DateTimeOffset -> DATETIMEOFFSET(0 - 7)
DateTime -> DATETIME2(0 - 7)
DateTime -> DATE : 0001-01-01 through 9999-12-31 (no time)
TimeSpan -> TIME(0 - 7) : 00:00:00.0000000 through 23:59:59.9999999
There are some that need to be checked to make sure that they are not out of the valid range for the SQL Server datatype as out of range would cause an exception. They also possibly need to be checked for truncation (if that is a concern). Keep in mind that here we are talking about strict truncation when the values are passed to parameters of a smaller scale, but will actually round up (well, at 5) when inserted directly into a column having a smaller scale. For example, sending in a DateTime value will lose all seconds and fractional seconds when passed to a parameter defined as SMALLDATETIME.
DateTime -> DATETIME : 1753-01-01 through 9999-12-31, 00:00:00.000 through 23:59:59.997
DateTime -> SMALLDATETIME : 1900-01-01 through 2079-06-06, 00:00 through 23:59 (no seconds)
Decimal -> MONEY : -922,337,203,685,477.5808 to 922,337,203,685,477.5807
Decimal -> SMALLMONEY : -214,748.3648 to 214,748.3647
Decimal -> DECIMAL : range = -9[digits = (Precision - Scale)] to 9[digits = (Precision - Scale)], truncation depends on defined Scale
The following string types will silently truncate when passed to a parameter with a max length that is less than the length of their value, but will error with String or binary data would be truncated if directly inserted into a column with a max length that is less than the length of their value:
byte[] -> BINARY
byte[] -> VARBINARY
string -> CHAR
string -> VARCHAR
string -> NCHAR
string -> NVARCHAR
The following is tricky as the true validation requires knowing more the options it was created with in the database.
string -> XML -- By default an XML field is untyped and is hence very lenient regarding "proper" XML syntax. However, that behavior can be altered by associating an XML Schema Collection (1 or more XSDs) with the field for validation (see also: Compare Typed XML to Untyped XML). So true validation of an XML field would include getting that info, if it exists, and if so, checking against those XSDs. At the very least it should be well-formed XML (i.e. '<b>' will fail, but '<b />' will succeed).
For the above types, the pre-validated types can be ignored. The rest can be tested in a switch(DestinationDataType) structure:
Types that need to be validated for ranges can be done as follows
case "smalldatetime":
if ((_Value < range_min) || (_Value > range_max))
{
_ThisValueSucks = true;
}
break;
Numeric/DateTime truncation, if being tested for, might be best to do a ToString() and using IndexOf(".") for most, or IndexOf(":00.0") for DATE and SMALLDATETIME, to find the number of digits to the right of the decimal (or starting at the "seconds" for SMALLDATETIME)
String truncation, if being tested for, is a simple matter of testing the length.
Decimal range can be tested either numerically:
if ((Math.Floor(_Value) < -999) || (Math.Floor(_Value) > 999))
or:
if (Math.Abs(Math.Floor(_Value)).ToString().Length <= DataTypeMaxSize)
Xml
as XmlDocument is pre-validated outside of potential XSD validation associated with the XML field
as String could first be used to create an XmlDocument, which only leaves any potential XSD validation associated with the XML field
If the source data is all string:
Then they all need to be validated. For these you would first use TryParse methods associated to each type. Then you can apply the rules as noted above for each type.
I have a bunch of records in several tables in a database that have a "process number" field, that's basically a number, but I have to store it as a string both because of some legacy data that has stuff like "89a" as a number and some numbering system that requires that process numbers be represented as number/year.
The problem arises when I try to order the processes by number. I get stuff like:
1
10
11
12
And the other problem is when I need to add a new process. The new process' number should be the biggest existing number incremented by one, and for that I would need a way to order the existing records by number.
Any suggestions?
Maybe this will help.
Essentially:
SELECT process_order FROM your_table ORDER BY process_order + 0 ASC
Can you store the numbers as zero padded values? That is, 01, 10, 11, 12?
I would suggest to create a new numeric field used only for ordering and update it from a trigger.
Can you split the data into two fields?
Store the 'process number' as an int and the 'process subtype' as a string.
That way:
you can easily get the MAX processNumber - and increment it when you need to generate a
new number
you can ORDER BY processNumber ASC,
processSubtype ASC - to get the
correct order, even if multiple records have the same base number with different years/letters appended
when you need the 'full' number you
can just concatenate the two fields
Would that do what you need?
Given that your process numbers don't seem to follow any fixed patterns (from your question and comments), can you construct/maintain a process number table that has two fields:
create table process_ordering ( processNumber varchar(N), processOrder int )
Then select all the process numbers from your tables and insert into the process number table. Set the ordering however you want based on the (varying) process number formats. Join on this table, order by processOrder and select all fields from the other table. Index this table on processNumber to make the join fast.
select my_processes.*
from my_processes
inner join process_ordering on my_process.processNumber = process_ordering.processNumber
order by process_ordering.processOrder
It seems to me that you have two tasks here.
• Convert the strings to numbers by legacy format/strip off the junk• Order the numbers
If you have a practical way of introducing string-parsing regular expressions into your process (and your issue has enough volume to be worth the effort), then I'd
• Create a reference table such as
CREATE TABLE tblLegacyFormatRegularExpressionMaster(
LegacyFormatId int,
LegacyFormatName varchar(50),
RegularExpression varchar(max)
)
• Then, with a way of invoking the regular expressions, such as the CLR integration in SQL Server 2005 and above (the .NET Common Language Runtime integration to allow calls to compiled .NET methods from within SQL Server as ordinary (Microsoft extended) T-SQL, then you should be able to solve your problem.
• See
http://www.codeproject.com/KB/string/SqlRegEx.aspx
I apologize if this is way too much overhead for your problem at hand.
Suggestion:
• Make your column a fixed width text (i.e. CHAR rather than VARCHAR).
• Pad the existing values with enough leading zeros to fill each column and a trailing space(s) where the values do not end in 'a' (or whatever).
• Add a CHECK constraint (or equivalent) to ensure new values conform to the pattern e.g. something like
CHECK (process_number LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][ab ]')
• In your insert/update stored procedures (or equivalent), pad any incoming values to fit the pattern.
• Remove the leading/trailing zeros/spaces as appropriate when displaying the values to humans.
Another advantage of this approach is that the incoming values '1', '01', '001', etc would all be considered to be the same value and could be covered by a simple unique constraint in the DBMS.
BTW I like the idea of splitting the trailing 'a' (or whatever) into a separate column, however I got the impression the data element in question is an identifier and therefore would not be appropriate to split it.
You need to cast your field as you're selecting. I'm basing this syntax on MySQL - but the idea's the same:
select * from table order by cast(field AS UNSIGNED);
Of course UNSIGNED could be SIGNED if required.
Following up this question: "Database enums - pros and cons", I'd like to know which database systems support enumeration data types, and a bit of detail on how they do it (e.g. what is stored internally, what are the limits, query syntax implications, indexing implications, ...).
Discussion of use cases or the pros and cons should take place in the other questions.
I know that MySQL does support ENUM:
the data type is implemented as integer value with associated strings
you can have a maximum of 65.535 elements for a single enumeration
each string has a numerical equivalent, counting from 1, in the order of definition
the numerical value of the field is accessible via "SELECT enum_col+0"
in non-strict SQL mode, assigning not-in-list values does not necessarily result in an error, but rather a special error value is assigned instead, having the numerical value 0
sorting occurs in numerical order (e.g. order of definition), not in alphabetical order of the string equivalents
assignment either works via the value string or the index number
this: ENUM('0','1','2') should be avoided, because '0' would have integer value 1
PostgreSQL supports ENUM from 8.3 onwards. For older versions, you can use :
You can simulate an ENUM by doing something like this :
CREATE TABLE persons (
person_id int not null primary key,
favourite_colour varchar(255) NOT NULL,
CHECK (favourite_colour IN ('red', 'blue', 'yellow', 'purple'))
);
You could also have :
CREATE TABLE colours (
colour_id int not null primary key,
colour varchar(255) not null
)
CREATE TABLE persons (
person_id int not null primary key,
favourite_colour_id integer NOT NULL references colours(colour_id),
);
which would have you add a join when you get to know the favorite colour, but has the advantage that you can add colours simply by adding an entry to the colour table, and not that you would not need to change the schema each time. You also could add attribute to the colour, like the HTML code, or the RVB values.
You also could create your own type which does an enum, but I don't think it would be any more faster than the varchar and the CHECK.
Oracle doesn't support ENUM at all.
AFAIK, neither IBM DB2 nor IBM Informix Dynamic Server support ENUM types.
Unlike what mat said, PostgreSQL does support ENUM (since version
8.3, the last one):
essais=> CREATE TYPE rcount AS ENUM (
essais(> 'one',
essais(> 'two',
essais(> 'three'
essais(> );
CREATE TYPE
essais=>
essais=> CREATE TABLE dummy (id SERIAL, num rcount);
NOTICE: CREATE TABLE will create implicit sequence "dummy_id_seq" for serial column "dummy.id"
CREATE TABLE
essais=> INSERT INTO dummy (num) VALUES ('one');
INSERT 0 1
essais=> INSERT INTO dummy (num) VALUES ('three');
INSERT 0 1
essais=> INSERT INTO dummy (num) VALUES ('four');
ERROR: invalid input value for enum rcount: "four"
essais=>
essais=> SELECT * FROM dummy WHERE num='three';
id | num
----+-------
2 | three
4 | three
There are functions which work specifically on enums.
Indexing works fine on enum types.
According to the manual, implementation is as follows:
An enum value occupies four bytes on disk. The length of an enum value's textual label is limited by the NAMEDATALEN setting compiled into PostgreSQL; in standard builds this means at most 63 bytes.
Enum labels are case sensitive, so 'happy' is not the same as 'HAPPY'. Spaces in the labels are significant, too.
MSSQL doesn't support ENUM.
When you use Entity Framework 5, you can use enums (look at: Enumeration Support in Entity Framework and EF5 Enum Types Walkthrough), but even then the values are stored as int in the database.