Concurrency control for table generated Transaction Id - sql-server

I have a system in which I generate the transaction ids based on a table. The number must have a specific format therefore using database sequences is not an option. Moreover, the number transactions types is variable which means creating variable number of sequences. The table has the following structure:
public class TransactionSequence : BaseEntity<int>, IAggregateRoot
{
public int Year { get; set; }
public string Prefix { get; set; }
public long Sequence { get; set; }
public string Service { get; set; }
public int Length { get; set; }
public bool IsCurrent { get; set; }
}
The code for the service which generates the number is as shown below:
public class NumberingService : INumberingService
{
public static int yearLength = 2;
public static int monthLength = 2;
public static int dayLength = 2;
private readonly IRepository<TransactionSequence> _repository;
private readonly NumberingConfiguration _numbering;
public NumberingService(IRepository<TransactionSequence> repository,
NumberingConfiguration numbering)
{
_repository = repository;
_numbering = numbering;
}
public async Task<Result<string>> GetNextNumberAsync(string service, int maxlength, UserEntity sysUser, string prefix = "")
{
string transactionId = string.Empty;
try
{
var spec = new CurrentYearNumberingSpec(service);
var sequence = await _repository.GetBySpecAsync(spec);
if (sequence == null)
{
await AddServiceNumberingAsync(service, maxlength, sysUser, prefix);
sequence = await _repository.GetBySpecAsync(spec);
}
sequence.Sequence = sequence.Sequence + 1;
await _repository.UpdateAsync(sequence);
int month = DateTime.Now.Month;
int day = DateTime.Now.Day;
var length = GetLength(sequence);
transactionId = sequence.Prefix + (sequence.Year % 100).ToString("D" + 2) + month.ToString("D" + 2) + day.ToString("D" + 2) + sequence.Sequence.ToString("D" + length);
}
catch (Exception ex)
{
return Result<string>.Error(ex.Message);
}
return Result<string>.Success(transactionId, "Retrieved the next number in the sequence succesfully!");
}
private static int GetLength(TransactionSequence sequence)
{
return sequence.Length - sequence.Prefix.Length - dayLength - monthLength - yearLength;
}
}
Note: I am only showing an excerpt of the code that contains the relative info!
The problem:
Since the system is highly concurrent, each request tries to obtain a transaction id when submitted. Thus, there is a high contention for TransactionSequence row which is currently active since it will be generating the transaction id via subsequent updates. This means that there will absolutely be locking.
Solutions I tried:
1- Optimistic Concurrency via ROWVERSION with retries, this had the worst performance since optimistic concurrency makes sense only if the collision possibility is rare! But since the collision is almost guaranteed this solution had the worst performance. Either that or I did not implement it correctly!
2- Locking via SemaphoreSlim, this had an acceptable performance but its problem was that it would not scale in load balanced scenarios.
3- Distributed Locking via Redis, this had an approximate performance to SemaphoreSlim but still not the performance I am looking for!
4- Queuing via RabbitMQ with prefetch size of 1, This had a better performance than the aforementioned solutions but still I wonder if there is an optimal solution!
5- Using HiLo algorithm, I did not implement this but I have read about it as in the link below:
CQS with Database-Generated Ids
I want to know if there is a better or a well-known solution to this problem
My Environment:
ASP .NET CORE 6
EF CORE 6
SQL SERVER 2019
I hope this was clear enough, and thanks in advance!

Related

WPF ItemsControl doesnt update when I add a new item in ItemsSource [duplicate]

I've got a WCF service that passes around status updates via a struct like so:
[DataContract]
public struct StatusInfo
{
[DataMember] public int Total;
[DataMember] public string Authority;
}
...
public StatusInfo GetStatus() { ... }
I expose a property in a ViewModel like this:
public class ServiceViewModel : ViewModel
{
public StatusInfo CurrentStatus
{
get{ return _currentStatus; }
set
{
_currentStatus = value;
OnPropertyChanged( () => CurrentStatus );
}
}
}
And XAML like so:
<TextBox Text="{Binding CurrentStatus.Total}" />
When I run the app I see errors in the output window indicating that the Total property cannot be found. I checked and double checked and I typed it correctly. The it occurred to me that the errors specifically indicate that the 'property' cannot be found. So adding a property to the struct made it work just fine. But this seems odd to me that WPF can't handle one-way binding to fields. Syntactically you access them the same in code and it seem silly to have to create a custom view model just for the StatusInfo struct. Have I missed something about WPF binding? Can you bind to a field or is property binding the only way?
Binding generally doesn't work to fields. Most binding is based, in part, on the ComponentModel PropertyDescriptor model, which (by default) works on properties. This enables notifications, validation, etc (none of which works with fields).
For more reasons than I can go into, public fields are a bad idea. They should be properties, fact. Likewise, mutable structs are a very bad idea. Not least, it protects against unexpected data loss (commonly associated with mutable structs). This should be a class:
[DataContract]
public class StatusInfo
{
[DataMember] public int Total {get;set;}
[DataMember] public string Authority {get;set;}
}
It will now behave as you think it should. If you want it to be an immutable struct, that would be OK (but data-binding would be one-way only, of course):
[DataContract]
public struct StatusInfo
{
[DataMember] public int Total {get;private set;}
[DataMember] public string Authority {get;private set;}
public StatusInfo(int total, string authority) : this() {
Total = total;
Authority = authority;
}
}
However, I would first question why this is a struct in the first place. It is very rare to write a struct in .NET languages. Keep in mind that the WCF "mex" proxy layer will create it as a class at the consumer anyway (unless you use assembly sharing).
In answer to the "why use structs" reply ("unknown (google)"):
If that is a reply to my question, it is wrong in many ways. First, value types as variables are commonly allocated (first) on the stack. If they are pushed onto the heap (for example in an array/list) there isn't much difference in overhead from a class - a small bit of object header plus a reference. Structs should always be small. Something with multiple fields will be over-sized, and will either murder your stack or just cause slowness due to the blitting. Additionally, structs should be immutable - unlesss you really know what you are doing.
Pretty much anything that represents an object should be immuatable.
If you are hitting a database, the speed of struct vs class is a non-issue compared to going out-of-process and probably over the network. Even if it is a bit slower, that means nothing compared to the point of getting it right - i.e. treating objects as objects.
As some metrics over 1M objects:
struct/field: 50ms
class/property: 229ms
based on the following (the speed difference is in object allocation, not field vs property). So about 5x slower, but still very, very quick. Since this is not going to be your bottleneck, don't prematurely optimise this!
using System;
using System.Collections.Generic;
using System.Diagnostics;
struct MyStruct
{
public int Id;
public string Name;
public DateTime DateOfBirth;
public string Comment;
}
class MyClass
{
public int Id { get; set; }
public string Name { get; set; }
public DateTime DateOfBirth { get; set; }
public string Comment { get; set; }
}
static class Program
{
static void Main()
{
DateTime dob = DateTime.Today;
const int SIZE = 1000000;
Stopwatch watch = Stopwatch.StartNew();
List<MyStruct> s = new List<MyStruct>(SIZE);
for (int i = 0; i < SIZE; i++)
{
s.Add(new MyStruct { Comment = "abc", DateOfBirth = dob,
Id = 123, Name = "def" });
}
watch.Stop();
Console.WriteLine("struct/field: "
+ watch.ElapsedMilliseconds + "ms");
watch = Stopwatch.StartNew();
List<MyClass> c = new List<MyClass>(SIZE);
for (int i = 0; i < SIZE; i++)
{
c.Add(new MyClass { Comment = "abc", DateOfBirth = dob,
Id = 123, Name = "def" });
}
watch.Stop();
Console.WriteLine("class/property: "
+ watch.ElapsedMilliseconds + "ms");
Console.ReadLine();
}
}
I can only guess why they only support properties: perhaps because it seems to be a universal convention in the .NET framework never to expose mutable fields (probably to safeguard binary compatibility), and they somehow expected all programmers to follow the same convention.
Also, although fields and properties are accessed with the same syntax, data binding uses reflection, and (so I've heard) reflection must be used differently to access fields than to access properties.

Flink window function getResult not fired

I am trying to use event time in my Flink job, and using BoundedOutOfOrdernessTimestampExtractor to extract timestamp and generate watermark.
But I have some input Kafka having sparse stream, it can have no data for a long time, which makes the getResult in AggregateFunction not called at all. I can see data going into add function.
I have set getEnv().getConfig().setAutoWatermarkInterval(1000L);
I tried
eventsWithKey
.keyBy(entry -> (String) entry.get(key))
.window(TumblingEventTimeWindows.of(Time.minutes(windowInMinutes)))
.allowedLateness(WINDOW_LATENESS)
.aggregate(new CountTask(basicMetricTags, windowInMinutes))
also session window
eventsWithKey
.keyBy(entry -> (String) entry.get(key))
.window(EventTimeSessionWindows.withGap(Time.seconds(30)))
.aggregate(new CountTask(basicMetricTags, windowInMinutes))
All the watermark metics shows No Watermark
How can I let Flink to ignore that no watermark thing?
FYI, this is commonly referred to as the "idle source" problem. This occurs because whenever a Flink operator has two or more inputs, its watermark is the minimum of the watermarks from its inputs. If one of those inputs stalls, its watermark no longer advances.
Note that Flink does not have per-key watermarking -- a given operator is typically multiplexed across events for many keys. So long as some events are flowing through a given task's input streams, its watermark will advance, and event time timers for idle keys will still fire. For this "idle source" problem to occur, a task has to have an input stream that has become completely idle.
If you can arrange for it, the best solution is to have your data sources include keepalive events. This will allow you to advance your watermarks with confidence, knowing that the source is simply idle, rather than, for example, offline.
If that's not possible, and if you have some sources that aren't idle, then you could put a rebalance() in front of the BoundedOutOfOrdernessTimestampExtractor (and before the keyBy), so that every instance continues to receive some events and can advance its watermark. This comes at the expense of an extra network shuffle.
Perhaps the most commonly used solution is to use a watermark generator that detects idleness and artificially advances the watermark based on a processing time timer. ProcessingTimeTrailingBoundedOutOfOrdernessTimestampExtractor is an example of that.
A new watermark with idleness capability has been introduced. Flink will ignore these idle watermarks while calculating the minimum so the single partition with the data will be considered.
https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarksWithIdleness.html
I have the same issue - a src that may be inactive for a long time.
The solution below is based on WatermarksWithIdleness.
It is a standalone Flink job that demonstrate the concept.
package com.demo.playground.flink.sleepysrc;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.eventtime.WatermarksWithIdleness;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.EventTimeSessionWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.time.Duration;
public class SleepyJob {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
final EventGenerator eventGenerator = new EventGenerator();
WatermarkStrategy<Event> strategy = WatermarkStrategy.
<Event>forBoundedOutOfOrderness(Duration.ofSeconds(5)).
withIdleness(Duration.ofSeconds(Constants.IDLE_TIME_SEC)).
withTimestampAssigner((event, timestamp) -> event.timestamp);
final DataStream<Event> events = env.addSource(eventGenerator).assignTimestampsAndWatermarks(strategy);
KeyedStream<Event, String> eventStringKeyedStream = events.keyBy((Event event) -> event.id);
WindowedStream<Event, String, TimeWindow> windowedStream = eventStringKeyedStream.window(EventTimeSessionWindows.withGap(Time.milliseconds(Constants.SESSION_WINDOW_GAP)));
windowedStream.allowedLateness(Time.milliseconds(1000));
SingleOutputStreamOperator<Object> result = windowedStream.process(new ProcessWindowFunction<Event, Object, String, TimeWindow>() {
#Override
public void process(String s, Context context, Iterable<Event> events, Collector<Object> collector) {
int counter = 0;
for (Event e : events) {
Utils.print(++counter + ") inside process: " + e);
}
Utils.print("--- Process Done ----");
}
});
result.print();
env.execute("Sleepy flink src demo");
}
private static class Event {
public Event(String id) {
this.timestamp = System.currentTimeMillis();
this.eventData = "not_important_" + this.timestamp;
this.id = id;
}
#Override
public String toString() {
return "Event{" +
"id=" + id +
", timestamp=" + timestamp +
", eventData='" + eventData + '\'' +
'}';
}
public String id;
public long timestamp;
public String eventData;
}
private static class EventGenerator implements SourceFunction<Event> {
#Override
public void run(SourceContext<Event> ctx) throws Exception {
/**
* Here is the sleepy src - after NUM_OF_EVENTS events are collected , the code goes to a SHORT_SLEEP_TIME sleep
* We would like to detect this inactivity and FIRE the window
*/
int counter = 0;
while (running) {
String id = Long.toString(System.currentTimeMillis());
Utils.print(String.format("Generating %d events with id %s", 2 * Constants.NUM_OF_EVENTS, id));
while (counter < Constants.NUM_OF_EVENTS) {
Event event = new Event(id);
ctx.collect(event);
counter++;
Thread.sleep(Constants.VERY_SHORT_SLEEP_TIME);
}
// here we create a delay:
// a time of inactivity where
// we would like to FIRE the window
Thread.sleep(Constants.SHORT_SLEEP_TIME);
counter = 0;
while (counter < Constants.NUM_OF_EVENTS) {
Event event = new Event(id);
ctx.collect(event);
counter++;
Thread.sleep(Constants.VERY_SHORT_SLEEP_TIME);
}
Thread.sleep(Constants.LONG_SLEEP_TIME);
}
}
#Override
public void cancel() {
this.running = false;
}
private volatile boolean running = true;
}
private static final class Constants {
public static final int VERY_SHORT_SLEEP_TIME = 300;
public static final int SHORT_SLEEP_TIME = 8000;
public static final int IDLE_TIME_SEC = 5;
public static final int LONG_SLEEP_TIME = SHORT_SLEEP_TIME * 5;
public static final long SESSION_WINDOW_GAP = 60 * 1000;
public static final int NUM_OF_EVENTS = 4;
}
private static final class Utils {
public static void print(Object obj) {
System.out.println(new java.util.Date() + " > " + obj);
}
}
}
For others, make sure there's data coming out of all your topics' partitions if you're using Kafka
I know it sounds dumb, but in my case I had a single source and the problem was still happening, because I was testing with very little data in a single Kafka topic (single source) that had 10 partitions. The dataset was so small that some of the topic's partitions did not have anything to give and, although I had only one source (the one topic), Flink did not increase the Watermark.
The moment I switched my source to a topic with a single partition the Watermark started to advance.

Dapper - two columns to be switched over time (controlled by config value)

I have the following problem:
A several tables with "data", "token_data" columns that switch their values over time
Phases:
In the current phase 0, there is only "data" column (clear data).
In phase 1 there will be "data", "token_data" columns.
In the phase 2, there will be "token_data", "clear_data" columns.
In the last phase 3, there should be only "data" column (by that
time it should be tokenized).
We currently have all dapper/db models with phase 0 in mind.
Is there a way to prepare Dapper models for all 4 phases? I was looking for OptionalColumn attribute, but couldn't find one.
Ideally there would be a global config switch that would control which Dapper model property represents the tokenized "data" column.
Like:
// Not good
[Column("Name")]
public string Name
{
get { return AppSettings.TokenizationEnabled ? this.TokenName : _name; }
set { _name = value; }
}
It's not 100% clear what you need to do. For example, why you can'y just created a class with all properties, and, depending on the phase, return the correct data for that phase. Something like:
class MyData {
public int Phase;
public String Data { private get; public set; }
public String Token_Data { private get; public set; }
public String Clean_Data { private get; public set; }
public String GetData()
{
switch(Phase): {
case 1: return Token_Data; break;
case 2: return Clean_data; break;
default: return Data; break
}
}
Aside from that, anyway, I think the feature called "Type Switching per Row" can help you: https://github.com/StackExchange/Dapper#type-switching-per-row

CEP issue while checkpointing."Could not find id for entry"

When checkpointing is turned on a simple CEP loop pattern
private Pattern<Tuple2<Integer, SimpleBinaryEvent>, ?> alertPattern = Pattern.<Tuple2<Integer, SimpleBinaryEvent>>begin("start").where(checkStatusOn)
.followedBy("middle").where(checkStatusOn).times(2)
.next("end").where(checkStatusOn).within(Time.minutes(5))
I see failures.
SimpleBinaryEvent is
public class SimpleBinaryEvent implements Serializable {
private int id;
private int sequence;
private boolean status;
private long time;
public SimpleBinaryEvent(int id, int sequence, boolean status , long time) {
this.id = id;
this.sequence = sequence;
this.status = status;
this.time = time;
}
public int getId() {
return id;
}
public int getSequence() {
return sequence;
}
public boolean isStatus() {
return status;
}
public long getTime() {
return time;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
SimpleBinaryEvent that = (SimpleBinaryEvent) o;
if (getId() != that.getId()) return false;
if (isStatus() != that.isStatus()) return false;
if (getSequence() != that.getSequence()) return false;
return getTime() == that.getTime();
}
#Override
public int hashCode() {
//return Objects.hash(getId(),isStatus(), getSequence(),getTime());
int result = getId();
result = 31 * result + (isStatus() ? 1 : 0);
result = 31 * result + getSequence();
result = 31 * result + (int) (getTime() ^ (getTime() >>> 32));
return result;
}
#Override
public String toString() {
return "SimpleBinaryEvent{" +
"id='" + id + '\'' +
", status=" + status +
", sequence=" + sequence +
", time=" + time +
'}';
}
}
failure cause:
Caused by: java.lang.Exception: Could not materialize checkpoint 2 for operator KeyedCEPPatternOperator -> Map (1/1).
... 6 more
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Could not find id for entry: SharedBufferEntry(ValueTimeWrapper((1,SimpleBinaryEvent{id='1', status=true, sequence=95, time=1505503380000}), 1505503380000, 0),....
I am sure I have the equals() and hashCode() implemented the way it should be. I have tried the Objects.hashCode too. In other instances I have had CircularReference ( and thus stackOverflow ) on SharedBuffer.toString(), which again points to issues with references ( equality and what not ). Without checkpointing turned on it works as expected. I am running on a local cluster. Is CEP production ready ?
I am using 1.3.2 Flink
Thanks a lot for trying out the library and reporting this!
The library is under active development as more and more features are added to it. The 1.3 was the first release of the library with such rich semantics, so we expect to see 1) how people use it and 2) if there are any bugs. So I would say that it is not 100% production-ready but it is not far.
Now for the problem at hand, I suppose you are using RocksDB for checkpointing, right? The reason I am assuming that is that with RocksDB, at each watermark (in event time) you deserialize the necessary state (e.g. the NFA), process some events and then serialize it again before putting it back in RocksDB.
This is not the case for the filesystem state backend, where you only serialize the state upon checkpointing and you read it and deserialize it only upon recovery. So in this case, given that you said that without checkpointing your job works fine, you would only see this problem only after recovering from a failure.
The root of the problem can be either that equals()/hashcode() is buggy (which does not seem to be the case), or there is a problem on the way we serialize/deserialize the CEP state.
Could you also provide a minimal input sequence of events that produce this to happen? This will be really helpful in order to reproduce the problem.
Thanks a lot,
Kostas

'Invalid attempt to read when no data is present' - exception happens "sometimes" in Entity Framework

I get the above error sometimes during the read. The exception originates from ASP.NET SqlDataReader whenever you try to read data before calling the Read() method. Since EF does all these internally, I am wondering what else can cause this error. could it be network (or) db connectivity?
thanks
Additional Bounty Info (GenericTypeTea):
I've got the same error after upgrading to EF Code First RC (4.1):
"Invalid attempt to read when no data
is present"
This is the code in question:
using (var context = GetContext())
{
var query = from item in context.Preferences
where item.UserName == userName
where item.PrefName == "TreeState"
select item;
// Error on this line
Preference entity = query.FirstOrDefault();
return entity == null ? null : entity.Value;
}
The table structure is as follows:
Preference
{
Username [varchar(50)]
PrefName [varchar(50)]
Value [varchar(max)] Nullable
}
The table is standalone and has no relationships. This is the DbModelBuilder code:
private void ConfigurePreference(DbModelBuilder builder)
{
builder.Entity<Preference>().HasKey(x => new { x.UserName, x.PrefName });
builder.Entity<Preference>().ToTable("RP_Preference");
}
Exactly the same code works perfectly in CTP5. I'm guessing this is an RC bug, but any ideas of how to fix it would be appreciated.
This error occurs when there is a large amount of data in the RC release. The difference between the RC and CTP5 is that you need to specify the [MaxLength] property that contains a large amount of data.
Are you re-using contexts? I would guess this is happening as a result of something you are doing within GetContext
If GetContext() provides a stale context, in which the DataReader is closed/corrupted, I could see the above happening.
I cannot reproduce your problem on EF4.1 RC1.
POCO:
public class Preference
{
public string UserName { get; set; }
public string PrefName { get; set; }
public string Value { get; set; }
}
Context:
public class PreferenceContext : DbContext
{
public DbSet<Preference> Preferences {get;set;}
public PreferenceContext()
: base("Data Source=localhost;Initial Catalog=_so_question_ef41_rc;Integrated Security=SSPI;") {
}
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
ConfigurePreference(modelBuilder);
base.OnModelCreating(modelBuilder);
}
private void ConfigurePreference(DbModelBuilder builder)
{
builder.Entity<Preference>().HasKey(x => new { x.UserName, x.PrefName });
builder.Entity<Preference>().ToTable("RP_Preference");
}
}
My little Console App:
class Program
{
static void Main(string[] args)
{
string userName = "Anon";
for (int i = 0; i < 10000; i++)
{
var p = GetPreference(userName);
}
}
private static string GetPreference(string userName)
{
using (var context = new PreferenceContext())
{
var query = from item in context.Preferences
where item.UserName == userName
where item.PrefName == "TreeState"
select item;
// Error on this line
Preference entity = query.FirstOrDefault();
return entity == null ? null : entity.Value;
}
}
}
I do 10,000 reads, and no error. You will need to post more complete code to continue.
Increase the CommandTimeout on the context.
I had the same issue with EF4 - In my case I was (trying to) return the list of entities within the using{} section. This is the same as you are doing in your question:
return entity == null ? null : entity.Value;
} // end using
I moved the return to after the } and it worked.
I think I had the problem because the code was in a function which had already queried the database in another using block, I suspect the table was locking but not reporting the error, ending the using block before the return released the database lock.
Steve

Resources