TDengine steam computing does not support write data to the table that already has data - tdengine

When I use TDengine database streaming computing, I found a problem,that is when I want to change my rule ,it is not supported, or I create a new stream computing write to this table, it is not supported either.
How should I handle the above situation?
TDengine version:3.0.2.2

TDengine database's streaming computing will support write to the table which already have data .
it would be released in Feb 2023 .

Related

Can I use Flink's filesystem connector as lookup tables?

Flink 1.13.2 (Flink SQL) on Yarn.
A bit confused - I found two (as I understand) different specifications of Filesystem connector (Ververica.com vs ci.apache.org):
https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/overview/#supported-connectors — Filesystem is "Bounded and Unbounded Scan, Lookup"
https://docs.ververica.com/user_guide/sql_development/connectors.html#packaged-connectors — Only JDBC marked as usable for Lookup.
Can I use Filesystem connector (csv) for creating lookup (dimension) tables to enrich Kafka events table? If yes - how it's possible using Flink SQL?
(I've tried simple left joins with FOR SYSTEM_TIME AS OF a.event_datetime - it's works in test environment with small amount of Kafka events, but in production I get GC overhead limit exceeded error. I guess that's because of not broadcasting small csv tables to worker nodes. In Spark I used to solve these type problems using related hints.)
The filesystem connector shouldn't be used as a lookup because lookup access needs indexed access. We should update the documentation for this.
The lookup(dimension) table needs to implement the LookupTableSource interface, currently only hbase, jdbc, and hive are implemented in the Flink 1.3 version

Compression and distribution of structured data in postgresql

I am creating a table which is very huge(in TB) in postgresql db and like greenplum I like to specify the compression size and distribute the data randomly
But in the postgresql documentation, I can't find any clause for compression
Any idea how can I achieve the compression and random distribution of the data in postgresql
Thanks in advance
For compression, there is only TOAST. That compresses data automatically, but only for large rows (exceeding 2000 bytes). There is no way to compress the whole table as such.
I am not sure what "random data distribution" in a table is, but if you want to distribute that table data across several devices, you have to define tablespaces for them and use hash partitioning with a partition on each tablespace.
For compression, PostgreSQL will do this automatically for you when they go above a certain size. Compression is applied at each individual data value though - not at the full table level. Meaning that if you have a billion rows that are very narrow, they won't get compressed. Or if you have very many columns each with only a small value in it, they won't get compressed. Details about this scheme in the manual.
If you need it on the full table level, a solution is to create a TABLESPACE for those tables that you want to be compressed and point it to a compressed filesystem. As long as the filesystem still obeys fsync() and standard POSIX semantics, this should be perfectly safe. Details about this in the manual.
PostgreSQL is not natively distributed. If you want a distributed version of PostgreSQL where data can be spread across several nodes, and have those nodes use replication for high availability, there are some 3rd party options like:
Postgres-XL - a forked version of Postgres designed to be distributed and has some other features like MPP.
Compression does not exists in PostGreSQL. There is no way to do that. The only exception is that LOBs (Large OBjects) are systematically compressed as TOAST which is clearly inappropriate to many LOBs (pictures like .jpg, .png...).
Read my papers about PostGreSQL limitations compare to MS SQL Server.
http://mssqlserver.fr/postgresql-vs-sql-server-mssql-part-3-very-extremely-detailed-comparison/
particularly § "17 – Data and index compression"

Copy database between two PostgreSQL servers

Is there some tool to copy database from one PostgreSQL to other on the fly NOT INVOLVING BACKUPS/RESTORES? The tool which automatically keeps database structure on slave server in sync with master server. Probably the tool with differential mode looking at records' primary keys.
I could use replication, but the problem is that it ties two servers in a permanent manner, and I do not need a continuous replication. I need to start it manually. It should terminate when finishes.
I had started to write my own .NET tool using reflection etc, but thought that may be somebody has already written such a tool.
Replication is the term you are looking for.
There are many variations on how to do this. Start by reading the manual and then google a little.
If the whole-system replication built-in to recent versions of PostgreSQL isn't to your taste then try searching for "slony" or "pg-pool" or "bucardo" (among others).

realtime system database use

Given a .NET environment with Windows CE, can you persist thousands of records per second in a local database (SQL Server 2008 - standard or CE).
What are the performance issues with persisting realtime instrument data in a database versus a log file?
SQL Server 2008 standard is more than capable of those insertion rates PROVIDED you have hardware capable of supporting it.
The question you really need to be asking is do I require the ability to search the captured data quickly?
This SO answer might be of interest: What does database query and insert speed depend on?
The number (and width) of indexes on a table will obviously have an impact on insertion rate.
If you are considering open-source, then MySQL is often cited as being able to handle high volumes.

What is Multiversion Concurrency Control (MVCC) and who supports it? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Recently Jeff has posted regarding his trouble with database deadlocks related to reading. Multiversion Concurrency Control (MVCC) claims to solve this problem. What is it, and what databases support it?
updated: these support it (which others?)
oracle
postgresql
Oracle has had an excellent multi version control system in place since very long(at least since oracle 8.0)
Following should help.
User A starts a transaction and is updating 1000 rows with some value At Time T1
User B reads the same 1000 rows at time T2.
User A updates row 543 with value Y (original value X)
User B reaches row 543 and finds that a transaction is in operation since Time T1.
The database returns the unmodified record from the Logs. The returned value is the value that was committed at the time less than or equal to T2.
If the record could not be retreived from the redo logs it means the database is not setup appropriately. There needs to be more space allocated to the logs.
This way the read consitency is achieved. The returned results are always the same with respect to the start time of transaction. So within a transaction the read consistency is achieved.
I have tried to explain in the simplest terms possible...there is a lot to multiversioning in databases.
PostgreSQL's Multi-Version Concurrency Control
As well as this article which features diagrams of how MVCC works when issuing INSERT, UPDATE, and DELETE statements.
The following have an implementation of MVCC:
SQL Server 2005 (Non-default, SET READ_COMMITTED_SNAPSHOT ON)
http://msdn.microsoft.com/en-us/library/ms345124.aspx
Oracle (since version 8)
MySQL 5 (only with InnoDB tables)
PostgreSQL
Firebird
Informix
I'm pretty sure Sybase and IBM DB2 Mainframe/LUW do not have an implementation of MVCC
Firebird does it, they call it MGA (Multi Generational Architecture).
They keep the original version intact, and add a new version that only the session using it can see, when committed the older version is disabled, and the newer version is enabled for everybody(the file piles-up with data and needs regular cleanup).
Oracle overwrites the data itself, and uses a rollback segments/undo tablespaces for other sessions and to rollback.
XtremeData dbX supports MVCC.
In addition, dbX can make use of SQL primitives implemented in FPGA hardware.
SAP HANA also uses MVCC.
SAP HANA is a full In-Memory Computing System, so MVCC costs for select is very low... :)
Here is a link to the PostgreSQL doc page on MVCC. The choice quote (emphasis mine):
The main advantage to using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.
This is why Jeff was so confounded by his deadlocks. A read should never be able to cause them.
SQL Server 2005 and up offer MVCC as an option; it isn't the default, however. MS calls it snapshot isolation, if memory serves.
MVCC can also be implemented manually, by adding a version number column to your tables, and always doing inserts instead of updates.
The cost of this is a much larger database, and slower selects since each one needs a subquery to find the latest record.
It's an excellent solution for systems that require 100% auditing for all changes.
MySQL also uses MVCC by default if you use InnoDB tables:
http://dev.mysql.com/doc/refman/5.0/en/innodb-multi-versioning.html
McObject announced in 11/09 that it has added an optional MVCC transaction manager to its eXtremeDB embedded database:
http://www.mcobject.com/november9/2009
eXtremeDB, originally developed as an in-memory database system (IMDS), is now available in editions with hybrid (in-memory/on-disk) storage, High Availability, 64-bit support and more.
There's a good explanation of MVCC -- with diagrams -- and some performance numbers for eXtremeDB in this article, written by McObject's co-founder and CEO, in RTC Magazine:
http://www.rtcmagazine.com/articles/view/101612
Clearly MVCC is increasingly beneficial as an application scales to include many tasks executing on multiple CPU cores.
DB2 version 9.7 has a licensed version of postgress plus in it. This means that this feature (in the right mode) supports this feature.
Berkeley DB also supports MVCC.
And when BDB storage engine is used in MySQL, MySQL also supports MVCC.
Berkeley DB is a very powerful, customizable fully ACID conform DBMS. It supports several different methods for indexing, master-slave replication, can be used as a pure key value store with it's own dynamic API or queried with SQL if wanted. Worth taking a look at.
Another document oriented DBMS embracing MVCC would be CouchDB. MVCC here also is a big plus for the built in peer-to-peer replication.
From http://vschart.com/list/multiversion-concurrency-control/
Couchbase,
OrientDB,
CouchDB,
PostgreSQL,
Project Voldemort,
BigTable,
Percona Server,
HyperGraphDB,
Drizzle,
Cloudant,
IBM DB2,
InterSystems Caché,
InterBase

Resources