INNER JOIN differences with h2 database and postgresql database - database

So I've changed my codes database from h2 to postgresql and I've noticed that the Inner Join call that is used in h2 is not giving the same results when I call it in postgresql. I've done research and after testing, I found out that the left join and other joins work perfectly it's only inner join giving me a different result. So, to get both output csv files to match, would I have to change the whole structure of the table or is there something similiar in postgresql that i'm overlooking?
public void doAllWork(int type, Connection conn, Statement st) {
try {
if (type == 1) {
st.execute("DROP TABLE IF EXISTS COMBINEDDATA;"); //USING DISTINCT TO EXCLUDE DUPLICATE RECORDS
st.execute("ANALYZE");
st.execute("CREATE TABLE COMBINEDDATA AS \n"
+ "SELECT DISTINCT E.DATA1, E.DATA2, E.DATA3, E.DATA4, E.DATA5, E.DATA6, \n"
+ "E.DATA7, E.DATA8, E.DATA9, E.DATA10, E.DATA11, E.DATA12, E.DATA13, E.DATA14, E.DATA15, E.DATA16, E.DATA17, \n"
+ "E.DATA18, E.DATA19, E.DATA21, E.DATA26, E.DATA27, E.DATA28, E.DATA29, \n"
+ "E.DATA30, E.DATA31, E.DATA32, E.DATA34, E.DATA35, E.DATA36, E.DATA37, E.DATA38, \n"
+ " C.CHAIN20, C.CHAIN33, C.CHAIN22, \n "
+ "D.DAT2, D.DAT3, D.DAT4, D.DAT7, D.DAT11, D.DAT9, D.DAT5, \n "
+ "E.DATA39, E.DATA40, E.DATA41 FROM rawData AS E \n"
+ "RIGHT JOIN CHAINDATA AS C \n"
+ "ON E.DATA7 = c.CHAIN2\n"
+ "AND E.DATA11 = c.CHAIN4\n"
+ "AND E.DATA21 = c.CHAIN10\n"
+ "AND E.DATA22 = c.CHAIN11\n"
+ "RIGHT JOIN DATDATA AS D\n"
+ "ON E.DATA7 = D.DAT18\n"
+ "AND E.DATA11 = D.DAT21\n"
+ "AND UCASE(E.DATA6) = UCASE(D.DAT17)\n"
+ "AND UCASE(E.DATA10) = UCASE(D.DAT20)\n"
+ "AND UCASE(E.DATA5) = UCASE(D.DAT16)\n"
+ "AND UCASE(E.DATA9) = UCASE(D.DAT19)\n"
+ "AND E.DATA20 = D.DAT22");
} else if (type == 2) {
st.execute("DROP TABLE IF EXISTS COMBINEDDATA2;");
st.execute("ANALYZE");
st.execute("CREATE TABLE COMBINEDDATA2 AS \n"
+ "SELECT DISTINCT E.DATA1, E.DATA2, E.DATA3, E.DATA4, E.DATA5, E.DATA6, \n"
+ "E.DATA7, E.DATA8, E.DATA9, E.DATA10, E.DATA11, E.DATA12, E.DATA13, E.DATA14, E.DATA15, E.DATA16, E.DATA17, \n"
+ "E.DATA18, E.DATA19, E.DATA21, E.DATA26, E.DATA27, E.DATA28, E.DATA29, \n"
+ "E.DATA30, E.DATA31, E.DATA32, E.DATA34, E.DATA35, E.DATA36, E.DATA37, E.DATA38, \n"
+ " C.CHAIN20, C.CHAIN33, C.CHAIN22, \n "
+ "D.DAT2, D.DAT3, D.DAT4, D.DAT7, D.DAT11, D.DAT9, D.DAT5, \n "
+ "E.DATA39, E.DATA40, E.DATA41 FROM rawData AS E \n"
+ "LEFT JOIN CHAINDATA AS C \n"
+ "ON E.DATA7 = c.CHAIN2\n"
+ "AND E.DATA11 = c.CHAIN4\n"
+ "AND E.DATA21 = c.CHAIN10\n"
+ "AND E.DATA22 = c.CHAIN11\n"
+ "LEFT JOIN DATDATA AS D\n"
+ "ON E.DATA7 = D.DAT18\n"
+ "AND E.DATA11 = D.DAT21\n"
+ "AND UCASE(E.DATA6) = UCASE(D.DAT17)\n"
+ "AND UCASE(E.DATA10) = UCASE(D.DAT20)\n"
+ "AND UCASE(E.DATA5) = UCASE(D.DAT16)\n"
+ "AND UCASE(E.DATA9) = UCASE(D.DAT19)\n"
+ "AND E.DATA20 = D.DAT22");
}
System.out.println("here");
if (type == 1) {
String dir = System.getProperty("user.dir");
st.executeUpdate("CALL CSVWRITE('" + dir + "\\OnlyMatching.csv', 'SELECT * FROM COMBINEDDATA','charset=UTF-8');"); //,
} else if (type == 2) {
String dir = System.getProperty("user.dir");
st.executeUpdate("CALL CSVWRITE('" + dir + "\\AllNonMatching.csv', 'SELECT * FROM COMBINEDDATA2','charset=UTF-8');");
}
} catch (Exception ex) {
Logger.getLogger(RyderCombinerGUI.class.getName()).log(Level.SEVERE, null, ex);
}
}
In the above snippet, the second loop with the left join works the same on h2 and postgresql, but the inner join loop returns something different.
Ex)
This is the output csv file using the h2 database.
And this is the output using postgresql database
Thanks in advance.

Assuming that you run the same ANSI-compliant query, with the same underlying data, in both H2 and Postgres, you should get the same result. There is nothing whatosever different about the behavior of INNER JOIN in either database.
But a quick search for ORDER BY in your code dump revealed that you are not doing any ordering in your queries. I noticed that Postgres coincidentally appears to be sorting on the data1 column, while H2 does not appear to be sorting at all. I suggest that the result sets are identical from the point of view of unsorted sets.
In general, if you expect a cetain ordering in your result set, you need to use ORDER BY in the query which generates that data. So if you add ORDER BY data1 to both queries, I expect the results will appear the same for both H2 and Postgres.

Related

H2 - Field must be in group by list

i get the following exception: Caused by: org.h2.jdbc.JdbcSQLSyntaxErrorException: Field "RECHNUNGEN0_.GRUPPEN_RECHNUNGSJAHR" must be in GROUP BY List
can someone explain whats wrong in the query?
#Query("Select new de.company.jpa.model.RechnungsjahrUndNummerVO(" +
"r.gruppenRechnungsjahr, " +
"r.gruppenRechnungsnummer) " +
"From RechnungEntity r " +
"Where (:vermittlungsnummern is null or r.vermittlungsnummer in :vermittlungsnummern) " +
"and (:statusklassen is null or r.statusklasse in :statusklassen) " +
"and (:gruppenOm is null or r.gruppenOm = :gruppenOm)" +
"group by r.gruppenRechnungsnummer " +
"order by r.gruppenRechnungsjahr")
List<RechnungsjahrUndNummerVO> findRechnungsjahrUndNummernByVermittlungsnummernStatusklasseGruppenOm(List<String> vermittlungsnummern,
List<String> statusklassen,
String gruppenOm);

Apache Flink: What type of record does JDBCInputFormat return?

I am getting an error related to setRowTypeInfo for a JDBCInputFormat. The error is below. Clearly the Tuple2 type of the DataSet doesn't like the RowTypeInfo of the JDBCInputFormat but I can't find anywhere that provides clarification on how to define the format.
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile
(default-compile) on project flink: Compilation failure [ERROR]
/Users/rocadmin/Desktop/flink/flink/src/main/java/svalarms/BatchJob.java:[125,48]
incompatible types: inferred type does not conform to equality
constraint(s) [ERROR] inferred:
org.apache.flink.api.java.tuple.Tuple2
[ERROR] equality constraints(s):
org.apache.flink.api.java.tuple.Tuple2,org.apache.flink.types.Row
[ERROR] [ERROR] -> [Help 1]
DataSet< Tuple2<Integer, Integer> > dbData =
env.createInput(
JDBCInputFormat.buildJDBCInputFormat()
.setDrivername("oracle.jdbc.driver.OracleDriver")
.setDBUrl("jdbc:oracle:thin:#//[ip]:1521/sdmprd")
.setQuery("" +
"SELECT T2.work_order_nbr, T2.work_order_nbr " +
"FROM sdm.work_order_master T2 " +
"WHERE " +
"TO_DATE(T2.date_entered + 19000000,'yyymmdd') >= CURRENT_DATE - 14 " +
"AND T2.W_O_TYPE = 'TC' " +
"AND T2.OFFICE_ONLY_FLG = 'N' " +
"")
.setRowTypeInfo(new RowTypeInfo(BasicTypeInfo.INT_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO))
.finish()
);
A JDBCInputFormat returns records of type Row. Hence, the resulting DataSet should be typed to Row, i.e.,
DataSet<Row> dbData =
env.createInput(
JDBCInputFormat.buildJDBCInputFormat()
.setDrivername("oracle.jdbc.driver.OracleDriver")
.setDBUrl("jdbc:oracle:thin:#//[ip]:1521/sdmprd")
.setQuery(
"SELECT T2.work_order_nbr, T2.work_order_nbr " +
"FROM sdm.work_order_master T2 " +
"WHERE " +
"TO_DATE(T2.date_entered + 19000000,'yyymmdd') >= CURRENT_DATE - 14 " +
"AND T2.W_O_TYPE = 'TC' " +
"AND T2.OFFICE_ONLY_FLG = 'N' "
)
.setRowTypeInfo(Types.ROW(Types.INT, Types.INT))
.finish()
);
got it going
TypeInformation[] fieldTypes = new TypeInformation[] {
BasicTypeInfo.BIG_DEC_TYPE_INFO,
BasicTypeInfo.BIG_DEC_TYPE_INFO
};
RowTypeInfo rowTypeInfo = new RowTypeInfo(fieldTypes);
JDBCInputFormatBuilder inputBuilder =
JDBCInputFormat.buildJDBCInputFormat().setDrivername("oracle.jdbc.driver.OracleDriver").setDBUrl("jdbc:oracle:thin:#//ipaddress:1521/sdmprd").setQuery("" +
"SELECT T2.work_order_nbr , T2.work_order_nbr " +
"FROM sdm.work_order_master T2 " +
"WHERE " +
"TO_DATE(T2.date_entered + 19000000,'yyyymmdd') >= CURRENT_DATE - 14 " +
"AND T2.W_O_TYPE = 'TC' " +
"AND T2.OFFICE_ONLY_FLG = 'N' " +
"").setRowTypeInfo(rowTypeInfo).setUsername(“user”).setPassword(“pass”);
DataSet<Row> source = env.createInput(inputBuilder.finish());

how to create pymol rename loop

I would like to create a loop for changing interactions name in PyMol. But after one selection loop it crashes and doesn't work.
def get_dists(interactions): # interactions=([1,2], [3,4])
for i in interactions:
a = "////" + str(i[0]) + "/C2'"
b = "////" + str(i[1]) + "/C2'"
cmd.distance("(" + a + ")", "(" + b + ")")
for j in range(1, 599):
x = "dist" + "0" + str(j)
y = str(i[0]) + " " + str(i[1])
cmd.set_name(str(x), str(y))
In Pymol the default name of interactions is dist01, 02 , 03.
I want to change these to 1_3, 5_59, 4_8, (interaction between residue).
Your code is totally fine except for one thing: If PyMol doesn't succeed with set_name the whole script is aborted. When you change it to, it should work:
try:
cmd.set_name(str(x), str(y))
except:
print('failed to rename')
Some additional comments:
y = str(i[0]) + " " + str(i[1]) should be y = str(i[0]) + "_" + str(i[1])
this line is probably for padding zeros x = "dist" + "0" + str(j). This is only needed when j is a single digit, otherwise the name of the distance objects is dist20 or dist123
cmd.set_name(str(x), str(y)) can be simplified to cmd.set_name(x, y) since x and y are already strings.

I have custom code that I need to bold for output. What google sheets script can I use?

I have created a customized work order submission form in Forms & Sheets that auto emails a confirmation from each submission (job request form) to create a data trail of vendor activity. Fairly integrated and totally cobbled together by a lot of reading in these forums coupled with a gazillion frustrating moments of trial & error. Novice moving towards "capable" but Im stuck on a piece of code for a triggered confirmation email with random work order generator and email confirmations and toggle based management built in. The code below that I actually need help with is for that triggered confirmation email that sends a confirmation of service, work order #, and also shows everything they originally submitted. The problem is that the code I have is providing the data exactly how and I want it and placement is great, but I need to create visual distinction between the column titles and the variable submission data. Can someone please help me add a bold code to the column titles in line 16 to help create that visual differentiation between columnar "category and submission data?
// This constant is written in column C for rows for which an email
// has been sent successfully.
var EMAIL_SENT = "EMAIL_SENT";
function sendEmails2() {
var sheet = SpreadsheetApp.getActiveSheet();
var startRow = 2; // First row of data to process
var numRows = 1000; // Number of rows to process
// Fetch the range of cells A2:B3
var dataRange = sheet.getRange(startRow, 1, numRows, 27)
// Fetch values for each row in the Range.
var data = dataRange.getValues();
for (var i = 0; i < data.length; ++i) {
var row = data[i];
var emailAddress = row[19];
var message = row[16] + "\n\n" + "Submitted By: " + row[19] + "\n\n" + "Date Submitted: " + row[0] + "\n\n" + row[21] + "\n\n" + "IMPORTANT NOTES FROM CDS: " + row[20] + "\n\n" + "Full Show Services: " + row[3] + "\n\n" + "Event Start Date: " + row[4] + "\n\n" + "Event End Date: " + row[5] + "\n\n" + "Warehouse Locations: " + row[6] + "\n\n" + "Individual Services Requested: " + row[7] + "\n\n" + "Individual Services - Warehouse(s) & Date(s) Requested: " + row[8] + "\n\n" + "Partial Hourly Staffing Details Requested: " + row[9] + "\n\n" + "Requestors Instructions / Comments: " + row[10] + "\n\n" + "Files: " + row[11] + row[12] + "\n\n" + "Thank you for your request. We appreciate your business. CDS Special Events Team ";// Second columnn
var emailSent = row[18];
var subject = row[16];// Third columnvar ss = SpreadsheetApp.getActiveSpreadsheet();
if (emailSent != EMAIL_SENT) { // Prevents sending duplicates
MailApp.sendEmail(emailAddress, subject, message);
sheet.getRange(startRow + i, 19).setValue(EMAIL_SENT);
// Make sure the cell is updated right away in case the script is interrupted
SpreadsheetApp.flush();
}
}
}

Titan graph database too slow with 100000+ vertices with indices how to optimize it?

Here is the indices code:
`
g = TitanFactory.build().set("storage.backend", "cassandra")
.set("storage.hostname", "127.0.0.1").open();
TitanManagement mgmt = g.getManagementSystem();
PropertyKey db_local_name = mgmt.makePropertyKey("db_local_name")
.dataType(String.class).make();
mgmt.buildIndex("byDb_local_name", Vertex.class).addKey(db_local_name)
.buildCompositeIndex();
PropertyKey db_schema = mgmt.makePropertyKey("db_schema")
.dataType(String.class).make();
mgmt.buildIndex("byDb_schema", Vertex.class).addKey(db_schema)
.buildCompositeIndex();
PropertyKey db_column = mgmt.makePropertyKey("db_column")
.dataType(String.class).make();
mgmt.buildIndex("byDb_column", Vertex.class).addKey(db_column)
.buildCompositeIndex();
PropertyKey type = mgmt.makePropertyKey("type").dataType(String.class)
.make();
mgmt.buildIndex("byType", Vertex.class).addKey(type)
.buildCompositeIndex();
PropertyKey value = mgmt.makePropertyKey("value")
.dataType(Object.class).make();
mgmt.buildIndex("byValue", Vertex.class).addKey(value)
.buildCompositeIndex();
PropertyKey index = mgmt.makePropertyKey("index")
.dataType(Integer.class).make();
mgmt.buildIndex("byIndex", Vertex.class).addKey(index)
.buildCompositeIndex();
mgmt.commit();`
Here is the search for vertices and then add vertex with 3 edges on 3GHz 2GB RAM pc. It does 830 vertices in 3 hours and I have 100,000 data its too slow. The code is below:
for (Object[] rowObj : list) {
// TXN_ID
Iterator<Vertex> iter = g.query()
.has("db_local_name", "Report Name 1")
.has("db_schema", "MPS").has("db_column", "txn_id")
.has("value", rowObj[0]).vertices().iterator();
if (iter.hasNext()) {
vertex1 = iter.next();
logger.debug("vertex1=" + vertex1.getId() + ","
+ vertex1.getProperty("db_local_name") + ","
+ vertex1.getProperty("db_schema") + ","
+ vertex1.getProperty("db_column") + ","
+ vertex1.getProperty("type") + ","
+ vertex1.getProperty("index") + ","
+ vertex1.getProperty("value"));
}
// TXN_TYPE
iter = g.query().has("db_local_name", "Report Name 1")
.has("db_schema", "MPS").has("db_column", "txn_type")
.has("value", rowObj[1]).vertices().iterator();
if (iter.hasNext()) {
vertex2 = iter.next();
logger.debug("vertex2=" + vertex2.getId() + ","
+ vertex2.getProperty("db_local_name") + ","
+ vertex2.getProperty("db_schema") + ","
+ vertex2.getProperty("db_column") + ","
+ vertex2.getProperty("type") + ","
+ vertex2.getProperty("index") + ","
+ vertex2.getProperty("value"));
}
// WALLET_ID
iter = g.query().has("db_local_name", "Report Name 1")
.has("db_schema", "MPS").has("db_column", "wallet_id")
.has("value", rowObj[2]).vertices().iterator();
if (iter.hasNext()) {
vertex3 = iter.next();
logger.debug("vertex3=" + vertex3.getId() + ","
+ vertex3.getProperty("db_local_name") + ","
+ vertex3.getProperty("db_schema") + ","
+ vertex3.getProperty("db_column") + ","
+ vertex3.getProperty("type") + ","
+ vertex3.getProperty("index") + ","
+ vertex3.getProperty("value"));
}
vertex4 = g.addVertex(null);
vertex4.setProperty("db_local_name", "Report Name 1");
vertex4.setProperty("db_schema", "MPS");
vertex4.setProperty("db_column", "amount");
vertex4.setProperty("type", "indivisual_0");
vertex4.setProperty("value", rowObj[3].toString());
vertex4.setProperty("index", i);
vertex1.addEdge("data", vertex4);
logger.debug("vertex1 added");
vertex2.addEdge("data", vertex4);
logger.debug("vertex2 added");
vertex3.addEdge("data", vertex4);
logger.debug("vertex3 added");
i++;
g.commit();
}
Is there anyway to optimize this code?
For completeness, this question was answered in the Aurelius Graphs mailing list:
https://groups.google.com/forum/#!topic/aureliusgraphs/XKT6aokRfFI
Basically:
build/use a real composite index:
mgmt.buildIndex("by_local_name_schema_value", Vertex.class).addKey(db_local_name).addKey(db_schema).addKey(value).buildComposite();
don't call g.commit() after each loop cycle, instead do something
like this: if (++1%10000 == 0) g.commit()
turn on storage.batch-loading if not already doing so
if all you can throw at cassandra is 2G of RAM consider using BerkleyDB. Cassandra prefers 4G of RAM minimum and would probably like "more"
I don't know the nature of your data, but can you pre-sort it and use BatchGraph as described in the Powers of Ten - Part I blog post and in the wiki - Using BatchGraph would prevent you from having to maintain the transaction described in number 2 above.

Resources