Good morning,
I have a problem with this query:
SELECT
P.txt_nome AS Pergunta,
IP.nome AS Resposta,
COUNT(*) AS Qtd
FROM
tb_resposta_formulario RF
INNER JOIN formularios F ON
F.id_formulario = RF.id_formulario
INNER JOIN tb_pergunta P ON
P.id_pergunta = RF.id_pergunta
INNER JOIN tb_resposta_formulario_combo RFC ON
RFC.id_resposta_formulario = RF.id_resposta_formulario
INNER JOIN itens_perguntas IP ON
IP.id_item_pergunta = RFC.id_item_pergunta
WHERE
RF.id_formulario = 2
GROUP BY
P.txt_nome,
IP.nome
This is the actual result of this query:
|Pergunta| Resposta |Qtd|
|Produto |Combo 1MB | 3 |
|Produto |Combo 2MB | 5 |
|Produto |Combo 4MB | 1 |
|Produto |Combo 6MB | 1 |
|Produto |Combo 8MB | 4 |
|Região |MG | 3 |
|Região |PR | 2 |
|Região |RJ | 3 |
|Região |SC | 1 |
|Região |SP | 5 |
These are the results I was expecting:
|Produto | Região |Qtd|
|Combo 1MB | MG | 3 |
|Combo 2MB | SP | 5 |
|Combo 4MB | SC | 1 |
|Combo 6MB | RJ | 1 |
|Combo 8MB | PR | 2 |
I am using the PIVOT and UNPIVOT operators but the result is not satisfactory.
Has anyone already faced this situation before? Do you have any insight you can offer?
I already analyzed these links:
SQL Server 2005 Pivot on Unknown Number of Columns
Transpose a set of rows as columns in SQL Server 2000
SQL Server 2005, turn columns into rows
Pivot Table and Concatenate Columns
PIVOT in sql 2005
Att,
Pelegrini
The "obvious" answer is: because the query is incorrect. We really know nothing about the table structure and what you're trying to achieve.
Concerning at least one very basic problem in your query: you're expecting the columns |Produto | Região |Qtd| in your response, yet the query unambiguously selects the columns Pergunta, Reposta and Qtd, which coincides with the result you're getting.
How well are you acquainted with SQL at all? It may be worth it to read an introductory text. I'd suggest this as a good introduction. (Uses Oracle, but the principles are the same)
Related
Using MS SQL Server, I want to:
Get data from my application
Perform preprocessing
Update a table
Steps 1 and 3 are giving me problems.
Simplifying the problem to its essence, my existing data looks like:
+--------+-------+
| Item | Usage |
+--------+-------+
| Part A | 10 |
| Part B | 15 |
| Part C | 8 |
+--------+-------+
and an example of the source data is:
+--------+
| Item |
+--------+
| Part A |
| Part B |
| Part B |
| Part B |
| Part A |
| Part A |
+--------+
My over all plan is to import the data into a CTE, do the preprocessing, then do an update.
Regarding getting the data, since "INTO" is not allowed in a CTE, how can I get the source data into the CTE. Or some other approach not using a CTE better?
Preprocessing is straightforward. Here is my SQL:
WITH MyData (Item, NewUsage)
AS
(
<Somehow get the data>
SELECT Item, Count(*) as NewUsage
FROM Items
GROUP BY Item
)
UPDATE Items
SET Usage = Usage + b.NewUsage
FROM Items as a JOIN MyData as b ON a.Item = b.Item;
The update is updating all the rows in Items by 1 instead of using the NewUsage column.
How do I get my data (into the CTE?) and how to write the SQL so it works properly?
i have a database with 10 tables. all 10 tables data is stored in different different locations. out of 10 tables, some are managed tables and some are external tables.
some tables location is /apps/hive/warehouse/
some tables location is /warehouse/hive/managed/
some tables location is /warehouse/hive/external/
is there any way to find out total size of the database with out go into each location and find the size, any alternative?
The below query when run in the Hive Metastore DB would help you in getting the total size occupied by all the tables in Hive. Note: The results you get for this query would be 100% correct only if all the tables are having their stats updated. [This can be checked in the table - TABLE_PARAMS in Metastore DB that I have also mentioned below (How it works?.b)]
Steps:
1. Login into Hive Metastore DB and use the database that is used by hive. hive1 by default.
2. Once done, you can execute the below query to get the total size of all the tables in Hive in bytes. The query takes the sum of total size of all the Hive tables based on the statistics of the tables.
MariaDB [hive1]> SELECT SUM(PARAM_VALUE) FROM TABLE_PARAMS WHERE PARAM_KEY="totalSize";
+------------------+
| SUM(PARAM_VALUE) |
+------------------+
| 30376289388684 |
+------------------+
1 row in set (0.00 sec)```
3. Remember, the result derived above is for only one replication. 30376289388684 x 3 is the actual size in HDFS including the replication.
How it works?
a. Selecting a random table in Hive with id 5783 and name - test12345 from the TBLS table in Hive Metastore DB.
MariaDB [hive1]> SELECT * FROM TBLS WHERE TBL_ID=5783;
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+----------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | LINK_TARGET_ID |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+----------------+
| 5783 | 1555060992 | 1 | 0 | hive | 0 | 17249 | test12345 | MANAGED_TABLE | NULL | NULL | NULL |
+--------+-------------+-------+------------------+-------+-----------+-------+-----------+---------------+--------------------+--------------------+----------------+
1 row in set (0.00 sec)
b. Checking the different parameters of the table in Hive Metastore table - TABLE_PARAMS for the same Hive table with id - 5783. The totalSize record indicates the total size occupied by this table in HDFS for one of its replica. The next point (c) which is the hdfs du -s can be compared to check this.
The param COLUMN_STATS_ACCURATE with the value true says the table's statistics property is set to true. You can check for tables with this value as false to see if there are any tables in Hive those might have missing statistics.
MariaDB [hive1]> SELECT * FROM TABLE_PARAMS
-> WHERE TBL_ID=5783;
+--------+-----------------------+-------------+
| TBL_ID | PARAM_KEY | PARAM_VALUE |
+--------+-----------------------+-------------+
| 5783 | COLUMN_STATS_ACCURATE | true |
| 5783 | numFiles | 1 |
| 5783 | numRows | 1 |
| 5783 | rawDataSize | 2 |
| 5783 | totalSize | 324 |
| 5783 | transient_lastDdlTime | 1555061027 |
+--------+-----------------------+-------------+
6 rows in set (0.00 sec)
c. hdfs du -s output of the same table from HDFS. 324 and 972 are the sizes of one and three replicas of the table data in HDFS.
324 972 /user/hive/warehouse/test12345
Hope this helps!
I have a person table which keeps some personal info. like as table below.
+----+------+----------+----------+--------+
| ID | name | motherID | fatherID | sex |
+----+------+----------+----------+--------+
| 1 | A | NULL | NULL | male |
| 2 | B | NULL | NULL | female |
| 3 | C | 1 | 2 | male |
| 4 | X | NULL | NULL | male |
| 5 | Y | NULL | NULL | female |
| 6 | Z | 5 | 4 | female |
| 7 | T | NULL | NULL | female |
+----+------+----------+----------+--------+
Also I keep marriage relationships between people. Like:
+-----------+--------+
| HusbandID | WifeID |
+-----------+--------+
| 1 | 2 |
| 4 | 5 |
| 1 | 5 |
| 3 | 6 |
+-----------+--------+
With these information we can imagine the relationship graph. Like below;
Question is: How can I get all connected people by giving any of them's ID.
For example;
When I give ID=1, it should return to me 1,2,3,4,5,6.(order is not important)
Likewise When I give ID=6, it should return to me 1,2,3,4,5,6.(order is not important)
Likewise When I give ID=7, it should return to me 7.
Please attention : Person nodes' relationships (edges) may have loop anywhere of graph. Example above shows small part of my data. I mean; person and marriage table may consist thousands of rows and we do not know where loops may occur.
Smilar questions asked in :
PostgreSQL SQL query for traversing an entire undirected graph and returning all edges found
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=118319
But I can't code the working SQL. Thanks in advance. I am using SQL Server.
From SQL Server 2017 and Azure SQL DB you can use the new graph database capabilities and the new MATCH clause to answer queries like this, eg
SELECT FORMATMESSAGE ( 'Person %s (%i) has mother %s (%i) and father %s (%i).', person.userName, person.personId, mother.userName, mother.personId, father.userName, father.personId ) msg
FROM dbo.persons person, dbo.relationship hasMother, dbo.persons mother, dbo.relationship hasFather, dbo.persons father
WHERE hasMother.relationshipType = 'mother'
AND hasFather.relationshipType = 'father'
AND MATCH ( father-(hasFather)->person<-(hasMother)-mother );
My results:
Full script available here.
For your specific questions, the current release does not include transitive closure (the ability to loop through the graph n number of times) or polymorphism (find any node in the graph) and answering these queries may involve loops, recursive CTEs or temp tables. I have attempted this in my sample script and it works for your sample data but it's just an example - I'm not 100% it will work with other sample data.
I've got two tables (threads and user_threads). Essentially, a thread is an object with a name, and then a user_thread links a user to a thread. This was to illustrate a many-to-many relationship.
Given this setup, Im trying to figure out how to get threads between exclusively two users.
Threads looks like this
|------------------------|
| id | name |
| 1 | group1 |
| 2 | test group |
|------------------------|
user_threads looks like this
|---------------------------------|
| id | user | thread |
|---------------------------------|
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 1 | 2 |
| 4 | 2 | 2 |
| 5 | 3 | 2 |
|---------------------------------|
So the issue that I'm running into is this - Given user 1 and user 2, I would like to return the mutual thread that is exclusive to them.
Querying with 1 and 2 should return thread 1. I've tried using a self join and mixing exclude, but SQL is not in my primary skill set. Is there any way to do this or do I need to restructure my tables?
One way is to select the threads that have both users using a JOIN and then excluding all those that have other users in them also.
SELECT ut1.thread FROM user_threads ut1
JOIN user_threads ut2 ON ut1.thread=ut2.thread
WHERE ut1."user" = 1 AND ut2."user" = 2
AND NOT EXISTS
(SELECT 1 FROM user_threads WHERE thread=ut1.thread AND "user" NOT IN (ut1."user", ut2."user"))
SQL Fiddle
I am using version 3.0.3, and running my queries in the shell.
I have ~58 million record nodes with 4 properties each, specifically an ID string, a epoch time integer, and lat/lon floats.
When I run a query like profile MATCH (r:record) RETURN count(r); I get a very quick response:
+----------+
| count(r) |
+----------+
| 58430739 |
+----------+
1 row
29 ms
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+--------------------------+----------------+------+---------+-----------+--------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+--------------------------+----------------+------+---------+-----------+--------------------------------+
| +ProduceResults | 7644 | 1 | 0 | count(r) | count(r) |
| | +----------------+------+---------+-----------+--------------------------------+
| +NodeCountFromCountStore | 7644 | 1 | 0 | count(r) | count( (:record) ) AS count(r) |
+--------------------------+----------------+------+---------+-----------+--------------------------------+
Total database accesses: 0
The Total database accesses: 0 and NodeCountFromCountStore tells me that neo4j uses a counting mechanism here that avoids iterating over all the nodes.
However, when I run profile MATCH (r:record) WHERE r.time < 10000000000 RETURN count(r);, I get a very slow response:
+----------+
| count(r) |
+----------+
| 58430739 |
+----------+
1 row
151278 ms
Compiler CYPHER 3.0
Planner COST
Runtime INTERPRETED
+-----------------------+----------------+----------+----------+-----------+------------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Variables | Other |
+-----------------------+----------------+----------+----------+-----------+------------------------------+
| +ProduceResults | 1324 | 1 | 0 | count(r) | count(r) |
| | +----------------+----------+----------+-----------+------------------------------+
| +EagerAggregation | 1324 | 1 | 0 | count(r) | |
| | +----------------+----------+----------+-----------+------------------------------+
| +NodeIndexSeekByRange | 1752922 | 58430739 | 58430740 | r | :record(time) < { AUTOINT0} |
+-----------------------+----------------+----------+----------+-----------+------------------------------+
Total database accesses: 58430740
The count is correct, as I chose a time value larger than all of my records. What surprises me here is that Neo4j is accessing EVERY single record. The profiler states that Neo4j is using the NodeIndexSeekByRange as an alternative method here.
My question is, why does Neo4j access EVERY record when all it is returning is a count? Are there no intelligent mechanisms inside the system to count a range of values after seeking the boundary/threshold value within the index?
I use Apache Solr for the same data, and returning a count after searching an index is extremely fast (about 5 seconds). If I recall correctly, both platforms are built on top of Apache Lucene. While I don't know much about that software internally, I would assume that the index support is fairly similar for both Neo4j and Solr.
I am working on a proxy service that will deliver results in a paginated form (using the SKIP n LIMIT m technique) by first getting a count, and then iterating over results in chunks. This works really well for Solr, but I am afraid that Neo4j may not perform well in this scenario.
Any thoughts?
The later query does a NodeIndexSeekByRange operation. This is going through all your matched nodes with the record label to look up the value of the node property time and does a comparison if its value is less than 10000000000.
This query actually has to get every node and read some info for comparison, and that's the reason why it is much slower.