I Have 1000000 records inside BigQuery. what is the best way to fetch data from DB and process using goLang? I'm getting timeout issue if fetch all the data without limit. already I increase the limit to 5min, but it takes more than 5 min.
I want to do some streaming call or pagination implementation, But i don't know in golang how I do.
var FetchCustomerRecords = func(req *http.Request) *bigquery.RowIterator {
ctx := appengine.NewContext(req)
ctxWithDeadline, _ := context.WithTimeout(ctx, 5*time.Minute)
log.Infof(ctx, "Fetch Customer records from BigQuery")
client, err := bigquery.NewClient(ctxWithDeadline, "ddddd-crm")
q := client.Query(
"SELECT * FROM Something")
q.Location = "US"
job, err := q.Run(ctx)
if err != nil {
log.Infof(ctx, "%v", err)
}
status, err := job.Wait(ctx)
if err != nil {
log.Infof(ctx, "%v", err)
}
if err := status.Err(); err != nil {
log.Infof(ctx, "%v", err)
}
it, err := job.Read(ctx)
if err != nil {
log.Infof(ctx, "%v", err)
}
return it
}
You can read the table contents directly without issuing a query. This doesn't incur query charges, and provides the same row iterator as you would get from a query.
For small results, this is fine. For large tables, I would suggest checking out the new storage api, and the code sample on the samples page.
For a small table or simply reading a small subset of rows, you can do something like this (reads up to 10k rows from one of the public dataset tables):
func TestTableRead(t *testing.T) {
ctx := context.Background()
client, err := bigquery.NewClient(ctx, "my-project-id")
if err != nil {
t.Fatal(err)
}
table := client.DatasetInProject("bigquery-public-data", "stackoverflow").Table("badges")
it := table.Read(ctx)
rowLimit := 10000
var rowsRead int
for {
var row []bigquery.Value
err := it.Next(&row)
if err == iterator.Done || rowsRead >= rowLimit {
break
}
if err != nil {
t.Fatalf("error reading row offset %d: %v", rowsRead, err)
}
rowsRead++
fmt.Println(row)
}
}
you can split your query to get 10x of 100000 records and run in multiple goroutine
use sql query like
select * from somewhere order by id DESC limit 100000 offset 0
and in next goroutine select * from somewhere order by id DESC limit 100000 offset 100000
I've got a function which is doing simple sqlx.Select(...):
func filteredEntities(ex sqlx.Ext, baseStmt string, res interface{}, f entities.UserProjectTimeFilter) error {
finalStmt, args, err := filteredStatement(baseStmt, f)
if err != nil {
return err
}
err = sqlx.Select(ex, res, finalStmt, args...)
return errors.Wrap(err, "db query failed")
}
Most of queries are going well (10-20ms), but one of them, such as:
"\n\tSELECT * FROM Productivity\n\t WHERE user_id IN (?) AND (tracker_id, project_id) IN ((?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?,?),(?...
The query was taken from logs and not formatted. It is executing more than 10 seconds (sometimes even more than 20). However, from a dump of the database, it is executing less than 1 second. The selection has less than 100 rows. A query even doesn't have "joins" and etc. Just simple "select" with conditions.
Which might be slower? All time was measured by default time.Now() and time.Since().
The table has these columns:
+---------+-------------+---------+-------------+----------+------------+------------+------------+--------+
| id | planning_id | user_id | activity_id | duration | project_id | tracker_id | created_at | useful |
+---------+-------------+---------+-------------+----------+------------+------------+------------+--------+
I have the following query:
SELECT ... ,
grade as [grade],
grade as [grade2]
FROM dbo.[qc_runs] r
JOIN ...
WHERE ...
I send it to MS SQL Server 2014 from my Go code and want to get back data (I am using github.com/denisenkom/go-mssqldb driver). However, I can read first grade value (type nvarchar(max)), but second one arrives empty! These are the same table fields, just duplicated. If I delete first grade value from query and leave just one, it will still arrive empty! The column is described as following:
[grade] [nvarchar](max) NULL,
SQL Management Studio executes this query just fine, both grade values are not empty, but Go code doesn't!
UPDATE #1
Go code:
evaluations, err := db.Query("EXEC qa.dbo.sp_get_evaluation_list ?", uid)
if err != nil {
if err == sql.ErrNoRows {
return list, nil
}
return list, err
}
// Get column names
colNames, err := evaluations.Columns()
if err != nil {
log.Logf("Failed to get columns: %v", err)
return list, err
}
// Result is your slice string.
readCols := make([]interface{}, len(colNames))
// Read data
for evaluations.Next() {
writeCols := make([]string, len(colNames))
for i := range writeCols {
readCols[i] = &writeCols[i]
}
evaluations.Scan(readCols...)
for i := range writeCols {
fmt.Println(colNames[i], ": ", writeCols[i])
}
...
}
Output:
...
grade : <some text from DB>
grade2 :
I'm not a go programmer. But you need to name those fields differently. How could your code possible discern between the first and second [grade]?
SELECT ... ,
grade as [grade],
grade as [grade2]
FROM dbo.[qc_runs] r
JOIN ...
WHERE ...
Given that you're getting an empty value on the second read, I suspect that the go driver (and the data reader you're using from it) is one-way only.
Problem solved, evaluations.Scan(readCols...) returned error err sql: Scan error on column index 3: unsupported Scan, storing driver.Value type <nil> into type *string, and left the rest of array blank
select count(tblVV.VNme) as total,
tblvV.VNme
from tblVV
inner join tblRV
on tblVV.MID=tblRV.ID
inner join tblRe
on tblRV.RID=tblRe.RID
where tblRe.StartDate>= '2016-07-01 00:00:00' and
tblRe.EndDate<= '2016-07-31 23:59:59' and
tblRe.Reg= 'uk' and
tblRV.RegNo='BR72' and
tblVV.VNme <>''
group by tblVV.VNme
For the above query I get:
total Vame
1 DDSB
11 MV
The above SQL query shows me correct data so now i try to convert above query to linq query
[WebMethod]
public static string GetVo(string RegNo)
{
string data = "[";
try
{
Ts1 DB = new Ts1();
var re = (from vehvoila in DB.tblVV
join regveh in DB.tblRV on vehvoila.MID equals regveh.ID
join reg in DB.tblReg on regveh.RID equals reg.RID
where regveh.RegNo == RegNo &&
vehvoila.Vame != ""
group vehvoila by vehvoila.Vame into g
select new
{
VNme = g.Key,
cnt = g.Select(t => t.Vame).Count()
}).ToList();
if (re.Any())
{
data += re.ToList().Select(x => "['" + x.Vame + "'," + x.cnt + "]")
.Aggregate((a, b) => a + "," + b);
}
data += "]";
}
linq query show me return data like this
[['DDSB',1],['DPSB',1],['DSB',109],['MV',39],['PSB',1]]
Whereas I want data this
[['DDSB',1],['MV',11]]
Now the data which return SQL query is correct so how I correct linq query
Note: forget fromdate,todate,region parameter in SQL query . because I have page in which I put dropdown and fromdate and todate picker and there is button so when I select values i.e. UK, and dates then data is display in table then when I click on any row in table then I want to get this data in data +=”]”;
actually above linq query work behind clicking on row
total Vame
1 DDSB
11 MV
You can write it all like this:
Ts1 db = new Ts1();
var result = (from vehvoila in db.tblVV
join regveh in db.tblRV on vehvoila.MID equals regveh.ID
join reg in db.tblReg on regveh.RID equals reg.RID
where reg.StartDate >= new DateTime(2016, 7, 1) &&
reg.EndDate < new DateTime(2016, 8, 1) &&
reg.Reg == "uk" &&
regveh == "BR72" &&
vehvoila != ""
group vehvoila by vehvoila.Vame into g
select $"[{g.Key},{g.Count()}]");
var data = $"[{string.Join(",", result)}]";
Because you only use the result for the creation of the string in the select I just return the string formatted for a single item and then later used string.Join instead of using the .Aggregate - I think a bit cleaner
The $"{}" syntax is the C# 6.0 string interpolation
In the condition of the EndDate I decided to use < instead of the <= with the change of the date - At least in oracle when you partition the table by date it is better for performance - maybe also in sql server
Without string interpolation:
Ts1 db = new Ts1();
var result = (from vehvoila in db.tblVV
join regveh in db.tblRV on vehvoila.MID equals regveh.ID
join reg in db.tblReg on regveh.RID equals reg.RID
where reg.StartDate >= new DateTime(2016, 7, 1) &&
reg.EndDate < new DateTime(2016, 8, 1) &&
reg.Reg == "uk" &&
regveh == "BR72" &&
vehvoila != ""
group vehvoila by vehvoila.Vame into g
select new { Key = g.Key, Count = g.Count()})
.AsEnumerable()
.Select(g => string.Format("[{0},{1}]",g.Key, g.Count));
var data = string.Format("[{0}]",string.Join(",", result));