Generating TPCH-SF300 and SF1000 data - database

I am trying to generate SF300 and SF1000 TPCH data on Databricks. However, my scripts have been running for over 24hrs now and I am guessing I did something wrong.
I followed the instructions the instructions on: https://github.com/databricks/spark-sql-perf. Then I used the notebook(tpcds_datagen.scala) in their repository to generate data. Of course, I modified the parameters to change TPC-DS to TPC-H. But it's extremely slow.
Could someone suggest a quicker way and help me out? Thanks in advance.

Related

Excel not pulling all records from view

One of my users is trying to pull all the data from one of my views in SQL Server... She's about 200 records short. What gives? I don't even know where to start troubleshooting this. She's not using any filtering on the data, she's trying to pull it all using Excel's 'External Data' features.
I don't want to be spoon-fed, but I don't even know where to start looking to troubleshoot this. If someone could point me in the right direction to begin my investigation, I'd greatly appreciate it.

Need help stringing together database processes

I need some help from those with more knowledge than I posses. I am currently trying to figure out how to get real time data from a database.
I need to be able to find the company info from the most recent licensees. So the search parameter I'm using is 2016-05-10T00:00:00.000
The full string together from the API and the search parameter can be found directly at this link:
https://www.hurl.it/?method=GET&url=https%3A%2F%2Fdata.wa.gov%2Fresource%2Fv8vv-gqqs.json&headers=%7B%22X-App-Token%22%3A[%22bjp8KrRvAPtuf809u1UXnI0Z8%22]%7D&args=%7B%22licenseeffectivedate%22%3A[%222004-07-14T00%3A00%3A00.000%22]%7D
So I'm looking to retrieve the most recently added accounts in order to verify 1. the license is active 2. the license number the contractor gives matches what the website says. I would like to figure out how to automate this so that when the newest licenses are added I'll know, and they will be extracted/downloaded into excel.
If anyone can help with this I would appreciate it very much. I also have more questions about using databases if any of you are experts in the field.
Once again, thank you!
Clay
Since your goal is to get this data into Excell, have you considered using something like our OData support instead? You could structure your query in Excel PowerBI and it'd automatically refresh the data.
Another option would be to use our CSV output type with an Excel web query. I use the IMPORTDATA(...) function in Google Sheets, which is very similar.

How do you set-up a flat file database?

Let's say there's this database which is flat-file, stores all information in JSON Format
https://github.com/Codetana/IkarusDB
I download it, But I dont know how to set it up.. Like I'm finding my way around on how to mess with different types of database..
Can someone please assist me as to how do I set this up and show it's usage?
Any help would be much appreciated.

What is the best solution to manage data from csv file and do some logic and action

Intro
I have learn basic programming in the past at school (vb) so i understand the logic behind an application and the way it think.
I started this week to learn python because ... I would like to be able to build what i need without having to smash my head in the keyboard.
In the mean time i would like to resolve my problem and i'm asking to all of you in case someone have seen something somewhere.
I have try to search but not sure how to ask so respond was not relevant.
My Question
I know i can do this in excel but i'm looking for another way if possible.
Does someone knows if there existe an application online or on a mac that give me the ability to:
1- Import data from a csv file and add it in a database. So every day i will log 100 line of data.
2- After that i would like to manage condition related on data store in store in file 2 and 3.
3- That will generated a file 4 where i can see the only important case i have to take care for the day.
I imagine a little bit Access from microsoft but i don't remember if i was able to add parameter on the data i will visualise or condition.
What i'm trying to achieve
I have to process a lot of data manually right now and i'm trying to find a way to take out automatically the recurring problem from a list that i receive every day And there is no other way for me than build my own validation process.
Thank for your help. I'm new but i will try my best to bring something is this community (Python studies lv1, VB -15ans) . For now i'm building my startup And my expertise is more in technical skill on security products and technical support.
In case i'm not alone that would like to be able to do some basic thing like i was needed without having to program from scratch a database program i found those 2 solution yet:
https://www.zoho.com/creator/
and
http://www.ifreetools.com/

How to interpret JMeter result while doing Database testing

I have recently started working with jMeter. And I was doing database stress testing for that. I have added the required drivers in the lib folder. And my jMeter is connected to database. And it works fine.
But the problem now I am facing is how to interpret those results. I just tested only one SQL which is doing SELECT on one table. Below is the screenshot of my various tabs in JMeter.
This below screenshot shows how many threads (10) I am running and Ramp up time.
This below screenshot shows me JDBC Connection Configuration settings, which I am not able to understand as well. It will be great if anyone can throw some thoughts on this what does it mean corresponding to number of threads I am running in my above picture.
This Below screenshot shows the result in a Summary report which I am again not able to interpret. What's the best way to interpret these results? Any thoughts on this will be of great help.
This Below screenshot shows the result which I am again not able to interpret. What's the best way to interpret these results? I was looking for how much time it is taking to execute that one single Select SQL. And this tab shows me lot of information but not sure how to interpret those. Any thoughts on this will be of great help.
Can anyone help me understanding these results? Thanks for the help.
You should use one of these:
Response Time Graph
Aggregate Graph
Look also at jmeter-plugins project:
http://code.google.com/p/jmeter-plugins/

Resources