I have multiple datasets and sheets of data of these(excel).
It's train time tables
depart a a a a
arrive t t t t
Train z1 z2 z3 z4
station
a
as 6:30:00 7:47:00 8:18:00 9:45:00
b
bs 6:33:00 7:50:00 8:21:00 9:48:00
c 6:35:00 7:52:00 8:23:00 9:50:00
cs 6:35:30 7:52:30 8:23:30 9:50:30
I try to put the data into collection or dictionary
To pull out data (mainly time) by train and station, or by time and station etc.
For dictionary, Seems like I need nested dictionary
For collection, may I loop through all item by criteria?
Can anyone give me a hint what method to use for getting data(time or station or train)?
Any advice would be appreciated.
Thank you
Whether to put them in a Collection or a Dictionary depends on how you intend to retrieve them afterwards.
Start by describing the data of a single record using public fields in a class module:
Option Explicit
Public TrainID As String
Public DepartureStationID As String
Public DepartureUTC As Date
Public ArrivalStationID As String
Public ArrivalUTC As Date
Now you can write a function that can parse a worksheet Range into a single instance of that class, then a procedure that runs this function for each interesting Range to parse (I'm not sure I'm reading the data correctly but that would be the general idea).
If you plan to iterate them all and run some method in a loop, use a Collection (and a For Each loop).
If you plan to retrieve them by ID, then you could use the TrainID or DepartureStationID field as a dictionary key, and have each "item" be a Collection of instances of that class (lest you'll run into duplicate key issues).
If you plan to parse a bunch of data sources and aggregate them into a queryable dataset, you only need a Collection to store the objects you're parsing; you'll be iterating that collection when you dump these objects onto a worksheet/table for pivoting and PowerQuery-ing =)
Related
I would like to avoid costly repeated data base queries in Anylogic. I have seen the following thread in Stack Overflow What is the fastest way to look up continuous data on Anylogic (Java, SQL) where a simple three step answer is provided but I'm not sure what the second point of the three actually means:
Save all rows as instances of that class at model start-up into a map - you can use Origin/Destination as the key (use Anylogic's Pair object) and the class instance as the value
I have created a class that takes as inputs the information from each column of my database. I would now like to save each row as an instance of that class - is there an easy way to do this? I may be missing something simple as I'm new to Anylogic.
I'm also unsure of how to create a mapping, if anyone could add more detail to point 2 above I'd be very grateful!
this is effectively the best advice, you created the class, which is a great step, but now, one element of that class, will be used as the key... for example the name... for instance if your class has firstName as one variable and lastName as another variable, you will use a string that is the concatenation of firstName and lastName as your key. Of course any key is fine, assuming that it is unique for all your table. Also an integer as an id is ok too.
create a collection of type LinkedHashMap
Create a class (you did that)
Your collection will take as the key a String (first + last name) and as the value of the elment the class...
now, when you read your database you will have something like this:
for(Tuple t : yourQueryResults){
YourClass yc=new YourClass(t.get(db.var1),t.get(db.var2));
String totalName=t.get(db.first_name)+"_"+t.get(db.last_name);
yourCollection.put(totalName,yc);
}
Now every time you want to find someone with the a name, for example "John Doe", instead of making a query, you will do
yourCollection.get("John_Doe").theVarYouWant;
if you use an id instead of the name, you can set an int as the key, and then you will just do yourCollection.get(theId).theVarYouWant
I could need some help with a Anylogic Model.
Model (short): Manufacturing scenario with orders move in a individual route. The workplaces (WP) are dynamical created by simulation start. Their names, quantity and other parameters are stored in a database (excel Import). Also the orders are created according to an import. The Agent population "order" has a collection routing which contains the Workplaces it has to stop in the specific order.
Target: I want a moveTo block in main which finds the next destination of the agent order.
Problem and solution paths:
I set the destination Type to agent and in the Agent field I typed a function agent.getDestination(). This function is in order which returns the next entry of the collection WP destinationName = routing.get(i). With this I get a Datatype error (while run not compiling). I quess it's because the database does not save the entrys as WP Type but only String.
Is there a possiblity to create a collection with agents from an Excel?
After this I tried to use the same getDestination as String an so find via findFirst the WP matching the returned name and return it as WP. WP targetWP = findFirst(wps, w->w.name == destinationName);
Of corse wps (the population of Workplaces) couldn't be found.
How can I search the population?
Maybe with an Agentlink?
I think it is not that difficult but can't find an answer or a solution. As you can tell I'm a beginner... Hope the description is good an someone can help me or give me a hint :)
Thanks
Is there a possiblity to create a collection with agents from an Excel?
Not directly using the collection's properties and, as you've seen, you can't have database (DB) column types which are agent types.1
But this is relatively simple to do directly via Java code (and you can use the Insert Database Query wizard to construct the skeleton code for you).
After this I tried to use the same getDestination as String an so find via findFirst the WP matching the returned name and return it as WP
Yes, this is one approach. If your order details are in Excel/the database, they are presumably referring to workplaces via some String ID (which will be a parameter of the workplace agents you've created from a separate Excel worksheet/database table). You need to use the Java equals method to compare strings though, not == (which is for comparing numbers or whether two objects are the same object).
I want a moveTo block in main which finds the next destination of the agent order
So the general overall solution is
Create a population of Workplace agents (let's say called workplaces in Main) from the DB, each with a String parameter id or similar mapped from a DB column.
Create a population of Order agents (let's say called orders in Main) from the DB and then, in their on-startup action, set up their collection of workplace IDs (type ArrayList, element class String; let's say called workplaceIDsList) using data from another DB table.
Order probably also needs a working variable storing the next index in the list that it needs to go to (so let's say an int variable nextWorkplaceIndex which starts at 0).
Write a function in Main called getWorkplaceByID that has a single String argument id and returns a Workplace. This gets the workplace from the population that matches the ID; a one-line way similar to yours is findFirst(workplaces, w -> w.id.equals(id)).
The MoveTo block (which I presume is in Main) needs to move the Order to an agent defined by getWorkplaceByID(agent.workplaceIDsList.get(nextWorkplaceIndex++)). (The ++ bit increments the index after evaluating the expression so it is ready for the next workplace to go to.)
For populating the collection, you'd have two tables, something like the below (assuming using strings as IDs for workplaces and orders):
orders table: columns for parameters of your orders (including some String id column) other than the workplace-list. (Create one Order agent per row.)
order_workplaces table: columns order_id, sequence_num and workplace_id (so with multiple rows specifying the sequence of workplace IDs for an order ID).
In the On startup action of Order, set up the skeleton query code via the Insert Database Query wizard as below (where we want to loop through all rows for this order's ID and do something --- we'll change the skeleton code to add entries to the collection instead of just printing stuff via traceln like the skeleton code does).
Then we edit the skeleton code to look like the below. (Note we add an orderBy clause to the initial query so we ensure we get the rows in ascending sequence number order.)
List<Tuple> rows = selectFrom(order_workplaces)
.where(order_workplaces.order_id.eq(id))
.orderBy(order_workplaces.sequence_num.asc())
.list();
for (Tuple row : rows) {
workplaceIDsList.add(row.get(order_workplaces.workplace_id));
}
1 The AnyLogic database is a normal relational database --- HSQLDB in fact --- and databases only understand their own specific data types like VARCHAR, with AnyLogic and the libraries it uses translating these to Java types like String. In the user interface, AnyLogic makes it look like you set the column types as int, String, etc. but these are really the Java types that the columns' contents will ultimately be translated into.
AnyLogic does support columns which have option list types (and the special Code type column for columns containing executable Java code) but these are special cases using special logic under the covers to translate the column data (which is ultimately still a string of characters) into the appropriate option list instance or (for Code columns) into compiled-on-the-fly-and-then-executed Java).
Welcome to Stack Overflow :) To create a Population via Excel Import you have to create a method and call Code like this. You also need an empty Population.
int n = excelFile.getLastRowNum(YOUR_SHEET_NAME);
for(int i = FIRST_ROW; i <= n; i++){
String name = excelFile.getCellStringValue(YOUR_SHEET_NAME, i, 1);
double SEC_PARAMETER_TO_READ= excelFile.getCellNumericValue(YOUR_SHEET_NAME, i, 2);
WP workplace = add_wps(name, SEC_PARAMETER_TO_READ);
}
Now if you want to get a workplace by name, you have to create a method similar to your try.
Functionbody:
WP workplaceToFind = wps.findFirst(w -> w.name.equals(destinationName));
if(workplaceToFind != null){
//do what ever you want
}
I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks
Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn
A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.
in an app i have an entity that contains a list of other entities (let's say an event holding a list of assigned employees)
using objectify - i need to find all the events a particular employee is assigned to.
is there a basic way to filter a query if it contains the parameter - kind of the opposite of the query in
... quick pseudocode
findAll(Employee employee) {
...
return ofy.query(Event.class).filter("employees.contains", employee).list();
}
any help would be greatly appreciated
i tried just doing filter("employees", employee) after seeing this http://groups.google.com/group/objectify-appengine/browse_thread/thread/77ba676192c08e20 - but unfortunately this returns me an empty list
currently i'm doing something really inefficient - going through each event, iterating through the employees and adding them to a new list if it contains the given employee just to have something that works - i know this is not right though
let me add one thing,
the above query is not actually what it is, i was just using that because i did not think this would make a difference.
The Employee and Events are in the same entity group with Business as a parent
the actual query i am using is the following
ofy.query(Event.class).ancestor(businessKey).filter("employees", employee).list();
unfortunately this is still returning an empty list - does having the ancestor(key) in there mess up the filter?
solution, the employees field was not indexed correctly.
I added the datastore-indexes file to create a composite index, but was testing originally on a value that I added before the employees field was indexed, this was something stupid i was doing - simply having an index on the "business" field and the "employees" field fixed everything. the datastore-indexes file did not appear to be necessary, after deleting it and trying again everything worked fine.
Generally, you do this one of two ways:
Put a property of Set<Key<Employee>> on the Event
or
Put a property of Set<Key<Event>> on the Employee
You could also create a relationship entity, but if you're just doing filtering on values with relatively low counts, usually it's easier to just put the set property on one entity or the other.
Then filter as you describe:
ofy.query(Event.class).filter("employees", employee).list()
or
ofy.query(Employee.class).filter("events", event).list()
The list property should hold a Keys to the target entity. If you pass in an entity to the filter() method, Objectify will understand that you want to filter by the key instead.
Example :
/***************************************************/
#Entity
#Cache
public class News {
#Id Long id;
String news ;
#Index List<Long> friend_list = new ArrayList<Long>();
// My friends who can see my news , exemele : friend_list.add(id_f1); friend_list.add(id_f2); friend_list.add(id_f3);
//To make an operation on "friend_list", it is obligatory to index it
}
/*************************************************/
public News(Long id_f){
List<Long> friend_id = new ArrayList<Long>();
friend_id.add(id_f);
Query<Nesw> query = ofy().load().type(News.class).filter("friend_list in",friend_id).limit(limit);
//To filter a list, just after the name of the field you want to filter, add "IN".
//here ==> .filter("friend_list in",friend_id);
// if friend_list contains "id_friend" ==> the query return value
.........
}
I've been using SQL Server to store historical time series data for a couple hundred thousand objects, observed about 100 times per day. I'm finding that queries (give me all values for object XYZ between time t1 and time t2) are too slow (for my needs, slow is more then a second). I'm indexing by timestamp and object ID.
I've entertained the thought of using somethings a key-value store like MongoDB instead, but I'm not sure if this is an "appropriate" use of this sort of thing, and I couldn't find any mentions of using such a database for time series data. ideally, I'd be able to do the following queries:
retrieve all the data for object XYZ between time t1 and time t2
do the above, but return one date point per day (first, last, closed to time t...)
retrieve all data for all objects for a particular timestamp
the data should be ordered, and ideally it should be fast to write new data as well as update existing data.
it seems like my desire to query by object ID as well as by timestamp might necessitate having two copies of the database indexed in different ways to get optimal performance...anyone have any experience building a system like this, with a key-value store, or HDF5, or something else? or is this totally doable in SQL Server and I'm just not doing it right?
It sounds like MongoDB would be a very good fit. Updates and inserts are super fast, so you might want to create a document for every event, such as:
{
object: XYZ,
ts : new Date()
}
Then you can index the ts field and queries will also be fast. (By the way, you can create multiple indexes on a single database.)
How to do your three queries:
retrieve all the data for object XYZ
between time t1 and time t2
db.data.find({object : XYZ, ts : {$gt : t1, $lt : t2}})
do the above, but return one date
point per day (first, last, closed to
time t...)
// first
db.data.find({object : XYZ, ts : {$gt : new Date(/* start of day */)}}).sort({ts : 1}).limit(1)
// last
db.data.find({object : XYZ, ts : {$lt : new Date(/* end of day */)}}).sort({ts : -1}).limit(1)
For closest to some time, you'd probably need a custom JavaScript function, but it's doable.
retrieve all data for all objects for
a particular timestamp
db.data.find({ts : timestamp})
Feel free to ask on the user list if you have any questions, someone else might be able to think of an easier way of getting closest-to-a-time events.
This is why databases specific to time series data exist - relational databases simply aren't fast enough for large time series.
I've used Fame quite a lot at investment banks. It's very fast but I imagine very expensive. However if your application requires the speed it might be worth looking it.
There is an open source timeseries database under active development (.NET only for now) that I wrote. It can store massive amounts (terrabytes) of uniform data in a "binary flat file" fashion. All usage is stream-oriented (forward or reverse). We actively use it for the stock ticks storage and analysis at our company.
I am not sure this will be exactly what you need, but it will allow you to get the first two points - get values from t1 to t2 for any series (one series per file) or just take one data point.
https://code.google.com/p/timeseriesdb/
// Create a new file for MyStruct data.
// Use BinCompressedFile<,> for compressed storage of deltas
using (var file = new BinSeriesFile<UtcDateTime, MyStruct>("data.bts"))
{
file.UniqueIndexes = true; // enforces index uniqueness
file.InitializeNewFile(); // create file and write header
file.AppendData(data); // append data (stream of ArraySegment<>)
}
// Read needed data.
using (var file = (IEnumerableFeed<UtcDateTime, MyStrut>) BinaryFile.Open("data.bts", false))
{
// Enumerate one item at a time maxitum 10 items starting at 2011-1-1
// (can also get one segment at a time with StreamSegments)
foreach (var val in file.Stream(new UtcDateTime(2011,1,1), maxItemCount = 10)
Console.WriteLine(val);
}
I recently tried something similar in F#. I started with the 1 minute bar format for the symbol in question in a Space delimited file which has roughly 80,000 1 minute bar readings. The code to load and parse from disk was under 1ms. The code to calculate a 100 minute SMA for every period in the file was 530ms. I can pull any slice I want from the SMA sequence once calculated in under 1ms. I am just learning F# so there are probably ways to optimize. Note this was after multiple test runs so it was already in the windows Cache but even when loaded from disk it never adds more than 15ms to the load.
date,time,open,high,low,close,volume
01/03/2011,08:00:00,94.38,94.38,93.66,93.66,3800
To reduce the recalculation time I save the entire calculated indicator sequence to disk in a single file with \n delimiter and it generally takes less than 0.5ms to load and parse when in the windows file cache. Simple iteration across the full time series data to return the set of records inside a date range in a sub 3ms operation with a full year of 1 minute bars. I also keep the daily bars in a separate file which loads even faster because of the lower data volumes.
I use the .net4 System.Runtime.Caching layer to cache the serialized representation of the pre-calculated series and with a couple gig's of RAM dedicated to cache I get nearly a 100% cache hit rate so my access to any pre-computed indicator set for any symbol generally runs under 1ms.
Pulling any slice of data I want from the indicator is typically less than 1ms so advanced queries simply do not make sense. Using this strategy I could easily load 10 years of 1 minute bar in less than 20ms.
// Parse a \n delimited file into RAM then
// then split each line on space to into a
// array of tokens. Return the entire array
// as string[][]
let readSpaceDelimFile fname =
System.IO.File.ReadAllLines(fname)
|> Array.map (fun line -> line.Split [|' '|])
// Based on a two dimensional array
// pull out a single column for bar
// close and convert every value
// for every row to a float
// and return the array of floats.
let GetArrClose(tarr : string[][]) =
[| for aLine in tarr do
//printfn "aLine=%A" aLine
let closep = float(aLine.[5])
yield closep
|]
I use HDF5 as my time series repository. It has a number of effective and fast compression styles which can be mixed and matched. It can be used with a number of different programming languages.
I use boost::date_time for the timestamp field.
In the financial realm, I then create specific data structures for each of bars, ticks, trades, quotes, ...
I created a number of custom iterators and used standard template library features to be able to efficiently search for specific values or ranges of time-based records.