Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
In an application I'm designing, the user has to be able to specify rather complex event scheduling (continuous time-block vs. daily time-blocks, exception date/times, recurrence patterns etc.)
Does anyone know of a good design page for such a thing online? For example, I was highly impressed with this page's description of how to do database audit trails, and would love something similar.
Current thinking
My database would contain of the following tables: Events and ScheduleItems
The relationship would be Events {1 -- 0..*} ScheduleItems
Events would have the following columns: eventId, schedulePattern
ScheduleItems would have the following columns: eventId, startDateTime, endDateTime
The front end controls would allow the user to specify general rules (daily/continuous, includes/excludes weekends, start/end date/time etc.). If they are not satisfied with the existing controls, they could then opt to display and manually tweak the generated "time-blocks"
On saving the event schedule...
If only the provided controls were used (no tweaking) I would save their selections as a pattern in the Events table (i.e. "sd:2010-04-28;st:09:20:00;ed:2010-05-12;et:17:20:00;r:2w[M-Th];z:EST" etc.)
If the user manually tweaked the generated time-blocks, I would save each individual time-block within the the ScheduleItems table and have Events.schedulePattern be given a special code ("MANUAL" or something).
Pros
I should be able to save > 90% of the events through the pattern field directly, and be able to handle any other corner cases through the "brute-force" ScheduleItems table. Since some of the handled cases include events that can go on for months (which would otherwise result in a very large number of time-blocks), having it in one line is rather attractive.
Cons
This is a fairly complex solution; any other system requiring this data would need to be able to handle parsing the schedulePattern as well as knowing when to fetch ScheduleItems.
The "schedulePattern", in a mature way, uses Cron format to store the users' schedule is a good idea for running task.
This format is simple but sophisticated. In relational database, there would be some performance benefit if you seperate every "entry" of Cron to a column of table with suitable indexes.
The effort, however, is the translation between this format and user interface. And the raw data to which user input should be recorded originally.
I would design two kinds of table, one for raw data inputted by user, another for running task for schedule.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm designing a database for tracking stock transactions & real-time portfolio holdings. The system should be able to pull historical position on a custom time-period basis (e.g. end of day holdings). My current design includes a transaction table and a real-time position table, when each transaction is booked into the database it will automatically trigger the update of the real-time position table. I'm planning to use PostgreSQL and the TimescaleDB extension for the transaction table.
I'm kind of confused about how to implement the function of historical holdings, as the historical holdings as a certain timestamp t can be derived by aggregating the transactions with timestamp <=t together. Should I use a separate table to record the historical holdings or simply do the aggregating? I am also considering use binary files to store snapshots of real-time positions at end of each day to support the historical position look-up.
I have little experience with Database design thus any advice/help is appreciated.
This question is lacking detail, so my answer is general.
One thing you could do is have two tables: one for the detailed data and one for the aggregation. Then you calculate one record for the latter from the former every day. Use partitioning for the detail table to be able to get rid of old data easily.
You can also use the same (partitioned) table for both, if the data structure allows it. Then you calculate a new aggregated record every day, drop the partition for that day and extend the partition boundary for the “aggregated” partition.
Consider carefully if you need the TimescaleDB extension. If it offers benefits you know you need, go for it. If not, do without it. It is always nice to have few dependencies. Just because you are storing time series data doesn't mean you need TimescaleDB.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
We are moving a logging table from DB2 to Oracle, in here we log exceptions and warnings from many applications. With the move I want to accomplish 2 main things among others: less space consumption (to add more rows because tablespaces are kinda small) while not increasing the server processing usage too much (cpu usage increases our bill).
In DB2 we basically have a table that holds text strings.
In Oracle I am taking the approach of normalizing the tables for columns with duplicated data (event_type, machine, assemblies, versno). I have a procedure that receives multiple parameters and I query the reference tables to get the IDs.
This is the Oracle table description.
One of the feedback I have so far from a co-worker is that I will not necessary reduce table space since indexes take space and my solution might end up using more than what saving all string uses. We don't know if this is true, does anyone have more information on this?
I am taking the right approach?
Will this approach help me accomplish my 2 main goals?
Additional feedback is welcome and appreciated.
The approach of using surrogate keys (numerical ID) and dimension tables (containing the ID key and the description) is popular in both OLPT and data warehouse. IMO the use for logging is a bit strange.
The problem is, the the logging component should not have much assumption about the data to be logged - it is vital to be able to log the exeptional cases.
So in case that you can't map the host name to ID (it was misspelled or simple not configured), it would be not much helpfull to log unknownhost or to suppress the logging at all.
Concerned the storage you can indeed save a lot storing IDs istead of long strings, but (dependent on data) you may get similar effect using table compression.
I think the most important thing about logging is that it works all the time. It must be bullet-proof, because if the logger fails you lose the information you need to diagnose problems in your system. Even worse would be to fail business transactions because the logger abended.
There is a strong inverse relationship between the complexity of a logging implementation and its reliability. The more moving parts it has, the more things there are to go wrong. So, while normalization is a good thing for business data it introduces unwelcome risk. Also, look-ups increase the overhead of writing a log message and that is also undesirable.
" tablespaces are kinda small"
Increasing the chance of failure is not a good trade-off. Ask for more space.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
How to make such a SCANNER by making use of Access Database Table?
A lot of stock market websites give us the ability for scanning the stocks which fulfill particular criteria / conditions etc. based on different parameters. Just as shown in these snapshots –
http://finviz.com/screener.ashx?v=351&f=cap_smallover,ta_pattern_channelup2,ta_rsi_nob50,ta_sma20_pb&ft=4
[IMG]http://i58.tinypic.com/517x5i.png[/IMG]
[IMG]http://i62.tinypic.com/27ymp6u.png[/IMG]
I want to make something similar for my personal offline use. I have already got the required data for all the fields in my access database in one single table itself.
What are the various options available for creating such a Scanner, either inside access itself or in any other 3rd party platform. Because if it is not possible to design something similar inside Access, then I am also open to the idea of using access data table as a back end and using any 3 rd party app as a front end if that can do this work very efficiently.
I hope that it will give me the ability to apply 10+ filters easily at one go, just by using the drop down type of feature, without having to switch back into the query design mode again and again for changing the filter parameters.
My current method of using a simple query in which I apply all these filters, is taking me a lot of time and effort, for changing the filter values again and again. So the current method is extremely inefficient. Therefor I am looking for an easier way to get the final query output data, which shows the filtered results and in which I can very easily change the filter conditions quickly.
Any ideas in this regard are welcome.
Thanks a lot
You need to create a dynamic form and query. The best and easiest way is to create a form with all of your dropdowns on it. Then, in your query, in the Condition section for each field, you set it equal to
Forms!MyForm!MyDropdown1
Obviously you have to change the reference name to the actual name of your form and combo, but you get the idea. This makes your query completely dynamic and you'll never have to go in it and edit the conditions (aka filter values).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
We are looking to create a software that receive log files from a high number of devices. We are looking around 20 million rows a day with log (2kb / each for each log line).
I have developed a lot of software but never with this large quantity of input data. The data needs to be searchable, sortable, groupable by source IP, dest IP, alert level etc.
It should be combining similiar log entries (occured 6 times etc..)
Any ideas and suggestions on what type of design, database and general thinking around this would be much appreciated.
UPDATE:
Found this presentation, seems like a similar scenario, any thoughts on this?
http://skillsmatter.com/podcast/cloud-grid/mongodb-humongous-data-at-server-density
I see a couple of things you may want to consider.
1) message queue - to drop a log line and let other part (worker) of the system to take care of it when time permits
2) noSQL - reddis, mongodb,cassandra
I think your real problem would be in querying the data , not in storing.
Also you probably would need a scalable solution.
Some of noSql databases are distributed you may need that.
Check this out, it might be helpful
https://github.com/facebook/scribe
A web search on "Stackoverflow logging device data" yielded dozens of hits.
Here is one of them. The question asked may not be exactly the same as yours, but you should get dozens on intersting ideas from the responses.
I'd base many decisions on how users most often will be selecting subsets of data -- by device? by date? by sourceIP? You want to keep indexes to a minimum and use only those you need to get the job done.
For low-cardinality columns where indexing overhead is high yet the value of using an index is low, e.g. alert-level, I'd recommend a trigger to create rows in another table to identify rows corresponding to emergency situations (e.g. where alert-level > x) so that alert-level itself would not have to be indexed, and yet you could rapidly find all high-alert-level rows.
Since users are updating the logs, you could move handled/managed rows older than 'x' days out of the active log and into an archive log, which would improve performance for ad-hoc queries.
For identifying recurrent problems (same problem on same device, or same problem on same ip address, same problem on all devices made by the same manufacturer, or from the same manufacturing run, for example) you could identify the subset of columns that define the particular kind of problem and then create (in a trigger) a hash of the values in those columns. Thus, all problems of the same kind would have the same hash value. You could have multiple columns like this -- it would depend on your definition of "similar problem" and how many different problem-kinds you wanted to track, and on the subset of columns you'd need to enlist to define each kind of problem. If you index the hash-value column, your users would be able to very quickly answer the question, "Are we seeing this kind of problem frequently?" They'd look at the current row, grab its hash-value, and then search the database for other rows with that hash value.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I know all details about how entity groups work in GAE's storage, but yesterday (at the App Engine meetup in Palo Alto), as a presenter was explaining his use of entity groups, it struck me that I've never really made use of them in my own GAE apps, and I don't recall seeing them used in open-source GAE apps I've used.
So, I suspect I've just been overlooking (not noticing or remembering) such examples because I'm simply not used to them enough to immediately connect "use of entity group" to "kind of application problems being solved" -- and I think I should remedy that by studying such sources with this goal in mind, focusing on what problem the EG use is solving (i.e., why the app works with it, but wouldn't work or wouldn't work well without it).
Can anybody suggest good URLs to such code? (Essays would also be welcome, if they focus on application-level problem solving, but not if, like most I've seen, they just focus on the details of how EGs work!-).
The main use of entity groups is to provide the means to update more than one entity in a transaction.
If you haven't had to use them, count your blessings. Either you have been designing your data models such that no two entities ever need to be updated at the same time in order to remain consistent, or else you do need them but you've gotten lucky :)
Imagine that I have an Invoice entity type, and a LineItem entity type. One Invoice can have multiple LineItems associated with it. My Invoice entity has a field called LastUpdated. Any time a LineItem gets added to my Invoice, I want to store the current date in the LastUpdated field.
My update function might look like this (pseudocode)
invoice.lastUpdated = now()
lineitem = new lineitem()
invoice.put()
lineitem.put()
What happens if the invoice put() succeeds and the lineitem put() fails? My invoice date will show that something was updated, but the actual update (the new LineItem) wouldn't be there. The solution is to put both puts() inside a transaction.
An alternative solution would be to use a query to find the date of the last inserted LineItem, instead of storing this data in the lastUpdated field. But that would involve fetching both the Invoice and all the LineItems every time you wanted to know the last time a lineitem was added, costing you precious datastore quota.
EDIT TO RESPOND TO POSTER's COMMENTS
Ah. I think I understand your confusion. The above paragraphs establish why transactions are important. But you say you still don't care about Entity groups, because you don't see how they relate to transactions. But if you are using db.run-in-transaction, then you are using entity groups, perhaps without realizing it! Every transaction involves one and only one entity group, and any given transaction can only affect entities belonging to the same group. see here
"All datastore operations in a
transaction must operate on entities
in the same entity group".
What kind of stuff are you doing in your transactions? There are plenty of good reasons to use transactions with just one Entity, which by default is in its own Entity Group. But sometimes you need to keep 2 or more entities in sync, like in my example above. If the Invoice and the LineItem Entities are not in the same entity group, then you could not wrap the modifications to them in a db.run-in-transaction call. So anytime you want to operate on 2 or more entities transactionally you need to first make sure they are in the same group. Hope that makes it more clear why they are useful.
I've used them here. I'm setting my customer object as the parent of the map markers. This creates an entity group for each customer and gives me two advantages:
Getting the markers of a customer is much faster, because they're stored physically with the customer object.(On the same server, probably on the same disk)
I can change the markers for a customer in a transaction. I suspect the reason transactions require all objects that they operate on to be in the same group is because they're stored in the same physical location, which makes it easier to implement a lock on the data.
I've used them here in this simple wiki system. The latest version of a page is always a root entity and past versions have the latest version as ancestor. The copy operation is done in a transaction to keep the version consistency and avoid losing a version in case of concurrency.