Developers with C# , VBA, and Python want to send the data into database - sql-server

I have a question, I have so much confusing. I am recent graduate, started working in a company there lots of old data in text files, so I have organised all these data by using Python and i have generated a excel file. so for the situation good.
The problem, for new incoming data of sensors from different developers
Most of the guys using VB scripting , people test the data and saving data in text files by using delimited (, and |).
some other guy developing projects in C#, these people got some different kind of text data with different delimiters(: and ,)
The question I have to set up the database, how can i take all these developers data , automatically into the data base, for these text data I have created column names and structure for it.
Do i need to develop individual interface for all these developers or is there any common interface i have to focus or where to start is still lacking me.
I hope people out there understood my situation.
Best regards
Vinod

Since you are trying to take data from different sources, and you are using SQL Server, you can look into SQL Server Integration Services (SSIS). You'd basically create SSIS packages, one for each incoming data source and transfer the data into one homogeneous database.

These type of implementations and design patterns really depend on your company and the development environment you are working in.
As for choosing the right database, this also depends on the type of data, velocity, volume and common questions your company would like to answer using that data.
Some very general guidelines:
Schematic databases differ from "NoSQL" databases
Searching within a text is different than searching simpler primitive types and you have to know what you will be doing!
If you are able to, converting both VB scripting & C# text files to popular data structure files (i.e JSON, CSV (comma delimited text file is the same)) might help you with storing the data in almost any database.
There is no right or wrong in implementing a specific interface for each data supplier, it depends on the alternatives at stake & your coding style
Consult with others & ready Case Studies!
Hope that's helpfull

You should consider using Excel’s Power Query tool. You can intake almost any file type (including text files) from multiple locations, cleanse and transform the data easily (you can deliminate as you please), and storage is essentially unlimited as opposed to a regular excel spread sheet which maxes at around 1.4 million rows.

I don't know your company's situation but PDI (Pentaho Data Integration tool, is good for transformations of data. The Community Edition is free!
However, if you generated them into an excel file Power Query might be a better tool for this project, if your data is relatively small.

Related

Loading different types of files into a database

At my work, we receive thousands of emails per day with attached files (xlsx, csv, xml, html, pdf, etc). Those emails get processed by a program and the files get downloaded and filtered into different folders depending on the sender.
The data in those files is then loaded into our MS SQL Server db by a proprietary software for which we are charged a hefty sum of money each year.
Now, I'm sure this is a super common process for enterprises, so there are probably some open source tools that we could use to replace this software.
How do most people do this? With individual scripts? What would be the correct way to do it?
Thank you very much!
EDIT: all the email attachments get converted to xlsx before being moved to their respective folders.
The 'correct' way would be to not use Excel as the interchange format - its fuzzy notion of data types (e.g. numbers vs. text) creates many issues.
Still, Excel feeds are a fact of life, so it's crucial to have a comprehensive and reliable way of dealing with multiple feeds that sometimes fail. I like this approach; the example uses my company's commercial cross-platform .NET ETL library, but I've also used the same approach with several other ETL tools previously.
Cheers,
Kristian

Local File-type Datastore for Winforms

I hope my title is not misleading, but what I'm looking for is a file-type datastore for a Winforms app (.NET 3.5) that will allow me to:
store/retrieve strings and decimal numbers (only);
open and save it to a file with a custom extension (like: *.abc); and
have some relational data mapping.
Basically, the application should allow me to create a new file with my custom extension (I am au fait with handling file-associations), and then save into this file data based on functionality determined by the application itself. Similar to when you would create a Word document, except that this file should also be able to store relational data.
An elementary example:
the "file" represents a person's car
every car will have standard data that will apply to it - Make, Model, Year, Color, etc.
However, each car may have self-determined categories associated with it such as: Mechanic History, In-Car Entertainment History, Modifications History, etc.
From the data stored in this "file", the person can then generate a comprehensive report on the car.
I realise that the above example could easily warrant using an embedded DB like SQLCE (SQL server compact edition), but this time around I want to be able to create datastores that can move with people, assuming that this application resides on more than one computer (eg. Work and Home).
I'm unsure if XML is the option to go with here, as the relational data-mapping may pose a problem. In theory I envision having a pre-defined data-model, and create a new SQLCE-type database (the "file") to which the data can be persisted to and retrieved from. File-size is not an issue, as long as it's portable, ie. I can copy it to a flash disk from the office and continue working with it at home.
If my question is still unclear, please let me know and I'll try my best to clarify! I would really appreciate any help in this regard.
THANX A MILLION!!
EDIT: my apologies for the long-winded essay!
Here are a couple of ways you could do this.
XML, DataSets and Linq
You can use an xml file and load it into a DataSet, then use Linq to query it.
SQLite
SQLite is a file-based database. You can ship the SQLite dll with your application, needs no install. You can name the files whatever you like, and then access the data using ADO.NET. Here is an ADO.NET provider and sqlite implementation all rolled into one dll: http://sqlite.phxsoftware.com/

Is there a way to open a Zope DB in .Net?

I need to upgrade an old system based on Zope, I need to be able to export the data to something like SQL Server...does anyone know of a way I can open the Zope DB in .NET or directly export it to SQL Server?
Thanks,
Kieron
I am a Plone web developer, and Jason Coombs is correct. The ZODB is an Object Data Base, and contains python objects. These objects can be python code, data, meta-data, etc and are stored in a hierarchy. This is very different from the world of SQL tables and stored procedures. (The growing NoSQL movement, shows that Zope is not the only one doing this.) Additionally, since these are complex python objects you really want to be working on the ZODB with the version of python it was created with, or make sure that you can do a proper migration. I don't think that you will be able to do this with IronPython.
Without knowing what you are trying to get out of the ZODB, it is hard to give specific advice. As Jason suggested, trying WebDAV/FTP access to the ZODB might be all that you need. This allows to pull out the basic content of pages, or image files, but you may loose much of the more complex data (for example an event page may not have all of its datetime data included) and you will loose much of of the meta-data.
Here is how someone migrated from Plone to Word press:
http://www.len.ro/2008/10/plone-to-wordpress-migration/
There are a number of articles on migrating from one Plone version to another. Some of this information maybe useful to you. stackoverflow is not allowing me to post more links but search for:
"when the plone migration fails doing content migration only"
"Plone Product ContentMigration"
The first important thing to note is that the Zope Object Database (ZODB) stores Python objects in its hierarchy. Therefore, getting "data" out of the ZODB generally does not make sense outside of the Python language. So to some extent, it really depends on the type of data you want to get out.
If the data you're seeking is files (such as HTML, documents, etc), you might be able to stand up a Zope server and turn on something like WebDAV or FTP and extract the files that way.
The way you've described it, however, I suspect the data you seek is more fine-grained data elements (like numbers or accounts or some such thing). In that case, you'll almost certainly need Python of some kind to extract the data and transform it into some format suitable to import into SQL Server. You might be able to stay inside the .NET world by using IronPython, but to be honest, I would avoid that unless you can find evidence that IronPython works with the ZODB library.
Instead, I suggest making a copy of your Zope installation and zope instance (so you don't break the running system), and then use the version of Python used by Zope (often installed together) to mount the database, and manipulate it into a suitable format. You might even use something like PyODBC to connect to the SQL Server database to inject the data -- or you may punt, export to some file format, and use tools you're more familiar with to import the data.
It's been a while since I've interacted with a ZODB, but I remember this article was instrumental in helping me interact with the ZODB and understand its structure.
Good luck!

Data Correlation in large Databases

We're trying to identify the locations of certain information stored across our enterprise in order to bring it into compliance with our data policies. On the file end, we're using Nessus to search through differing files, but I'm wondering about on the database end.
Using Nessus would seem largely pointless because it would output the raw data and wouldn't tell us what table or row it was in, or give us much useful information, especially considering these databases are quite large (hundreds of gigabytes).
Also worth noting, this system needs to be able to do pattern-based matching (such as using regular expressions). Not just a "dumb search" engine.
I've investigated the use of Data Mining and Data Warehousing in order to find this data but it seems like they're more for analysis of data than actually just finding data.
Is there a better method of searching through large amounts of data in a database to try and find this information? We're using both Oracle 11g and SQL Server 2008 and need to perform the searches on both, so I'd like to stay away from server-specific paradigms (although if I have to rewrite some code to translate from T-SQL to PL/SQL, and vice versa, I don't mind)
On SQL Server for searching through large amounts of text, you can look into Full Text Search.
Read more here http://msdn.microsoft.com/en-us/library/ms142559.aspx
But if I am reading right, you want to spider your database in a similar fashion to how a web search engine spiders web sites and web pages.
You could use a set of full text queries that bring back the results spanning multiple tables.
Oracle supports regular expression with the RegExp_Like() function and it ought to be fairly straightforward to automate the generation of the code you need based on system metadate (to find all text columns over a certain length, for example, and include them in a predicate againt that table to find the rows and values that match your regexp). Doesn't sound too challenging really. In theory you could check constrain columns to prevent the insertion of values that match a regexp but that might be overkill.
Oracle Text is suited for searching for words/phrases in larg(ish) bits of text (eg PDFs, HTMLs, TXT or DOCs) held in the database. There is some limited fuzziness searching, but not regular expressions per se.
You don't really go into what sort of data you are looking for or what you have in your databases. Nessus indicates you are looking for security issues, but the title of "Data Correlation" suggests something completely different.
Really the data structures should provide the information about what to look for and where. That's what databases are about - structuring data for accessibility. A database backing a CMS, forum software or similar would be a different kettle of fish.

Database recommendation

I'm writing a CAD (Computer-Aided Design) application. I'll need to ship a library of 3d objects with this product. These are simple objects made up of nothing more than 3d coordinates and there are going to be no more than about 300 of them.
I'm considering using a relational database for this purpose. But given my simple needs, I don't want any thing complicated. Till now, I'm leaning towards SQLite. It's small, runs within the client process and is claimed to be fast. Besides I'm a poor guy and it's free.
But before I commit myself to SQLite, I just wish to ask your opinion whether it is a good choice given my requirements. Also is there any equivalent alternative that I should try as well before making a decision?
Edit:
I failed to mention earlier that the above-said CAD objects that I'll ship are not going to be immutable. I expect the user to edit them (change dimensions, colors etc.) and save back to the library. I also expect users to add their own newly-created objects. Kindly consider this in your answers.
(Thanks for the answers so far.)
The real thing to consider is what your program does with the data. Relational databases are designed to handle complex relationships between sets of data. However, they're not designed to perform complex calculations.
Also, the amount of data and relative simplicity of it suggests to me that you could simply use a flat file to store the coordinates and read them into memory when needed. This way you can design your data structures to more closely reflect how you're going to be using this data, rather than how you're going to store it.
Many languages provide a mechanism to write data structures to a file and read them back in again called serialization. Python's pickle is one such library, and I'm sure you can find one for whatever language you use. Basically, just design your classes or data structures as dictated by how they're used by your program and use one of these serialization libraries to populate the instances of that class or data structure.
edit: The requirement that the structures be mutable doesn't really affect much with regard to my answer - I still think that serialization and deserialization is the best solution to this problem. The fact that users need to be able to modify and save the structures necessitates a bit of planning to ensure that the files are updated completely and correctly, but ultimately I think you'll end up spending less time and effort with this approach than trying to marshall SQLite or another embedded database into doing this job for you.
The only case in which a database would be better is if you have a system where multiple users are interacting with and updating a central data repository, and for a case like that you'd be looking at a database server like MySQL, PostgreSQL, or SQL Server for both speed and concurrency.
You also commented that you're going to be using C# as your language. .NET has support for serialization built in so you should be good to go.
I suggest you to consider using H2, it's really lightweight and fast.
When you say you'll have a library of 300 3D objects, I'll assume you mean objects for your code, not models that users will create.
I've read that object databases are well suited to help with CAD problems, because they're perfect for chasing down long reference chains that are characteristic of complex models. Perhaps something like db4o would be useful in your context.
How many objects are you shipping? Can you define each of these Objects and their coordinates in an xml file? So basically use a distinct xml file for each object? You can place these xml files in a directory. This can be a simple structure.
I would not use a SQL database. You can easy describe every 3D object with an XML file. Pack this files in a directory and pack (zip) all. If you need easy access to the meta data of the objects, you can generate an index file (only with name or description) so not all objects must be parsed and loaded to memory (nice if you have something like a library manager)
There are quick and easy SAX parsers available and you can easy write a XML writer (or found some free code you can use for this).
Many similar applications using XML today. Its easy to parse/write, human readable and needs not much space if zipped.
I have used Sqlite, its easy to use and easy to integrate with own objects. But I would prefer a SQL database like Sqlite more for applications where you need some good searching tools for a huge amount of data records.
For the specific requirement i.e. to provide a library of objects shipped with the application a database system is probably not the right answer.
First thing that springs to mind is that you probably want the file to be updatable i.e. you need to be able to drop and updated file into the application without changing the rest of the application.
Second thing is that the data you're shipping is immutable - for this purpose therefore you don't need the capabilities of a relational db, just to be able to access a particular model with adequate efficiency.
For simplicity (sort of) an XML file would do nicely as you've got good structure. Using that as a basis you can then choose to compress it, encrypt it, embed it as a resource in an assembly (if one were playing in .NET) etc, etc.
Obviously if SQLite stores its data in a single file per database and if you have other reasons to need the capabilities of a db in you storage system then yes, but I'd want to think about the utility of the db to the app as a whole first.
SQL Server CE is free, has a small footprint (no service running), and is SQL Server compatible

Resources