I have a contact manager program and I would like to offer the feature to import csv files. The problem is that different data sources order the fields in different ways. I thought of programming an interface for the user to tell it the field order and how to handle exceptions.
Here is an example line in one of many possible field orders:
"ID#","Name","Rank","Address1","Address2","City","State","Country","Zip","Phone#","Email","Join Date","Sponsor ID","Sponsor Name"
"Z1234","Call, Anson","STU","1234 E. 6578 S.","","Somecity","TX","United States","012345","000-000-0000","someemail#gmail.com","5/24/2010","z12343","Quantum Independence"
Notice that in one data field "Name" there is a comma to separate last name and first name and in another there is not.
My plan is to have a line for each field (ie ID, Name, City etc.) and a statement "import to" and list box with options like: Don't Import, Business>Join Date, First Name, Zip
and the program recognizes those as properties of an object...
I'd also like the user to be able to record preset field orders so they can re-use them for csv files from the same download source.
Then I also need it to check if a record all ready exists (is there a record for Anson Call all ready?) and allow the user to tell it what to do if there is a record (ie mailing address may have changes, so if that field is filled overwrite it, or this mailing address is invalid, leave the current data untouched for this person, overwrite the rest).
While I'm capable of coding this...i'm not very excited about it and I'm wondering if there's a tool or set of tools out there to all ready perform most of this functionality...
I hope this makes sense...
Is there a header row?
usually in CSV files, the first line is the header.
If so you could use the header line to determine the order, just have a list of column names, and only prompt the user if a column name does not match.(this could then be auto added into the predefined list).
EDIT:
even if a header does not exist, its simple enough to add one. The file can be manually edited. Alternatively in your program let the user define it (from your predefined list)
I can't find any tools all ready out there and no one has replied otherwise, so for the sake of leaving a question answered until otherwise notified the answer is: no.
Related
So, i have a school project to build the uberEats management system in C.
I have a csv file that contains the cities where the system is available and their respective code. Eg: 1, New York.
And i need to read that file, so when the user inserts one code, it associates to the specific city. Can anyone help me how to do that?
you could read the file and make a matrix of 2xno_cities, and when the user puts a number, you just do
array[input-given-by-user][1]
since the first [] would be for the row, and the second one for the column, which is where you save the cities,
I have a requirement in which one source is a table and one source is a file. I need to join these both on a column. The problem is that I can do this for one table with one transformation but I need to do it for multiple set of files and tables to load into another set of specific files as target using the same transformation.
Breaking down my requirement more specifically :
Source Table Source File Target File
VOICE_INCR_REVENUE_PROFILE_0 VoiceRevenue0 ProfileVoice0
VOICE_INCR_REVENUE_PROFILE_1 VoiceRevenue1 ProfileVoice1
VOICE_INCR_REVENUE_PROFILE_2 VoiceRevenue2 ProfileVoice2
VOICE_INCR_REVENUE_PROFILE_3 VoiceRevenue3 ProfileVoice3
VOICE_INCR_REVENUE_PROFILE_4 VoiceRevenue4 ProfileVoice4
VOICE_INCR_REVENUE_PROFILE_5 VoiceRevenue5 ProfileVoice5
VOICE_INCR_REVENUE_PROFILE_6 VoiceRevenue6 ProfileVoice6
VOICE_INCR_REVENUE_PROFILE_7 VoiceRevenue7 ProfileVoice7
VOICE_INCR_REVENUE_PROFILE_8 VoiceRevenue8 ProfileVoice8
VOICE_INCR_REVENUE_PROFILE_9 VoiceRevenue9 ProfileVoice9
The table and file names are always corresponding i.e. VOICE_INCR_REVENUE_PROFILE_0 should always join with VoiceRevenue0 and the result should be stored in ProfileVoice0. There should be no mismatches in this case. I tried setting the variables with table names and file names, but it only takes on value at a time.
All table names and file names are constant. Is there any other way to get around this. Any help would be appreciated.
Try using "Copy rows to result" step. It will store all the incoming rows (in your case the table and file names) into a memory. And for every row, it will try to execute your transformation. In this way, you can read multiple filenames at one go.
Try reading this link. Its not the exact answer, but similar.
I have created a sample here. Please check if this is what is required.
In the first transformation, i read the tablenames and filenames and loaded it in the memory. After that i have used the get variable step to read all the files and table names to generate the output. [Note: I have not used table input as source anywhere, instead used TablesNames. You can replace the same with the table input data.]
Hope it helps :)
I'm doing an Excel loop through fifty or more Excel files. The loop goes through each Excel file, grabs all the data and inputs it into the database without error. This is the typical process of setting delay validation to true, and making sure that the expression for the Excel Connection is a string variable called EFile that is set to nothing (in the loop).
What is not working: trying to input the name of the Excel file into the database.
What's been tried (edit; SO changed my 2 to 1 - don't know why):
Add a derived column between the Excel file and database input, and add a column using the EFile expression (so under Expression in the Derived Column it would be #[User::EFile]). and add the empty. However, this inputs nothing a blank (nothing).
One suggestion was to add ANOTHER string variable and set its properties EvaluateAsExpression to True and set the Expression to the EFile variable (#[User::EFile]). The funny thing is that this does the same thing - inputs a blank into the database.
Numerous people on blogs claim they can do this, yet I haven't seen one actually address this (I have a blog and I will definitely be showing people how to do this when I get an answer because, so far, these others have fallen short). How do I grab an Excel file's name and input it in a database during a loop?
Added: Forgot to add, no scripts; the claim is that it can be done without them, so I want to see the solution without them.
Note: I already have the ability to import the data from the Excel files - that's easy (see my GitHub account, as I have two different projects for importing all sorts of txt, csv, xls, xlsx data). I am trying to also get the actual name of the file being imported also into the database. So, if there are fifty Excel files, along with the data in each file, the database will have the fifty file names alongside that data (so if each file has 1000 rows of data, each 1000 rows would also have the name of the file they came from next to them as an additional column). This point seems to cause a lot of confusion, as people assume I'm having trouble importing data in files - NOPE, see my GitHub; again that's easy. It's the FILENAME that needs to also be imported.
Test package: https://github.com/tmmtsmith/SSISLoopWithFileName
Solution: #jaimet pointed out that the Derived Column needed to be the #[User::CurrentFile] (see the test package). When I first ran the package, I still got a blank value in my database. But when we originally set up the connection, we do point it to an actual file (I call this "fooling the package"), then change the expression on the connecting later to the #[User::CurrentFile], which is blank. The Derived Column, using the variable #[User::CurrentFile], showed a string of 0. So, I removed the Derived Column, put the full file path and name in the variable, then added the variable to the Derived Column (which made it think the string was 91 characters long), then went back and set the variable to nothing (English teacher would hate the THENs about right now). When I ran the package, it inputted the full file path. Maybe, like the connection, it needs to initially think that a file exists in order for it to input the full amount of characters?
Appreciate all the help.
The issue is because of blank value in the variable #[User::FileNameInput] and this caused the SSIS package to assume that the value of this variable will always be of zero length in the Derived Column transformation.
Change the expression on the Derived column transformation from #[User::FileNameInput] to (DT_STR, 2000, 1252)#[User::FileNameInput].
Type casting the derived column to 2000 sets the column length to that maximum value. The value 1252 represents the code page. I assumed that you are using ANSI code page. I took the value 2000 from your table definition because the FilePath column had variable VARCHAR(2000). If the column data type had been NVARCHAR(2000), then the expression would be (DT_WSTR, 2000)#[User::FileNameInput]
Tim,
You're using the wrong variable in your Derived Column component. You are storing the filename in #[User::CurrentFile] but the variable that you're using in your Derived Column component is #[User::FileNameInput]
Change your Derived Column component to use #[User::CurrentFile] and you'll be good.
Hope that helps.
JT
If you are using a ForEach loop to process the files in a folder then I have have used the technique described in SSIS Junkie's blog to get the filename in to an SSIS variable: SSIS: Enumerating files in a Foreach loop
You can use the variable later in your flow to write it to the database.
TO all intents and purposes your method #1 should work. That's exactly how I would attempt to do it. I am baffled as to why it is not working. Could you perhaps share your package?
Tony, thanks very much for the link. Much appreciated.
Regards
Jamie
I'm tring to create an SSIS package to import some dataset files, however given that I seem to be hitting a brick
wall everytime I achieve a small part of the task I need to take a step back and perform a sanity check on what I'm
trying to achieve, and if you good people can advise whether SSIS is the way to go about this then I would
appreciate it.
These are my questions from this morning :-
debugging SSIS packages - debug.writeline
Changing an SSIS dts variables
What I'm trying to do is have a For..Each container enumerate over the files in a share on the SQL Server. For each
file it finds a script task runs to check various attributes of the filename, such as looking for a three letter
code, a date in CCYYMM, the name of the data contained therein, and optionally some comments. For example:-
ABC_201007_SalesData_[optional comment goes here].csv
I'm looking to parse the name using a regular expression and put the values of 'ABC', '201007', and
'SalesData' in variables.
I then want to move the file to an error folder if it doesn't meet certain criteria :-
Three character code
Six character date
Dataset name (e.g. SalesData, in this example)
CSV extension
I then want to lookup the Character code, the date (or part thereof), and the Dataset name against a lookup table
to mark off a 'checklist' of received files from each client.
Then, based on the entry in the checklist, I want to kick off another SSIS package.
So, for example I may have a table called 'Checklist' with these columns :-
Client code Dataset SSIS_Package
ABC SalesData NorthSalesData.dtsx
DEF SalesData SouthSalesData.dtsx
If anyone has a better way of achieving this I am interested in hearing about it.
Thanks in advance
That's an interesting scenario, and should be relatively easy to handle.
First, your choice of the Foreach Loop is a good one. You'll be using the Foreach File Enumerator. You can restrict the files you iterate over to be just CSVs so that you don't have to "filter" for those later.
The Foreach File Enumerator puts the filename (full path or just file name) into a variable - let's call that "FileName". There's (at least) two ways you can parse that - expressions or a Script Task. Depends which one you're more comfortable with. Either way, you'll need to create three variables to hold the "parts" of the filename - I'll call them "FileCode", "FileDate", and "FileDataset".
To do this with expressions, you need to set the EvaluateAsExpression property on FileCode, FileDate, and FileDataset to true. Then in the expressions, you need to use FINDSTRING and SUBSTRING to carve up FileName as you see fit. Expressions don't have Regex capability.
To do this in a Script Task, pass the FileName variable in as a ReadOnly variable, and the other three as ReadWrite. You can use the Regex capabilities of .Net, or just manually use IndexOf and Substring to get what you need.
Unfortunately, you have just missed the SQLLunch livemeeting on the ForEach loop: http://www.bidn.com/blogs/BradSchacht/ssis/812/sql-lunch-tomorrow
They are recording the session, however.
I need a list of common first names for people, like "Bill", "Gordon", "Jane", etc. Is there some free list of lots of known names, instead of me having to type them out? Something that I can easily parse with the programme to fill in an array for example?
I'm not worried about:
Knowing if a name is masculine or feminine (or both)
If the dataset has a whole pile of false positives
If there are names that aren't on it, obviously no dataset like this will be complete.
If there are 'duplicates', i.e. I don't care if the dataset lists "Bill" and "William" and "Billy" as different names. I'd rather have more data than less
I don't care about knowing the popularity the name
I know Wikipedia has a list of most popular given names, but that's all in a HTML page and manged up with horrible wiki syntax. Is there a better way to get some sample data like this without having to screen scrape wikipedia?
A CSV from the General Register Office of Scotland with all the forenames registered there in 2007.
Another large set of first names in CSV format and SQL format too (but they didn't say which DB dumped the SQL).
GitHub page with the top 1000 baby names from 1880 to 2009, already parsed into a CSV for you from the Social Security Administration.
CSV of baby names and meanings from a Princeton CS page.
That ought to be enough to get you started, I'd think.
You can easily consume the Wikipedia API (http://en.wikipedia.org/w/api.php) to retrieve the list of pages in specific category, looks like Category:Given names is something you want to start from.
http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmnamespace=0&cmlimit=500&cmtitle=Category:Given_names
The part of result from this URL looks like this:
<cm pageid="5797824" ns="0" title="Abdou" />
<cm pageid="5797863" ns="0" title="Abdu" />
<cm pageid="859035" ns="0" title="Abdul Aziz" />
<cm pageid="6504818" ns="0" title="Abdul Qadir" />
Look at the API and select appropriate format and query parameters, and check categories.
P.S.
BTW, The wiki-text from page you linked to contain names in a form that easy to extract using regexp... As well as titles of links in the rendered HTML page have “(name)” attached to the name itself.
Social Security Administration - Beyond the Top 1000 Names Data Files
The above is a comprehensive list of first names in use in the US. The zip files contain national and state-level data by year of birth in CSV format. It includes the number of occurrences (minimum 5) and gender. For example, the national file for 2010 includes 33,838 baby names.