Twitter name length in DB - sql-server

I'm adding a field to a member table for twitter names for members on a site. From what I can work out the maximum twitter name length is 20 so it seems obvious that I should set the field size to varchar(20) (SQL Server).
Is this a good idea?
What if Twitter starts allowing multi-byte characters in the user names? Should I make this field nvarchar?
What if Twitter decides to increase the size of a username? Should I make it 50 instead and then warn a user if they enter a name longer than 20?
I'm trying to code defensively so that I can reduce the chances of modifying the code around this input field and the DB schema changes that might be needed.

while looking for the same info i found the following in a sort of weird place in the twitter help section (why not in the API docs? who knows?):
"Your user name can contain up to 15 characters. Why no more? Because we append your user name to your 140 characters on outgoing SMS updates and IM messages. If your name is longer than 15 characters, your message would be too long to send in a single text message."
http://help.twitter.com/entries/14609-how-to-change-your-username
so perhaps one could even get away with varchar(16)

While new accounts has a limit of 15 characters in the username and 20 characters in the name, for old accounts this limit seems to be undefined. The documentation here states:
Earlybirds: Early users of Twitter may have a username or real name longer than user names we currently allow. This is ok until you need to save changes to your account settings. No changes will save unless your user/real name is the appropriate length; this means you have to change your real name/username to meet our most modern regulations.
So you are probably better of having a long field and save yourself some time when you hit the border cases.

Nowadays, space is usually not a concern, so I'd use a mostly generic approach: use nvarchar(200).
When designing DB schemas you must think 2 steps ahead, even more than when programming. Or get yourself a good schema update strategy, then you'll be fine also with varchar(20).

Personally I wouldn't worry. Use something like 200 (or a nice round number like 256) and you won't have this problem. The limit then is on their API, so you might be best to do some verification that it is a real username anyway. That verification implicitly includes the length checking.

Twitter allows for 140 characters to be typed in as the message payload for transmission, and includes "[username]:" at the beginning of the SMS message. With an upper limit of 140 characters for the message combined with the messaging system being based on SMS, I think they would have to decrease the allowable message size to increase the username. I think it is a pretty safe bet that 20 characters would be the max username length. I'd use nvarchar just in case someone uses 16-bit characters, and maybe pad it a little. nvarchar(24) should work; I wouldn't go any higher than nvarchar(32).
If you're going to develop an app for their service, you should probably watch the messages on Twitter's API Announcements mailing list.

[opinion only]
Twitter works on SMS and the limit there is something like 256 characters, so the name has to be small to avoid hitting into the message.
nvarchar would be a good idea for all twitter text
If the real ID of a Twitterer is a cell-phone then the longest phone number is your max - 20 should easily cover it!
Defensive programming is always good :) !
[/opinion only]

There's only so much you can code defensively, I'd suggest looking at the twitter API documentation and following anything specified there. That said, from a cursory look through nowhere seems to specify the length of the username, annoyingly :/

One thing to keep in mind here is that a field using nvarchar needs twice as much space, since it needs 2 bytes to store each potential unicode character. So, a twitter status would need a size of 280 using nvarchar, PLUS some more for possible retweets, as those aren't inlcuded in the 140 char limit. I discovered this just today in fact!
For example:
RT #chatrbyte: here's some great tweet
that I'm retweeting.
The RT #chatrbyte: is not included in the 140 character limit.
So, assuming that a Twitter username has a 20 character limit, and wanting to also capture a ReTweet, a field to hold a full tweet would need to be a nvarchar of size 280 + 40 (for the username) + 8 (for the initial RT # before a retweet) +4 (for the :+space after a Retweet username) = 330.
I would say go for nvarchar(350) to give yourself a little room. That's what I am trying right now. If I'm wrong I'll update here.

I'm guessing you are managing the data entry on the Twitter name field in your application somewhere other than just in the database. If you open the field to 200 characters, you only have to change the code in one place or if you allow users to enter Twitters names with more than 20 characters, you don't have to worry about a change at all.

Related

SQL Server Problem with carriage splitting a string

DECLARE #Description2 VARCHAR(MAX);
SELECT #Description2 = '1. Each week there will be a philosophical question to address. For example, you will address questions such as do we have freewill or are we determined?, what does quantum physics tell us about the nature of reality?, and what are the philosophical implications of Darwinian evolution? Utilizing the readings for the week prepare a written essay response. There will be a minimum of ten mini-essays for the term and each must be a minimum of 200 words.
2. Develop an argument on the topic of ontology, focusing specifically on the question "Are we just the brain?" Argue either the materialist position (we are just the brain) or the non-materialist position (we are not just the brain), drawing from the primary writings of the philosophers. Be sure to explain both positions in your essay and then make the case for the position you are supporting. This argumentative essay needs to be at least 750 words in length. We will then conduct an in class debate and you will need to argue your point in a debate setting.
3. Develop an argument in the area of ethics, arguing for or against animal rights. Make sure to utilize primary writings in the construction of your argument. This argumentative essay needs to be at least 750 words in length. Students will present their position to the class in a ten minute oral presentation.
4. Having read the writings of Epictetus and Sartre compare and contrast Stoicism and Existentialism. Write a 750 word essay highlighting the key differences and similarities.
5. Analyze the primary readings of Nietzsche in journal form. Choose 10 separate passages to analyze and include the following: a) a summary of the passage; b) an interpretation or analysis of the argument; c) a comparison and/or contrast to the ideas of another philosopher or philosophy; d) personal insight into the writing by applying the ideas to you or to the world at large (its meaning on a deeper, more personal level). This journal will be at least 1000 words in length. '
set #Description2 = replace(replace(replace(replace(replace(replace(replace(#Description2,'<p>',''),'</p>',''),'<br />',''),' ',''),'<br />.<br />',''),'<div>',''),'</div>','')
set #Description2 = concat('???',REPLACE(#Description2,CHAR(13)+char(10),'???'))
select ltrim(s.Item)
from dbo.DelimitedSplit8K_LEAD(#Description2, '???') s
where ltrim(s.Item) <> ''
So what do you actually want to happen? And what are you seeing that is different from what you want?
In SSMS, if you go to tools > options > query results > SQL Server > results to grid, there is a checkbox labelled "retain CR/LF on copy or save" which determines how varchar data will be treated when you click on the output grid results and copy the data.
If you check the checkbox, the carriage return/linefeed will be retained. Ie, if you do:
select 'a
b'
Run that query, copy the result from the grid, and paste it into, say, notepad, you will get 2 lines of text in notepad.
On the other hand, if you don't check the checkbox, you will only get a single line in notepad.
Be aware that if you change the setting of this checkbox, I believe you will need to open a new query window to see the new behaviour.
Note that this only controls the behaviour of SSMS. It has nothing to do with how your data is "really" stored, for example, if you insert it into a table. If you have an application reading data from a table, the way it formats the output is up to the application.

SQL Server validating postcodes

I have a table containing postcodes but there is no validation built in to the entry form so there is no consistency in the way they are stored in the database, sample below:
ID Postcode
001742 B5
001745
001746
001748 DY3
001750
001751
001768 B276LL
001774 B339HY
001776 B339QY
001780 WR51DD
I want to use these postcode to map the distance from a central point but before I can do that I need to put them into a valid format and filter out any blanks or incomplete postcodes.
I had considered using
left(postcode,3) + ' ' + right(postcode,3)
To correct the formatting but this wouldn't work for postcodes like 'M6 8HD'
My aim is to get the list of postcodes in a valid format but I don't know how to account for different lengths of postcode. Is this there a way to do this in SQL Server?
As discussed in the comments, sometimes looking at a problem the other way around presents a far simpler solution.
You have a list of arbitrary input provided by users, which frequently doesn't contain the correct spacing. You also have a list of valid postcodes which are correctly spaced.
You're trying to solve the problem of finding the correct place to insert spaces into your arbitrary inputs to make them match the list of valid codes, and this is extremely difficult to do in practice.
However, performing the opposite task - removing the spaces from the valid postcodes - is remarkably easy to do. So that is what I'd suggest doing.
In our most recent round of data modelling, we have modelled addresses with two postcode columns - PostCode containing the postcode as provided from whatever sources, and PostCodeNoSpace, a computed column which strips whitespace characters from PostCode. We use the latter column for e.g. searches based on user input. You may want to do something similar with your list of Valid postcodes, if you're keeping it around permanently - so that you can perform easy matches/lookups and then translate those matches back into a version that has spaces - which is actually a solution to the original question posed!

Using text in a column of Date/Time type in access

I have a column in MS Access in which the data could be any of the following:
A date
Text string: "n/a"
Text string: "n/e"
The vast majority of entries will be dates but a very few will need to be these specified text strings. I would like to still be able to perform date calculations on the column. Whats the best datatype to use?
In my opinion the best approach would be to leave the date field as Date/Time and then add another field to indicate the status if the Date/Time field is Null. Something like:
DateField DateStatus
--------- ----------
2014-09-21
n/a
2014-09-23
2014-09-25
n/e
You could use a single Text field, but then any time you wanted to use the field value as a proper Date/Time value you'd have to convert it using CDate(). You would also have the possibility of other junk getting in there, or dates getting entered in different formats (e.g. d/m/yyyy vs. m/d/yyyy). And finally, you would lose the ability to easily determine whether a Date/Time value is in a particular row (which in my approach would simply be ... WHERE DateField IS [NOT] NULL).
I agree with Gord Thompson's answer - mainly because it's so non-intuitive to have, essentially, two completely different types of data in a single column, and because it's going to make validation/data integrity stuff so much harder with little upside - and, as he indicates with the CDate() reference, dates basically only work reliably like dates if they're in a "date/time" field. Microsoft has a page on choosing a data type that explains some of the Access-specific differences in more detail.
I also suggest that you don't actually have a text field for those "comments," since you say there's only a handful of potential options - use a Long Integer and connect back to a separate table with the list of allowable entries. This will allow you to run reports more easily, change the "display text" in one step instead of potentially dozens of times, etc. It also saves a relatively small amount of space per record (long integer = 4 bytes; text = up to 255 bytes.)
You can also do fun data/reporting stuff with that Comment (long integer) field and dates - even combined into ranges, by the way - queries let you use the two different columns to create a single answer. I have a report that's grouped so that you can see stats for everything that's active (by quarter in which they start) plus everything that's pending (with the code indicating who's responsible for watching this record,) plus everything that's not pending but still doesn't have a start date (with the reason code displayed,) plus everything that's expired (by quarter in which they ended.) It looks like each of those things is in a single column in the report, but it's actually like five columns that have been concatenated with the IIf function.
(Almost every argument I can come up with boils down to "this is what relational databases are all about and why they're so awesome.)

Reading specified lines below a found string

I'm going to create a simple database bank account in C-language but I haven't quite figured out how I'm gonna fetch data for a specific account already created and sent to a file. I was thinking of doing a search from the beginning of the file using fseek for an account number specified since all account numbers will be unique. Is there a way to read the the amount of lines specified below that account number once it is found? For e.g in my file accounts.txt there will be the accounts
Account # : 13398
First Name : Eric
Last Name : Walters
Parish : St.tofu
Year of Birth : 1980
Age : 34
Savings Period : 5 year(s)
Password : Eric1
Account # : 13398
Account balance: $0.00
====================================
I want to search through the file for the account number and fetch it along with everything else 10 lines below it and display it on the screen if this is possible then say 'aye' and point me to a certain area I should study to achieve this and when I'm successful i'll post my coding here to show what I have done.
fseek() allows you to skip a certain number of bytes in each file. If your lines are not always the same length, you will have to read the entire file, not just to search for the account numbers, but also to find the ten newlines that delimit each account. To do this, you are better off using fgets().
The steps would be something like this
foreach line in file
if line starts with "Account Number"
if the number is the one you want
print the next 10 lines
else
skip the next 10 lines
else
keep looking
Firstly, fseek is used to move the file pointer not for searching. For search text, i.e. account id in your case, there is some examples Trying to find and replace a string from file in C. To write your own code, learning the basic use of file handling functions is enough. Furthermore, since your data is structured (every 11 lines represent one account), you code can be accelarated. At last, what you are trying to do is what database software offers and it is hard too implement your own database as fast as commercial software.
You could search in the file, but that would be a bit tedious. Even more tedious if you wanted to modify the account details.
Why don't you use SQLite:
It is designed to replace fopen().
?

Why won't " (800) 555-1212" fit in a varchar(20) field?

I have a SQL Server phone number field that is defined as varchar(20). A phone number like '800-555-1212' fits with no issue but a phone number like '(800) 555-1212' will cause a "data will be truncated" error. Why would that be?
That example inserts fine into a field defined that way (I ran a test).
You may have some invisible characters adding to the length like tabs and carriage returns.
Or you may have a calculated field (or code in a trigger) that is based on that field (we have a phone_number_stripped field to store just the numerics) that has accounted for stripping out - but not ( or ).
A value could be "like" that value, but be padded with spaces on either end.
Make sure your software handles that case. I recommend allowing the space, but automatically trimming it off.

Resources