Ahhh, LINQ – of course you’re case sensitive!

One of those ‘ahhh, bollocks’ moments this morning, so thought I’d write about it – a) so I’m not bitten by it again (writing about these things helps them sink in) and b) in case anyone else gets stuck and need a quick google of it.

Linq to SQL

We use linq commonly in our data access (SQL Server 2005/2008) and all is well on a join like:

var results = from cd in context.Distribution
               join uc in context.UCodes on cd.Batch equals uc.Batch
               where uc.Stamp >= betweenStart && uc.Stamp <= betweenEnd

...

Fairly standard stuff, an inner join between two tables based upon a criteria.  We use Latin1_General_CI_AS as our collation so no worries at all on those joins.

Linq to Objects

Now take those two collections out of the DB and into code (as we’ve had to do recently for a long running query), and that join above (on cd.Batch equals uc.Batch) gets buggered up.

Batch in the case above is a string, and someone forgot to sanitise it before entry to the DB (I use the royal someone, as it may have been me!), so a batch can be either ‘vfc’ or ‘VFC’ or ‘Vfc’ etc.

Move away from our cosy Latin1_General_CI_AS world and the above started to return a lot less data because of casing.

The fix is (as you would expect) easy:

var results = from cd in context.Distribution 
  
               join uc in context.UCodes on cd.Batch.ToUpperInvariant() equals uc.Batch.ToUpperInvariant()
               where uc.Stamp >= betweenStart && uc.Stamp <= betweenEnd
...

I thought I’d have a quick look around in terms of case sensitivity and which conversion mechanism to use (ToUpper, ToLower, etc.) and the following post interested me:

http://msdn.microsoft.com/en-us/library/bb386042.aspx

With the following information:

Strings should be normalized to uppercase. A small group of characters, when they are converted to lowercase, cannot make a round trip. To make a round trip means to convert the characters from one locale to another locale that represents character data differently, and then to accurately retrieve the original characters from the converted characters.

I’ve always tended to .ToUpperInvariant() when I’ve done string comparisons anyway, but it’s interesting to see some reasoning behind it.

 

Anyway, it goes down as one of those gotchas that I thought I’d write up.

Overriding ToString() on your objects using reflection

Just a very quick one, more as a reminder to myself on something I’d setup, though as per, would love any feedback.

On a current project we’re using exception driven development, and upon an exception, we’re throwing our own custom business exceptions, and building up the Exception.Data collection with the properties in the objects as they were at the point of exception.

We have a lot of DTO objects down at the dal layer, some with a lot of properties, and we didn’t want to keep having to do:

UserNotFoundException ex = new UserNotFoundException();
ex.Data.Add(“Username”, userDto.Username);
ex.Data.Add(“UserId”, userDto.UserId);
ex.Data.Add(“AccountStatus”, userDto.AccountStatus);
...

With that in mind, I started working on what reflection could bring to the table and perhaps giving our Dto objects a base type to derive from.

Here’s the initial stab at what I’ve arrived at:

public class BaseDto
{ public override string ToString()
{
PropertyInfo[] propertyInfos = this.GetType().GetProperties();

Array.Sort(propertyInfos, (propertyInfo1, propertyInfo2) => propertyInfo1.Name.CompareTo(propertyInfo2.Name));

StringBuilder output = new StringBuilder();
foreach (PropertyInfo propertyInfo in propertyInfos)
{
output.AppendFormat("{0}: {1}\n", propertyInfo.Name, propertyInfo.GetValue(this, null));
}

return output.ToString();
}
}

Jobs a good ‘un.  I can now just use:

UserNotFoundException ex = new UserNotFoundException();
ex.Data.Add(“userDto”, userDto.ToString());

and all properties will be enumerated and documented in that one property of the .Data dictionary.

<meerkat>simples!</meerkat>

I hasten to add, this seems to work a treat in testing – I’ve got more rigorous investigation to do to make sure it presents a way forward for us, but thought I’d post it anyway in case anyone found it useful.

A better way to check for validity in emails?

I’ve had a method that I’ve used from time to time to validate email addresses, trying to cater for the common problems that have been seen with addresses.  This weekend I had cause to look at it and thought there must be a better way of representing it all.

Couple of thoughts crossed my mind:

  1. I’m not throwing exceptions anywhere, and although I know the method, so use it as I’d expect, perhaps I should be throwing a FormatException? or some others?
  2. It’d be easy to make this an extension method, but I guess it’d be an extension to System.String, and doesn’t really feel right as it serves such a focussed purpose.
  3. Should I be doing any other checks in the code that I’m not already?

I’ll have a read around and look at refactoring, but thought I’d post it here so that I have a record of the ‘before’ and ‘after’ views.

/// 
/// 
/// 
/// 
/// 
/// 
public static string ValidateEmail(string email, out string error)
{
	try
	{
		error = "";

		// Pre-formatting steps
		email 	= email.Trim().Replace(" ", "");
		email 	= email.Replace(",", ".");												// mostly, commas are full stops gone wrong
		email 	= (email.EndsWith(".")) ? email.Substring(0, email.Length-1) : email;	// kill any full stop at the end of an address
		email	= email.Replace(@"""", "");												// remove " in the email address
		email	= (email.StartsWith("'")) ? email.Substring(1) : email;					// remove ' at the start of the address
		email	= (email.EndsWith("'")) ? email.Substring(0, email.Length-1) : email;	// remove ' at the end of the address

		// STEP 1	- No '@' symbol in Email
		if (!email.Contains("@"))
		{
			error = "Email contains no '@' symbol.";
			return "";
		}

		// STEP 2	- More than 1 '@'symbol in Email
		if (email.Split('@').Length > 2)
		{
			error = "Email contains too many '@' symbols.";
			return "";
		}

		// STEP 3	- No .com, .co.uk at end of addresses
		//			- Invalid characters ()<>,?/\|^!"£$%^&* ??? in address
		Regex _regex = new Regex(@"^[-\w._%+']+@[-\w.]+\.[\w]{2,4}$", RegexOptions.IgnoreCase);
		if (!_regex.IsMatch(email))
		{
			error = "Email address appears invalid.";
			return "";
		}
		
		return email;
	}
	catch
	{
		error = "Unknown error with email address.";
		return "";
	}
}