Primer on Selecting Data Using Entity Framework

This question about selecting data using Entity Framework on StackOverflow got me thinking that a lot of LINQ to Entities code samples show very little consistency in how data is selected. Find(), Single(), SingleOrDefault(), First(), and FirstOrDefault() are often used interchangeably.

The first issue to address is using Where() in the data selection statement. Where() should only be used when several entities are required. There is no advantage in using Where() as below :

var db = new MyDBContext();
var employee = db.Employees.Where(x => x.ID == 1).Single();

This can be written as :

var employee = db.Employees.Single(x => x.ID == 1);

Note that there is a fundamental difference between using Where() in isolation and appending a standard LINQ operator such as Single() or ToList(). Where() and OrderBy() will return IQueryable and IOrderedQueryable objects respectively which contain core SQL Statements to be executed against the database but have not yet requested or received data from the database.
Appending Single(), First(), ToList() or similar will subsequently execute the query:

var emps = db.Where(x => x.Salary > 100000); // emps is an IQueryable() object, no database interaction yet
//.....
var highSalaryEmps = emps.ToList() //database query executed and an IEnumerable() returned as highSalaryEmps

This can have both performance and data integrity implications. In the above example, data is only read from the database when ToList() is called, intervening update/add operations will be reflected in the highSalaryEmps object which may not be the intended outcome.
In terms of performance, the primary issue to be aware of is using filtering operations after executing a large database query. For example :

var emps = db.Employees.ToList();
var highestSalaryEmp = emps.OrderBy(x => x.Salary).Single();

This will first return all employees from the database and then performance an order and selection. Far better to just query the database using these criteria :

var highestSalaryEmp = db.Employees.OrderBy(x => x.Salary).Single();

Find()

Find() is set apart from other element operations in LINQ-To-Entities in that it will first query the in-memory entities that EF is tracking and only hit the database in the event that none is found. Thus it can have superior performance and should always be used where possible – especially in lengthy operations when the entity is likely to already be in memory. Find() is unfortunately not very flexible and can only be used to look up an entity by its Primary Key :

var emp = db.Employees.Find(empId);

Note that you can use the Local property on a DbSet to access the in-memory data if you require more advanced logic :

var emp = db.Employees.Local.SingleOrDefault(x => x.Name = "Jude")
        ?? db.Employees.SingleOrDefault(x => x.Name = "Jude");

Note the use of the null coalescor (??) above which tests if the first statement evaluates to null and if so then proceeds to execute the second statement. Thus EF will first test if there is a match in-memory and only if no match is found will then hit the database.

Single or First

A lot of EF code samples contain First() when it is clearly not optimal. First() is often used interchangeably with Single() although they are quite different. Single mandates that one and only one entity can be returned, if two matching records are found in the database an error will be thrown. First() is only concerned with returning the first record and performs no check to determine if there are multiple matching records.
Thus Single performs a data integrity test. In many scenarios there should only but one matching record in the database, and so Single will ensure this – although this will necessitate addition error trapping logic since Single will throw errors is more than one match is found. Under the covers, Single does this by using ‘SELECT TOP (2)’ for its queries and then EF inspects the returned records and throws an error if two records are returned. First by contrast simply queries using ’SELECT TOP (1)’.

As a side note, using .Take(1) instead of .First() has no difference in the SQL used to query the database, but Take() returns an IQueryable object and not an entity as with First() so the above discussed issues will be relevant.

OrDefault()

SingleOrDefault() or FirstOrDefault() should only be used when it is not known if the entity has already been persisted to the database. Thus SingleOrDefault() will query the database for the entity and return null if no entities are found (note that an error will still be thrown if more than one match is found). A typical pattern is querying and returning an entity if it exists or creating and adding it if there is no matching entity. In such a scenario the ?? null coalescor introduced above can be very useful.
Take a scenario where users can create Tags for Articles, if the Tag exists it shouldnt be returned as an entity but it needs to be created if it does not yet exist:

var tag = db.Tags.SingleOrDefault(x => x.TagName ="EntityFramework") ??
new Tag { TagName ="EntityFramework", CreateDate = DateTime.UtcNow };
article.Tag = tag;
db.SaveChanges(); //Note the new tag will not be persisted to the database until SaveChanges is called.

]]>

Find()

Single or First

OrDefault()

Leave a comment Cancel reply