I ran three tests of an automated process and learned an interesting thing. Here is the set up:
- Commented out explicit transactions from NHibernate's Session
- Used a repository that retrieved entities by a natural key
Process ran in under 1 minute. The Sql profiler logged 14,631 rows. This ran very fast but using implicit transactions in NHibernate is not a best practice. Also, anytime you see a process generating 14,000 rows you probably aren't doing something correctly. Here is a sample of the logged rows from the profiler:
Notice that a lot of the extra rows exist because of the Audit Login lines. These must have something to do with the fact that we are using implicit transactions because as you'll see later when we use explicit transactions, you don't see all of these lines.
- Used explicit transaction
- Used a repository that retrieved entities by a natural key (Same as Test 1)
Process ran in 9 minutes. Interestingly enough, the Sql Profiler only logged 3,676 rows as compared to the 14,621 rows observed from Test 1. This was completely unexpected to me because typically when something takes longer to run it also generates more queries. But take a look at the follow screen capture of the profiler:
The reason there are so fewer rows is because you don't see all of those Audit Login lines we saw in Test 1. So why did this test take so much longer? Let's go onto the next test and we'll get back to this answer later.
- Used explicit transaction (Same as Test 2)
- Provided some caching in the repository that would only create queries to the database if the natural key that was being searched for hadn't already been searched.
Process ran in under 1 minute. The Sql profiler logged 1,192 rows that look much like what happened in Test2 but without all of the queries that were cached now in the repository (explaining why we have less rows than Test 2). Note, I understand that having 1,192 rows logged here is still quite a large amount. I could probably make this number better with a better fetching strategy.
So now the number of log entries all make sense but it still isn't completely clear as to why Test 1 with implicit transactions ran so quickly even without performing any caching within the repository. I tracked it down to one thing by comparing the profiler logs. The difference is that Test 1 and Test 3 are performing all inserts/updates at the end of the unit of work and all select statements are done at the beginning of the unit of work. Test 2 has inserts and update statements inline / throughout the entire unit of work.
I can explain how this happened although I can't fully explain what the difference is (but I do have an idea). For test one, none of the inserts/updates were performed until a flush command was issued and the flush command was issued at the end of the unit of work. In fact if this flush command is not issued at all then no data is persisted (no insert and/or update queries are ran).
In Test 3 I believe that the inserts and updates statements were at the end because it appears that (from Test 2) performing queries on the database (not using primary keys) interrupts things and forces the 1st level cache to be cleared (somewhat anyways). The caching in the repository include in Test 3 made it so that the queries didn't have to be ran. Using id generators besides the Identity ones that we are using (such as guid or hi/lo) may have solved this problem as well.
Ultimately I believe it all comes down to the fact that the 1st level cache can keep everything in memory and then open up just one connection (or something) and fire off all of the inserts/updates all at once. This is the part that I can't fully explain.