SQL Server DB documentation from schema

I am in the (fun!) process of documenting an application I created nearly 5 years ago, in which the majority of the functionality is controlled by the database (it's a business workflow automation tool).

Reverse engineering substantially reduces the size of the task - Visio's Database > Reverse Engineer... command is a real time saver when you have foreign key constraints defined, makes generation of an ERD no more complicated than arranging all the boxes so that none (or few!) of the relationship lines cross!

A little utility contributed to the Code Project website makes the process of creating table and field documentation much easier: http://www.codeproject.com/KB/database/sqldoc.aspx

This utility can be pointed at a SQL server database, and will generate an HTML document listing all the tables and fields in the database, together with data types, nullable, default values and description fields from the db schema.

It uses an XSLT transformation to convert the xml generated to an HTML document. This can be edited so that it's closer to final requirements - I have updated the XSLT to place anchors on each table name, and links at the top of the file to each table.

This makes it a much less arduous task to document each table and field - I can edit the file in dreamweaver and just add descriptions for each field and table.

I Love (SQL) views

I've always been a fan of Views within SQL Server - I've found them especially useful when creating applications which need to be able to access data from external sources, without necessarily being aware of where that data came from - or even of its precise format.

One feature which I have found especially useful, is the ability to select and join to data from within a different database, from within a view.

This can be done as follows:

[More]

Love the View!

I've always been a fan of Views within SQL Server - I've found them especially useful when creating applications which need to be able to access data from external sources, without necessarily being aware of where that data came from - or even of its precise format.

One feature which I have found especially useful, is the ability to select and join to data from within a different database, from within a view.

This can be done as follows:

Create View MyView1
AS
Select L.ID as Local_ID, L.Name as Local_Name, L.Related_ID as Remote_ID,
R.Name as Remote_Name, R.Description as Remote_Description
From dbo.tbl_local L
Inner Join RemoteDB.dbo.tbl_remote R on L.Related_ID = R.ID
Please note: table / field names are for illustration only - and not indicative of Best Practice or requirement

So specifying an explicit database name in front of the schema name causes the data to be retrieved from the named DB, on the same server, rather than from the db in which the view is placed. I believe it is also possible to specify a database which is on a different server, using Linked Servers - but I've never had a requirement to check this out so far.

What's great about this, is that if I reference this view rather than the tables directly within my application / stored procedures, I can change where the data is coming from without impact on the application. This has been particularly useful in the past when a system is reliant upon external data, but needs to be developed in isolation. I can create a table in my local DB which mirrors the structure of the remote data source, and point my view at that - and then when the application is ready for integration, I can update the view to point to the remote DB.

[More]

Never do this... PLEASE

I had been asked to make a couple of small amends to the functionality of an old legacy CF application, that had been created by someone who has long since left the company...

One of the pages in this application was responsible for displaying a list of all clients on the system, together with a count of the number of associated tasks. This page displays over 1700 rows (I know - paging wouldn't be a bad idea...)

At the bottom of the page, I was astounded to find this gem:

<cfoutput query="get_clients" >
   <!----- get count of tasks ----->
   <cfquery name="no_of_tasks" datasource="#request.dsn#">
          select count(*) as nooftasks from tasks where taskclientID = #clientID# and taskdeleted = 0
   </cfquery>
   <!----- get count of users ----->
   <cfquery name="no_of_users" datasource="#request.dsn#">
          select count(*) as noofusers from users where userclientID = #clientID#
   </cfquery>
</cfoutput>

This is terrible in several ways -

  • The queries are run once for each record within the outer query (get_clients) - 1708 times for each of the count queries
  • The output from these queries is never used!!!

I removed the above block of code - nothing broke as it wasn't being referenced, and the load time for the page went down from ~15 seconds to ~8.

I looked further up the page, and found that the queries I had deleted, had been copied from a location further up the page - where at least the calculated value was being referenced... or at least one of the values was being referenced - the HTML for displaying the other value had been commented out, without removing / commenting the 2nd query!

I removed the second query, which reduced the load time down to ~5 seconds.

So already, just by removing unused queries, we had a 300% improvement in performance.

Real world proof that parametrised SQL really does provide a measurable improvement in performance:

I wrapped the #clientID# value in the subquery in CFQUERYPARAM, and the page load time dropped from ~5 seconds to ~3 seconds...

Of course even with the 500% improvement in performance - I still didn't see the point of leaving this sub query in - when at most it would take me 15 min to build the aggregate in to the outer query.

So I did that, and page load dropped to < 700ms - over 20 times faster than the original page.

I calculated how many queries were being processed in the original page - it came to a mind blowing 6,833!!! - hardly surprising the page took a while to load!

I hope it is completely obvious to everyone why its a terrible idea to Nest CFQUERY calls within loops - performance of the application decreases logarithmically as the size of the outer data set increases - and its completely unnecessary, as the DB engine will be able to calculate the aggregates in the initial select statement, with virtually no measurable impact on performance - surprisingly enough, databases are actually pretty good at manipulating and doing calculations on data!!!

BlogCFC was created by Raymond Camden. This blog is running version 5.5.1, hosted by TalkWebSolutions.