Satisfying Complicated Data Requests by Reducing the Request to Common Core Elements


Yesterday I was asked to provide the underlying data for a complicated, drill-down report for a client’s marketing department. This report needed all sorts of visual embellishments, however, in the end, the core to providing the data centered around a common need: to collect data from multiple sources, partition and pivot!

This example is very simplistic compared to the actual combination of dimension and fact tables required by my client and by most users of wide-scale enterprise data warehouses. However by breaking down the data processing operations to the barest few elements will assure that managing the surrounding, more descriptive attributes found in real world implementations will be better accomplished. In this example, I’ll deconstruct the process of separating  customer balances by year and by month for product account balances when the purchasing accounts associated with these customers potentially change over time.  I’ll use a single dimension table to represent customers and a single fact table to represent purchasing transactions.

Here is the T-SQL script to build a temporary Type 2 dimension table for customers and the known transactions for these customers.

drop table #type2;drop table #type2;
create table #type2(ckey int,skey int,effdate date,enddate date);

insert into #type2(ckey,skey,effdate,enddate)values(1,100,'1/1/2017','3/1/2017');
insert into #type2(ckey,skey,effdate,enddate)values(1,150,'3/1/2017','5/31/2017');
insert into #type2(ckey,skey,effdate,enddate)values(1,175,'5/31/2017',NULL);
insert into #type2(ckey,skey,effdate,enddate)values(2,200,'5/31/2017',NULL);
-- view the results
select ckey, skey, effdate,enddate 
 from #type2 
order by ckey,skey,effdate

When completed, you will see two customers. The first customer (ckey = 1) has three purchasing keys (skey= 100,150 and 175) associated over time. This is depicted by the effective begin and end date attributes associated with each row. An end date which is NULL means that the purchasing key associated with the customer is the most current.  In the case of the second customer (ckey = 2), only a single purchasing key has ever been, and continues to be, associated with this customer.



Next in the script,  I’ll build a temporary fact table…

drop table #fact;
create table #fact(ckey int, skey int, xactDate date, acct int, amt decimal(10,2));

-- step#2 populate the transaction fact table.
insert into #fact(ckey,skey,xactDate,acct,amt)values(1,100,'2/1/2017',1,10.00);
insert into #fact(ckey,skey,xactDate,acct,amt)values(1,100,'2/1/2017',10,20.00);
insert into #fact(ckey,skey,xactDate,acct,amt)values(1,100,'2/2/2017',1,10.00);
insert into #fact(ckey,skey,xactDate,acct,amt)values(1,100,'2/3/2017',10,15.00);
insert into #fact(ckey,skey,xactDate,acct,amt)values(1,100,'3/1/2017',1,70.00);
insert into #fact(ckey,skey,xactDate,acct,amt)values(1,100,'4/20/2017',8,5.00);

insert into #fact(ckey,skey,xactDate,acct,amt)values(1,150,'3/15/2017',1,20.00);
insert into #fact(ckey,skey,xactDate,acct,amt)values(1,150,'5/6/2017',10,40.00);

insert into #fact(ckey,skey,xactDate,acct,amt)values(1,175,'6/1/2017',1,50.00);
insert into #fact(ckey,skey,xactDate,acct,amt)values(1,175,'6/12/2017',1,10.00);
insert into #fact(ckey,skey,xactDate,acct,amt)values(1,175,'6/15/2017',10,40.00);

insert into #fact(ckey,skey,xactDate,acct,amt)values(2,200,'5/31/2017',1,10.00);
insert into #fact(ckey,skey,xactDate,acct,amt)values(2,200,'5/31/2017',10,10.00);
insert into #fact(ckey,skey,xactDate,acct,amt)values(2,200,'6/18/2017',4,15.00);
-- view the results
select ckey,skey,xactDate,acct 
 from #fact
order by ckey,skey,xactDate,acct

This table contains the bare bones of the key elements needed to later pivot the results commonly loaded into accounting reports for end users. For now, think of this table as a record of all customer purchases, the purchasing vehicle used, and the product accounts which represent the goods purchased on the date of the transaction.


Finally, I’ll issue the PIVOT query which will break down the customer’s product purchases over time, reflecting which purchases were made under which purchasing key.

SELECT PosYear, PosMonth, ckey, skey, isnull([1],0) as A1, isnull([4],0) as A4, isnull([8],0) as A8, isnull([10],0) as A10select PosYear, PosMonth, ckey, skey, isnull([1],0) as A1, isnull([4],0) as A4, isnull([8],0) as A8, isnull([10],0) as A10 
 FROM (SELECT datepart(year,f.xactDate) as PosYear, datepart(month,f.xactDate) as PosMonth, f.ckey, f.skey, f.acct, isnull(sum(f.amt),0) as TotAmt      
 FROM #fact f        
 GROUP BY f.ckey,f.skey,f.acct,datepart(year,xactDate),datepart(month,xactdate)  
 ) AS s  
 PIVOT  ( sum(TotAmt)      
 FOR acct IN   ([1],[4],[8],[10])  
 ) as pvt  
ORDER BY ckey, skey, PosYear, PosMonth


Hopefully this post will help address a common, real world request.




Codenvy and Enterprise Application Integration : DevOps Approach


Codenvy’s most recent offering uses the power of Docker to expand and customize the development, runtime and deployment environments. How AWESOME is that? Less time maintaining local environments so there’s more coding time—who isn’t onboard with that? This version of the cloud IDE is not just “better”, but it is truly a high-quality workbench for development across multiple types of projects, platforms and databases.  This contribution to the development community in and of itself is noteworthy. I LOVE Eclipse, but I hate the time it takes to download the myriad of plugins for whatever project I happen to be working on. I won’t even go into the compatibility and dependency issues! As a software integrator/data architect each new project I work on requires yet another set of plugins. I am equally impressed that Codenvy has donated this cloud extension platform to Eclipse ( because I know this will move all IDEs light-years forward (Microsoft, are you listening?)!

I mentioned Docker in the first sentence, then abandoned the topic. For those of you unfamiliar with Docker, it can be described as a light weight abstraction in the form of containers which can be shared across operating systems. Think of Docker as an open platform to build, ship and run highly distributed applications. For an oversimplified example, let’s say there is an environmental issue with your JBOSS web app deployed to a customer’s Debian server but not to a SuSe server at the same customer site. Docker offers the ability to abstract the build to the exact environment specifications. Codenvy offers the advantage of debugging the code at runtime between environments.

Today I will be working with Docker to distribute a common R environment to a Windows and a Debian platform to deploy with an R app. I’ll keep you all posted on how well it goes!

Importing an XML file into SQL Server with XML Bulk Load


Like with any set of tools and base technology, there are multiple ways to achieve an import task. For those familiar with integrated services, there’s the XML SSIS component , for those more comfortable with .NET/C# development, xml can quickly be serialized/deserialized from files to databases–but– for those interested in a straightforward bulk load of xml data into a defined table, or set of tables, this approach is for you.

Today I had a support call which ultimately boiled down to a  foreign key violation when attempting to write rows from an xml file that already had identifier assignments in the target table.  Since the file was over a gig, it was impossible for me to tell exactly what was going with the records themselves without dumping the file into a comparable table, sans the constraints.   I decided to use XML Bulk Load to do this task, which loaded the xml file in less that 5 seconds!  That was AFTER I spent about 15 minutes setting up the mapping file and writing the *.vbs script… while chatting with the end user and the first tier support  fellow.

To do this, you’ll need to make sure that you have SQLXML installed. Since I am working with the SQL Server 2012 engine, SqlXml 4.0 SP1 is required. Before downloading and installing, either check the Add/Remove programs section, or look in the registry–the CurVer key in the SqlXml folder will reveal the version number.

Next, create a look alike of the target table WITHOUT the constraints if you’re analyzing a problem similar to mine. With the constraints if you’re expecting valid data.

Here’s the what the target table looks like in this case:

CREATE TABLE [protected].[xml_Master30_Assignment](
          [MasterAssignmentId] [int] NOT NULL,
          [mrID] [int] NOT NULL,
          [AssignmentTypeId] [int] NOT NULL,
          [IsTeam] [char](1) NOT NULL,
         [Assignee] [varchar](500) NULL,
         [TeamAssignee] [varchar](256) NULL,
         [AssignmentBeginDate] [datetime] NOT NULL,
        [AssignmentEndDate] [datetime] NULL)

Here is a sampling of the data found in the XML file:

<database xmlns:dbi=’’&gt;
<AssignmentBeginDate>2013-08-25 22:25:08.000</AssignmentBeginDate>
<AssignmentBeginDate>2013-08-25 22:21:45.000</AssignmentBeginDate>


Now comes the real work. You will have to map the data in the xml file with an annotated schema

<?xml version=”1.0″ ?>
<Schema xmlns=”urn:schemas-microsoft-com:xml-data”
xmlns:sql=”urn:schemas-microsoft-com:xml-sql” >

<ElementType name=”MasterAssignmentId” dt:type=”int” />
<ElementType name=”mrID” dt:type=”int” />
<ElementType name=”AssignmentTypeId” dt:type=”int” />
<ElementType name=”IsTeam” dt:type=”string” />
<ElementType name=”Assignee” dt:type=”string” />
<ElementType name=”TeamAssignee” dt:type=”string” />
<ElementType name=”AssignmentBeginDate” dt:type=”datetime” />
<ElementType name=”AssignmentEndDate” dt:type=”datetime” />

<ElementType name=”database” sql:is-constant=”1″>
<element type=”Select” />

<ElementType name=”Select” sql:is-constant=”1″>
<element type=”row” />

<ElementType name = “row” sql:relation=”protected.xml_master30_assignment“>
<element type=”MasterAssignmentId” sql:field=”MasterAssignmentId” />
<element type=”mrID” sql:field=”mrID” />
<element type=”AssignmentTypeId” sql:field=”AssignmentTypeId” />
<element type=”IsTeam” sql:field=”IsTeam” />
<element type=”TeamAssignee” sql:field=”TeamAssignee” />
<element type=”Assignee” sql:field=”Assignee” />
<element type=”AssignmentBeginDate” sql:field=”AssignmentBeginDate” />
<element type=”AssignmentEndDate” sql:field=”AssignmentEndDate” />


Write the vbscript to execute the xml load.

Set objBL = CreateObject(“SQLXMLBulkLoad.SQLXMLBulkLoad”)
objBL.ConnectionString = “Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=YourDatabase;Data Source=YourServer;”
objBL.ErrorLogFile = “c:\error.log”
objBL.Execute “c:\xml_master30_assignmentMapping.xml”, “c:\MASTER30_ASSIGNMENT.xml”
Set objBL = Nothing

That’s it! The data from the XML file should be there.

  FROM [Scratch].[protected].[xml_Master30_Assignment]
order by MasterAssignmentId asc

T-SQL Stored Procedure to Grant Permissions on Stored Proc Objects


In today’s universe of multi-tiered distributed processing, VMs, etc. you may need a quick and easy way to generate and migrate T-SQL scripts capable of setting up the necessary permissions so that an enterprise service account will be able to execute them.

In this example, I am looking for all NON-Microsoft-Shipped (IsMSShipped) stored procedure (sproc) objects within a database instance. Once I find them, I dub the result set with the text necessary to execute the output (e.g. GRANT EXECUTE ON) as a separate T-SQL script.

I generally run this sproc interactively in SQL Manager, then right-click on the output box to cut and paste the results into another query window for the db instance I want to execute the newly generated t-sql code in.

This sproc has a PRINT option–and an EXECUTE option. The PRINT option will generate the t-sql statements to the output window WITHOUT executing the statement. The EXECUTE option will print the statement first, then execute it. Execute the sproc with the PRINT option first to be sure that the subset of sprocs selected is the subset you are interested in assigning grants.

exec GrantSprocsExecForWinSvcAcct ‘PRINT’

————— Beginning of Stored Proc ——————-

USE [Metro]

— =============================================
— Author:
— Create date: 6/11/2009
— Description: looks for stored proc objects, outputs grant
— statements for future t-sql execution.
— =============================================
ALTER PROCEDURE [evaluate].[GrantSprocsExecForWinSvcAcct]
@UserName varchar(100),
@Which varchar(5)

— SET NOCOUNT ON added to prevent extra result sets from
— interfering with SELECT statements.

INTO #Procedurelist

@Loopid INT,
@MaxId INT
— @UserName VARCHAR(50)

— This is the user that will get the execute permissions
— // used as a test
— SELECT @UserName = ‘Metro\svc$WinAcct-dev’

— Grab start and end values for the loop
SELECT @Loopid = 1,
@MaxId = MAX(ID)
FROM #Procedurelist

@ProcName VARCHAR(400)

PRINT ‘Beginning process…’

— This is where the loop starts
WHILE @Loopid <= @MaxId BEGIN

— get the procedure name
FROM #Procedurelist
WHERE ID = @Loopid

IF(upper(@Which) = ‘EXEC’)
— construct the statement
SELECT @SQL = ‘GRANT EXECUTE ON ‘ + @ProcName + ‘ TO ‘ + @UserName


IF(upper(@Which) = ‘PRINT’)
— construct the statement
SELECT @SQL = ‘GRANT EXECUTE ON ‘ + @ProcName + ‘ TO ‘ + @UserName

— increment counter
SET @Loopid = @Loopid + 1

— clean up
DROP TABLE #Procedurelist

PRINT ‘End of processing. If you chose EXEC, please spot check stored procedures to assure execute permission has been granted to user.’


Passing a Data Array to a Microsoft SQL Server Stored Procedure


There is a probability of encountering the need to pass an array of data from a front end .NET/FLEX application or WCF/SOA Web Service at some point during the development process for an application.  For example, perhaps you are developing an online order entry system which will require that multiple items from your application get written to the order header and order details tables.

There are a few ways to do this through T-SQL/stored procedures, but the two primary approaches are to:  (l)  create user functions that parse a delimiter to separate the elements in a string;  (2) use the XML features available in Microsoft SQL Server.

If you are using SQL Server 6.5/7.0,  the XML features won’t be available. so you’ll need to go the route of user defined functions.  This post does not address these methods, but here is my favorite link on the topic.

If you are using SQL Server 2000,  the tools available for use are the legacy OPENXML rowset provider and the system stored procedures: sp_xml_preparedocumentsp_xml_removedocument .  In SQL Server 2005 and 2008, these tools still exist, however the preferred XML data type methods for these versions of SQL Sever are: nodes(), value(), and query().  Here are two excellent posts for anyone working with OPENXML (or data arrays) in SQL Server 2000:

In this practical Order Header/Order Detail example, I’ll be using the xml data type methods: value() and nodes().  To view the relational design pattern used for this example, Fernando Loranzo’s Introduction to Relational Database Design article illustrates the OrderHeader and OrderDetails tables with  Order and Order_Items tables.

T-SQL/Stored Procedure

This procedure accepts two basic parameters, the account id (@AccountID) of the customer and the product items/product quantities bound passed as an xml document  (@OrderList). For the OrderHeader table, the identity feature of SQL Server is invoked on an insert creating a unique OrderID. The OrderID is captured in the declared variable (@OrderID) for use when inserting the product/quantity data into the OrderDetails table. To do the insert, the sub-select uses the OrderList xml document (as ‘D’), drilling through the nodes (/Order/Item/Prod) to parse through the document, obtaining the values (e.g. D.element.value).

create proc [dbo].[InsertOrder]
@OrderList xml,
@AccountID int


set nocount on

declare @OrderID int
declare @Today datetime

set @Today = getDate()

insert into [dbo].[OrderHeader]


insert into [dbo].[OrderDetail]
D.element.value(‘@ID’, ‘nvarchar(255)’) as ID,
D.element.value(‘@Qty’,’nvarchar(255)’) as Qty
from @OrderList.nodes(‘/Order/Item/Prod) as D(element)

XML for the OrderList

In order for the stored procedure InsertOrder to work, the xml which is passed (@OrderList)  is in the following format:

<Prod ID=”60000″ Qty=”200″></Prod>
<Prod ID=”60180″ Qty=”1000″></Prod>

To test this stored procedure:

Open a query window and type the execute statement, wrapping the OrderList xml in single quotes and the customer number/account id in single quotes (optional as AccountID is defined as an integer).

exec InsertOrder
<Prod ID=”60000″ Qty=”200″></Prod>
<Prod ID=”60180″ Qty=”1000″></Prod>

Copyright © Paula DiTallo 2009 All Rights Reserved

FAS 142: Goodwill Write-Downs and Trash Amnesty Day (Part 2)


Like any business intelligence/decision support (BIDS) project, The first thing to do is gather some comparative numbers from peer data. In this case, the peer data will be a set of pre-calculated ratios I extracted from Google Finance.

I’ll be looking at Kforce (symbol: KFRC) in Tampa, FL and Robert Half (symbol: RHI) in Menlo Park, CA , Sun Health Care (symbol: SUNH) in Irvine, CA, CDI Corporation (symbol:CDI) in Philadelphia, PA , Spherion (symbol: SFN) in Fort Lauderdale, FL, and a now defunct organization called World Health Alternatives, Inc. (symbol: WHAIQ).

To help analyze this, I created a throw away Google Spreadsheet. This spreadsheet is really a scratch workbook with three worksheets.

  • The first worksheet isolates the net profit margin, operating margin, earnings before interest, tax and depreciation (ebtid), return on average assets, and return on average equity for Kforce, Robert Half, CDI, Spherion and Sun Health Care for 2008. World Health Alternatives, Inc. is absent from this worksheet because by 2006 this entity was in bankruptcy.


  • The second worksheet isolates the pattern prior to World Health Alternatives failure and the similar pattern discovered when reviewing Kforce’s and Spherion’s 4th Quarter 2008 financial information.


  • The third worksheet takes a look at the quantity, type and trend of the purchased assets for five of the staffing companies.

In worksheet 1, I’ll let the numbers speak for themselves. For example the net profit margin (the net profit divided by net revenues from the income statement) is an indication of how effective a company is at cost control. The higher the net profit margin, the more effective the company is at converting revenue into actual profit. This ratio is helpful when comparing companies in the same industry subject to similar business conditions. For this indicator, Sun Health Care is the winner @ 6.25%, while Kforce lags behind @ -8.94%. When looking at the operating margin (operating income divided by net sales) to determine the proportion of the revenue available after paying for production costs like wages, etc. Robert Half is in the best position @ 9.11%. Kforce’s troubling -8.53% operating margin tells me that fixed costs such as mortgages, leases and debt servicing may become hard-to-meet obligations in the future if this ratio is not on the road to improvement.

During a “normal” year, when stock prices are relatively stable—I would spend more discussion time on the return on average equity, since this is the sum of the equity value at the beginning and at the end of the year—and as such— measures the fluctuation in equity for shareholders. Overall, this ratio is generally a good indicator of corporate profitability for investors—especially those interested in evaluating a stock’s growth potential. While its clear from worksheet 1 that Kforce is struggling, Spherion and CDI also show signs of similar difficulties.

On worksheet 2, I’ve introduced World Health Alternatives primarily because it is one of the few publicly held staffing agencies with available financial statement information before it went into bankruptcy in 2006. In this case, the last SEC filing for the company was in 2004. Take a look at the ratios of Spherion and Kforce. Two years before bankruptcy, World Health Alternatives exhibited better margins across the board! Even adjusting for today’s current economic climate, any investor holding stock today in these two companies would be wise to re-evaluate his or her position.

I created worksheet 3 while I was in the process of reviewing the 10-K filings for Spherion, Kforce and World Health Alternatives to ascertain whether my suspicion that these non cash, goodwill write-downs may actually reflect moderate to heavy indebtedness—thus reflecting the outpaced sales revenue, poor returns on assets, and diluted earnings. B-i-n-g-o!

In all three cases, some form(s) (or multiple forms) of credit were used in the purchasing of the assets. In the case of World Health Alternatives however, it is important to note that there existed no goodwill impairment at all prior to its bankruptcy, but instead over half the total assets of the company were attributed to goodwill.

Hopefully, the executives managing the trash at Spherion and Kforce will identify their truly valuable assets, control their costs and move on with renewed vigor.

Copyright © Paula DiTallo 2009 All Rights Reserved

FAS 142: Goodwill Write-Downs and Trash Amnesty Day (Part 1)


Eric Fox and S.L Mintz both agree that for corporations considering write-offs in the midst of this year’s economic chaos, targeting goodwill on the balance sheet is a convenient option for addressing the steadily declining stock prices that have hurt their company’s investment strategies.

One of the first elements that is truly noteworthy about goodwill, is it often represents large-scale purchases made in the past by a given enterprise of an existing company or several existing companies. In essence, goodwill is the difference between what the enterprise pays for the existing asset(s) and what the book value actually is. At the time the purchase is made, the book value is customarily recorded at cost—however—the purchase price is typically more (way more) than the recorded book value. The difference between the book value and the purchase price is placed on the asset side of the ledger.

If this year’s news from Wall Street has told the average investor anything repeatedly in a variety of ways, it has emphasized that boardroom executives are less than scrupulous. Fortunately, FASB/GAAS recognized this long before the rest of us got such incessant broadcasts! Provision FAS 142 requires that this esoteric goodwill asset account is tested annually for impairment. If the asset(s) are deemed impaired, a non-cash write-down of the asset(s) is required. Impairment, no matter how well this word is dressed up and driven to town, is not a good word to read anywhere on an annual report or financial statement. In its finery, impairment means a reduction in an enterprise’s stated capital, but beneath that velvet robe and golden threads, it really means that the enterprise’s top-level management has poorly estimated the expected gains and losses realized by the purchases of the asset(s). Furthermore, although the write-down can be considered a non-cash one, the reality may be that the initial purchase of the asset(s) were (and may still be) financed—in which case, the goodwill write-down does indeed have a real money value tied to it elsewhere on the balance sheet.

Expect to see hefty use of goodwill write-offs through 2010—many corporations will simply be taking advantage of the investment community’s overall expectations of poor earnings to haul off impaired assets of all sorts by the millions (in some cases billions) to the curb during this trash amnesty period without any formidable challenges from shareholders. Unfortunately, many others will be recording the impairment because the cash has already left the table without the likelihood of returning. If you’re an investor, its important to read from the balance sheets which impairment write-downs represent reality adjustments and which impairment write-downs represent a possible non-recoverable decline.

So, why is That Data Lady so interested in all of this? After all, this discussion is quite a stretch from the usual menu of practical topics like SSIS packages, PL/SQL stored procedures or even the more ethereal topic of managing petabytes of data in the cloud.

I can only say that aside from investing a little money in the markets, the opportunity for examining the strength of predictive analytics as it relates to the type of every day data many of us will encounter when we open our portfolios is just too glaring to ignore!:-) After all, what is the point of large scale professional data management and the accompanying transformations applied against that data if similar, smaller scale tasks don’t offer the information we can use to enrich the decision support processes in our lives?

In Part 3 of this Trash Amnesty Day series, I promise to get back to my data roots and discuss the details of EDGAR online, xbrl and the power of EXCEL. In that article, I’ll walk through an xbrl taxonomy, develop an xsd schema and load a sample xbrl/xml financial statement into an EXCEL spreadsheet—for now, I’ll rave on about these massive corporate write-downs!

Since S.L. Mintz addressed the 1.4 billion dollar write-down at Mohawk Industries (Carpet Manufacturer) and Eric Fox examined the hundreds of millions of write-downs in the retail goods sector, I am going to explore the goodwill write-downs in the temporary/professional staffing services industry for the sake of variety in Part 2.


When Goodwill Becomes Badwill (DKS, FLEX, ANN, JNY)

by Eric Fox, March 11, 2009

Goodwill Hunting

by S.L. Mintz, January 1, 2009

Copyright © Paula DiTallo 2009 All Rights Reserved