Category Archives: Data Modeling

Applying a complex Excel model to multiple input values

You may remember the day you were introduced to Excel’s pivot tables. It might have happened through a co-worker, a book or an online tutorial but the effect was probably along the lines of: “Wow! How did I ever analyze data without them?”

Well, I sort of went through the same experience again several months ago reading about Dermot Balson’s Merge pattern (sample file here) – thanks to a post by Jim Johnson. I would love to elaborate some other time on Dermot’s idea of applying the concept of design pattern to spreadsheets but I will delve here specifically into his merge pattern, which makes use of Excel data tables.

According to Microsoft, data tables allow you to “test different input values for a formula without having to retype or copy the formula for each value” (Q282852). This is typically useful for sensitivity analysis, but it also works great for running complex calculations against multiple records in a data set.

I’ve argued before that most (structured) data benefits from being stored in a a flat file format. One of the downsides of doing so, however, is the amount of work involved in running a complex Excel model against all rows in a flat file. If you don’t intend to migrate your model to a relational or multidimensional database, then data tables as a wonderful way of achieving just that – free-form spreadsheet computations applied to structured data.

Additional links of interest

Happy Excel modeling!

Data-related tips & tricks from around the web

You’ve probably noticed that you shouldn’t trust this blog for real-time news tracking. The following are essentially timeless, however, so here we go…

Rob van Gelder (of DailyDoseOfExcel fame) shared a tip back in May on how to build a simple Gantt chart in Excel. I’m posting a link here because it’s the easiest I’ve seen so far.

Jeff Smith exposes his Golden Rule of data Manipulation over at sqlteam.com. While he elaborates on his statement from a programmer’s standpoint, it’s all applicable to knowledge workers and spreadsheets:

“It is always easier and more flexible to combine data elements rather than to break them apart”

From a data analysis standpoint, Jeff’s examples are essentially related to what I would call attributes (such as phone numbers). His rule still holds true with values, though. As you work towards summarizing a data set (say, daily financial transactions that you want to analyze by month), you’ll want to aggregate values as late as possible instead of running the risk of losing valuable information by aggregating too early. Spreadsheet programs hit a limit between 65k and 1M records, but there are tools to take it from there – which brings us to Paul Steynberg’s advice for considering OLAP tools as part of a financial system manager’s toolbox.

OLAP technologies are particularly well suited to handling large amounts of data. I personally share Paul’s opinion of Microsoft’s SQL Server Analysis Services, which I would describe to the non-initiated as Excel on steroids. On lots of steroids, that is. SSAS gives you access to summaries and advanced computations based on millions of underlying records, usually responding in just a few seconds.

EDIT (Sep. 10, 2008): Using a definition as crude as “Excel on steroids” for SSAS left me feeling a little guilty. I’m over it now, having just read Andrew Fryer’s post on business intelligence for small business;-)

7 habits of effective desktop data management

Following up on a somewhat theoretical introductory post related to desktop data management, Here are 7 habits which should prove useful when approaching any desktop data-related task. These will be refined based on the discussion you and I will have here, so let me know what you think.

I was actually shooting for 10 commandments, but 7 habits sounded good, too. Seven should be enough to start the discussion anyway, and leave room for expansion without making the number ridiculously large should we end up adding a few.

Now remember. This is supposed to apply to desktop work so let’s not wander off too far into foreign keys, surrogate keys or normal forms territory;-)

Here we go:

  1. Any data processing task takes you through the steps of organizing data, processing it and presenting it. Acknowledge these steps in your workflow and organize around them.
  2. Do the hard work first. Looking up and presenting information is both easier to perform and more valuable when it is base on a well-organized data layer.
  3. Organize your data as flat files instead of free-form tables. The flexibility of spreadsheets makes them both a blessing and a curse, often leading you to design the processing and presentation layers first and the data organization layer last. Organizing your data as flat files may require some more work upstream but will pay off handsomely downstream. See #2.
  4. Strive for one source of truth. An important piece of information should have to be changed in one place only and cascade from there.
  5. As much as reasonably possible, always work with a full dataset that includes domains contiguous to the one you’re currently working on. You’ll be much better off keeping too much information and identifying records appropriately than discarding what you still think you won’t need – but will.
  6. Don’t design for exceptions. You want a few standard tools that abide by the principles above while allowing you to handle exceptions, not multiple unmanageable tools that were each designed to handle a specific exception.
  7. Use identifiers. Referred to as Unique IDs or keys in database jargon, the important idea to remember is that you want a way to lookup a particular item of data at any point in time and distinguish it unequivocally from its neighbors.

Each of these items deserves several posts and I will be following up on each – hopefully with a mix of opinions, examples, tool-agnostic tips, and tool-specific tricks.

A new site dedicated to information workers

Arno Nel is up to something with his Information Worker website:

Our vision is to develop the site into a place where technical people will turn when they need help or want to see what else can be done, and where business people will interact with peers to understand the difference that technology can make in their own environment through the sharing of problems, solutions, approaches and methodologies.

If you took the time to check out this page, then IW is most probably worth a visit, too.

Desktop Data Management

The complexity of modern organizations requires their employees to manage an ever-increasing flow of information. Many trends combine to that effect, among which fast growth, under-staffing, inappropriate tools or poor procedures. The purpose of this blog is to create a discussion around all the ways that managing information can be improved from a desktop worker’s standpoint – if we only take the time to sit back and think rather than jumping right in armed with our usual tools and mindset.

I’ll use the term desktop data management to refer to these practices.

Few online resources seem devoted to data management from a desktop worker’s standpoint, with most searches on the subject quickly ending up in data modeling or “information technology” territory. Nonetheless, even when data management isn’t your primary job, it is usually in your own self-interest to approach it with care. You may, after all, be subject to external forces that have made you the (however unwilling) owner of some piece of data critical to your company. If you think you’re too busy to consider something as esoteric as data management, a lot of your unproductive time in the office (no offense) will probably involve looking up information, matching lists from different sources or doing some other kind of repetitive, error-prone work. Chances are that better data management could spare you time and frustration.

Robust data management relies on down-to-earth principles, not on a particular piece of technology. I am convinced that the broad application of a few simple ideas can have a significant impact on an organization’s productivity, facilitating the exchange of critical information among coworkers.

Desktop data management problems are real-world problems, not theoretical entertainment for “computer people”. They address data as a valuable resource. Relegating data management to a low priority has a hugely detrimental impact on quality and efficiency. It results in anything from lower service levels to wasted energy, human conflicts or a mix of all three – and more. On the other hand, treating data management with the care it deserves produces a snowball effect, revealing previously unthinkable opportunities at a fraction of the cost you would have imagined. This is as true for a large online retailer processing thousands of orders daily, as it is true for you as an individual, sending a mailing to a few dozen customers or producing monthly reports for your company’s headquarters.