Friday, May 13, 2011

Querying the Training and Testing Sets

This article (as you could rather clearly surmise from the title) is about querying the training and testing sets of a Predixion Insight model.  This is a very useful technique and is handy in a variety of situations, but that is not the point of this article.  The point of this article will be revealed in a future posting, so for now you can sit back and learn how to retrieve training and testing sets.

Whenever you upload a rectangular dataset using Predixion Insight you have the option to automatically separate your data into training and testing sets.  The page of the wizard where you specify this information looks like this:


The parameters allow you to control how much data is set aside for testing up to a maximum.  Later this held out data is used to automatically create accuracy charts, profit charts and other fun stuff.

Once your data is uploaded to Predixion Insight, you can create as many models as you wish on the dataset with ease using any of the modeling tools or even visual macros for automation.  If, for any reason, you want to fetch the data from the Predixion Insight server back into your Excel workbook or PowerPivot table, you can use the Query tool on the Insight Analytics ribbon.


In the Query Wizard you select a dataset to query rather than a model.  Querying a model generally means fetching predictions or model patterns.  Querying a dataset simply means fetching the data that is cached in that dataset.


On the next page of the wizard you select which columns you want to fetch from the dataset.  You can click the checkbox at the top to retrieve all of them.  More importantly you can select a filter to limit which rows are returned.  This is where the magic comes in – there is a “special” filter clause that you can select called “Is Test Cases”.  Setting this filter clause to “true” causes only the test cases to be returned, whereas setting the clause to “false” returns only training cases.


And that’s it.  Although you now may have a new trick in handling your Predixion Insight datasets, you will have to wait until a future installment to learn why this detail is so important.  Until then – happy mining!