tag:blogger.com,1999:blog-24931379195004075372024-03-18T02:46:48.071-07:00Jamie's JunkContinuing whatever comes to mind on data mining and predictive analyticsJamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.comBlogger30125tag:blogger.com,1999:blog-2493137919500407537.post-13408842353942778222012-08-02T14:11:00.001-07:002012-08-02T14:11:59.330-07:00Yes Virginia, you can do Predictive Analytics!<p>In 1897, an eight-year old Virginia Hanlon posed the question to the New York Sun – is there a Santa Claus? While the editors of the Sun could have easily disregarded and dismissed the child’s simple question, instead they took the opportunity to address the simple question metaphorically and inspirationally in a way that has impacted the American view of Christmas for over a century. <p>In the 13 years that I have been involved with Predictive Analytics, Machine Learning and Data Mining, I have been told countless times that there is no Santa Claus in this field... < continued on the <a href="http://blog.predixionsoftware.com/2012/07/yes-virginia-you-can-do-predictive-analytics/">Predixion Software blog</a> ></p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com0tag:blogger.com,1999:blog-2493137919500407537.post-73909992168098487642012-04-10T16:24:00.001-07:002012-04-10T16:24:33.630-07:00Sharing and Collaborating with Predixion Insight<p> <h3></h3> <h3><em><font size="2"><font style="font-weight: normal">This post is a draft topic I wrote for the Getting Started section of the Predixion Insight documentation. These topics are higher level and “bloggy” so I’m posting here as well!</font></font></em></h3> <p>The most significant insights aren’t discovered in a vacuum. Predixion Insight provides the tools and methodologies allowing you to share predictive results with your colleagues from various aspects of your business in order to extract the most value from your data. This article will describe the variety of ways that you can share using Predixion Insight for collaboration and productionalization. <p> <h4><a name="MiniTOCBookMark2"></a>Shared Collections</h4> <p>The area where you perform the majority of work in Predixion Insight is in a private, automatically provisioned collection of objects called simply “My Workspace”. A “shared collection” is similar to your workspace except that the operations you can perform are limited, and many users can access the collection at the same time. A shared collection has an owner, who can assign permissions to other users, as well as authors who can modify the collection and readers who can only view or use objects in the collection. <p>Shared collections can either be standard collections created by a Predixion Insight administrator, or, more commonly, can be created by users to share their work on an ad-hoc basis. <p>To create a shared collection from Predixion Insight for Excel, open the “My Stuff” dialog from the Insight Analytics ribbon, and click the “New Collection” button. <p><img border="0" alt="AddCollectionDialog.png" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/AddCollectionDialog.png" width="516" height="460"> <p>The Add Collection dialog allows you to name and describe your collection as well as adding permissions. For broad collaboration and sharing it is more convenient to add Active Directory groups than it is to add individual users to ensure that more colleagues will be able to interact with your results. <p>Once created, permissions and collection details can be edited in the Manage My Stuff dialog or the Predixion Insight User Portal by selecting the collection name and choosing an option from the action list. <p><img border="0" alt="CollectionActionList.png" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/CollectionActionList.png" width="226" height="92"> <p>When you create a new collection make sure that you notify your colleagues that it is available! <h4><a name="MiniTOCBookMark3"></a>Publishing Datasets and Models</h4> <p><img border="0" alt="PublishButton.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/PublishButton.PNG" width="45" height="51"> <p>With shared collections you can collaborate by publishing your datasets and models via the Publish tool in Predixion Insight for Excel or via the Predixion Insight User Portal. Using this dialog you simply move datasets and models from your workspace or other collections you can view to the destination collection. You can even move items from a shared collection back to your personal workspace if you have permission. This dialog also provides the option to create a new collection if you haven’t done so already. <p><img border="0" alt="PublishDialog.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/PublishDialog.PNG" width="628" height="484"> <p>After you select which items are to be published you will have the ability to determine how other people can use the items. For example, you can restrict other users from being able to move your items out of the shared collection via sharing or exporting. <h4><a name="MiniTOCBookMark4"></a>Collaboration</h4> <p>Providing access alone is not sufficient for providing a collaborative environment for analytics. The Predixion Insight Viewers are an interactive environment for exploring and understanding the patterns discovered in your data. Whenever you discover interesting information within the viewer, the Comments and History control allows you to record your findings by posting a comment. This provides a collaborative, social environment for you and your colleagues to discuss data findings in real time. <p>Since every comment is relevant only in the context in which it was made, you can click on the view button next to any comment to see exactly what the commenter was looking at as they entered the comment. <p><img border="0" alt="CommentsPaneView.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/CommentsPaneView.PNG" width="313" height="342"> <p>Comments and collaboration are available in all Predixion Insight viewers, and also wherever you select a model or dataset. Click on the Comments and History tab next to the selector to see all current comments or add your own. <p><img border="0" alt="CommentsPaneSelector.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/CommentsPaneSelector.PNG" width="596" height="314"> <p>New comments will automatically appear in the Predixion Insight for Excel and the Predixion User Portal interfaces. If configured by your Predixion Insight Administrator, comments on datasets and models can be broadcast to a Twitter account of your choosing. This allows you to learn about new comments and collaboration through your Twitter client of choice. If this feature is enabled, it is recommended that the Twitter account be set to protect your tweets so that only approved users will see any comments that are made. <h4><a name="MiniTOCBookMark5"></a>Sharing and Embedding Insights</h4> <p>Once you have published models and datasets to a collection, Predixion Insight provides a variety of methods to share your insights with your colleagues. The easiest way is to simply send a permalink to a model view via email. You can retrieve a permalink to any Predixion Insight view by clicking on the Information button on the viewer ribbon and then clicking on “Create Permalink”. <p><img border="0" alt="CreatePermalink.png" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/CreatePermalink.png" width="217" height="384"> <p>This will create a permanent link to precisely what you are viewing that you can share with others. Simply copy the direct link from the Links dialog and paste into an email. Any recipient that has permission to view models in the shared collection will be able to see exactly what you were looking at when you created the link. Note that if you create a permalink to a model in your workspace, only you will be able to view it, so make sure that you send colleagues permalinks to models that have been published to shared collections. <p><img border="0" alt="Permalink.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/Permalink.PNG" width="540" height="238"> <p>The Links dialog also provides the ability to save a thumbnail of the current view if you want to provide a preview in your messaging or to show an image to link from a web page. Additionally, you can embed the viewer itself into a web page by using the provided HTML snippet. In all cases, only users with appropriate permissions will be able to view models, so you don’t have to worry about results being forwarded to unauthorized users. <p>You can also retrieve a permalink from any comment that has been added to the model. These permalinks are automatically created at the time the comment is posted. Click on the information button associated with the specific comment to display the Links dialog. <h5><img border="0" alt="CommentPermalink.png" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/CommentPermalink.png" width="296" height="64"></h5> <h4><a name="MiniTOCBookMark6"></a>Embedding Visualizations in SharePoint</h4> <p>A convenient way to share your insights and let your colleagues interact with models you have created and published is by directly embedded the Predixion Insight visualizations in a SharePoint web page. In order to embed a Predixion Insight visualization into a SharePoint page, first edit the page, and then insert a new web part. <p><img border="0" alt="SPInsertWebPart.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/SPInsertWebPart.PNG" width="228" height="141"> <p>The web part you need to add is the Page Viewer web part under the “Media and Content” category. <p><img border="0" alt="SPPageViewerWebPart.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/SPPageViewerWebPart.PNG" width="764" height="245"> <p>The empty web part, once added, will contain a hyperlink to open the tool pane. Click this link and then paste in the permalink to your visualization. Note that you do not need to paste the HTML snippet from the Predixion Insight Links dialog as SharePoint will automatically create an IFrame to surround the Predixion Insight visualization. It is recommended to set a specific height for your visualization or SharePoint may make the viewer too small to be useful. <p><img border="0" alt="SPConfigurePageViewer.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/SPConfigurePageViewer.PNG" width="270" height="517"> <p>Finally, once you have configured and saved your SharePoint page, all authorized users will be able to collaborate and share using Predixion Insight directly from their SharePoint portal. <h4><img border="0" alt="SPEmbeddingComplete.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/SPEmbeddingComplete.PNG" width="557" height="427"></h4> <h4></h4> <h4><a name="MiniTOCBookMark7"></a>Sharing Results via Excel</h4> <p>Often it is convenient to embed your insights into Excel workbooks along with data and other work that you have created. This allows you to create a story or provide other evidence that supports your insights. When viewing a Predixion Insight model from Excel, you can copy your current view to the currently active Excel workbook by clicking the Copy to ExceI button. <p><img border="0" alt="CopyToExcel.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/CopyToExcel.PNG" width="62" height="49"> <p>This copies a version of the current visualization into Excel, along with a permalink to the view. You can then share your findings by sharing the Excel workbook by publishing it to SharePoint via Excel Services, or by emailing or otherwise providing the Excel file to colleagues. <h4><img border="0" alt="ExcelCopy.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/ExcelCopy.PNG" width="518" height="498"></h4> <h4></h4> <h4><a name="MiniTOCBookMark8"></a>Sharing Named Results</h4> <p>In addition to sharing and collaborating on Predixion Insight models using the Predixion Insight visualizations, you can also directly share any specific predictive or descriptive results you generate with Predixion Insight. These results can come from predictive queries, accuracy tests, or any of the Insight Now insights. <p>Predixion Insight results appear as chiclets in the Predixion Pane in Excel or in the Predixion Insight User Portal. You can provide a custom name for the result by clicking on the pre-assigned name and typing over it. In order to share the results, click on the collaboration button. <p><img border="0" alt="NR-chiclet.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/NR-chiclet.PNG" width="290" height="110"> <p>In the Share Results dialog that appears, select the collection or individuals you wish to share the result with and click OK. <p><img border="0" alt="NR-Dialog.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/NR-Dialog.PNG" width="514" height="370"> <p>After the results are published to a collection, they will be visible to anyone who has collection permissions. The chiclet for the result will indicate that the result is shared and not in an individuals Predixion Insight workspace. <p><img border="0" alt="NR-published_chiclet.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/NR-published_chiclet.PNG" width="289" height="110"> <p>Predixion Insight results can be accessed via the Predixion Insight for Excel client, the Predixion Job Source for SQL Server Integration Services, the Predixion Insight ODBC provider, or the Predixion Insight API. This allows for a wide variety of scenarios where you can embed predictive results directly into your line of business applications. <h4><a name="MiniTOCBookMark9"></a>Using Visual Macros</h4> <p>Sharing named results is typically used to put your predictive results into a production scenario. Whether the results will be integrated into a business application, a data warehouse, or viewed using a reporting tool, any results you want to share will have to be generated on a regular basis. Visual Macros is a technology in Predixion Insight that allows you to describe a predictive action or workflow in an Excel worksheet and then combine and execute that action any number of times. Using this method, combined with Predixion Insight’s sharing ability, API’s and integration with SQL Server Integration Services allows for a powerful predictive application platform. <p>Most Predixion Insight server actions are scriptable via Visual Macros. In order to create a Visual Macro, click on the options arrow on the Finish or OK button on a Predixion Insight for Excel dialog or wizard. This will provide options where you can create a Visual Macro as well as execute the action of the dialog or wizard. <p><img border="0" alt="NR-VMui.png" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/NR-VMui.png" width="292" height="98"> <p>A Visual Macro itself is a worksheet description of the task you described using a dialog or wizard that you can edit in plain language without having to understand the details of any particular scripting syntax. Multiple Visual Macros can be pasted into a single Excel worksheet to create a multiple step operation. The following Visual Macro publishes a result to a shared collection and provides a new name. <p><img border="0" alt="NR-VM.PNG" src="http://pxtu/PredixionWeb/Help/SharingGettingStartedGraphics/NR-VM.PNG" width="401" height="255"> <p>Visual Macros can be executed by using the My Macros button in the Predixion Insight for Excel client, the Predixion Insight API – including the Predixion Insight for Excel VBA API – and the Predixion Insight Execute Visual Macro Integration Services task. <p><img border="0" src="http://tracking.hubspot.com/track?&g=ebab3703fe314f3ea16a6e8dea383493&n=0&u=http%3A%2F%2Fpxtu%2FPredixionWeb%2FHelp%2FSharing_and_Collaboration_with_Predixion_Insight.htm&r=http%3A%2F%2Fpxtu%2FPredixionWeb%2FHelp%2FGetting_Started%2FGetting_Started_With_Predixion_Insight.htm&t=Sharing%20and%20Collaboration%20with%20Predixion%20Insight&p=109386&lvt=2012-04-10 19:19:25&cvt=2012-04-10 19:19:27" width="0" height="0"> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com0tag:blogger.com,1999:blog-2493137919500407537.post-22226269924125624972011-11-12T00:01:00.001-08:002011-11-12T00:07:01.465-08:00Our own 11-11-11–the launch of Predixion Insight 2.0!<a href="http://lh3.ggpht.com/---pDbdlErkM/Tr4nuxNZEGI/AAAAAAAAAOQ/gaYlAFA8aZw/s1600-h/skyrim%25255B4%25255D.png"><img align="left" alt="skyrim" border="0" height="135" src="http://lh6.ggpht.com/-jRbTIwkbWmI/Tr4nvKHPeWI/AAAAAAAAAOY/AT-8fnTl6ng/skyrim_thumb%25255B2%25255D.png?imgmax=800" style="background-image: none; border: 0px currentColor; display: inline; float: left; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="skyrim" width="93" /></a>Although the weekend is likely to be more filled with <a href="http://www.amazon.com/gp/product/B004HYK956/ref=as_li_ss_tl?ie=UTF8&tag=sqlserverda09-20&linkCode=as2&camp=217145&creative=399373&creativeASIN=B004HYK956" target="_blank">Skyrim</a> than Predixion Insight, today (being minutes before I posted this) we launched our groundbreaking Predixion Insight 2.0 cloud predictive analytics service. Accompanied by the musical stylings of our own <a href="http://www.reverbnation.com/matthewmeadows" target="_blank">Matthew Meadows</a> the dev team works out the last details in order to make sure we have a smooth deployment of the cloud platform. As the final build draws to a close, the team chats about the distribution of <a href="http://www.farleysandsathers.com/About/WhoWeAre.asp?BrandID=6" target="_blank">Now & Laters</a> , laughs about political gaffes, and the third thing, oops, I can’t remember the third thing.<br />
The last few weeks as we’ve been able to step back and actually use the product we’ve built, it’s truly a delight. Bogdan and I have both just stood in the hallway and related how <em>fun </em>it is. Who would think that a predictive analytics business application could be “fun”? The collaborative visualizations we’ve added to the product transform how we think about problems and how we talk about them. The features even have changed how we conduct bug triages as we can link to the product directly from Visual Studio bug reports and have the issues presented right there. No need to capture a bitmap or rely on the descriptive range of test prose – we can just link to the freakin’ issue! Boom – there it is!<br />
And it’s pretty too – overlaying tiles and animated graphics really transform the pattern discovery experience. Being able to share your explorations with colleagues through commenting and actually see exactly what they were seeing when they made the comment is remarkable once you start using it. I can’t imagine going back to a single analyst view of the world after using Insight 2.0 – it would just seem….stifling.<br />
<a href="http://lh4.ggpht.com/-0y7ts582zH4/Tr4nvNRR1jI/AAAAAAAAAOg/-WMsUkY67G0/s1600-h/DepNetBrowser%25255B6%25255D.png"><img alt="DepNetBrowser" border="0" height="392" src="http://lh5.ggpht.com/-lJOF9pens0Q/Tr4nvSWpA_I/AAAAAAAAAOo/B6VROMl9dNo/DepNetBrowser_thumb%25255B4%25255D.png?imgmax=800" style="background-image: none; border: 0px currentColor; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="DepNetBrowser" width="544" /></a><br />
This release means a lot to me personally – my goal has always been to change the world through software, and with Predixion I believe I’m in the right place with the right team to make it happen. We have an incredible development org right now that deserves all the credit in the world for building such an amazing product that will only get better. I really feel honored to be working with this great team that made this possible:<br />
<ul><li>Bogdan Crivat</li>
<li>Shuvro Mitra</li>
<li>Abdul Shameer</li>
<li>Duong Nguyen</li>
<li>Jeff Willis</li>
<li>Tatyana Yakushev</li>
<li>Matt David</li>
<li>Yimin Wu</li>
<li>Matthew Meadows</li>
<li>Raghu Ramachandran</li>
</ul>Also – it goes without mentioning that the development team couldn’t make this happen in a vacuum and it’s the SoCal and roaming Predixion team members and of course our investors that create the environment that allows us to deliver our vision and technology to the world. So without listing our company directory, I’ll just say that I appreciate all the pulls and tugs and support from all the other Predixionites that make this possible.<br />
Anyway - 2.0 is really just the beginning from which we will continue to make dramatic changes in not just “predictive analytics” but how people value and appreciate their data and how data can make a difference in people’s lives. And just to wrap it up, and since I made it today – here’s a tutorial video introducing the frame controls for Predixion Insight 2.0.<br />
<div class="wlWriterEditableSmartContent" id="scid:5737277B-5D6D-4f48-ABFC-DD9C333F4C5D:cb82a122-2146-482e-ae35-c320fa9e0680" style="display: inline; float: none; margin: 0px; padding: 0px;"><div id="077157eb-d9db-446e-9268-f39b648194e5" style="display: inline; margin: 0px; padding: 0px;"><div><a href="http://www.youtube.com/watch?v=VeSnl2lzf9U" target="_new"><img alt="" galleryimg="no" onload="var downlevelDiv = document.getElementById('077157eb-d9db-446e-9268-f39b648194e5'); downlevelDiv.innerHTML = "<div><object width=\"448\" height=\"277\"><param name=\"movie\" value=\"http://www.youtube.com/v/VeSnl2lzf9U?hl=en&hd=1\"><\/param><embed src=\"http://www.youtube.com/v/VeSnl2lzf9U?hl=en&hd=1\" type=\"application/x-shockwave-flash\" width=\"448\" height=\"277\"><\/embed><\/object><\/div>";" src="http://lh6.ggpht.com/-cKrxdC7cMbQ/Tr4nvsh9M3I/AAAAAAAAAOw/gMj2Hsji03g/videodf862d8f8fa7%25255B4%25255D.jpg?imgmax=800" /></a></div></div><div style="clear: both; font-size: 0.8em; width: 448px;">Predixion Insight 2.0 Viewer Controls Tutorial</div></div>Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com0tag:blogger.com,1999:blog-2493137919500407537.post-15517295582950034212011-11-08T16:57:00.001-08:002011-11-08T17:07:32.790-08:00Introducing Predixion Insight 2.0!<p> </p> <p>We are on the verge of releasing Predixion Insight 2.0 and I wanted to give everyone a heads up on some of the new capabilities. It’s been a very exciting release and the product (if I may say so myself) is looking beautiful! Every time I step back and actually <em>use</em> the product (rather than develop it) I’m stunned by what we’ve accomplished!</p> <p>We have in short measure changed what it means to “do” predictive analytics. Predixion Insight 2.0 provides real advances in discovery and collaboration that significantly improve how users will even be able to talk about their predictive work. </p> <p>In my mind there’s so much more road for us to travel, so much more we can – and will – be doing to transform predictive analytics to predictive intelligence – but this milestone puts quite a bit of that highway behind us.</p> <p>I put together a short vblog showing some of the changes we’re releasing this week. Later I’m sure we’ll have a nice polished version that’s messaged appropriately, but this one is just a fairly raw footage of me introducing 2.0 to you. Enjoy and come back for more!.</p> <p> </p> <div style="padding-bottom: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; float: none; padding-top: 0px" id="scid:5737277B-5D6D-4f48-ABFC-DD9C333F4C5D:ee47cc60-2fbb-406a-baa7-ae344116a331" class="wlWriterEditableSmartContent"><div id="6c6a2e71-b1c3-4e98-806d-757d719865ee" style="margin: 0px; padding: 0px; display: inline;"><div><a href="http://www.youtube.com/watch?v=rgz3LZRS_AY" target="_new"><img src="http://lh6.ggpht.com/-fAp4K4HS1cw/TrnQCKvE4BI/AAAAAAAAAOM/Zw4OB2YuWvc/video4a1780f8c87e%25255B20%25255D.jpg?imgmax=800" style="border-style: none" galleryimg="no" onload="var downlevelDiv = document.getElementById('6c6a2e71-b1c3-4e98-806d-757d719865ee'); downlevelDiv.innerHTML = "<div><object width=\"448\" height=\"277\"><param name=\"movie\" value=\"http://www.youtube.com/v/rgz3LZRS_AY?hl=en&hd=1\"><\/param><embed src=\"http://www.youtube.com/v/rgz3LZRS_AY?hl=en&hd=1\" type=\"application/x-shockwave-flash\" width=\"448\" height=\"277\"><\/embed><\/object><\/div>";" alt=""></a></div></div><div style="width:448px;clear:both;font-size:.8em">A taste of Predixion Insight 2.0</div></div> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com1tag:blogger.com,1999:blog-2493137919500407537.post-58252612021086356182011-08-16T14:12:00.001-07:002011-08-16T14:19:55.750-07:00Scoring R models against PowerPivot Data–Part 3–Evaluating and ScoringThis is part 2 of a three part series about integrating R and PowerPivot. <br />
<ul><li><span style="font-size: x-small;">Part 1 – Intro (includes links to necessary components)</span> </li>
<li><span style="font-size: x-small;">Part 2 – Creating a classification model in R</span> </li>
<li><span style="font-size: x-small;"><strong>Part 3 – Evaluating and Scoring R models in Predixion</strong></span> </li>
</ul><div align="center"></div>In my previous post, I walked through the creation of a classification model in R and imported the model into <a href="http://predixionsoftware.com/" target="_blank" title="Predixion Software">Predixion</a> Insight. The necessary files for this part are on my <a href="http://cid-632f41683e4abc1a.office.live.com/richupload.aspx/.Public/Scoring%20R%20PowerPivot%20Part%203" target="_blank">SkyDrive</a> – in order to follow through with the steps in this phase you will need to import the PMML file into Predixion Insight if you don’t already have it. (Go to Insight Analytics/My Stuff – click “Import”). Also, open the Excel file “R Demo.xslx”.<br />
In this installment I will show you how to create a profit chart using the R model against data in Excel, and then score the model against data in PowerPivot.<br />
A profit chart allows you to determine how much profit (or loss) you will achieve by using a predictive model to select which cases – e.g. leads, customers, etc – to target. The profit chart gives you information on how to apply the model to data in order to maximize your profit based on your cost metrics.<br />
To launch the Profit Chart wizard, click on the Profit Chart button in the “Test” section of the Insight Analytics ribbon.<br />
<div align="center"><img alt="image" border="0" height="91" src="http://lh3.ggpht.com/-qHxYMuUvBbk/TkrdFFkssfI/AAAAAAAAAME/JwPaQED_pPE/image%25255B19%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="244" /></div>Once the Profit Chart wizard appears (you will be prompted to log in to Predixion Insight if you haven’t already) skip past the intro screen and select the model you imported from R – it will be under the PMML Models node which appears after all of the Predixion datasets<br />
<div align="center"><a href="http://lh4.ggpht.com/-rKd9NzqL5pI/TkrdFBTxrEI/AAAAAAAAAMI/X0lGBR-HOts/s1600-h/image5.png"><img alt="image" border="0" height="94" src="http://lh5.ggpht.com/-npfVdM61pC8/TkrdFfepuTI/AAAAAAAAAMM/UZerixdpvF0/image_thumb1.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="244" /></a></div><div align="left">Note that when you select your PMML model you will see information about the model in the description pane. Not all PMML models support retrieving the probability of output states like the models created using Predixion Insight. If a PMML model supports this functionality it will be listed in the Output section in the Description pane. Retrieving the probability of the output states is required to use Profit Charts and Accuracy charts on PMML models.</div><div align="left"><img alt="image" border="0" height="244" src="http://lh3.ggpht.com/-T0DDIO0n_jQ/TkrdFm8qLII/AAAAAAAAAMQ/W3puwIbCriQ/image%25255B18%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="230" /></div>The next step is to specify the business parameters of the profit chart – that is, the real factors that are important to how the model is expected to perform. In this case, you need to choose how much the campaign is going to cost and how much you expect to receive from successful contacts. Since there is a variable cost associated with contacting leads, you will need to click on the “…” button next to the individual cost field. Specify your costs according to these screen shots and click next.<br />
<a href="http://lh3.ggpht.com/-jwGoBsxfmeQ/TkrdGBig73I/AAAAAAAAAMU/JAh9MYi5RZs/s1600-h/image%25255B14%25255D.png"><img alt="image" border="0" height="338" src="http://lh5.ggpht.com/-ja5pScEx1C8/TkrdGTj5CAI/AAAAAAAAAMY/Z-nV2VwgCco/image_thumb%25255B10%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="433" /></a><br />
<img alt="image" border="0" height="238" src="http://lh6.ggpht.com/-P4RWUFoAxr0/TkrdGuwll2I/AAAAAAAAAMc/Wk3g6BLYMY8/image%25255B16%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="325" /><br />
Clicking Next on the dialog will bring you to the Select Input Data page. The R Demo Excel file has some prepared test data that you can use. You can also use data from PowerPivot, or test data that is automatically separated in Predixion datasets. If you were already looking at the test data when you launched the wizard, the table would already be selected – otherwise select the test data and click next to advance.<br />
<a href="http://lh5.ggpht.com/-siCgUAQRk_I/TkrdGzpHeLI/AAAAAAAAAMg/EM_DVHlB88E/s1600-h/image%25255B29%25255D.png"><img alt="image" border="0" height="259" src="http://lh5.ggpht.com/-VamlW9rr2rk/TkrdHK2CXvI/AAAAAAAAAMk/qithD46QEOY/image_thumb%25255B19%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="326" /></a><br />
On the final page of the wizard, you need to specify the relationships between the model columns and the columns in your Excel spreadsheet. Predixion Insight will automatically bind the columns based on the column names if possible. At this point you can click the Finish button and be done with it, however, if you want to be able to easily rerun the profit chart with different parameters or even automate the process, you can click on the little arrow on the Finish button and select “Execute and Create a Visual Macro.” This will not only start the Profit Chart, but also create a worksheet with a Profit Chart Visual Macro – that is, a worksheet with the executable description of the job you just performed. This Visual Macro can be altered and re-run however you choose, and even executed from VBA or automated via SQL Server Integration Services<br />
<a href="http://lh4.ggpht.com/-nNKuePuihgI/TkrdHXZoffI/AAAAAAAAAMo/66avN9GtHvY/s1600-h/image%25255B26%25255D.png"><img alt="image" border="0" height="359" src="http://lh3.ggpht.com/-aQvwq-l0N0Q/TkrdHmJS24I/AAAAAAAAAMs/NCPyasi96Ug/image_thumb%25255B18%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="475" /></a><br />
<a href="http://lh6.ggpht.com/-MePEH4bhRsc/TkrdH1g8S-I/AAAAAAAAAMw/4AowZ9FiATs/s1600-h/image%25255B30%25255D.png"><img alt="image" border="0" height="324" src="http://lh5.ggpht.com/-HEClB2bmqkU/TkrdIBTG99I/AAAAAAAAAM0/RDO8Cacm038/image_thumb%25255B20%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="344" /></a><br />
When the progress dialog comes up – it’s a good idea to dismiss it by clicking on the “Minimize to Task Pane” button which allows the operation to execute in the background. When complete you can click on the “Results” button in the job chicklet which renders the Profit Chart into your workbook. You can also click on the collaboration icon to send the job results directly to your friends and colleagues or publish to shared collections.<br />
<div align="center"><a href="http://lh3.ggpht.com/-QAia1paDaw0/TkrdIDNA4xI/AAAAAAAAAM4/6nk1AyTUcF4/s1600-h/image26.png"><img alt="image" border="0" height="178" src="http://lh3.ggpht.com/-HpnR1oI9n7E/TkrdIf3jUII/AAAAAAAAAM8/h4nO16hX85s/image_thumb8.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="244" /></a></div><div align="center"><a href="http://lh6.ggpht.com/-RIfMhLthDO8/TkrdI62ghsI/AAAAAAAAANA/2D7VYCr6pHE/s1600-h/image29.png"><img alt="image" border="0" height="101" src="http://lh6.ggpht.com/-shf74nu2Da0/TkrdJMv_UaI/AAAAAAAAANE/XBUeBFOko2o/image_thumb9.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="244" /></a></div><div align="center"><a href="http://lh3.ggpht.com/-PYhlOd-h8Rk/TkrdJMyQ1SI/AAAAAAAAANI/9dHK5eXCCbY/s1600-h/image32.png"><img alt="image" border="0" height="99" src="http://lh3.ggpht.com/-I3Lb73zTeg0/TkrdJaEbqlI/AAAAAAAAANM/upI5XRZjQUo/image_thumb10.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="244" /></a></div><div align="left">The actual Profit Chart itself shows us the maximum profit we can expect by using the R model to optimize our campaign. Additionally it tells us the probability threshold we need to consider when selecting which leads to contact. Specifically, in this case, we should only contact leads that have a probability of conversion, as dictated by the model, of 34.4% or greater.</div><div align="left"><a href="http://lh3.ggpht.com/-7f_atCl1yTE/TkrdJ_sZ4nI/AAAAAAAAANQ/VyMIKar2u2g/s1600-h/image%25255B48%25255D.png"><img alt="image" border="0" height="340" src="http://lh5.ggpht.com/--ZJ7rct7OxM/TkrdKL3quSI/AAAAAAAAANU/Pz7hieu0MHE/image_thumb%25255B28%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="343" /></a></div>Now that you know how to evaluate your R model – or more appropriately – how to evaluate the <em>predictions</em> from your R model, it’s time for the climax of this series – applying that R model to PowerPivot data. Actually, it’s almost an anticlimax because it’s so simple, but let’s walk through the process and I’ll point out some interesting bits along the way. To get started, click on the Query button on the Insight Analytics ribbon.<br />
<div align="center"><img alt="image" border="0" height="95" src="http://lh6.ggpht.com/-dAqFuV9gjKk/TkrdKdOG-dI/AAAAAAAAANY/W-gpYE-iLM0/image%25255B57%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="273" /></div><div align="left">As before you select the model you want to query (the R model!) and then select the input data. This time, you need to select that the Input Type is “Power Pivot Data.” If you haven’t already, Predixion Insight will launch PowerPivot and then connect to PowerPivot to collect the list of tables. One option you have with PowerPivot data that you don’t have with Excel data is that you can specify an arbitrary data filter. This allows you to, for instance, only score the leads that were received within the last month or so rather than the entire dataset – very handy for continual deployment of your models!</div><a href="http://lh5.ggpht.com/-psYh0TJS9N8/TkrdKvWInxI/AAAAAAAAANc/zgodbDxdbSs/s1600-h/image%25255B58%25255D.png"><img alt="image" border="0" height="306" src="http://lh6.ggpht.com/-J7CWJZ00138/TkrdK1y8R1I/AAAAAAAAANg/ladjjLs6xvA/image_thumb%25255B32%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="385" /></a><br />
After selecting your data, you have to map the model columns to the data columns as you did with the Profit Chart and then you get to the results page. In the results page you choose what information you need from the model. Since we are looking for leads with a probability to convert of at least 34.4%, all we need to do is to check the Probability of Yes checkbox. Since we want to integrate the results directly into PowerPivot, we don’t need to check any of the boxes to copy source data. You generally use that option when you need to move the results to a different destination.<br />
<a href="http://lh5.ggpht.com/-Eb33L1VVOKo/TkrdLAqS-KI/AAAAAAAAANk/KWQLR6p1Bi0/s1600-h/image%25255B59%25255D.png"><img alt="image" border="0" height="262" src="http://lh6.ggpht.com/-u5nREeyaRwI/TkrdLelKpvI/AAAAAAAAANo/LcJS7p6tKNg/image_thumb%25255B33%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="331" /></a><br />
Finally, you tell Predixion Insight what to do with the result. In this case we are going to add the results from our R model <em>back into</em> PowerPivot. Specify a table name and check the box with appends the result to the original table. Predixion Insight automatically detects the best identifier column, so you don’t have to actually specify anything there.<br />
<a href="http://lh5.ggpht.com/-GlqdDloMJcc/TkrdLppsweI/AAAAAAAAANs/FrxkGQHrPLI/s1600-h/image%25255B64%25255D.png"><img alt="image" border="0" height="297" src="http://lh5.ggpht.com/-MX4NU3nhasQ/TkrdL_nfekI/AAAAAAAAANw/q-J-m82G8WI/image_thumb%25255B36%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="367" /></a><br />
Again, you can finish (even choose to create a Visual Macro) and either wait for the job complete or dismiss it and keep working while the job executes in the background. Once the job finishes it will automatically update PowerPivot, or if you dismissed it, you click on the results to populate the PowerPivot table. Like the ProfitChart result, you can share the results with colleagues by clicking on the collaboration button – in which case, it would probably be a good idea to add some of the additional source columns for context! Note that you can also change the name of the job chicklet by selecting and typing over the default name. This can be useful when sharing or if you want to revisit past job submissions.<br />
<a href="http://lh5.ggpht.com/-VC6k6qjwUcI/TkrdMCHSOZI/AAAAAAAAAN0/yiNC__LCnrU/s1600-h/image%25255B63%25255D.png"><img alt="image" border="0" height="93" src="http://lh5.ggpht.com/-RSbIMIUWUME/TkrdMevRyPI/AAAAAAAAAN4/5WzybsfvrP0/image_thumb%25255B35%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="244" /></a><br />
Finally after you’ve fetched results you can see the results in your PowerPivot table. In this screenshot you see that Predixion Insight created a new PowerPivot table called “R Query Results” and added a new calculated column to the source table that references the predicted result for each row.<br />
<a href="http://lh5.ggpht.com/-ItUQMbhuGYA/TkrdMmlscxI/AAAAAAAAAN8/mF3mgljqQ7E/s1600-h/image%25255B55%25255D.png"><img alt="image" border="0" height="180" src="http://lh5.ggpht.com/-i1ydZeD7NvY/TkrdNBTO9eI/AAAAAAAAAOA/Ts2ZxLBSvBE/image_thumb%25255B31%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="371" /></a><br />
Now you can analyze your PowerPivot data the typical way (PivotTables published to SharePoint!) integrating predictive results from R. My first step would be to add a calculated column indicating which leads crossed the threshold indicated by the ProfitChart, but the possibilities are limitless!Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com2tag:blogger.com,1999:blog-2493137919500407537.post-58420679152074685762011-05-20T09:46:00.001-07:002011-05-20T09:46:13.773-07:00Scoring R models against PowerPivot Data–Part 2–Creating an R classification model<p>This is part 2 of a three part series about integrating R and PowerPivot. </p> <ul> <li><font size="2">Part 1 – Intro (includes links to necessary components)</font></li> <li><font size="2"><strong>Part 2 – Creating a classification model in R</strong></font></li> <li><font size="2">Part 3 – Evaluating and Scoring R models in Predixion</font></li> </ul> <p><font size="2">In this installment, I’m going to walk through the process of creating a predictive model in R, exporting the model as <a href="http://en.wikipedia.org/wiki/PMML" target="_blank">PMML</a> and importing the model into <a title="Predixion Software" href="http://predixionsoftware.com" target="_blank">Predixion</a> Insight.  The demo requires packages from </font><a href="http://rattle.togaware.com/rattle-download.html" target="_blank"><font size="2">Togaware</font></a><font size="2"> to create models and export them– you can go to the Togaware website and follow their instructions, or you can just launch the R console and enter:</font></p> <pre class="csharpcode"><font color="#ff0000">> install.packages(<span class="str">"RGtk2"</span>) <br />> install.packages(<span class="str">"rattle"</span>, dependencies=TRUE) <br /></font></pre><br /><style type="text/css"><br /><br /><br /><br />.csharpcode, .csharpcode pre<br />{<br /> font-size: small;<br /> color: black;<br /> font-family: consolas, "Courier New", courier, monospace;<br /> background-color: #ffffff;<br /> /*white-space: pre;*/<br />}<br />.csharpcode pre { margin: 0em; }<br />.csharpcode .rem { color: #008000; }<br />.csharpcode .kwrd { color: #0000ff; }<br />.csharpcode .str { color: #006080; }<br />.csharpcode .op { color: #0000c0; }<br />.csharpcode .preproc { color: #cc6633; }<br />.csharpcode .asp { background-color: #ffff00; }<br />.csharpcode .html { color: #800000; }<br />.csharpcode .attr { color: #ff0000; }<br />.csharpcode .alt <br />{<br /> background-color: #f4f4f4;<br /> width: 100%;<br /> margin: 0em;<br />}<br />.csharpcode .lnum { color: #606060; }</style><br /><br /><p><font size="2">Now, you may be prompted to install additional packages and you may have to restart R after the library installation is complete – it’s probably a safe bet in any case.</font></p><br /><br /><p><font size="2">After you’ve installed R and the correct libraries and are in the correct state of R preparedness, you can load the necessary libraries into R using these commands:</font></p><br /><br /><pre class="csharpcode"><font color="#ff0000">> library(rpart)<br />> library(pmml)</font></pre><br /><br /><p><font color="#ff0000"><font color="#222222"><font size="2">The next step is to load the training dataset into R.  As preparation for this article, I created a model (the same model as in </font><a href="http://www.predixionsoftware.com/predixion/Portals/0/Flash/PredixionInsightIntroandDemo.htm" target="_blank"><font size="2">our demo video</font></a><font size="2">) and used the technique in </font><a href="http://jamiemaclennan.blogspot.com/2011/05/querying-training-and-testing-sets.html" target="_blank"><font size="2">my previous post</font></a><font size="2"> (for those who already read that, this was the point) to export the training set and saved the result as a .csv file.  I’ve put the training data file up on my <a href="http://cid-632f41683e4abc1a.office.live.com/browse.aspx/.Public/Scoring%20R%20PowerPivot%20Part%202" target="_blank">SkyDrive</a> if you just want to grab it.</font></font></font></p><br /><br /><p><font color="#ff0000"><font color="#222222" size="2">This line reads the.csv file and puts the result into an object called “insurance”.  Note that R, due to it’s UNIX origins, uses forward slashes instead of backslashes as directory separators – also it’s case sensitive! </font></font></p><br /><br /><p><font color="#ff0000" size="2" face="Consolas">> insurance <- read.csv(<span class="str">"Data/Insurance Demo/insurance.train.csv"</span>)</font></p><br /><br /><p><font color="#ff0000"><font color="#000000"></font><font color="#000000" size="2">Next, I need to create a predictive model.  The <font face="Consolas">rpart</font> command creates a tree model on a dataset given a target column.  There are many additional parameters controlling how the tree is created, but for the purposes of this blog post, I’m just providing the shortest possible command.  This line creates a tree model and places it into an object called “TreeModel” that we can use later</font></font></p><br /><br /><p><font color="#ff0000" size="2" face="Consolas">> TreeModel <- rpart(Converted~.,insurance)</font></p><br /><style type="text/css"><br /><br /><br /><br />.csharpcode, .csharpcode pre<br />{<br /> font-size: small;<br /> color: black;<br /> font-family: consolas, "Courier New", courier, monospace;<br /> background-color: #ffffff;<br /> /*white-space: pre;*/<br />}<br />.csharpcode pre { margin: 0em; }<br />.csharpcode .rem { color: #008000; }<br />.csharpcode .kwrd { color: #0000ff; }<br />.csharpcode .str { color: #006080; }<br />.csharpcode .op { color: #0000c0; }<br />.csharpcode .preproc { color: #cc6633; }<br />.csharpcode .asp { background-color: #ffff00; }<br />.csharpcode .html { color: #800000; }<br />.csharpcode .attr { color: #ff0000; }<br />.csharpcode .alt <br />{<br /> background-color: #f4f4f4;<br /> width: 100%;<br /> margin: 0em;<br />}<br />.csharpcode .lnum { color: #606060; }</style><br /><br /><p><font color="#ff0000"><font color="#222222" size="2">The <font face="Consolas">pmml</font> command converts the tree model to PMML, assigning the result to “TreePmml”.</font></font></p><br /><br /><p><font color="#ff0000" size="2" face="Consolas">> TreePmml<-pmml(TreeModel)</font></p><br /><style type="text/css"><br /><br /><br /><br />.csharpcode, .csharpcode pre<br />{<br /> font-size: small;<br /> color: black;<br /> font-family: consolas, "Courier New", courier, monospace;<br /> background-color: #ffffff;<br /> /*white-space: pre;*/<br />}<br />.csharpcode pre { margin: 0em; }<br />.csharpcode .rem { color: #008000; }<br />.csharpcode .kwrd { color: #0000ff; }<br />.csharpcode .str { color: #006080; }<br />.csharpcode .op { color: #0000c0; }<br />.csharpcode .preproc { color: #cc6633; }<br />.csharpcode .asp { background-color: #ffff00; }<br />.csharpcode .html { color: #800000; }<br />.csharpcode .attr { color: #ff0000; }<br />.csharpcode .alt <br />{<br /> background-color: #f4f4f4;<br /> width: 100%;<br /> margin: 0em;<br />}<br />.csharpcode .lnum { color: #606060; }</style><br /><br /><p><font color="#ff0000"><font color="#222222" size="2">Finally, use <font face="Consolas">write</font> to output the pmml into a file – note again the forward slashes in the file path!</font></font></p><br /><br /><p><font color="#ff0000"><font size="2" face="Consolas">> write(toString(TreePmml),<span class="str">"Data/Insurance Demo/RModel.pmml"</span>) <br /><br /> <br /></font></font></p><br /><br /><p><br /> <p><font color="#ff0000"><font size="2" face="Consolas"><font color="#000000" face="Arial">This creates the file “RModel.pmml” that, if you’re curious, you can open up and examine in notepad or any other text editor.  Inspecting the text you’ll see that PMML is simply an XML representation of the patterns that were learned by the R algorithm – if you search for the word “node” you will find the structure of the tree itself.</font></font></font></p><br /><br /> <p><a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TdaaykAla3I/AAAAAAAAALk/FyZ9B7ILHwY/s1600-h/image%5B13%5D.png"><img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TdaazC_QO8I/AAAAAAAAALo/7jBX7e_QAIk/image_thumb%5B9%5D.png?imgmax=800" width="294" height="217" /></a></p><br /><br /> <p><font color="#ff0000"><font size="2" face="Consolas"><font color="#000000" face="Arial">Now that I have a model in PMML, I can launch Excel with Predixion Insight for Excel installed, switch to the Insight Analytics ribbon and click the My Stuff button.  In the My Stuff dialog, I need to click “Import PMML” to load the R model.  Note that PMML support is available in the Predixion Insight Free Trial and the Predixion Insight with PMML products.</font></font></font></p><br /><br /> <p align="center"><a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TdaaznMREmI/AAAAAAAAALs/GjpKxnFuXuc/s1600-h/image_thumb_thumb%5B2%5D.png"><img style="background-image: none; border-bottom: 0px; border-left: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb_thumb" border="0" alt="image_thumb_thumb" src="http://lh3.ggpht.com/_ESUcEp-EcL8/Tdaazx6hVjI/AAAAAAAAALw/7thcEMdKkvU/image_thumb_thumb_thumb.png?imgmax=800" width="208" height="99" /></a></p><br /><br /> <pre class="csharpcode"><font face="Arial"></font></pre><br /> <style type="text/css"><br /><br /><br /><br />.csharpcode, .csharpcode pre<br />{<br /> font-size: small;<br /> color: black;<br /> font-family: consolas, "Courier New", courier, monospace;<br /> background-color: #ffffff;<br /> /*white-space: pre;*/<br />}<br />.csharpcode pre { margin: 0em; }<br />.csharpcode .rem { color: #008000; }<br />.csharpcode .kwrd { color: #0000ff; }<br />.csharpcode .str { color: #006080; }<br />.csharpcode .op { color: #0000c0; }<br />.csharpcode .preproc { color: #cc6633; }<br />.csharpcode .asp { background-color: #ffff00; }<br />.csharpcode .html { color: #800000; }<br />.csharpcode .attr { color: #ff0000; }<br />.csharpcode .alt <br />{<br /> background-color: #f4f4f4;<br /> width: 100%;<br /> margin: 0em;<br />}<br />.csharpcode .lnum { color: #606060; }</style><br /><br /> <p><a href="http://lh6.ggpht.com/_ESUcEp-EcL8/Tdaa0FpXkTI/AAAAAAAAAL0/m6e2gRF7NjA/s1600-h/image_thumb1_thumb%5B2%5D.png"><img style="border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image_thumb1_thumb" border="0" alt="image_thumb1_thumb" src="http://lh3.ggpht.com/_ESUcEp-EcL8/Tdaa0rruQxI/AAAAAAAAAL4/on1GoEVIQ8Y/image_thumb1_thumb_thumb.png?imgmax=800" width="241" height="249" /></a></p><br /><br /> <p>After I click Import PMML and select and import my R model, it appears in the list of datasets and models under a special PMML node in all dialogs where you select a model, such as the My Stuff dialog or the accuracy chart and query wizards.  For most purposes PMML models are treated the same as native Predixion models in that you can import and export them, share them with other users, and publish them to shared collections.</p><br /><br /> <p align="center"><a href="http://lh5.ggpht.com/_ESUcEp-EcL8/Tdaa03RveoI/AAAAAAAAAL8/kn5SP1XibTg/s1600-h/image_thumb2_thumb%5B2%5D.png"><img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image_thumb2_thumb" border="0" alt="image_thumb2_thumb" src="http://lh6.ggpht.com/_ESUcEp-EcL8/Tdaa1NUrRqI/AAAAAAAAAMA/WNpQVkYWnes/image_thumb2_thumb_thumb.png?imgmax=800" width="244" height="111" /></a></p><br /></p><br /><br /><br /><br /><br /><br /><p>In the next installment, I will show you how to use the R model in Predixion to generate accuracy charts, and, as promised, score data in PowerPivot!</p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com14tag:blogger.com,1999:blog-2493137919500407537.post-45568497636729802062011-05-18T17:54:00.001-07:002011-05-18T17:54:07.671-07:00Scoring R models against PowerPivot Data - Part 1 of 3–Introduction<p><font size="2">This series is all about combining two completely different technologies that target two completely different audiences.   However, it turns out that those two audiences often have to serve the same master in looking for information and insight within their data.  And typically those two audiences don’t or <em>can’t</em> talk to each other, which makes driving common goals a bit difficult.</font></p> <p><font size="2">PowerPivot is a tool for preparing and creating data applications from a business analyst’s perspective.  You can bring data in from multiple sources, match them together, derive new information, create some calculations, and present the results in an attractive and meaningful business context.  You can download and find more information about PowerPivot at </font><a href="http://www.powerpivot.com"><font size="2">www.powerpivot.com</font></a><font size="2">.  Go ahead – it’s free!</font></p> <p><font size="2">R is a statistical language for preparing and creating data applications from a data scientist’s perspective.  You can import data, apply statistical tests and analysis, derive new information, and visualize the results in a scientific and statistical context.  You can download and find more information about R from </font><a href="http://www.r-project.org"><font size="2">www.r-project.org</font></a><font size="2">.  Go ahead – it’s free!</font></p> <p><font size="2">So the question remains, how can we make these two worlds collide in a meaningful way that takes the science performed in R and applies it to the business context of PowerPivot.  You knew I was going to say this, but </font><a title="Predixion Software" href="http://predixionsoftware.com" target="_blank"><font size="2">Predixion</font></a><font size="2"> Insight is the way!  Predixion Insight and Predixion Insight for Excel allows you to take predictive models created in R and apply them to data in PowerPivot (and Excel actually), thereby taking the scientific abstracts and providing them concrete business context.</font></p> <p><font size="2">In this three part series, I’m going to walk through the complete process of creating a model in R and then applying it to data in PowerPivot by means of Predixion Insight.  The four parts will cover the following topics:</font></p> <ul> <li><font size="2">Part 1 – Intro (you’re reading this now)</font></li> <li><font size="2">Part 2 – Creating a classification model in R</font></li> <li><font size="2">Part 3 – Evaluating and Scoring R models in Predixion</font></li> </ul> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com0tag:blogger.com,1999:blog-2493137919500407537.post-3195999822265262122011-05-13T14:25:00.001-07:002011-05-13T14:25:05.317-07:00Querying the Training and Testing Sets<p>This article (as you could rather clearly surmise from the title) is about querying the training and testing sets of a <a title="Predixion Software" href="http://predixionsoftware.com" target="_blank">Predixion</a> Insight model.  This is a very useful technique and is handy in a variety of situations, but that is not the <em>point</em> of this article.  The point of this article will be revealed in a future posting, so for now you can sit back and learn how to retrieve training and testing sets.</p> <p>Whenever you upload a rectangular dataset using Predixion Insight you have the option to automatically separate your data into training and testing sets.  The page of the wizard where you specify this information looks like this:</p> <p><img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TczZUam0khI/AAAAAAAAALI/iyzkWi7G5Co/image%5B6%5D.png?imgmax=800" width="318" height="246" /></p> <p>The parameters allow you to control how much data is set aside for testing up to a maximum.  Later this held out data is used to automatically create accuracy charts, profit charts and other fun stuff.</p> <p>Once your data is uploaded to Predixion Insight, you can create as many models as you wish on the dataset with ease using any of the modeling tools or even visual macros for automation.  If, for any reason, you want to fetch the data from the Predixion Insight server back into your Excel workbook or PowerPivot table, you can use the Query tool on the Insight Analytics ribbon.</p> <p><a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TczZUjj2zkI/AAAAAAAAALM/_9KIx0bRvcQ/s1600-h/image%5B22%5D.png"><img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TczZUzZC2fI/AAAAAAAAALQ/1nra6ca2tDI/image_thumb%5B13%5D.png?imgmax=800" width="155" height="91" /></a></p> <p>In the Query Wizard you select a <em>dataset</em> to query rather than a <em>model.</em>  Querying a model generally means fetching predictions or model patterns.  Querying a dataset simply means fetching the data that is cached in that dataset.</p> <p><a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TczZVUh1F8I/AAAAAAAAALU/G7ajRLjU06s/s1600-h/image%5B15%5D.png"><img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TczZVuoNQ-I/AAAAAAAAALY/3_NsQEn9G6g/image_thumb%5B8%5D.png?imgmax=800" width="317" height="247" /></a></p> <p>On the next page of the wizard you select which columns you want to fetch from the dataset.  You can click the checkbox at the top to retrieve all of them.  More importantly you can select a filter to limit which rows are returned.  This is where the magic comes in – there is a “special” filter clause that you can select called “Is Test Cases”.  Setting this filter clause to “true” causes only the test cases to be returned, whereas setting the clause to “false” returns only training cases.</p> <p><a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TczZV2VxnVI/AAAAAAAAALc/RzArWnrDYAw/s1600-h/image%5B21%5D.png"><img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TczZWQAy-yI/AAAAAAAAALg/ZCvIEhsPoP8/image_thumb%5B12%5D.png?imgmax=800" width="314" height="306" /></a></p> <p>And that’s it.  Although you now may have a new trick in handling your Predixion Insight datasets, you will have to wait until a future installment to learn why this detail is so important.  Until then – happy mining!</p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com1tag:blogger.com,1999:blog-2493137919500407537.post-55213884707324422232011-04-20T14:15:00.000-07:002011-04-20T14:15:00.100-07:00Profiling all PowerPivot tables at once using Predixion Insight VBAIn a <a href="http://jamiemaclennan.blogspot.com/2011/02/full-data-profile-azure-marketplace.html" target="_blank">previous post</a>, I showed how to use <a href="http://predixionsoftware.com/" target="_blank" title="Predixion Software">Predixion</a> Insight For Excel’s Data Profile feature to get descriptive statistics on PowerPivot tables – in that case using the great <a href="http://www.practicefusion.com/" target="_blank">Practice Fusion</a> data from the Windows Azure <a href="https://datamarket.azure.com/" target="_blank">Data Marketplace</a>. In practice, however, this feature is so useful that I always find myself profiling <em>every</em> table <em>every</em> time I load up a dataset in PowerPivot, so it’s a bit cumbersome to go through the user interface (regardless of how attractive it really is) each time.<br />
One thing that you may not be aware of is that Predixion Insight for Excel has a VBA API that allows you to not only interact with Predixion Insight objects, but with PowerPivot objects as well. Using this API, I can easily create a simple VBA macro that executes a Data Profile for each attached PowerPivot table.<br />
The first thing you need to do is to enable the developer ribbon in Excel since it’s hidden by default. You can do this by going to File/Options and selecting “Customize the Ribbon” and checking the “Developer” option in the “Main Tabs” tree control. Then switch to the developer ribbon and click the “Visual Basic” icon, which, in a throwback to turn of the century user interfaces, launches the VBA editor.<br />
<div align="center"><a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TZ4plgJS39I/AAAAAAAAAKw/d49WB4Hop4w/s1600-h/image2.png"><em><img alt="image" border="0" height="184" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pmiyb7GI/AAAAAAAAAK0/0Z5Wi0mKhqs/image_thumb.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="244" /></em></a> <br />
<em><span style="font-size: xx-small;">The VBA editor did <strong>not</strong> get the facelift the rest of Office 2007/10 did…</span></em></div>Once you’re in the VBA editor – you need to add a reference to the Predixion Insight VBA library. Select Tools/References from the VBA editor and choose the Predixion Insight VBA API for Microsoft Excel. (Hint – when the dialog appears, press “Q” and then scroll up – you will find it much quicker!)<br />
<a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pm3Ri-gI/AAAAAAAAAK4/NqR9VHjLO0M/s1600-h/image6.png"><img alt="image" border="0" height="364" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TZ4pnHl2NbI/AAAAAAAAAK8/Cv1hl7wxaRo/image_thumb2.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="453" /></a><br />
To have a place to write some code, simply select the Insert menu and choose “Module”, which will create an editor that you can write code. The code is very straightforward, you create your PowerPivot and Data Exploration objects from the Predixion VBA library, connect to PowerPivot to get the list of tables, and then get a data profile for each table that is returned. You can copy the code below into the VBA editor as is.<br />
<pre class="csharpcode"><span class="kwrd">Sub</span> ProfileAllPowerPivotTables(AdvancedProfile <span class="kwrd">As</span> <span class="kwrd">Boolean</span>)
<span class="kwrd">Dim</span> PowerPivot <span class="kwrd">As</span> <span class="kwrd">New</span> PredixionVBA.PowerPivot
<span class="kwrd">Dim</span> DataExploration <span class="kwrd">As</span> <span class="kwrd">New</span> PredixionVBA.DataExploration
PowerPivot.ConnectToPowerPivot
TableList = PowerPivot.GetPowerPivotTables()
<span class="kwrd">For</span> <span class="kwrd">Each</span> Table <span class="kwrd">In</span> TableList
Application.StatusBar = <span class="str">"Profiling PowerPivot Table: "</span> + _
Table
DataExploration.GenerateDataProfileForPowerPivot Table, <span class="str">""</span>, _
AdvancedProfile, <span class="str">""</span>
<span class="kwrd">Next</span>
Application.StatusBar = <span class="str">""</span>
<span class="kwrd">End</span> <span class="kwrd">Sub</span>
<span class="kwrd">Sub</span> ProfileAllPowerPivotTablesAdvanced()
ProfileAllPowerPivotTables (<span class="kwrd">True</span>)
<span class="kwrd">End</span> <span class="kwrd">Sub</span></pre><br />
<br />
<pre class="csharpcode"><span class="kwrd"></span><span class="kwrd">Sub</span> ProfileAllPowerPivotTablesBasic()
ProfileAllPowerPivotTables (<span class="kwrd">False</span>)
<span class="kwrd">End</span> <span class="kwrd">Sub</span> </pre><br />
<style type="text/css">
.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, "Courier New", courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }
</style><br />
<br />
<br />
<br />
<br />
As I stated, the code is fairly straightforward. The other parameters for <strong>GenerateDataProfileForPowerPivot</strong> are an optional table filter and an optional list of columns, which you probably don’t need to get a quick profile of all the tables in a first shot. Running either of the top level functions on the Predixion Sample Data workbook by clicking on the green “play” button generates all the profiles in about a minute with an index to boot.<br />
<br />
<br />
<br />
<br />
<br />
<a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TZ4pns3QcJI/AAAAAAAAALA/MaKDeU_d3rE/s1600-h/image11.png"><img alt="image" border="0" height="343" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pn_vav3I/AAAAAAAAALE/nE6Epa2Z2rQ/image_thumb5.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="644" /></a>Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com1tag:blogger.com,1999:blog-2493137919500407537.post-85176415765609712372011-04-18T14:12:00.000-07:002011-04-18T14:12:00.327-07:00Transactional Sampling with PowerPivot and Predixion Insight<a href="http://predixionsoftware.com/" target="_blank" title="Predixion Software">Predixion</a> Insight For Excel provides some excellent sampling tools that can be applied to data in Excel worksheets or in an embedded PowerPivot database. If you believe that there are any analysis opportunities outside of pivot tables, sampling PowerPivot becomes very interesting. Since very few tools outside of Predixion Insight can read PowerPivot data directly and you simply cannot move over 1 million rows from PowerPivot into an Excel worksheet, you are pretty much stuck if you want to perform varied analyses on that PowerPivot data. By sampling PowerPivot data you can leverage any techniques that you know and love in native Excel for analysis as well as using any other analytical addins such as the <a href="http://archive.msdn.microsoft.com/solverfoundation" target="_blank">Microsoft Solver Foundation</a>.<br />
Predixion Insight offers three kinds of sampling – percentage, row count, and balanced. Percentage sampling allows you to specify a specific percentage and will produce a random sample with approximately that percentage of data rows from the source table. Row count will produce a sample containing exactly the number of rows you request (assuming there are that many in the source table to begin with). Balanced sampling is a special case since the purpose isn’t to reduce data volume, rather it is to accentuate a typically rare feature of the data for better pattern discovery – e.g. increase the balance of rows representing fraudulent vs non fraudulent cases. Independent of the sampling type, the resulting sampled set can be created as a table in an Excel worksheet or in PowerPivot (ProTip: in a pinch, using Predixion Insight sampling is a great trick to move data back and forth between the two).<br />
However, when you have transactional data, you really need to think about sampling a little differently. In transactional data, information about a single transaction is contained across multiple rows. Sampling individual rows would actually break up a transaction and change the nature of the data itself. For example, a 20% sample of shopping cart data essential removes 8 items out of a 10 item basket – you would be very unlikely to find the mythical beer and diapers correlation that way! Instead you need a way to sample that would take an entire transaction rather than individual rows.<br />
In order to accomplish this, you need Excel and PowerPivot (obviously) as well as Predixion Insight for Excel and the source database from which you pull your transactional data. <br />
I’ll go through the entire sequence explicitly, but here are the high level steps:<br />
<ul><li>Load transactional data into PowerPivot </li>
<li>Load distinct transactional id’s into PowerPivot </li>
<li>Create a sample of the transactional keys </li>
<li>Add a relationship to PowerPivot between the full data table and the sample </li>
<li>Add a calculated column indicating rows that are selected </li>
<li>Use that column as a table filter </li>
</ul>In this example, I’m using some data from Last.FM regarding music artists and user submitted tags that I pulled from <a href="http://musicmachinery.com/2010/11/10/lastfm-artisttags2007/" target="_blank">here</a>. <br />
<a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TZ4pKyYR0qI/AAAAAAAAAI4/izX5pG1c-v0/s1600-h/image45.png"><img alt="image" border="0" height="345" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pLR-YamI/AAAAAAAAAI8/ulm696zvXGw/image_thumb15.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="448" /></a><br />
You can see that the data is “transactional” with the “Artist ID” being the transaction ID with the “Tag” being the items in the transactions. <br />
First get the data into PowerPivot by opening the PowerPivot window from Excel and selecting “From Database/From SQL Server” from the ribbon. (If your data is in a different database, find it under “from other sources”)<br />
<a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TZ4pLuwNPII/AAAAAAAAAJA/Koycwt28T3k/s1600-h/image48.png"><img alt="image" border="0" height="102" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TZ4pL7U8HtI/AAAAAAAAAJE/88-EqbhvVOQ/image_thumb16.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="244" /></a><br />
This launches the table import wizard where you indicate the location and database containing the data to load, and then select the transactional table.<br />
<a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TZ4pMCSuUEI/AAAAAAAAAJI/QI7puzJR3og/s1600-h/image49.png"><img alt="image" border="0" height="484" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pMe6gVEI/AAAAAAAAAJM/2xHjVKn54kE/image_thumb17.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="473" /></a><br />
Next go back to the Table Import Wizard, but this time, select the “Write a query” option<br />
<a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TZ4pMhq1DvI/AAAAAAAAAJQ/PvkNv5_5Qm8/s1600-h/image50.png"><img alt="image" border="0" height="484" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pM-lSwwI/AAAAAAAAAJU/YPa1bZK-j1w/image_thumb18.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="473" /></a><br />
The query is a simple “SELECT DISTINCT” of the transaction ID’s. Note that the “Friendly Query Name” will end up as the table name in PowerPivot so it’s a good idea to set it here.<br />
<a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TZ4pNDXzDfI/AAAAAAAAAJY/jQUOviHXgng/s1600-h/image51.png"><img alt="image" border="0" height="484" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pNm_jAQI/AAAAAAAAAJc/FrZSag8j6LY/image_thumb19.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="473" /></a><br />
Then you go back to Excel, switch to the Insight Analytics ribbon and choose the “Sample Data” tool<br />
<a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TZ4pNncaYZI/AAAAAAAAAJg/PiwsvAc0RU4/s1600-h/image20.png"><img alt="image" border="0" height="90" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TZ4pN1z9jSI/AAAAAAAAAJk/2JWxb0rlU4s/image_thumb6.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="244" /></a><br />
Select “PowerPivot Data” as the input type and choose the result of the select distinct query as the table.<br />
<a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pOOhxcmI/AAAAAAAAAJo/bYJ2qBRshrs/s1600-h/image61.png"><img alt="image" border="0" height="488" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TZ4pOpkij-I/AAAAAAAAAJs/_5NYZ25E3i8/image_thumb29.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="632" /></a><br />
Choose the sampling method – in this case you will want to choose Random Sampling to reduce data volumes.<br />
<a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TZ4pO9Hz3xI/AAAAAAAAAJw/og70lewhS_0/s1600-h/image60.png"><img alt="image" border="0" height="488" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TZ4pPeidO0I/AAAAAAAAAJ0/iahciImKrPg/image_thumb28.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="632" /></a><br />
Then choose how many transactions you want in the output. This is entirely up to you at this point. For this example, I chose 10,000 transactions, but really it’s whatever you choose to be right for your scenario.<br />
<a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TZ4pPo4450I/AAAAAAAAAJ4/6AxAusWqML0/s1600-h/image59.png"><img alt="image" border="0" height="488" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TZ4pQJ9HCzI/AAAAAAAAAJ8/nol_z9YmYuA/image_thumb27.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="632" /></a><br />
Finally, specify where you want the results – you need to add the results as a PowerPivot table, but you can change the name. There’s no point in creating an unselected list as you will see later in the example.<br />
<a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TZ4pQSV_mYI/AAAAAAAAAKA/nrHijTrmEu4/s1600-h/image58.png"><img alt="image" border="0" height="488" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TZ4pQuLWrQI/AAAAAAAAAKE/036VPFQnYXY/image_thumb26.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="632" /></a><br />
When the sampling is complete you can open PowerPivot, switch to the sampled data page and see how many transactions you selected. Since I chose 10,000 for this example, there are exactly 10,000 transaction ID’s in the result!<br />
<a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pQx5KudI/AAAAAAAAAKI/ie1mxBxpso8/s1600-h/image38.png"><img alt="image" border="0" height="199" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TZ4pRVxhAqI/AAAAAAAAAKM/o7qv7JRl1Sw/image_thumb12.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="235" /></a><br />
Now you need to create a relationship between the Selected Data table and the source table – in my example, that’s the LastFM table. First, in the PowerPivot window, switch to the Design ribbon and click “Create Relationship.”<br />
<a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TZ4pRoG-9EI/AAAAAAAAAKQ/SgJF9piQgvw/s1600-h/image14.png"><img alt="image" border="0" height="85" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TZ4pSAJIpJI/AAAAAAAAAKU/7LQPEHcXMlA/image_thumb4.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="244" /></a><br />
In the Create Relationship dialog, relate the transaction ID from the Selected Data table to the transaction ID of the full transaction table.<br />
<a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TZ4pSHgbbKI/AAAAAAAAAKY/H59_AOTUrzs/s1600-h/image57.png"><img alt="image" border="0" height="307" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TZ4pSr2JwuI/AAAAAAAAAKc/RWRxs0y_3Ls/image_thumb25.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="612" /></a><br />
Next you need to add a calculated column to the full transaction table indicating for each row if that row has been selected or not. There are many DAX expressions that will work to do this, I chose the straightforward “not blank related” approach. <br />
<a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TZ4pS7vQoXI/AAAAAAAAAKg/GujVjRteDWY/s1600-h/image62.png"><img alt="image" border="0" height="177" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pTGFdMnI/AAAAAAAAAKk/KvIAIHXD7Ww/image_thumb30.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="574" /></a><br />
And that’s it! You now have a sampled a transactional table – or at least labeled rows in a transactional table in such a way that you can easily select the sampled transactions. In Predixion Insight for Excel, you can always filter a PowerPivot data source for any operation using the filter control on the data selection page. Also, this is why you don’t need to keep the “unselected” samples since you can easily choose either the “selected” items or the “unselected” items in any filter.<br />
<a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pTVLE8hI/AAAAAAAAAKo/qoU86FQOUCA/s1600-h/image63.png"><img alt="image" border="0" height="558" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TZ4pTqXv9FI/AAAAAAAAAKs/JjFZX8KaY2k/image_thumb31.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="632" /></a><br />
Let me finish this off with one more ProTip. It’s unlikely that you will want these extra tables – i.e. the distinct list and sampled list of transaction ids – to show up in your pivot table analyses. While you can’t explicitly hide <em>tables</em> in PowerPivot, if you hide <strong>all</strong> the columns in a PowerPivot table it has the same effect. So to effectively hide a table from pivot charts, go to the Design ribbon in PowerPivot and use the Hide and Unhide tool to hide all of the columns in the extra tables.Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com0tag:blogger.com,1999:blog-2493137919500407537.post-58101754014536859822011-03-03T15:59:00.001-08:002011-04-07T14:18:03.874-07:00Predixion Software on BITV<a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TXArbicrT6I/AAAAAAAAAIc/n0R0JGnfOsY/s1600-h/image_thumb36%5B4%5D.png"><img alt="image_thumb36" border="0" height="108" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TXArcKyxRQI/AAAAAAAAAIg/pBTxPeh_Ws8/image_thumb36_thumb%5B1%5D.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image_thumb36" width="240" /></a><img height="55" src="http://blogs.technet.com/resized-image.ashx/__size/550x0/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-59-23/8865.BI.TV.PNG" style="display: block; float: none; margin-left: auto; margin-right: auto;" width="240" /><br />
Bruno Aziza’s Business Intelligence TV <a href="http://blogs.msdn.com/b/bi/archive/2011/03/03/predicting-the-future-the-next-step-in-bi.aspx" target="_blank" title="Predixion Software">spot on Predixion Software</a> just aired on his blog, and despite the video thumbnail of me looking like I just sucked on a lemon, it came out pretty well. During the video I discuss the <a href="http://jamiemaclennan.blogspot.com/2011/02/prescription-finder-demo.html" target="_blank">application I built</a> using the Practice Fusion Data from the Azure Marketplace.<br />
<div class="wlWriterEditableSmartContent" id="scid:5737277B-5D6D-4f48-ABFC-DD9C333F4C5D:4e5bf65e-9836-43e5-8377-9649fbb874d2" style="display: inline; float: none; margin: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><div id="ae550dc4-b4a7-4206-8bfc-3a7a8f54a22e" style="display: inline; margin: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><div><a href="http://www.youtube.com/watch?v=XBGVdwMscvI" target="_new"><img alt="" galleryimg="no" onload="var downlevelDiv = document.getElementById('ae550dc4-b4a7-4206-8bfc-3a7a8f54a22e'); downlevelDiv.innerHTML = "<div><object width=\"425\" height=\"355\"><param name=\"movie\" value=\"http://www.youtube.com/v/XBGVdwMscvI&hl=en\"><\/param><embed src=\"http://www.youtube.com/v/XBGVdwMscvI&hl=en\" type=\"application/x-shockwave-flash\" width=\"425\" height=\"355\"><\/embed><\/object><\/div>";" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TYpkoCCHNNI/AAAAAAAAAI0/f0F3OR6mp0w/videodaedd4ce4d44%5B5%5D.jpg?imgmax=800" style="border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-style: none;" /></a></div></div></div><br />
The video below demonstrates the application and shows how I create the application starting with an almost empty workbook (there’s some graphics in there and a <em>wee bit</em> of VBA code). <br />
<iframe allowfullscreen="allowfullscreen" frameborder="0" height="390" src="http://www.youtube.com/embed/NETpHovS9PI" title="YouTube video player" width="480"></iframe><br />
In case you didn’t know already, we have more videos on <a href="http://predixionsoftware.com/" target="_blank" title="Predixion Software">Predixion</a> Insight at our <a href="http://www.youtube.com/predixion" target="_blank">YouTube channel</a> that provide more information about Predixion and tutorials on how to use the product.<br />
I’ll continue this series with some more discussion of the dataset as well as how to create the model and VBA code used in the video.Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com1tag:blogger.com,1999:blog-2493137919500407537.post-180878308907456952011-02-16T18:17:00.001-08:002011-04-07T14:18:44.424-07:00Prescription Finder DemoThis really isn’t a blog post – it’s just a post so you can find the related materials for the Prescription Finder demo that will appear in video links any time now.<br />
Without a lot of additional explanations, just in case you’ve stumbled upon this accidentally, this is what the demo looks like.<br />
<a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TVyFLoAD9sI/AAAAAAAAAIU/P5j-ynl39Kk/s1600-h/image%5B10%5D.png"><img alt="image" border="0" height="273" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TVyFMLWiTNI/AAAAAAAAAIY/xbX5KxCSvlc/image_thumb%5B8%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="455" /></a> <br />
You can access the demo files from the SkyDrive folder below. To run the entire demo, you will need to import the included model into your <a href="http://predixionsoftware.com/" target="_blank" title="Predixion Software">Predixion</a> Insight account (get a free trial if you don’t have one!)<br />
<iframe frameborder="0" marginheight="0" marginwidth="0" scrolling="no" src="http://cid-632f41683e4abc1a.office.live.com/embedicon.aspx/.Public/PrescriptionFinder" style="background-color: #fcfcfc; height: 115px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; width: 98px;" title="Preview"></iframe><br />
Enjoy!Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com1tag:blogger.com,1999:blog-2493137919500407537.post-24467174095735634112011-02-08T18:09:00.001-08:002011-04-07T14:19:18.537-07:00Data Preparation Webinar Slide DeckHere’s the deck that I presented at today’s webinar complete with the DAX formulas I added to the PracticeFusion PowerPivot workbook. You can get the data from the Azure Marketplace or download one of my <a href="http://cid-632f41683e4abc1a.office.live.com/self.aspx/.Public/Practice%20Fusion%20-%20Blog1.xlsx" target="_blank">preloaded copies</a>.<br />
Included in the deck is the coupon code for 50% off the <a href="http://predixionsoftware.com/" target="_blank" title="Predixion Software">Predixion</a> Insight Cloud service for 3 months – expires 3/15/2011!<br />
<iframe frameborder="0" height="327" scrolling="no" src="http://r.office.microsoft.com/r/rlidPowerPointEmbed?p1=1&p2=1&p3=SD632F41683E4ABC1A!213&p4=" width="402"></iframe><br />
Drop me a line or post a comment if you have any questions! (I’ll update this post when the video is posted)<br />
EDIT: Webinar replay is available <a href="https://predixionsoftware-ec.webex.com/predixionsoftware-ec/lsr.php?AT=pb&SP=EC&rID=3191567&rKey=2a732cd242d9e6fe" target="_blank">here</a>Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com2tag:blogger.com,1999:blog-2493137919500407537.post-10355358828276603072011-02-06T09:32:00.001-08:002011-04-07T14:19:57.688-07:00The Full Data Profile - Azure Marketplace Data with PowerPivot and Predixion Insight Part 3<img alt="image" border="0" height="152" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TUwxQNPKsaI/AAAAAAAAAHQ/Qix7jSpNd_E/image11.png?imgmax=800" style="display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="336" /><br />
In the <a href="http://jamiemaclennan.blogspot.com/2011/01/cleaning-azure-marketplace-data-with.html" target="_blank">first post</a> of the series I gave a screenshot of a basic Profile Data report from <a href="http://predixionsoftware.com/" target="_blank" title="Predixion Software">Predixion</a> Insight, leaving creating the remaining reports as “an exercise for the reader.” Of course, if you don’t currently have Excel 2010 installed on your system, you may worried that you wouldn’t be able to complete the assignment and might get a bad grade on the pop quiz that’s coming up after the next few posts. (Just kidding).<br />
Since I got permission to go ahead and post the data and results for my <a href="http://jamiemaclennan.blogspot.com/2011/02/azure-marketplace-data-with-powerpivot.html" target="_blank">last post</a>, I put the work-in-progress files up on <a href="http://www.skydrive.com/" target="_blank">SkyDrive</a>. To make it easier to see what Profile Data in Predixion Insight really does I embedded the results here for you to explore interactively. If you want to feel like Gulliver in Lilliput poking through the castle windows you can click around in this interface, otherwise you can click on the “View full size workbook”<a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TU7bM-51YCI/AAAAAAAAAH0/iFZIu6V5Jh8/s1600-h/image%5B3%5D.png"><img alt="image" border="0" height="24" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TU7bNfPFKHI/AAAAAAAAAH4/mJHbPban_lo/image_thumb%5B1%5D.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline;" title="image" width="23" /></a> button for better access.<br />
<iframe frameborder="0" height="470" scrolling="no" src="http://r.office.microsoft.com/r/rlidExcelEmbed?su=7147003049637755930&Fi=SD632F41683E4ABC1A!179&AllowInteractivity=True" width="500"></iframe><br />
This workbook was created by downloading the <a href="https://datamarket.azure.com/dataset/8c30d4d3-0846-4b08-9991-104958f24ca7" target="_blank">Practice Fusion</a> data as described <a href="http://jamiemaclennan.blogspot.com/2011/01/cleaning-azure-marketplace-data-with.html" target="_blank">here</a> and running the Profile Data tool from Predixion Insight on each table in PowerPivot. I ran Profile Data twice on each table so you can see the difference <a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TU7bNo_5OLI/AAAAAAAAAH8/99WJZhkXNYw/s1600-h/image%5B8%5D.png"><img align="right" alt="image" border="0" height="59" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TU7bN4OtC3I/AAAAAAAAAIA/XZv1yGj4t78/image_thumb%5B4%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; margin-left: 0px; margin-right: 0px;" title="image" width="126" /></a>between the “Basic” and “Advanced” profile reports. Then I just edited the names on the worksheet tabs and the report titles. The Insight Index is automatically created and updated to create a nice guide to the workbook – it also updates automatically as you change worksheet names and report titles, so it’s convenient as well. If you delete rows in your index, or delete the whole page, or somehow do something irreversible to the worksheet (it is in an Excel worksheet, after all), you can regenerate the Insight Index from the Options dialog available under the Predixion menu button.<br />
One more thing I had to do to host the workbook interactively in SkyDrive is to drop all the PowerPivot tables after creating the reports. The Excel file was between 9 and 10 MB which is too large for viewing in the browser.<br />
<h3>Advanced Data Profile</h3>I explained a bit about the basic data profile at the start of this series, so I’ll just explain briefly the difference. Essentially the difference is that you just get more stats with “Advanced” than you do with “Basic”, plus the report takes slightly longer to execute as it has more to compute.<br />
For numerical columns, Advanced adds measures such as Kurtosis and Skewness which measure the “normality” of the data – that is – how well the data conforms to a bell curve, an approximate median, the mode, and the confidence interval. For those who aren’t familiar with these terms, they are defined with links to appropriate Wikipedia articles in the column headers.<br />
The Advanced Profile Data report also contains correlation and covariance matrices for numerical columns. Looking at the correlation matrix for the SyncChart table, for example, shows a clear correlation between Weight and BMI, and a negative correlation between Height and BMI.<br />
<img alt="image" border="0" height="102" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TU7bOfUS09I/AAAAAAAAAIE/CH3xqtL-154/image%5B14%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="454" /> <br />
For categorical columns, the Advanced report adds the top and bottom 3 states present in the column, as well as a number representing the top 80% of the states in the data. This number tells you how many values of the column cover 80% of the data. Using the SyncChart example again, you can see that although there are 39 different physician specialties represented in the data, 80% of the visits are to only 3 types of physicians.<br />
<img alt="image" border="0" height="69" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TU7bO99fnyI/AAAAAAAAAII/3VidAl8dSvs/image_thumb%5B10%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="454" /><a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TU7bPNHtLKI/AAAAAAAAAIM/uajpmzDaQJY/s1600-h/image%5B24%5D.png"><img alt="image" border="0" height="69" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TU7bPl02pHI/AAAAAAAAAIQ/EDNAQYM0308/image_thumb%5B12%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="524" /></a> <br />
A good use of the top and bottom states is to let you know if the data is particularly useful. For instance the PhysicianSpecialty row above shows that the bottom three states appear only once in the data. This indicates that, for Predictive Analytics at least, you will not find any patterns regarding Naturopathy-Acupuncture, Dermatology, or Physical Medicine & Rehabilitation using this data. A good practice to deal with data that shows up with way is to use the Clean Data/Outliers tool to rename such states to “Other”. (Personally, with this data, I would also use Clean Data/Relabel to change blank states in this columns to “Unknown”).<br />
<h3>How do get this stuff</h3>Just to recap for those who haven’t read the other posts, the you can get the tools used to create these reports from the following locations:<br />
<ul><li><a href="http://office.microsoft.com/en-us/try" target="_blank">Excel 2010 Trial</a></li>
<li><a href="http://powerpivot.com/download.aspx" target="_blank">Microsoft PowerPivot</a></li>
<li><a href="http://www.predixionsoftware.com/predixion/products.aspx" target="_blank">Predixion Insight 30-day trial</a></li>
<li><a href="https://datamarket.azure.com/dataset/8c30d4d3-0846-4b08-9991-104958f24ca7" target="_blank">Practice Fusion data set</a></li>
</ul>Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com2tag:blogger.com,1999:blog-2493137919500407537.post-70815228069724972252011-02-04T09:03:00.001-08:002011-02-09T14:41:59.294-08:00Azure Marketplace Data with PowerPivot and Predixion Insight Part 2 - How tall am I anyway?<p></p> <p><em>This post follows from my previous post on <a href="http://jamiemaclennan.blogspot.com/2011/01/cleaning-azure-marketplace-data-with.html" target="_blank">Cleaning Azure Marketplace Data</a>.  I recommend skimming that post prior to reading this one and possibly following through the steps.  However, I did meet Matt Douglass of Practice Fusion at the Strata conference in Santa Clara, CA this week, and he graciously gave me permission to post my workbooks containing the practice fusion data as I progress so you can shortcut the steps and jump in at any time in this series.</em></p> <p><img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TUwxQNPKsaI/AAAAAAAAAHQ/Qix7jSpNd_E/image11.png?imgmax=800" width="336" height="152" /> </p> <p><img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" border="0" src="http://www.lifetlc.com/resources/PracticeFusion_-_Marketing_Logo.jpg" width="259" height="105" /></p> <p>Looking more into the <a href="https://datamarket.azure.com/dataset/8c30d4d3-0846-4b08-9991-104958f24ca7" target="_blank">Practice Fusion data</a>, I took some more interest in considering the Height column that I cleaned up there. It got me thinking about what I could learn more about “Height” of these patients.  </p> <p>First, here’s what you need to follow along with this post:</p> <ul> <li>Microsoft Excel 2010 </li> <li>Microsoft <a href="http://www.powerpivot.com/download.aspx" target="_blank">PowerPivot</a> </li> <li><a href="http://www.predixionsoftware.com/predixion/products.aspx" target="_blank">Predixion Insight</a> for Excel </li> <li>Workbook resulting from the last post – <a href="http://cid-632f41683e4abc1a.office.live.com/self.aspx/.Public/Practice%20Fusion%20-%20Blog1.xlsx" target="_blank">download from SkyDrive</a> if needed </li> </ul> <h1>Part 1 – What is height?</h1> <p>I want to see trends on height over time, so I:</p> <ul> <li>Create a Pivot Table from the PowerPivot ribbon </li> <li>Add my “Height with Outliers Removed” column from the SyncChart table (that I created last post) to the Values well </li> <li>Set the “Summarize By” property on the value to “Average” </li> <li>Add the “YearOfBirth” column from the SyncPatients table to the Axis Fields well </li> </ul> <p>At this point PowerPivot prompts me that a relationship may be needed, so I allow PowerPivot to create the relationship, which it does correctly, and with some formatting I have a nice chart that looks like this:</p> <p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TUwxQs0500I/AAAAAAAAAHU/k_ddVYnYA90/image711.png?imgmax=800" width="510" height="282" /> </p> <p>Wow – how simple was that!  We took the cleansed data and easily created a chart allowing PowerPivot to automatically create relationships among multiple tables that were downloaded from a public cloud source.  Pretty cool – except that it’s wrong.</p> <p>Well, “wrong” may be a bit harsh - “misleading” may be a better term.  You see, we didn’t create a chart showing the Heights of <em>Patients</em> by the year they were born, we created a chart showing the height measurements taken by doctors averaged by the birth year of the patient.  I’m not exactly sure why that would be useful.  The impact of that is, for example, if there were two patients born in 1920, and one was 60 inches tall and the other was 70 inches, but the shorter patient went to the doctor 4 times and the taller only once, we would see the average for 1920 to be 62 inches ( (60*4 + 70*1)/5 ) rather than what we might expect at 65 inches ( (60 + 70)/2 ).</p> <h2></h2> <h3>Understanding Height</h3> <p>So what we need do to is to define what “Height” means for a <em>Patient</em> rather than a <em>Visit</em> (each row in the SyncChart table represents a doctor’s visit).  This type of attribute creation is very common when performing analyses – particularly for predictive analytics.    To create a “Height” attribute we have to decide what “Hieght” means in terms of our data – this generally requires business understanding of the data, making assumptions of the data, or both.  With the PracticeFusion data cleaned from the previous we have 0 or more measurements of height for each patient with extreme values removed.  In this case, I’m going to use the average of all the height measurements.  Realize that this assumes that any particular individual’s height didn’t change significantly – which is generally true if the patients are adults. </p> <p>Given that the patients being adults is a hypothesis rather than a known fact, means that I should verify through experimentation.  I went to the PowerPivot window and added a calculated column to the SyncChart table with this expression:</p> <p><font face="Courier New">=SyncChart[VisitYear]-RELATED(SyncPatient[YearOfBirth])</font> </p> <p>I named that column “Age at Visit Time” and created the table below showing that out of almost 60,000 visits, only a handful had a potential to have an unexpected growth spurt.  (Note that this was only possible because of the relationship between SyncChart and SyncPatient that was created at the beginning of this blog post.)  To me, this validates that using the average height measurement across visits is a reasonable value for an individual patient’s height.</p> <table border="0" cellspacing="0" cellpadding="0"><tbody> <tr> <td width="47"><font size="1">Age</font></td> <td width="128"><font size="1">Count of ChartGuid</font></td> </tr> <tr> <td><font size="1">14</font></td> <td><font size="1">2</font></td> </tr> <tr> <td><font size="1">16</font></td> <td><font size="1">9</font></td> </tr> <tr> <td><font size="1">17</font></td> <td><font size="1">140</font></td> </tr> <tr> <td><font size="1">18</font></td> <td><font size="1">331</font></td> </tr> <tr> <td><font size="1">19</font></td> <td><font size="1">482</font></td> </tr> <tr> <td><font size="1">20</font></td> <td><font size="1">479</font></td> </tr> <tr> <td><font size="1">21</font></td> <td><font size="1">459</font></td> </tr> <tr> <td><font size="1">…</font></td> <td><font size="1">…</font></td> </tr> </tbody></table> <h2> </h2> <h3>Defining Height</h3> <p>Now that I’m convinced I know what “Height” is, I added another calculated column in the PowerPivot window to the SyncPatient table with the expression:</p> <p><font face="Courier New">=AVERAGEX(RELATEDTABLE(SyncChart), <br />          SyncChart[Height with Outliers Removed])</font></p> <p>and named it simply “Height”.  Creating a PivotTable and measuring the difference shows that in just the first few rows you have a difference of almost 3 inches – clearly showing the problems with the initial “naive” approach.</p> <table border="0" cellspacing="0" cellpadding="0"><tbody> <tr> <td width="92"><font size="1">YearOfBirth</font></td> <td width="100"><font size="1">Patient Height</font></td> <td width="84"><font size="1">Chart Height</font></td> <td width="64"><font size="1">Diff</font></td> </tr> <tr> <td><font size="1">1920</font></td> <td><font size="1">62.0</font></td> <td><font size="1">61.1</font></td> <td><font size="1">0.87</font></td> </tr> <tr> <td><font size="1">1921</font></td> <td><font size="1">62.9</font></td> <td><font size="1">60.2</font></td> <td><font size="1">2.77</font></td> </tr> <tr> <td><font size="1">1922</font></td> <td><font size="1">64.1</font></td> <td><font size="1">64.1</font></td> <td><font size="1">0.02</font></td> </tr> <tr> <td><font size="1">1923</font></td> <td><font size="1">63.7</font></td> <td><font size="1">63.7</font></td> <td><font size="1">0.06</font></td> </tr> <tr> <td><font size="1">1924</font></td> <td><font size="1">63.8</font></td> <td><font size="1">64.9</font></td> <td><font size="1">1.16</font></td> </tr> <tr> <td><font size="1">…</font></td> <td><font size="1">…</font></td> <td><font size="1">…</font></td> <td><font size="1">…</font></td> </tr> </tbody></table> <p></p> <p></p> <p></p> <p></p> <p>FYI – due to the way PivotTables in Excel work, in order to make the “Diff” column I had to create a separate column <em>outside</em> of the PivotTable.  Also, I couldn’t simply click in the cells like “=” <em>click “-“ click </em>and drag down since Excel generates a call to GETPIVOTDATA that has embedded strings that don’t change when you drag down.  So I typed direct cell references (e.g. <font face="Courier New">=ABS(C28-D28)</font> ) and dragged the expression down along the PivotTable.</p> <h1>Part 2 – Pedantic Disclaimer</h1> <h1></h1> <p>Now as a side note, I do want to apologize for maybe being overly pedantic about this process.  I am trying to make sure I don’t have any of those moments where it’s very clear how you get from A to B and from C to D, but the path from B to C is a complete mystery, so I’m being liberal with throwing in all the side streets that come to mind, so pardon the stream of consciousness approach.</p> <h1>Part 3 – Back To the Topic – How tall am I anyway?</h1> <p>So now that I have determined the meaning and semantic of “Height” – I can inspect its value for each patient – so exciting!</p> <table border="0" cellspacing="0" cellpadding="0"><tbody> <tr> <td width="295"><font size="1">PatientGuid</font></td> <td width="84"><font size="1">Height</font></td> </tr> <tr> <td><font size="1">{0003CBB3-40EF-493F-9906-B8EF9F7E7B11}</font></td> <td><font size="1">63.3</font></td> </tr> <tr> <td><font size="1">{000D7634-7A8C-4CAD-895F-C241F19669DA}</font></td> <td><font size="1">60.0</font></td> </tr> <tr> <td><font size="1">{0016F7D0-EABF-4D55-BEA7-B08F7F8567FD}</font></td> <td><font size="1">70.7</font></td> </tr> <tr> <td><font size="1">{002FC25B-A761-4039-8507-E6FE59B19833}</font></td> <td><font size="1">66.0</font></td> </tr> <tr> <td><font size="1">…</font></td> <td><font size="1">…</font></td> </tr> </tbody></table> <p>However, the problem with this “Height” metric is that although I have a measure, I still don’t really understand how “tall” they are.  This is because our idea of “tallness” is purely based on how their height relates to <em>other people’s</em> heights.  To use some five-syllable words, the values shows how tall they are <em>quantitatively</em>, but not <em>qualitatively</em>.  In order to understand an individual’s “tallness” we need another metric.</p> <p>The way we create this metric is to use <a title="Predixion Software" href="http://predixionsoftware.com" target="_blank">Predixion</a> Insight’s data preparation tools to <em>normalize</em> the Height data.  Normalization is a process in which numeric data is transformed so that values can be more easily compared.  Predixion Insight supports three types of normalization – Z-Score, Min-Max, and Log.  While Min-Max and Log normalization are fascinating topics in their own right, in this case we will be using Z-Score normalization.</p> <p>In short, Z-Score normalization transforms a value by first subtracting the average of all values and then dividing the result by the standard deviation of the values.  Assuming a normal bell-curve style distribution, the resulting value has some interesting properties.</p> <p>The chart below from wikipedia shows that if, after normalization, a value has a Z-Score of 0, the value is perfectly average, whereas if the value is 1, it is larger than 84.1% of all values.  Therefore, if a person has a Height that has a Z-Score of 1, we can confidently say that person is “tall”, and if the Z-Score of their Height is –1, they most definitely are “short.”</p> <p><a href="http://upload.wikimedia.org/wikipedia/commons/b/bb/Normal_distribution_and_scales.gif"><img alt="File:Normal distribution and scales.gif" src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Normal_distribution_and_scales.gif/800px-Normal_distribution_and_scales.gif" width="425" height="246" /></a></p> <p>So walking through the normalization wizard in Predixion Insight is straightforward.  First you click on the Clean Data menu button in the Prepare Data chunk of the Insight Analytics ribbon.</p> <p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TUwxQ0IcpYI/AAAAAAAAAHY/uOGklCaUwK0/image15.png?imgmax=800" width="185" height="211" /> </p> <p>Next you read the <em>very</em> helpful and instructive text on the introductory page and advance to the data selection page where you choose that you want PowerPivot data and the SyncPatient table.</p> <p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TUwxRF6HykI/AAAAAAAAAHc/jZjxLVH2Tso/image16.png?imgmax=800" width="628" height="484" /> </p> <p>Then you choose which columns to normalize.  Good thing to remember for future use is that you can normalize <em>all</em>  the numeric columns in a table at once, which is a great time saver.  In this case, there’s only a single column that is useful to normalize, so I ensure that only the Height column in selected.</p> <p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TUwxRWcwFvI/AAAAAAAAAHg/k4wqM0X-yag/image22.png?imgmax=800" width="628" height="484" /> </p> <p>Finally, we choose “Normalization Options” which is the normalization type and some grouping options.  Grouping Options we will get back to, but you can select the different types of normalization to read the fun and informative descriptive text.  Alternatively, you can leave the defaults as they are and just click “Finish”.</p> <p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TUwxRivaowI/AAAAAAAAAHk/BfGo5pA7ezg/image28.png?imgmax=800" width="628" height="484" /> </p> <p>The effect of the wizard is to add a new “Height Normalized” column to your workbook with the following expression:</p> <p><font face="Courier New">=IF(ISBLANK('SyncPatient'[Height]), BLANK(), <br />   IFERROR(('SyncPatient'[Height]- <br />    AVERAGE('SyncPatient'[Height]))/SQRT( <br />     SUMX('SyncPatient', IF( <br />        ISBLANK('SyncPatient'[Height]),0, <br />          ('SyncPatient'[Height]- <br />           AVERAGE('SyncPatient'[Height]))^2)) <br />   /(COUNT('SyncPatient'[Height])-1)), 0))</font></p> <p>You might want to copy and paste that somewhere for safe-keeping so you can do a search and replace whenever you want to use it in PowerPivot, or you can just use Predixion Insight and let the wizard do it for you.</p> <p>Taking the table from above of patients and heights and adding the new “Height Normalized” column, you now have some <em>qualitative </em>information </p> <table border="0" cellspacing="0" cellpadding="0"><tbody> <tr> <td width="295"><font size="1">PatientGuid</font></td> <td width="48"><font size="1">Height</font></td> <td width="125"><font size="1">Height Normalized</font></td> </tr> <tr> <td><font size="1">{0003CBB3-40EF-493F-9906-B8EF9F7E7B11}</font></td> <td><font size="1">63.3</font></td> <td><font size="1">-0.7</font></td> </tr> <tr> <td><font size="1">{000D7634-7A8C-4CAD-895F-C241F19669DA}</font></td> <td><font size="1">60.0</font></td> <td><font size="1">-1.5</font></td> </tr> <tr> <td><font size="1">{0016F7D0-EABF-4D55-BEA7-B08F7F8567FD}</font></td> <td><font size="1">70.7</font></td> <td><font size="1">1.1</font></td> </tr> <tr> <td><font size="1">{002FC25B-A761-4039-8507-E6FE59B19833}</font></td> <td><font size="1">66.0</font></td> <td><font size="1">0.0</font></td> </tr> <tr> <td><font size="1">{0032D035-1A8C-40FC-B4A7-6A5B8DA93D06}</font></td> <td><font size="1">70.0</font></td> <td><font size="1">1.0</font></td> </tr> <tr> <td><font size="1">{00453E13-85AD-45B9-9D93-C5BCF0ABEA54}</font></td> <td><font size="1">56.6</font></td> <td><font size="1">-2.3</font></td> </tr> <tr> <td><font size="1">{005B6A30-C5CE-4588-B77B-2550174D6CE1}</font></td> <td><font size="1">66.0</font></td> <td><font size="1">0.0</font></td> </tr> <tr> <td><font size="1">…</font></td> <td><font size="1">…</font></td> <td><font size="1">…</font></td> </tr> </tbody></table> <p>You can now tell that the first two patients are “short” and the third is quite “tall”.  However, and you knew this was coming, and not just because of my effective foreshadowing above, the data can still play tricks on you.</p> <p>If I go and add an additional feature to the report – Gender – we see that there is more interpretation of the data to be done.  Basically we see that the normalized values are telling us that women are “short” (they have mostly negative Z-Scores) and men are “tall” (they have mostly positive Z-Scores).  Since our “business knowledge” of the world around us tells us “duh” we need to do a little more work.  Wouldn’t it be nice if we could normalize women separately from men?</p> <table border="0" cellspacing="0" cellpadding="0"><tbody> <tr> <td width="295"><font size="1">PatientGuid</font></td> <td width="70"><font size="1">Gender</font></td> <td width="48"><font size="1">Height</font></td> <td width="125"><font size="1">Height Normalized</font></td> </tr> <tr> <td><font size="1">{0003CBB3-40EF-493F-9906-B8EF9F7E7B11}</font></td> <td><font size="1">F</font></td> <td><font size="1">63.3</font></td> <td><font size="1">-0.7</font></td> </tr> <tr> <td><font size="1">{000D7634-7A8C-4CAD-895F-C241F19669DA}</font></td> <td><font size="1">F</font></td> <td><font size="1">60.0</font></td> <td><font size="1">-1.5</font></td> </tr> <tr> <td><font size="1">{0016F7D0-EABF-4D55-BEA7-B08F7F8567FD}</font></td> <td><font size="1">M</font></td> <td><font size="1">70.7</font></td> <td><font size="1">1.1</font></td> </tr> <tr> <td><font size="1">{002FC25B-A761-4039-8507-E6FE59B19833}</font></td> <td><font size="1">M</font></td> <td><font size="1">66.0</font></td> <td><font size="1">0.0</font></td> </tr> <tr> <td><font size="1">{0032D035-1A8C-40FC-B4A7-6A5B8DA93D06}</font></td> <td><font size="1">M</font></td> <td><font size="1">70.0</font></td> <td><font size="1">1.0</font></td> </tr> <tr> <td><font size="1">{00453E13-85AD-45B9-9D93-C5BCF0ABEA54}</font></td> <td><font size="1">F</font></td> <td><font size="1">56.6</font></td> <td><font size="1">-2.3</font></td> </tr> <tr> <td><font size="1">{005B6A30-C5CE-4588-B77B-2550174D6CE1}</font></td> <td><font size="1">M</font></td> <td><font size="1">66.0</font></td> <td><font size="1">0.0</font></td> </tr> </tbody></table> <p></p> <p>Well as we have it, we can.  Run through the Normalization Wizard just as above but stop on the Normalization Options page.  This time choose the Grouping Option “Group By” and select “Gender.”  This does just what we want.  Women’s heights will be normalized among women, and men’s against men.</p> <p><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TUwxSOIjn0I/AAAAAAAAAHo/k36AWSGfVaQ/image37.png?imgmax=800" width="628" height="484" /> </p> <p>Again, Predixion Insight updates the PowerPivot table by adding a calculated column – this time entitled “Height Normalized By Gender”.  I’m not going to put the expression here – it’s basically the same as the one above but with several CALCULATE and ALLEXCEPT expressions thrown in for good measure – besides, the curious can see the expression by downloading the resulting workbook and taking a look for yourself.</p> <p>This time the results look a little different</p> <table border="0" cellspacing="0" cellpadding="0"><tbody> <tr> <td width="125"><font size="1">PatientGuid</font></td> <td width="70"><font size="1">Gender</font></td> <td width="48"><font size="1">Height</font></td> <td width="125"><font size="1">Height Normalized</font></td> <td width="189"><font size="1">Height NormalizedByGender</font></td> </tr> <tr> <td><font size="1">{0003CBB3-…</font></td> <td><font size="1">F</font></td> <td><font size="1">63.3</font></td> <td><font size="1">-0.7</font></td> <td><font size="1">-0.1</font></td> </tr> <tr> <td><font size="1">{000D7634-…</font></td> <td><font size="1">F</font></td> <td><font size="1">60.0</font></td> <td><font size="1">-1.5</font></td> <td><font size="1">-1.3</font></td> </tr> <tr> <td><font size="1">{0016F7D0-…</font></td> <td><font size="1">M</font></td> <td><font size="1">70.7</font></td> <td><font size="1">1.1</font></td> <td><font size="1">0.5</font></td> </tr> <tr> <td><font size="1">{002FC25B- …</font></td> <td><font size="1">M</font></td> <td><font size="1">66.0</font></td> <td><font size="1">0.0</font></td> <td><font size="1">-1.0</font></td> </tr> <tr> <td><font size="1">{0032D035-…</font></td> <td><font size="1">M</font></td> <td><font size="1">70.0</font></td> <td><font size="1">1.0</font></td> <td><font size="1">0.3</font></td> </tr> <tr> <td><font size="1">{00453E13-…</font></td> <td><font size="1">F</font></td> <td><font size="1">56.6</font></td> <td><font size="1">-2.3</font></td> <td><font size="1">-2.5</font></td> </tr> <tr> <td><font size="1">{005B6A30-…</font></td> <td><font size="1">M</font></td> <td><font size="1">66.0</font></td> <td><font size="1">0.0</font></td> <td><font size="1">-1.0</font></td> </tr> <tr> <td>…</td> <td>…</td> <td>…</td> <td>…</td> <td>…</td> </tr> </tbody></table> <p></p> <p>This time we see that that first woman isn’t really all that short – just a hair under average, and the first man isn’t quite as tall as we thought.  With this information we now have an accurate understanding of the “tallness” of each individual that makes sense in terms of how we comprehend the real world.</p> <p>Hey – to make it even nicer, we can add some quick conditional formatting to the normalized by gender column to get a quick heatmap of gender relative tallness – something you simply could not do without this technique.</p> <p><a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TUwxSaAZR1I/AAAAAAAAAHs/pae3Y29rzTM/s1600-h/image%5B4%5D.png"><img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TUwxStI7EeI/AAAAAAAAAHw/A4H_9ldtNKA/image_thumb%5B1%5D.png?imgmax=800" width="558" height="283" /></a> </p> <p>Although this example is on a sample dataset, you can easily apply this technique to a variety of applications.  For example, if you were looking at mean time to failure of a common mechanical part, you could normalize by the device in which it was installed.  Or if you were interested in housing prices, you may normalize by zip code.  </p> <p>For your convenience, I put the end result of this experiment on <a href="http://cid-632f41683e4abc1a.office.live.com/self.aspx/.Public/Practice%20Fusion%20-%20Blog2.xlsx" target="_blank">SkyDrive</a> as well.,</p> <p>In case anyone is interested, my current “tallness” is between 0.1 and 0.2, so I guess I’m “tall-ish”, but not particularly so :).</p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com0tag:blogger.com,1999:blog-2493137919500407537.post-65913126783356566782011-01-24T11:39:00.001-08:002011-01-25T09:43:56.283-08:00Cleaning Azure Marketplace Data with PowerPivot and Predixion Insight<a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TT3VPLw2j4I/AAAAAAAAAE4/kLMR1EHx7pw/s1600-h/image%5B70%5D.png"><img alt="image" border="0" height="150" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TT3VPt0YCnI/AAAAAAAAAE8/yukvdz4U79Q/image_thumb%5B36%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="334" /></a><br />
Thanks to some connections at Microsoft I was made aware of some very interesting data that was published on the Windows Azure Data Marketplace by Practice Fusion – deidentified medical records of 5000 patients. Of course, I had to take a look. What I found was a very compelling and interesting set of data, that was more interesting when looked through the lens of <a href="http://predixionsoftware.com/" target="_blank" title="Predixion Software">Predixion</a> Insight. Of course, as with any data that you may get from anywhere, there are first some issues in the data itself that need to be “corrected” before used for any analysis.<br />
In this post I’ll show you how to get that data, explore a little bit of it, and clean up one particular part. There’s lots more to do, but this is just getting your feet wet.<br />
In order to follow along at home, you will need Excel 2010, PowerPivot (get <a href="http://www.powerpivot.com/download.aspx" target="_blank">here</a>), Predixion Insight for Excel (get <a href="http://www.predixionsoftware.com/predixion/products.aspx" target="_blank">here</a>).<br />
<h1>Step 1 – Loading the data</h1>After you have all that software installed and running on your machine, you will want to go to the <a href="https://datamarket.azure.com/dataset/8c30d4d3-0846-4b08-9991-104958f24ca7" target="_blank">Practice Fusion Data page</a> on Windows Azure and subscribe to their feed. If you haven’t subscribed to any the data market before, you will first have to agree to the MS terms of service to subscribe to the data market. After that you will have to agree to Practice Fusion’s terms of service and that will give you access to the data. Ignore the button there that allows you to load data directly into PowerPivot.<br />
The most efficient way to get all of the data is to go directly into PowerPivot, so launch Excel (or just create a new workbook since you probably already have Excel running anyway) and click on the PowerPivot button to get to the PowerPivot window.<br />
<a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VP93mYKI/AAAAAAAAAFA/yhnduZBtRUI/s1600-h/image7.png"><img alt="Get External Data" border="0" height="93" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TT3VQb-hSkI/AAAAAAAAAFE/Va6D2kvUP7A/image_thumb3.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="Get External Data" width="297" /></a><br />
<div align="center"><strong><em>PowerPivot Ribbon Section</em></strong></div>Inside the PowerPivot window you will need to click the “From Azure Marketplace” button in the Get External Data chunk of the ribbon shown below. If you’re saying “Hey! I don’t have that button!” don’t be afraid. Microsoft quietly updated PowerPivot at least twice since it was released – go and download the latest release and try again – trust me – it will be there.<br />
The From Azure Marketplace Button launches the import wizard where you want to paste in the root URL for the data feed which is: <a href="https://api.datamarket.azure.com/Data.ashx/PracticeFusion/MedicalResearchData/">https://api.datamarket.azure.com/Data.ashx/PracticeFusion/MedicalResearchData/</a><br />
Also you will need to enter your account key, which, up to this point, you probably didn’t know you have. However clicking on the Find button will launch a web page displaying your unreadable key that you can copy and paste here.<br />
<a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TT3VQiGNNMI/AAAAAAAAAFI/21eVMh77INw/s1600-h/image15.png"><img alt="Specifying Data Feed" border="0" height="484" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VRNiJUrI/AAAAAAAAAFM/Tsu89uvK7ss/image_thumb7.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="Specifying Data Feed" width="473" /></a><br />
<div align="center"><strong><em>Table Import Wizard – Specifying the Data Feed</em></strong></div><div align="left">After that, select all the tables and click finish. Loading the data from the marketplace takes a little longer than you would expect – not “come back tomorrow” kind of long, but definitely the “I can’t be late for my daughter’s junior high school recital and if I take the laptop, I’ll lose my wireless connection” kind of long. So click finish and go to lunch, or get a coffee, or catch up on reading the Predixion Insight documentation or something.</div><a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TT3VRdT94YI/AAAAAAAAAFQ/tIeadOAGAzM/s1600-h/image.png"><img alt="Selecting Tables" border="0" height="484" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TT3VR5j0BuI/AAAAAAAAAFU/BQj4sPJLBiQ/image_thumb.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="Selecting Tables" width="473" /></a><br />
<div align="center"><strong><em>Table Import Wizard – Selecting all the Practice Fusion Tables</em></strong></div><h1>Step 2 – Initial Data Exploration</h1>One of the somewhat interesting things about loading data from an unknown source is that you have absolutely no idea what you’re looking at. This is where you can leverage the awesomeness of Predixion Insight’s Profile Data tool. This tool provides a quick summary of the data in an Excel worksheet or PowerPivot table so you can at a very minimum determine if the data makes <em>any sense at all</em>.<br />
<div align="center"><a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VSIqtirI/AAAAAAAAAFY/rIWcNr6f0oM/s1600-h/image%5B3%5D.png"><img alt="image" border="0" height="155" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VSWDwcOI/AAAAAAAAAFc/8us-5ZqYU5Y/image_thumb%5B1%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="244" /></a><em><strong>Profile Data Command in Predixion Insight</strong></em> </div>To run Profile Data, switch back to the Excel interface, click on the <strong>Insight Analytics</strong> ribbon and choose <strong>Profile Data</strong> under the <strong>Explore Data</strong> menu button. In the Profile Data Wizard, you just need to select that you are choosing PowerPivot and the source of the data and which table you want – in this case, let’s use the SyncChart table which documents the basic patient measurements over a series of doctor’s visits.<br />
<div align="center"><a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VSsXjc3I/AAAAAAAAAFg/WeCkzZC74rQ/s1600-h/image%5B16%5D.png"><img alt="Profile Data Select Data" border="0" height="488" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TT3VTlK-8rI/AAAAAAAAAFk/sCLcZe5EaQY/image_thumb%5B8%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="Profile Data Select Data" width="632" /></a><em><strong>Profile Data Select Data Page</strong></em></div><div align="left">Next you choose the type of statistics – I really like using “advanced” statistics because it makes me feel smarter ;), but for initial exploration the default “basic” stats will suffice, so stick with the default and just click “Finish”</div><a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TT3VUGX_BgI/AAAAAAAAAFo/9svKwiQ_4KQ/s1600-h/image%5B18%5D.png"><img alt="Profile Data Options" border="0" height="488" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TT3VUbH2TvI/AAAAAAAAAFs/4THMzsmiVfc/image_thumb%5B10%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="Profile Data Options" width="632" /></a> <br />
<div align="center"><strong><em>Profile Data Options</em></strong></div><div align="left">As a result, Predixion Insight will scan the table and produce a report on all of the columns in the PowerPivot table, as well as a handy index sheet to easily find the report in your workbook (which at this point should contain the report, the index, and three blank sheets, so it’s no biggie, but <em>you know</em> it will get more populated as you go on…). I can’t recommend strongly enough that, for your own sanity, you go ahead and run <strong>Profile Data</strong> on all of the tables in PowerPivot (see, that index will come in handy very quickly….)</div><div align="left">In any case, let’s have a look at what Profile Data discovered.</div><a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TT3VU8BHFwI/AAAAAAAAAFw/EIaAP4qBkyw/s1600-h/image%5B19%5D.png"><img alt="image" border="0" height="319" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TT3VVDEgkkI/AAAAAAAAAF0/Z1LR5Tfoj7A/image_thumb%5B11%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="644" /></a> <br />
<div align="center"><strong><em>Results of Profile Data on the SyncChart table</em></strong></div><div align="left">First of all, Profile data did us a great service by simply telling us how many rows and columns are in the table – since we just randomly (well, not quite) pulled some data from the internet we had no idea what was in there, Profile Data let us know that we have 59,193 rows and 12 columns, and that the column <em>ChartGuid</em> is likely a key since it uniquely identifies each row (i.e, there are 59,193 unique values of <em>ChartGuid</em>).</div><div align="left">Next the Profile Data report breaks the results down into a <em>Continuous</em> section and a <em>Discrete</em> section. The Continuous section contains information on columns that contain numerical data. The Discrete section contains information on columns that contain non-numerical data, or numerical data that “look like” they could be considered categorical, such as a column that contains just 0’s and 1’s.</div><div align="left">One thing that stands out fairly immediately is that there is a lot of missing data in this table. All the continuous columns, save <em>VisitYear, </em>have over 23,000 blanks, and the <em>HeartRate</em> column is completely empty – i.e. it has 59,193 blanks in 59,193 rows.</div><div align="left">Feel free to examine the resulting report for interesting pieces of information but for expediency’s sake, I want to point out some data that <strong>just looks wrong</strong>. Take a look at the ranges of values for Height, Weight, and BMI</div><div align="center"><a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VVk0TLNI/AAAAAAAAAF4/-t1B1F-8l8Q/s1600-h/image%5B22%5D.png"><img alt="image" border="0" height="66" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VV0b6XFI/AAAAAAAAAF8/v1Vub9o6eso/image_thumb%5B12%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="244" /></a> </div><div align="center"><strong><em>Focus on Height, Weight, BMI</em></strong></div><div align="left">Now, not having any background in the data, we don’t know what units height and weight are measured in, but regardless of the units, the ranges are simply too extreme – given inches, feet, or centimeters, you simply don’t have the range of heights from 5 to 120 in human existence! Similarly with weight! BMI is unitless, but still with the range from 0.07 to 5511 you have quite the unnatural <a href="http://en.wikipedia.org/wiki/Jack_Sprat" target="_blank">Jack Sprat</a> situation going on here.</div><div align="left">Looking at the results for the height column a little further indicates that there is a mean of 65 and standard deviation of about 5. This would indicate (to me, at least) that the intended units are inches, since the average height of humans is around 65 inches and not 65 centimeters or 65 feet or miles or whatever. So I’ll go ahead and decide that it’s inches. It would make sense that the weight would be in pounds since the height measurement is not using the metric system either, but looking at the average weight of 180 indicates that the value, while still high (this is America, anyway) isn’t likely to be kilograms or stone or other values.</div><div align="left">Still, we need to figure out what is going on with those other values, so let’s move on to Step 3.</div><h1>Step 3 – Cleaning up the data</h1>Cleaning up data like this is never automatic – you need to use tools and apply your judgment. Just like we were able to use the Profile Data tool to help us understand the nature and deduce the units of the data, we need to apply the same common sense treatment to cleaning the data.<br />
The first question to ask is what do you really want to do with the erroneous data? Since these are medical records recording actual values of patients, the safest thing to do may be simply to null out the data for values that are out of range. However, since there are several rows for each patient, and people (once fully grown) generally don’t change in height too much, it may be OK to simply take a height measurement from another chart reading for the same patient. It really depends on how you’re going to use the data, so I’ll perform both methods.<br />
Let’s take the “whack the bad data” approach first. To do this, I’m going to use the <strong>Outliers </strong>tool from Predixion Insight which is under the Clean Data menu button on the Insight Analytics ribbon.<br />
<div align="center"><a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TT3VWJwAHQI/AAAAAAAAAGA/rGqAYl1CM84/s1600-h/image%5B33%5D.png"><img alt="image" border="0" height="209" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VWQ_82JI/AAAAAAAAAGE/l5hnvr3zzKE/image_thumb%5B17%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="133" /></a> </div><div align="center"><strong><em>Clean Data/Outliers</em></strong></div>Selecting this tool brings up the Remove Outliers ribbon which allows you to truncate the extreme ranges of numerical values or remove categorical values that occur infrequently (or even too frequently) from your data. First you need to select that you are interested in the PowerPivot table called SyncChart (as above) and then you select the column from which you want to remove the outliers. You can select via the drop down or by clicking on the column preview.<br />
<a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VXBhp5VI/AAAAAAAAAGI/cx2mw400h-c/s1600-h/image%5B37%5D.png"><img alt="image" border="0" height="488" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TT3VXQr-q4I/AAAAAAAAAGM/38zLZHYZ9cM/image_thumb%5B19%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="632" /></a> <br />
<div align="center"><em><strong>Outliers Column Selection</strong></em></div>Next this brings up a chart of the values in the Height column. The higher the curve the more values are in that range. Personally, I like to increase the resolution of the chart to the maximum (100) so I can get the best view of the data – this sometimes doesn’t work well if there are a small number of data points, but this data set is sufficient. As you can see most of the data lies in the middle of the range with extreme values tapering off quickly. You can mouse over any point on the cart to see the range that point represents and how many rows have values in that range.<br />
<a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TT3VXhHQk_I/AAAAAAAAAGQ/DZ7NecLE0T0/s1600-h/image%5B41%5D.png"><img alt="image" border="0" height="488" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TT3VYMZCsAI/AAAAAAAAAGU/RjEU8sMLJQ8/image_thumb%5B21%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="632" /></a> <br />
<div align="center"><em><strong>Remove Outliers Chart</strong></em></div>To remove the outliers you simply drag the thumbs on the slider above the chart to whatever cutoff point you wish, or type in the limits into the text boxes directly. When you click next you get to the “what do I do with this?” screen. Again, the answer really depends on your application. For some problem spaces, it may be OK to set the extreme values to the limits or to the mean. In this case, we’re simply going to null out the value – the default option – and click next.<br />
<a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TT3VYctup7I/AAAAAAAAAGc/Jv-cIkEVYTc/s1600-h/clip_image002%5B4%5D.jpg"><img alt="Remove Outliers Options" border="0" height="488" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TT3VY4XRZ2I/AAAAAAAAAGg/tZouJwBkqBg/clip_image002_thumb%5B1%5D.jpg?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="Remove Outliers Options" width="632" /></a><br />
<div align="center"><strong><em>What do we do with a drunken data entry operator?</em></strong></div>Which leads us to the final screen - name the result column – it’s good to pick a descriptive name. Finishing the wizard creates a calculated column in PowerPivot that nulls out any data that’s out of range. This has an added benefit in that if you refresh the table with new data, any new out of range data will automatically be set to null.<br />
<a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TT3VZadyXhI/AAAAAAAAAGk/YpxmdDtigf4/s1600-h/image%5B49%5D.png"><img alt="Select Destination" border="0" height="488" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VZoaX3MI/AAAAAAAAAGo/xUA-BvlWXdY/image_thumb%5B25%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="Select Destination" width="632" /></a> <br />
<div align="center"><strong><em>Finishing the Remove Outlier Wizard</em></strong></div>Inspecting PowerPivot, we see that Predixion Insight injected the following DAX expression:<br />
<span style="font-family: Courier New; font-size: x-small;">=IF(AND('SyncChart'[Height]>= 48.99, 'SyncChart'[Height] <br />
<85.28),'SyncChart'[Height],BLANK())</span><br />
This expression simply copies the in-range values and replaces the out of range values with blanks.<br />
Before looking at the results, I want to try the other method of removing outliers I proposed – using the average of a patient’s other visits. To do this I used the following custom DAX expression:<br />
<span style="font-family: Courier New; font-size: x-small;"> CALCULATE(AVERAGEX(SyncChart, SyncChart[Height]), <br />
ALLEXCEPT(SyncChart, SyncChart[PatientGuid]), <br />
SyncChart[Height]>48.99, <br />
SyncChart[Height]<85.28)</span><br />
This expression basically says, calculate the average of the Height column by first removing all filters except for a filter on the current rows PatientGuid Column, and also exclude any rows where the Height is out of range. I copied the IF expression created by Predixion Insight and replaced the part that says “BLANK()” with this custom expression. I renamed the new column “Height with Outliers Replaced”<br />
The next thing I want to do before looking further is calculating the BMI. The BMI was included with the data, but is really a derived column. The formula for BMI using inches and pounds is BMI=Weight * 703/Height^2. In order to protect against divide by zero errors, I put an IF in the DAX expression to check for blanks and created two columns like this:<br />
<span style="font-family: courier ne; font-size: x-small;"> =IF(ISBLANK(SyncChart[Height with Outliers Removed]), <br />
BLANK(), <br />
SyncChart[Weight]*703/POWER(SyncChart[Height with Outliers Removed],2))</span><br />
(Obviously one had the Outliers Replaced version of the column).<br />
To see the impact of the results, I re-ran Profile Data against the SyncChart table with the new columns. Here are the results:<br />
<a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TT3VZ-kVBhI/AAAAAAAAAGs/Tr1dDHysl1Y/s1600-h/image%5B53%5D.png"><img alt="Profile Data Results" border="0" height="105" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VaIG5sbI/AAAAAAAAAGw/Wm0DIJvgp_0/image_thumb%5B27%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="Profile Data Results" width="644" /></a> <br />
The first thing I noticed is that the Height with Outliers Replaced column has no blanks – this means two things – one, that every patient has at least one doctor’s visit where they have a height measurement, and two, I should have checked for ISBLANK in my expression to calculate that Height value!<br />
The other things to notice is that the fixed height values are much more in line with reality - we’ve eliminated the pixies and the hill giants (frost giants are much taller). Also, the recomputed BMI values are starting to be more realistic as well. In fact we reduced the maximum BMI from over 5500 to 250 by only removing 317 values from 59193 rows – about 1/2%! <br />
To complete the exercise and ensure that the BMI values were reasonable we would have to clean the Weight column as well, which we do by running through the Remove Outliers wizard again. However, there’s actually one more trick. When looking at the outliers for the Weight column, we get a chart that looks like this:<br />
<a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TT3VaXGWV2I/AAAAAAAAAG0/VzEE7GE_vBc/s1600-h/image%5B57%5D.png"><img alt="Specify Thresholds" border="0" height="488" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TT3VarccVPI/AAAAAAAAAG4/mR6ENkitFyQ/image_thumb%5B29%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="Specify Thresholds" width="632" /></a> <br />
This occurs when you have a range of outliers that includes extremely large outliers that obscures the resolution of the data. To get a better view of the data, you can click the logarithm button – circled below – and get a much more usable visual – I also increased the resolution:<br />
<a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TT3VgSSPXRI/AAAAAAAAAG8/U3jKqJxzPEE/s1600-h/image%5B61%5D.png"><img alt="Specify Thresholds Log" border="0" height="488" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TT3VgjfiLaI/AAAAAAAAAHA/qfGdl0tkN74/image_thumb%5B31%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="Specify Thresholds Log" width="632" /></a> <br />
I removed the extreme outliers (rather arbitrarily choosing a range of about 33 – 510 lbs, which is still probably pretty extreme) replacing them with nulls. I didn’t try to do the “replace with other visits” trick, since people’s weight can fluctuate quote a bit from visit to visit.<br />
Creating a new BMI column and re-running the Profile Data gave me results like this (other rows hidden):<br />
<a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TT3VhW_d7qI/AAAAAAAAAHE/FnhfPurxxJE/s1600-h/image%5B65%5D.png"><img alt="Profile Data" border="0" height="76" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TT3VhygKGGI/AAAAAAAAAHI/Ts-MoFng7V4/image_thumb%5B33%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="Profile Data" width="644" /></a> <br />
You can see the range of BMI values went from an original 5511 to a more reasonable 78. It’s very likely (pretty much guaranteed) that I could have more tightly truncated both the height and weight variables to make them more inline with reality, but this is a good start. <br />
Some deeper inspection of the data may be necessary to find the correct boundaries, and of course it always depends on how you are going to use the data. For predictive purposes, it generally is better to eliminate extreme values even if they <em>are</em> valid since they can skew the results. For traditional BI reporting – e.g. “how many visits were by people over 9 feet tall” – you generally want to keep any valid value, no matter how extreme. In either case, however, you do want to eliminate <em>invalid</em> values. Luckily with this data set, by combining common sense with the tools at hand we’re able to carve away at the bad data to make the remaining data useful for analysis.<br />
That just may happen in a future post… ;)Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com22tag:blogger.com,1999:blog-2493137919500407537.post-59353950299753959122010-12-09T17:01:00.001-08:002010-12-09T17:01:14.567-08:00Predixion Insight Update<p><img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="left" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TQF7yqkKLtI/AAAAAAAAAD4/CKtcbT9VR4Q/image%5B7%5D.png?imgmax=800" width="68" height="300" /></p> <p> </p> <p>We just launched a new, incremental version of <a title="Predixion Software" href="http://predixionsoftware.com" target="_blank">Predixion</a> Insight with a whole bucket load of new features – some subtle, some more obvious and in your face.  Actually – many you shouldn’t even notice – just have a feeling of the product being “better” all around.</p> <p>In this posting I’ll just give a quick overview of some of the new additions and later I’ll get around to providing details on all the goodies.  It’s better if you just try them our for yourself anyway – if you are a current user, just log in and you’ll be prompted to update, and if not <a href="http://www.predixionsoftware.com/predixion/products.aspx" target="_blank">what are you waiting for</a>?  (One great new “feature” is that we’ve extended the free trial to 30-days, to give you ample time to check everything out!)</p> <p>So I’m going to tell you about new features in my trademark stream of consciousness – no particular order kind of way.</p> <p><strong>Normalization</strong></p> <p><a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TQF7ywYl6JI/AAAAAAAAAD8/OBnxMEQ7Zug/s1600-h/image%5B17%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="left" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TQF7zVXKnvI/AAAAAAAAAEA/T8JRcPlHmso/image_thumb%5B10%5D.png?imgmax=800" width="138" height="211" /></a> </p> <p>The Normalization allows you to normalize data in Excel or PowerPivot by Z-Score, Min-Max, or Logarithm.  It’s actually really cool because you can normalize <em>all</em> the data at the same time instead of column per column.    Also, if you’re normalizing PowerPivot data, you can <em>conditionally</em> normalize so you can get Z-scores by group.  Way cool.</p> <p> </p> <p> </p> <p> </p> <p><strong>Explore Data</strong></p> <p><a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TQF7zj0IHdI/AAAAAAAAAEE/E-JI9ZY1ffk/s1600-h/image%5B21%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="left" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TQF7z4dISCI/AAAAAAAAAEI/FOrerjLa9Xw/image_thumb%5B12%5D.png?imgmax=800" width="170" height="172" /></a> </p> <p>We added additional data exploration options so the “Explore Data” button became the “Explore Data” menu and the previous “Explore Data Wizard” became the “Explore Column Wizard”.</p> <p>Profile Data provides descriptive statistics about your Excel or PowerPivot data.  I won’t say too much about it here since Bogdan already wrote up an <a href="http://www.bogdancrivat.net/dm/archives/76" target="_blank">excellent post</a> just on this feature!  </p> <p>Also, Explore Column has an added button to allow you to look at your data (and bin it!) in log space.  All your data scrunched up on the side?  Click the “log” button and see it spread out with a nicer, friendlier distribution.  This button also shows up in the Clean Data/Outliers Wizard as well.</p> <p><strong>Getting Started</strong></p> <p><a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TQF70JpzTnI/AAAAAAAAAEM/2yLUo8aMjQ8/s1600-h/image%5B25%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="left" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TQF70k4RzZI/AAAAAAAAAEQ/rAB3GcpVq_g/image_thumb%5B14%5D.png?imgmax=800" width="244" height="160" /></a> </p> <p>I guess I really should have put this one first, despite this being stream of consciousness.  Anyway, noting that customers were having a hard time finding our sample data, <a href="http://www.predixionsoftware.com/predixion/Help" target="_blank">help</a> and our <a href="http://www.predixionsoftware.com/predixion/Support.aspx" target="_blank">support forums</a>, among other things, we added a handy (and pretty) Getting Started page.</p> <p> </p> <p><strong>PMML Support</strong></p> <p><a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TQF703XQ6GI/AAAAAAAAAEU/HFefymek5CU/s1600-h/image%5B29%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="left" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TQF71CYxYWI/AAAAAAAAAEY/T6zl3xfhc-8/image_thumb%5B16%5D.png?imgmax=800" width="129" height="60" /></a> </p> <p>In the Manage My Stuff dialog you can import models from SAS, SPSS, R, and any other PMML source.  You can then use all of the model validation tools and the query wizard to score and validate against data in Excel and PowerPivot.  Way to leverage your extend investment in predictive analytics tools to the Excel BI user base!</p> <p><strong>Insight Index and Insight Log</strong></p> <p><a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TQF71BtZFkI/AAAAAAAAAEc/FwsNZvf8TJs/s1600-h/image%5B34%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="left" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TQF71R80acI/AAAAAAAAAEg/BJ4ADPKwQBA/image_thumb%5B23%5D.png?imgmax=800" width="47" height="52" /></a> The Insight Index is an automatically created guide to all of the predictive insights and results you create with Predixion Insight.  This guide provides descriptions of every generated report along with links to each report worksheet that are automatically updated when you rename worksheets and change report titles.  The Insight Log automatically maintains a trace of all Predixion Insight operations that generate Visual Macros.  These operations can be re-executed individually or copied to additional worksheets to easily create custom predictive workflows.  Both of these features can be turned on or off in the options dialog.</p> <p><strong>VBA Connexion</strong></p> <p>Predixion Insight now supports VBA programming interfaces allowing you to create custom predictive applications inside Excel.  It’s like super hard – for example, look at this excerpt I wrote for a demo that applies new data to an existing time series model so you can forecast off of a short series:</p> <p><a href="http://lh6.ggpht.com/_ESUcEp-EcL8/TQF71kU6ZtI/AAAAAAAAAEk/FjmQRewrQNM/s1600-h/image%5B38%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TQF71wkkw7I/AAAAAAAAAEo/WkKGriQYWMs/image_thumb%5B25%5D.png?imgmax=800" width="593" height="58" /></a> </p> <p>Oh, wait – it’s not hard – it’s easy!</p> <p><strong>Visual Macros</strong></p> <p><a href="http://lh5.ggpht.com/_ESUcEp-EcL8/TQF72HXrCqI/AAAAAAAAAEs/Jg_fTHaUnf0/s1600-h/image%5B42%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="left" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TQF72R049zI/AAAAAAAAAEw/OOwa4BcI2Hc/image_thumb%5B27%5D.png?imgmax=800" width="44" height="69" /></a> </p> <p>While we’re on the topic of Visual Macros, I should mention that Visual Macros now support all of the Insight Now tasks as well as the Insight Analytics tasks.  This means that you can easily encapsulate any operation we perform on the server in a macro and use it to string together your own workflows.</p> <p><strong>Other stuff working better</strong></p> <p>There are always little things here and there that are better as well that you don’t notice until you get there – the Query Wizard is cleaner and easier to use.  Exceptions – even ones caused by data entry – give you an immediate way to provide feedback directly to the dev team.  And I can’t even begin to talk about how much better our <a href="http://predixionsoftware.com" target="_blank">website</a> is!  (Mostly because I’m out of time to write this post – HA!)</p> <p>Really – try it out and let me know what you think.  If you’ve tried it before go check out what’s new – if not, now is a wonderful time to do so, it’s lookin’ pretty good.</p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com0tag:blogger.com,1999:blog-2493137919500407537.post-16542810664634314972010-09-14T13:00:00.001-07:002010-09-14T13:00:40.185-07:00And launched! (plus some secrets)<p>Yesterday we officially launched our first  offering from  <a title="Predixion Software" href="http://predixionsoftware.com" target="_blank">Predixion</a> Software – titled “Predixion Insight”.  For those who haven’t been following along, Predixion Insight is a cloud predictive analytics service access through an embedded Exce<img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="yippee" border="0" alt="yippee" align="right" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TI_UYQ73hEI/AAAAAAAAADg/Pr3U6z8qH8Q/yippee%5B15%5D.png?imgmax=800" width="151" height="240" />l client (Predixion Insight for Excel) that has absolutely no infrastructure or procurement friction and works with Excel 2007 and botht he 32-bit and 64-bit versions on Microsoft Excel.  Oh – and it’s also directly integrated with Microsoft’s new PowerPivot offering allowing powerful analytics, business modeling and now, predictive analytics, right in the Excel working environment.</p> <p>We actually closed down the beta and turned on Insight’s lights a week ago today.  Coming from a “packaged software” background it was interesting that shipping software in a cloud environment all boils down to switching a DNS entry on GoDaddy.com and we’re up and running.  We actually had one customer forego the free trial and become our first paying customer on day one!  A lot of our beta customers came back to take advantage of the free trial (<a href="http://www.predixionsoftware.com/products" target="_blank">sign up here</a> if you haven’t already) and our service has been humming along making predictive magic for everyone quite nicely.  I do thank all of our beta customers for helping out and finding issues based on configurations and connection performance that would have been difficult for us to find out on our own – we also managed to incorporate quite a bit of feedback from the beta into the product.</p> <p>On Friday we celebrated the launch in the dev offices with a toast of <a href="http://www.pyratrum.com/" target="_blank">Pyrat Rum</a> and then heads down to keep the Predixion train going.  Personally I’ve been busy with Simon, our CEO, demonstrating the product to industry analysts – 20 so far, and many more to come – and getting great feedback and some exciting responses (first published notes <a href="http://www.enterprisemanagement.com/research/asset.php?id=1842" target="_blank">here</a>, <a href="http://jtonedm.com/2010/09/14/first-look-predixion" target="_blank">here</a>, and <a href="http://www.wiseanalytics.com/blog/2010/09/14/changing-the-predictive-analytics-game/" target="_blank">here</a>), while designing elements of our Enterprise offering as well as some exciting incremental functionality.</p> <p>One interesting thing about any product release is always what is the <em>last</em> feature to make the cut.  What is that last piece of functionality that you just need to put in or that you want so badly that it gets in no matter what.  In our case it’s all in the task pane of Predixion Insight for Excel – the task pane was definitely the runaway surprise success story for the beta – one participant even responded “I wish all software worked that way.”  Given how happy people were with this feature, and we always wanted to add a little more functionality, we made sure that search and filter made it into v1.</p> <p><a href="http://lh4.ggpht.com/_ESUcEp-EcL8/TI_UYwbpqGI/AAAAAAAAADk/-WdURARERIw/s1600-h/image%5B9%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TI_UZEqLgqI/AAAAAAAAADo/jrOvIzwHueQ/image_thumb%5B4%5D.png?imgmax=800" width="244" height="240" /></a> With this feature you can type arbitrary text and Predixion Insight for Excel will search all fields (including extended info) of each task and only display the results, or you can filter based on task type or items that are expiring soon.  Neat, huh?  Ok, maybe just “ok”.  Anyway, since you’ve already downloaded, installed the software, and signed up for a free trial, you may have noticed that if you select one of those filtering options it actually places something in the search box like this:<a href="http://lh3.ggpht.com/_ESUcEp-EcL8/TI_UZeh2jLI/AAAAAAAAADs/hE7ZlLgvD-M/s1600-h/image%5B13%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TI_UZt2olKI/AAAAAAAAADw/3tEMdYGFAzQ/image_thumb%5B6%5D.png?imgmax=800" width="183" height="32" /></a>This is <strong>totally</strong> meant to imply that there can be other things you could “type” into the search box to filter your tasks, and as of today, this is the only place you’ll ever find out about what they are.  Search tags usage is simply “tag:value”  if the engine finds the “tag:” starting the search string it uses it – plain and simple – it’s not even case-sensitive.  The super-secret Predixion Insight for Excel search tags are:</p> <table border="1" cellspacing="0" cellpadding="2" width="595"><tbody> <tr> <td valign="top" width="102">Tag</td> <td valign="top" width="113">Value</td> <td valign="top" width="270">Description</td> <td valign="top" width="108">Example</td> </tr> <tr> <td valign="top" width="102">tasktype</td> <td valign="top" width="113">InsightNow <br />Modeling <br />Test <br />Query <br />Management <br /></td> <td valign="top" width="270">Returns tasks of the specified task</td> <td valign="top" width="108">tasktype:Query <br /></td> </tr> <tr> <td valign="top" width="102">expires</td> <td valign="top" width="113">integer hours</td> <td valign="top" width="270">Returns tasks expiring within the specified number of hours</td> <td valign="top" width="108">expires:48 <br /></td> </tr> <tr> <td valign="top" width="102">created</td> <td valign="top" width="113">integer hours</td> <td valign="top" width="270">Returns tasks created within the specified number of hours</td> <td valign="top" width="108">created:2 <br /></td> </tr> <tr> <td valign="top" width="102">duration</td> <td valign="top" width="116">integer minutes</td> <td valign="top" width="270">Returns tasks with a duration less than the specified minutes</td> <td valign="top" width="108">duration:1</td> </tr> <tr> <td valign="top" width="102">completed</td> <td valign="top" width="116">(optional) <br />true <br />false</td> <td valign="top" width="270">Returns tasks that have completed, or not</td> <td valign="top" width="108">completed:</td> </tr> <tr> <td valign="top" width="102">results</td> <td valign="top" width="116">(optional) <br />true <br />false</td> <td valign="top" width="270">Returns tasks that have results to download, or not</td> <td valign="top" width="108">results:</td> </tr> <tr> <td valign="top" width="102">status</td> <td valign="top" width="116">succeeded <br />failed <br />pending</td> <td valign="top" width="270">Returns tasks with the specified status</td> <td valign="top" width="108">status:pending</td> </tr> <tr> <td valign="top" width="102">tag</td> <td valign="top" width="116">any string</td> <td valign="top" width="270">Searched only the “tag” field of the task – i.e. the large bolded text</td> <td valign="top" width="108">tag:Profit Chart</td> </tr> </tbody></table> <p>You can add a bang (!) to the beginning of the tag to return any results not specified by the filter – for example you can get all the tasks created in the last hour by using the string “created:1”, and you can return all the tasks <em>not</em> created in the last hour by using the string “!created:1” – very handy indeed.</p> <p>Oh, and just for a bonus, you can edit the “tag” (large bolded text) of any task for  your own personal edification – that is, you can change “Profit Chart” into “My Superbad Market Mayhem Chart” and then find it with the search “<strong>tag:Superbad</strong>”</p> <p>Anyway, that’s what is there for now – who knows what may come as Predixion Insight grows…..</p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com1tag:blogger.com,1999:blog-2493137919500407537.post-45249502007967172682010-08-18T22:05:00.001-07:002010-08-18T22:05:36.334-07:00Closer to liftoff…<p>This week we officially launched our public beta, and it<img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="right" src="http://lh3.ggpht.com/_ESUcEp-EcL8/TGy7nZRRUcI/AAAAAAAAADE/Rccwe1hgx8E/image%5B17%5D.png?imgmax=800" width="98" height="79" /> was one of those moments where you’ve pushed so hard running on adrenaline that when you’ve reached that summit you collapse because you can finally sleep a good sleep – if only for a moment.  With the beta launch we have people from around the world enjoying predictive analytics in the cloud via <a title="Predixion Software" href="http://predixionsoftware.com" target="_blank">Predixion</a> Insight.  It’s exciting watching from behind the scenes as customers launch asynchronous predictive tasks ranging in size from a few kilobytes to 100 megs.  The machinations of a Rube Goldberg contraption comes to mind as the pieces of the system coordinate – a user presses a button causing their data to be launched to the <img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="left" src="http://lh6.ggpht.com/_ESUcEp-EcL8/TGy7nrVTmPI/AAAAAAAAADI/YSD7tAzf1Uc/image%5B16%5D.png?imgmax=800" width="128" height="79" /> cloud while simultaneously they are automatically provisioned across an array of servers.  The data shuttled seamlessly and invisibly between tasks on their behalf being dissected and analyzed before being dropped into a predictive report right back on their desktop.  </p> <p>The movement from development to beta deployment is really wonderful for me personally.  If you haven’t yet seen it, go to our <a href="http://predixionsoftware.com" target="_blank">website</a> and click on play video to get an overview of the company.  If there’s one word I can say about that piece of “marketing,” it is that it is <strong>sincere</strong>.   Go ahead – go watch it.  This <u>is</u> something I’ve been working toward a long time.  We’re releasing a version 1 product and we have a lot to do to fully reach our goals, but <strong>right now</strong>, any user, anywhere, can access powerful<img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="right" src="http://lh4.ggpht.com/_ESUcEp-EcL8/TGy7n93kCNI/AAAAAAAAADM/eE3r1r8UB-I/image%5B22%5D.png?imgmax=800" width="132" height="80" /> easy-to-use predictive analytics without having to jump through hoops for procurement, acquisition, installation, management, etc. etc. etc.   By creating an Excel-native, subscription-based predictive service, we’re taking the traditional barriers barring people from even opening the door to predictive analytics and slashing them to the ground.   </p> <p>So I was going to write a longer post explaining some more details about the product, but you should try it <strong>now </strong>(and anyway, Bogdan already wrote a <a href="http://www.bogdancrivat.net/dm/archives/65" target="_blank">great post</a> with some feature details)<strong>  </strong>You can watch a demo that gives a lightning fast overview of the product <a href="http://www.predixionsoftware.com/media/predixioninsightintroanddemo/predixioninsightintroanddemo.htm" target="_blank">here</a> and then go download and enroll for the <a href="http://www.predixionsoftware.com/products" target="_blank">“free trial” beta</a>.  I have it on good authority that there may be some interesting beta events for accomplished users, and you still have the opportunity to get in early, so don’t wait!</p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com2tag:blogger.com,1999:blog-2493137919500407537.post-45698322313634972312010-08-01T14:15:00.001-07:002010-08-01T14:48:21.748-07:00Predixion on the brink….<p>   I officially started my career as “founding CTO” with Predixion on January 6, 2010, and now, just 7 months later we are on the brink of launching the VIP beta of our new product and service, Predixion Insight, on August 2.  With<img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="LG1" border="0" alt="LG1" align="left" src="http://lh5.ggpht.com/_ESUcEp-EcL8/TFXlqgCdmUI/AAAAAAAAADA/frgC0eNIEbg/LG1%5B32%5D.png?imgmax=800" width="183" height="240" /> a development team of only 5 people we’ve created, what I think, is a truly disruptive entry in the predictive analytics space, and we’re just getting started.</p> <p>It’s been a very exciting time – meeting the co-founders of Predixion, deciding to venture off from Microsoft to start something new based on the ideas developed over the past several years, recruiting the best development team you could ask for, filming corporate videos at my house, meeting with customers, partners, and venture capitalists – there hasn’t been a boring day yet!</p> <p>This last week we’ve moved to a new office space in Redmond and wrapped up the bits for our VIP beta.  This beta is limited to only 12 select people.  We ran two incredible online demos and some feedback we received: “let me say that I loved it”, “Can't wait to play with the product!”, “Based on what I saw yesterday, Insight is more like a coral reef than a warm bath!”</p> <p>Anyway don’t be worried that you will be left out because you’re not part of our VIP beta – we are quickly filling up our next phase of the beta offered on a first come-first serve basis.   This phase will be launched on August 16th and you can sign up on our <a href="http://predixionsoftware.com" target="_blank">website</a>.  We’ve been working hard and fast at making a product that you can use immediately, every day, without boundaries, and we’re on the brink of delivering it to you.  Over the next few weeks we will be creating collateral materials that make using <a title="Predixion Software" href="http://predixionsoftware.com" target="_blank">Predixion</a> Insight even easier.  Stay tuned, true believers, you’ll like what we have coming!</p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com1tag:blogger.com,1999:blog-2493137919500407537.post-51893594782381243042010-05-07T10:48:00.001-07:002010-05-11T07:54:00.336-07:00Bootstrapping Windows on GoGrid – getting your admin password on the box.<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUnvUTUQ5rvJ1IHTSibLBcwOgC8smEYbjnMAqVqHeBimFXhfJFppTYf5HjXgr1gOisP6ax3t_Nfmfsi0qxIMIGOCqTdbiPqOhSTmMPlCnx6TnEwKr2K0rV5iXp9fIuyv1a1hI-Ka1KHyWY/s1600/ga%5B13%5D.png" imageanchor="1" style="clear: left; cssfloat: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUnvUTUQ5rvJ1IHTSibLBcwOgC8smEYbjnMAqVqHeBimFXhfJFppTYf5HjXgr1gOisP6ax3t_Nfmfsi0qxIMIGOCqTdbiPqOhSTmMPlCnx6TnEwKr2K0rV5iXp9fIuyv1a1hI-Ka1KHyWY/s320/ga%5B13%5D.png" tt="true" /></a></div> I spent a lot of time this week working on trying to get our service running on <a href="http://www.gogrid.com/">GoGrid</a> as a potential alternative to Amazon’s EC2. They jury’s still out, but they seem to offer better hardware for the price. There are a lot of other pro’s and con’s between the two services, but maybe that’s a subject for a future article – maybe after we make a final decision! The nature of our service requires that we can perform on-demand machine requisitioning and provisioning. Using Amazon’s EC2, certain aspects were easier than GoGrid, due to the nature of the way they handle server images “AMI” in Amazon lingo, “MyGSI” in GoGrid. In short, the nature of the sysprep step performed on a newly provisioned machine at GoGrid causes some problems with certain services we need to run and user accounts we need to provision.<br />
Part of the issue has to do with the way GoGrid provisions administrator passwords – on a newly provisioned machine, the administrator account will have a new password, which you would expect, but also, any additional administrators you create are on the image aren’t valid after provisioning. So the GoGrid-provisioned password is pretty much all you have. This is OK if you can interactively logon to the machine after provisioning, but not so OK if you want to do this automatically. To solve this problem, I came up with a method to fetch the admin password from GoGrid itself from the machine after launch. We trigger this via a web service call after the machine is launched, but presumably you could do this on a startup event as well – I haven’t experimented with that as of yet, but <em>presumably</em> it should work.<br />
The difficulty in the solution is simply due to the limited information you have about your machine from your machine. The basic approach is to call the GoGrid API to get the list of passwords from all your machines, and then find the password that matches the public IP of your machine. In order to use this code, the first thing you need to do is to go to your GoGrid account page and add an API key which you will use to securely interact with the GoGrid service. The type of API key should be <strong>System User</strong>, as that is required to fetch passwords. This key will be embedded in your code on the GoGrid image, so you should take necessary steps to protect it.<br />
In this solution I use the <a href="http://wiki.gogrid.com/wiki/index.php/API:C_Sharp_API_Developer_Home">GoGridClient class</a> from the GoGrid Wiki Documentation – copy that code and specify your api_key and shared secret.<br />
The first task is to write a function to get the passwords from GoGrid (we wrote the GoGridIPType and GoGridIPState enums – they contain the values in the code):<br />
<span style="font-family: Courier New;"><br />
public static string GetPasswordsRaw() // returns the raw XML as provided by GoGrid <br />
{ <br />
string returnValue = String.Empty; </span><span style="font-family: Courier New;"> try <br />
{ <br />
GoGridClient grid = new GoGridClient(); </span><span style="font-family: Courier New;"> System.Collections.Hashtable parameters = new System.Collections.Hashtable(); <br />
parameters.Add("format", "xml"); <br />
string requestUrl = grid.getAPIRequestURL("/support/password/list", parameters); <br />
returnValue = grid.sendAPIRequest(requestUrl); <br />
} <br />
catch(Exception) <br />
{ <br />
} </span><span style="font-family: Courier New;"> return returnValue; <br />
}</span><br />
<div class="csharpcode"></div><div class="csharpcode">After you have this function, you need a function to get the list of ip addresses from your machine and compare it to the ip addresses from GoGrid. The function first grabs all of the ipaddresses from the local machine and then uses Xpath queries to isolate and iterate the password objects from the GoGrid response. Then it uses more Xpath queries to grab the ipaddress and password from each object. Finally it checks to see if the ipaddress matches any ipaddress on the machine and returns the associated password.</div><br />
<span style="font-family: Courier New;">private string GetAdminPassword() <br />
{ <br />
// Fetch ip addresses for the local machine and store into a list <br />
List<string> ipaddresses = new List<string>(); <br />
System.Net.IPHostEntry IPHost = System.Net.Dns.GetHostEntry(System.Net.Dns.GetHostName()); <br />
foreach (System.Net.IPAddress ip in IPHost.AddressList) <br />
{ <br />
// Only take the IPv4 addresses <br />
if (ip.AddressFamily == System.Net.Sockets.AddressFamily.InterNetwork) <br />
{ <br />
Report("Found ip: {0}", ip.ToString()); <br />
ipaddresses.Add(ip.ToString()); <br />
} <br />
} </span><span style="font-family: Courier New;"><br />
// Get the password information from GoGrid and load into an XML document <br />
string xml = GetPasswordsRaw(); <br />
XmlDocument d = new XmlDocument(); <br />
d.LoadXml(xml); </span><span style="font-family: Courier New;"><br />
// Use Xpath to select the "password" objects <br />
string path = "/gogrid/response/list/object[@name='password']"; <br />
XmlNodeList nodes = d.SelectNodes(path); <br />
</span><span style="font-family: Courier New;"> foreach (XmlNode node in nodes) <br />
{ <br />
// Extract the password and ipaddress from the password object <br />
XmlNode pwdnode = node.SelectSingleNode("attribute[@name='password']"); <br />
XmlNode ipnode = node.SelectSingleNode <br />
("attribute[@name='server']/object[@name='server']/attribute[@name='ip']" + <br />
"/object[@name='ip']/attribute[@name='ip']"); </span><span style="font-family: Courier New;"><br />
// API Key passwords will not have an ipnode <br />
if (pwdnode == null || ipnode == null) <br />
continue; </span><span style="font-family: Courier New;"><br />
string password = pwdnode.FirstChild.Value; <br />
string ipaddress = ipnode.FirstChild.Value; <br />
// Check to see if the ipaddress belongs to this machine <br />
if (ipaddresses.Contains(ipaddress)) <br />
return password; <br />
} <br />
throw(new SystemException("Did not find password")); <br />
}</span><br />
Once you have the admin password, you can use it to impersonate the box admin as necessary to run additional code requiring such privileges. It really helps in allowing us to automatically deploy boxes on GoGrid. Given the creative commons license of the GoGrid API, the same technique should apply to other cloud providers as necessary. <br />
Hope this helps with your cloud infrastructure deployments – love to hear your comments.Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com3tag:blogger.com,1999:blog-2493137919500407537.post-43488590911961778782010-04-30T19:07:00.001-07:002010-04-30T19:07:10.701-07:00Cheers from the Predixion Dev Team!!<p><a href="http://lh4.ggpht.com/_ESUcEp-EcL8/S9uMxzxB8wI/AAAAAAAAACQ/pA12v7fnXQ8/s1600-h/PDT%5B4%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="PDT" border="0" alt="PDT" src="http://lh4.ggpht.com/_ESUcEp-EcL8/S9uMzaEzeuI/AAAAAAAAACU/W68XWPtMJDs/PDT_thumb%5B2%5D.png?imgmax=800" width="541" height="366" /></a> </p> <p>We’re assembled and ready to rock!  Have a great weekend!</p> <p>-Jamie and the PX Devs</p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com0tag:blogger.com,1999:blog-2493137919500407537.post-77052443425992944232010-04-13T19:12:00.001-07:002010-04-13T19:12:01.532-07:00Cases lost in Time<p><a href="http://lh6.ggpht.com/_ESUcEp-EcL8/S8UkaZgjDxI/AAAAAAAAAB4/5tHZNaYxpZE/s1600-h/CLT%5B9%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="CLT" border="0" alt="CLT" align="right" src="http://lh4.ggpht.com/_ESUcEp-EcL8/S8UkawHLOnI/AAAAAAAAAB8/KezCOYRRqr8/CLT_thumb%5B7%5D.png?imgmax=800" width="66" height="240" /></a> </p> <p>This post was inspired by a question on the <a href="http://social.msdn.microsoft.com/Forums/en-US/sqldatamining/threads" target="_blank">MSDN data mining forum</a> that we knew would come to us one day.  When developing the SQL Server Data Mining platform, we had made one of those design decisions that was kind of wonky, but made sense if you turn your head sideways and squint a bit.  It all resolved to the fact that since our Time Series algorithm was based on Decision Trees, we could use the Decision Tree viewer to show more information about your time series model than anyone had ever seen before – you could see a piecewise linear regressions for each distinct pattern over time – it was one of those “OMG – it’s full of stars….” moments.</p> <p><a href="http://lh6.ggpht.com/_ESUcEp-EcL8/S8UkbKmALCI/AAAAAAAAACA/xWh3aIlAkAw/s1600-h/monolith%5B3%5D.jpg"><img style="border-bottom: 0px; border-left: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px" title="monolith" border="0" alt="monolith" src="http://lh3.ggpht.com/_ESUcEp-EcL8/S8UkboZXSPI/AAAAAAAAACE/B32L1nzwa24/monolith_thumb%5B1%5D.jpg?imgmax=800" width="244" height="156" /></a> Anyway, one of the things that you get to see when using the Decision Tree Viewer is the number of <em>cases</em> or facts or rows or however you want to call them.  This information shows up in the Mining Legend, like this:</p> <p><a href="http://lh5.ggpht.com/_ESUcEp-EcL8/S8UkbwUU4jI/AAAAAAAAACI/PSiJBwQ3zsk/s1600-h/image%5B3%5D.png"><img style="border-bottom: 0px; border-left: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_ESUcEp-EcL8/S8UkcE-hknI/AAAAAAAAACM/DrAl3FtxGUg/image_thumb%5B1%5D.png?imgmax=800" width="244" height="136" /></a> So, when you create a time series model, you get the same kind of information – Total Cases = some number.  Nobody really considered that number too harshly in SQL Server 2005, but then we greatly improved the Time Series algorithm in 2008, and things changed.  The most obvious change is that we supplemented our 2005 decision tree algorithm, ARTXP, with a (fairly) standard implementation of the ARIMA time series algorithm.  A user noticed that if they created a model using <em>only</em> the ARIMA algorithm, the “Total Cases” number was <em>higher </em>than when they used ARTXP or the default blended mode.</p> <p>S0, is ARTXP eating cases?  Is it ignoring valuable slices of time lost to eternity?  No, not really – like I said, if you turn your head and squint it really does make sense that ARTXP will have less “cases” than ARIMA.  The part that doesn’t make sense is that to satisfy the devil of “consistency” we kind of overloaded the term “cases”.  ARIMA – Auto Regressive Integrated Moving Averages – is more of what you would naturally think of in a forecasting algorithm – it performs calculations on time slice values to determine patterns and make forecasts.  ARTXP – Auto Regressive Trees with cross (X) Predict - on the other hand, doesn’t work in a “way you would naturally think” kind of way.  ARTXP decomposes the time slices into a series of “cases” that it then feeds to the decision tree engine.</p> <p>Let’s examine how this works.  Let’s take a simple series with 10 values – this one should do:</p> <p align="center">11, 12, 13, 14, 15, 16, 17, 18, 19, 20</p> <p>If we assume AR(4), that is, using 4 values to predict our “target”, we get “cases” that look like this:</p> <table border="1" cellspacing="0" cellpadding="2" width="400"><tbody> <tr> <td valign="top" width="66">Case</td> <td valign="top" width="66">Input</td> <td valign="top" width="66">Input</td> <td valign="top" width="66">Input</td> <td valign="top" width="66">Input</td> <td valign="top" width="66">Predict</td> </tr> <tr> <td valign="top" width="66">1</td> <td valign="top" width="66">11</td> <td valign="top" width="66">12</td> <td valign="top" width="66">13</td> <td valign="top" width="66">14</td> <td valign="top" width="66">15</td> </tr> <tr> <td valign="top" width="66">2</td> <td valign="top" width="66">12</td> <td valign="top" width="66">13</td> <td valign="top" width="66">14</td> <td valign="top" width="66">15</td> <td valign="top" width="66">16</td> </tr> <tr> <td valign="top" width="66">3</td> <td valign="top" width="66">13</td> <td valign="top" width="66">14</td> <td valign="top" width="66">15</td> <td valign="top" width="66">16</td> <td valign="top" width="66">17</td> </tr> <tr> <td valign="top" width="66">4</td> <td valign="top" width="66">14</td> <td valign="top" width="66">15</td> <td valign="top" width="66">16</td> <td valign="top" width="66">17</td> <td valign="top" width="66">18</td> </tr> <tr> <td valign="top" width="66">5</td> <td valign="top" width="66">15</td> <td valign="top" width="66">16</td> <td valign="top" width="66">17</td> <td valign="top" width="66">18</td> <td valign="top" width="66">19</td> </tr> <tr> <td valign="top" width="66">6</td> <td valign="top" width="66">16</td> <td valign="top" width="66">17</td> <td valign="top" width="66">18</td> <td valign="top" width="66">19</td> <td valign="top" width="66">20</td> </tr> </tbody></table> <p>You see that for each time (t), we need to take the previous values (t-1), (t-2), (t-3), and (t-4).  This means that the first four values of the series aren’t available as case targets – they are preceded by nothing.  In the end, for 10 time slices using AR(4), you end up with only 6 “cases” to analyze.  Whereas if you used ARIMA, it would simply use all the slices and the “Total Cases” would be 10.</p> <p>So, like I said – turn your head and squint and it makes sense.  Of course, once you understand this, the “Total Cases” for the ARIMA models <em>doesn’t make sense</em>.  (cue evil laughter).  Yeah yeah – it doesn’t make sense, but you know what it means.</p> <p>Anyway, for other cases lost in time, I realized I missed an important series in my digest of postings of yore – the incredible Time Series Reporting Stored Procedure series – a three-part series in four parts – go figure – it’s kind of like that cases lost in time in reverse, I suppose.  This series shows how to create a report that contains both the historical data and predicted data from a Time Series model.</p> <p><a href="http://blogs.msdn.com/jamiemac/archive/2008/08/26/time-series-reporting-stored-procedure-part-1-of-3.aspx" target="_blank">TS Reporting Sproc Part 1</a> <br /><a href="http://blogs.msdn.com/jamiemac/archive/2008/08/27/time-series-reporting-stored-procedure-part-2-of-3.aspx" target="_blank">TS Reporting Sproc Part 2</a> <br /><a href="http://blogs.msdn.com/jamiemac/archive/2008/08/27/time-series-reporting-stored-procedure-part-3-of-3.aspx" target="_blank">TS Reporting Sproc Part 3</a> <br /><a href="http://blogs.msdn.com/jamiemac/archive/2008/08/27/time-series-reporting-stored-procedure-part-4-of-3.aspx" target="_blank">TS Reporting Sproc Part 4</a></p> <p>I do believe that is the last of the digested posts of yesteryear.  I’ll have some more coming up as Predixion motors on!</p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com0tag:blogger.com,1999:blog-2493137919500407537.post-21931258994478899502010-03-16T12:19:00.001-07:002010-03-16T12:19:38.223-07:00Executing DMX DDL from a linked server<p>Luckily before I left MSFT, I had the foresight to change my contact email on that old wet blog of mine that I’m no longer able to contribute to – no hard feelings.  I received a question which is something that has come up frequently enough that it just needs to be dealt with so for all future posts, you can just say “look at the Executing DMX DML from a link server post on Jamie’s new blog – the only blog that matters,” and be done with it.</p> <p>So just for definition’s sake – DMX – Data Mining eXtensions to SQL, DDL – Data Definition Language, DMX DDL – DMX statements that create or modify objects!  You would think you can add two TLA’s and get an SLA, but that stands for “service level agreement” which has nothing to do with this post.  This post could also have been named “how to execute non-rowset returning commands on Analysis Services from SQL Server”, but not only do I digress, I like the actual title better with the dual unpronounceable acronyms..</p> <p>Anyway, in my <a href="http://jamiemaclennan.blogspot.com/2010/03/dmx-posting-digest.html" target="_blank">DMX Digest post</a>, I referenced this <a href="http://blogs.msdn.com/jamiemac/archive/2008/10/07/getting-data-mining-results-into-sql-tables.aspx" target="_blank">post</a> which showed how to execute DMX statements from SQL and put the results in SQL table.  In short (just in case you don’t want to click those links), you set up a linked server and then use OPENQUERY to make the DMX call.  One (well, at least one) adventurous reader sought fit to try other kinds of statements than queries – in particular a DROP MINING STRUCTURE statement.  The problem with DROP MINING STATEMENTS – and other DDL statements is that they don’t return a rowset, which is a requirement for OPENQUERY – which really wants some output columns to bind to.</p> <p>The nice way to do this would be to take advantage of the SQL EXECUTE command, which, at least in SQL Server 2008, has been extended to execute commands on linked servers.  Such a command would look very elegant, like this:</p> <pre class="csharpcode"><span class="kwrd">EXECUTE</span> ( <span class="str">'DROP MINING STRUCTURE [MyMiningStructure]'</span> )<br /><span class="kwrd">AT</span> MyDataMiningServer</pre><br /><br /><p>Wow – that would be nice!  If only it worked, that is.  If you endeavor to try such a think you’ll get the pleasant response of “<font size="2" face="cons">Server 'MyDataMiningServer' is not configured for RPC.</font>”  What this means, evidently, is that the nice way of doing things isn’t going to happen.</p><br /><br /><p>But, never fear, we can take advantage of all that boundless flexibility built in to SQL Server Data Mining to make it happen.  All we need to do is to create some kind of statement that can be called from SQL Server’s OPENQUERY that executes a statement of our choosing.  And the way to do this is to write a stored procedure that executes a statement and returns some sort of table.  This is the really big hammer solution to the problem.</p><br /><br /><p>And what do you know, I happen to have that stored procedure right here…..</p><br /><br /><pre class="csharpcode"><span class="kwrd">using</span> System;<br /><span class="kwrd">using</span> System.Collections.Generic;<br /><span class="kwrd">using</span> System.Linq;<br /><span class="kwrd">using</span> System.Text;<br /><span class="kwrd">using</span> System.Data;<br /><span class="kwrd">using</span> Microsoft.AnalysisServices.AdomdServer;<br /><br /><span class="kwrd">namespace</span> DMXecute<br />{<br /> <span class="kwrd">public</span> <span class="kwrd">class</span> Class1<br /> {<br /> [SafeToPrepare(<span class="kwrd">true</span>)]<br /> <span class="kwrd">public</span> DataTable DMXecute(<span class="kwrd">string</span> statement)<br /> {<br /> DataTable Result = <span class="kwrd">new</span> DataTable(<span class="str">"Table"</span>);<br /> Result.Columns.Add(<span class="str">"Column"</span>, <span class="kwrd">typeof</span>(<span class="kwrd">int</span>));<br /> <span class="kwrd">if</span> (Context.ExecuteForPrepare)<br /> <span class="kwrd">return</span> Result;<br /><br /> AdomdCommand cmd = <span class="kwrd">new</span> AdomdCommand(statement);<br /> cmd.ExecuteNonQuery();<br /><br /> <span class="kwrd">return</span> Result;<br /> }<br /> }<br />}</pre><br /><style type="text/css"><br /><br />.csharpcode, .csharpcode pre<br />{<br /> font-size: small;<br /> color: black;<br /> font-family: consolas, "Courier New", courier, monospace;<br /> background-color: #ffffff;<br /> /*white-space: pre;*/<br />}<br />.csharpcode pre { margin: 0em; }<br />.csharpcode .rem { color: #008000; }<br />.csharpcode .kwrd { color: #0000ff; }<br />.csharpcode .str { color: #006080; }<br />.csharpcode .op { color: #0000c0; }<br />.csharpcode .preproc { color: #cc6633; }<br />.csharpcode .asp { background-color: #ffff00; }<br />.csharpcode .html { color: #800000; }<br />.csharpcode .attr { color: #ff0000; }<br />.csharpcode .alt <br />{<br /> background-color: #f4f4f4;<br /> width: 100%;<br /> margin: 0em;<br />}<br />.csharpcode .lnum { color: #606060; }</style><br /><br /><p>And calling it from SQL Server – easy </p><br /><br /><pre class="csharpcode"><span class="kwrd">EXEC</span> sp_addlinkedserver @server=<span class="str">'MyDataMiningServer'</span>, <span class="rem">-- local SQL name given to the linked server</span><br />@srvproduct=<span class="str">''</span>, <span class="rem">-- not used </span><br />@provider=<span class="str">'MSOLAP'</span>, <span class="rem">-- OLE DB provider </span><br />@datasrc=<span class="str">'localhost'</span>, <span class="rem">-- analysis server name (machine name) </span><br />@<span class="kwrd">catalog</span>=<span class="str">'MyDMDatabase'</span> <span class="rem">-- default catalog/database </span><br /><br /><span class="kwrd">GO</span><br /><br /><span class="kwrd">SELECT</span> * <span class="kwrd">FROM</span> <span class="kwrd">OPENQUERY</span>(MyDataMiningServer,<span class="str">'CALL DMXecute.DMXecute("DROP MINING STRUCTURE [MyMiningStructure]")'</span>)<br /><br />GO</pre><br /><style type="text/css"><br /><br />.csharpcode, .csharpcode pre<br />{<br /> font-size: small;<br /> color: black;<br /> font-family: consolas, "Courier New", courier, monospace;<br /> background-color: #ffffff;<br /> /*white-space: pre;*/<br />}<br />.csharpcode pre { margin: 0em; }<br />.csharpcode .rem { color: #008000; }<br />.csharpcode .kwrd { color: #0000ff; }<br />.csharpcode .str { color: #006080; }<br />.csharpcode .op { color: #0000c0; }<br />.csharpcode .preproc { color: #cc6633; }<br />.csharpcode .asp { background-color: #ffff00; }<br />.csharpcode .html { color: #800000; }<br />.csharpcode .attr { color: #ff0000; }<br />.csharpcode .alt <br />{<br /> background-color: #f4f4f4;<br /> width: 100%;<br /> margin: 0em;<br />}<br />.csharpcode .lnum { color: #606060; }</style><style type="text/css"><br /><br />.csharpcode, .csharpcode pre<br />{<br /> font-size: small;<br /> color: black;<br /> font-family: consolas, "Courier New", courier, monospace;<br /> background-color: #ffffff;<br /> /*white-space: pre;*/<br />}<br />.csharpcode pre { margin: 0em; }<br />.csharpcode .rem { color: #008000; }<br />.csharpcode .kwrd { color: #0000ff; }<br />.csharpcode .str { color: #006080; }<br />.csharpcode .op { color: #0000c0; }<br />.csharpcode .preproc { color: #cc6633; }<br />.csharpcode .asp { background-color: #ffff00; }<br />.csharpcode .html { color: #800000; }<br />.csharpcode .attr { color: #ff0000; }<br />.csharpcode .alt <br />{<br /> background-color: #f4f4f4;<br /> width: 100%;<br /> margin: 0em;<br />}<br />.csharpcode .lnum { color: #606060; }</style><br /><br /><p>Of course, you can execute any DMX or MDX statement you want there, so this is simply dangerous is general – you definitely shouldn’t be sending unvalidated user input through here for fear of SQL Injection style attacks.  A better way, in general, would be to write stored procedures that performed exactly the operations you need taking just the object name as a parameter.</p><br /><style type="text/css"><br /><br />.csharpcode, .csharpcode pre<br />{<br /> font-size: small;<br /> color: black;<br /> font-family: consolas, "Courier New", courier, monospace;<br /> background-color: #ffffff;<br /> /*white-space: pre;*/<br />}<br />.csharpcode pre { margin: 0em; }<br />.csharpcode .rem { color: #008000; }<br />.csharpcode .kwrd { color: #0000ff; }<br />.csharpcode .str { color: #006080; }<br />.csharpcode .op { color: #0000c0; }<br />.csharpcode .preproc { color: #cc6633; }<br />.csharpcode .asp { background-color: #ffff00; }<br />.csharpcode .html { color: #800000; }<br />.csharpcode .attr { color: #ff0000; }<br />.csharpcode .alt <br />{<br /> background-color: #f4f4f4;<br /> width: 100%;<br /> margin: 0em;<br />}<br />.csharpcode .lnum { color: #606060; }</style> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com7tag:blogger.com,1999:blog-2493137919500407537.post-31860517724100122862010-03-14T22:17:00.001-07:002010-03-14T22:17:43.892-07:00Gluten Free Waffles<p>OK, this posting has nothing to do with SQL Server, Data Mining or Predictive Analytics, or even <a href="http://predixionsoftware.com" target="_blank">Predixion</a>.  It’s kind of a follow-up to my previous post – I’ve gotten some emails and other communiqué about my twins diet. </p> <p>Every Sunday is family waffle day, and I’ve come up with a pretty good waffle recipe for the boys.  I have to make a “regular” batch for the older kids, April and myself and I make a gluten-free, casein-free batch for the boys.  Of course, I have to use a separate waffle iron to avoid contamination.</p> <p>Anyway, the recipe I use is as follows:</p> <ul> <li>1 3/4 cup <a href="http://www.bobsredmill.com/gf-all_purpose-baking-flour.html" target="_blank">Bob’s Red Mill GF All Purpose Baking Flour</a></li> <li>1 tbsp Gluten-Free Aluminum-Free Baking powder</li> <li>1 tbsp Sugar</li> <li>1/2 tsp Salt</li> <li>1 tsp <a href="http://www.bobsredmill.com/xanthan-gum.html" target="_blank">Xanthan Gum</a></li> <li>2 Eggs, separated</li> <li>1 3/4 cups Rice Milk</li> <li>1/2 cup Canola oil</li> <li>1 tsp Vanilla</li> </ul> <p>Turn the waffle iron on to high to heat up while you prepare the ingredients.  In a medium-large bowl, mix all the dry ingredients.  In a separate medium bowl, beat the egg yolks up a little bit, and then add the vanilla, rice milk, and canola oil.   Pour the wet ingredients, save the egg whites, into the dry ingredients and mix well.</p> <p>Using an electric mixer, beat the egg whites until stiff peaks form.  Gently fold the egg whites into the mixture so it is mixed but all the air doesn’t escape from the egg whites.</p> <p>Pour 1/3 cup of mixture onto each waffle area of the iron.  Gluten-free waffles take a bit longer to cook then their glutinous counterparts – I usually increase the time by 1 minute, which means it takes 6 minutes for a batch on our waffle iron, but your mileage may vary.</p> <p>NB:  I use a PAM cooking spray to keep the waffles from sticking.  PAM and all other cooking sprays contain soy lecithin.  Typically, we avoid soy, but it seems that my boys aren’t sensitive to small amounts of soy lecithin.  If your child is sensitive, you can brush on canola oil with a pastry brush or paper towel.</p> <p>Makes 10-12 waffles.</p> <p>Enjoy!</p> Jamie MacLennanhttp://www.blogger.com/profile/01763092476599830990noreply@blogger.com1