Jamie's Junk

Thursday, August 2, 2012

Yes Virginia, you can do Predictive Analytics!

In 1897, an eight-year old Virginia Hanlon posed the question to the New York Sun – is there a Santa Claus? While the editors of the Sun could have easily disregarded and dismissed the child’s simple question, instead they took the opportunity to address the simple question metaphorically and inspirationally in a way that has impacted the American view of Christmas for over a century.

In the 13 years that I have been involved with Predictive Analytics, Machine Learning and Data Mining, I have been told countless times that there is no Santa Claus in this field... < continued on the Predixion Software blog >

Tuesday, April 10, 2012

Sharing and Collaborating with Predixion Insight

This post is a draft topic I wrote for the Getting Started section of the Predixion Insight documentation. These topics are higher level and “bloggy” so I’m posting here as well!

The most significant insights aren’t discovered in a vacuum. Predixion Insight provides the tools and methodologies allowing you to share predictive results with your colleagues from various aspects of your business in order to extract the most value from your data. This article will describe the variety of ways that you can share using Predixion Insight for collaboration and productionalization.

Shared Collections

The area where you perform the majority of work in Predixion Insight is in a private, automatically provisioned collection of objects called simply “My Workspace”. A “shared collection” is similar to your workspace except that the operations you can perform are limited, and many users can access the collection at the same time. A shared collection has an owner, who can assign permissions to other users, as well as authors who can modify the collection and readers who can only view or use objects in the collection.

Shared collections can either be standard collections created by a Predixion Insight administrator, or, more commonly, can be created by users to share their work on an ad-hoc basis.

To create a shared collection from Predixion Insight for Excel, open the “My Stuff” dialog from the Insight Analytics ribbon, and click the “New Collection” button.

The Add Collection dialog allows you to name and describe your collection as well as adding permissions. For broad collaboration and sharing it is more convenient to add Active Directory groups than it is to add individual users to ensure that more colleagues will be able to interact with your results.

Once created, permissions and collection details can be edited in the Manage My Stuff dialog or the Predixion Insight User Portal by selecting the collection name and choosing an option from the action list.

When you create a new collection make sure that you notify your colleagues that it is available!

Publishing Datasets and Models

With shared collections you can collaborate by publishing your datasets and models via the Publish tool in Predixion Insight for Excel or via the Predixion Insight User Portal. Using this dialog you simply move datasets and models from your workspace or other collections you can view to the destination collection. You can even move items from a shared collection back to your personal workspace if you have permission. This dialog also provides the option to create a new collection if you haven’t done so already.

After you select which items are to be published you will have the ability to determine how other people can use the items. For example, you can restrict other users from being able to move your items out of the shared collection via sharing or exporting.

Collaboration

Providing access alone is not sufficient for providing a collaborative environment for analytics. The Predixion Insight Viewers are an interactive environment for exploring and understanding the patterns discovered in your data. Whenever you discover interesting information within the viewer, the Comments and History control allows you to record your findings by posting a comment. This provides a collaborative, social environment for you and your colleagues to discuss data findings in real time.

Since every comment is relevant only in the context in which it was made, you can click on the view button next to any comment to see exactly what the commenter was looking at as they entered the comment.

Comments and collaboration are available in all Predixion Insight viewers, and also wherever you select a model or dataset. Click on the Comments and History tab next to the selector to see all current comments or add your own.

New comments will automatically appear in the Predixion Insight for Excel and the Predixion User Portal interfaces. If configured by your Predixion Insight Administrator, comments on datasets and models can be broadcast to a Twitter account of your choosing. This allows you to learn about new comments and collaboration through your Twitter client of choice. If this feature is enabled, it is recommended that the Twitter account be set to protect your tweets so that only approved users will see any comments that are made.

Sharing and Embedding Insights

Once you have published models and datasets to a collection, Predixion Insight provides a variety of methods to share your insights with your colleagues. The easiest way is to simply send a permalink to a model view via email. You can retrieve a permalink to any Predixion Insight view by clicking on the Information button on the viewer ribbon and then clicking on “Create Permalink”.

This will create a permanent link to precisely what you are viewing that you can share with others. Simply copy the direct link from the Links dialog and paste into an email. Any recipient that has permission to view models in the shared collection will be able to see exactly what you were looking at when you created the link. Note that if you create a permalink to a model in your workspace, only you will be able to view it, so make sure that you send colleagues permalinks to models that have been published to shared collections.

The Links dialog also provides the ability to save a thumbnail of the current view if you want to provide a preview in your messaging or to show an image to link from a web page. Additionally, you can embed the viewer itself into a web page by using the provided HTML snippet. In all cases, only users with appropriate permissions will be able to view models, so you don’t have to worry about results being forwarded to unauthorized users.

You can also retrieve a permalink from any comment that has been added to the model. These permalinks are automatically created at the time the comment is posted. Click on the information button associated with the specific comment to display the Links dialog.

Embedding Visualizations in SharePoint

A convenient way to share your insights and let your colleagues interact with models you have created and published is by directly embedded the Predixion Insight visualizations in a SharePoint web page. In order to embed a Predixion Insight visualization into a SharePoint page, first edit the page, and then insert a new web part.

The web part you need to add is the Page Viewer web part under the “Media and Content” category.

The empty web part, once added, will contain a hyperlink to open the tool pane. Click this link and then paste in the permalink to your visualization. Note that you do not need to paste the HTML snippet from the Predixion Insight Links dialog as SharePoint will automatically create an IFrame to surround the Predixion Insight visualization. It is recommended to set a specific height for your visualization or SharePoint may make the viewer too small to be useful.

Finally, once you have configured and saved your SharePoint page, all authorized users will be able to collaborate and share using Predixion Insight directly from their SharePoint portal.

Sharing Results via Excel

Often it is convenient to embed your insights into Excel workbooks along with data and other work that you have created. This allows you to create a story or provide other evidence that supports your insights. When viewing a Predixion Insight model from Excel, you can copy your current view to the currently active Excel workbook by clicking the Copy to ExceI button.

This copies a version of the current visualization into Excel, along with a permalink to the view. You can then share your findings by sharing the Excel workbook by publishing it to SharePoint via Excel Services, or by emailing or otherwise providing the Excel file to colleagues.

Sharing Named Results

In addition to sharing and collaborating on Predixion Insight models using the Predixion Insight visualizations, you can also directly share any specific predictive or descriptive results you generate with Predixion Insight. These results can come from predictive queries, accuracy tests, or any of the Insight Now insights.

Predixion Insight results appear as chiclets in the Predixion Pane in Excel or in the Predixion Insight User Portal. You can provide a custom name for the result by clicking on the pre-assigned name and typing over it. In order to share the results, click on the collaboration button.

In the Share Results dialog that appears, select the collection or individuals you wish to share the result with and click OK.

After the results are published to a collection, they will be visible to anyone who has collection permissions. The chiclet for the result will indicate that the result is shared and not in an individuals Predixion Insight workspace.

Predixion Insight results can be accessed via the Predixion Insight for Excel client, the Predixion Job Source for SQL Server Integration Services, the Predixion Insight ODBC provider, or the Predixion Insight API. This allows for a wide variety of scenarios where you can embed predictive results directly into your line of business applications.

Using Visual Macros

Sharing named results is typically used to put your predictive results into a production scenario. Whether the results will be integrated into a business application, a data warehouse, or viewed using a reporting tool, any results you want to share will have to be generated on a regular basis. Visual Macros is a technology in Predixion Insight that allows you to describe a predictive action or workflow in an Excel worksheet and then combine and execute that action any number of times. Using this method, combined with Predixion Insight’s sharing ability, API’s and integration with SQL Server Integration Services allows for a powerful predictive application platform.

Most Predixion Insight server actions are scriptable via Visual Macros. In order to create a Visual Macro, click on the options arrow on the Finish or OK button on a Predixion Insight for Excel dialog or wizard. This will provide options where you can create a Visual Macro as well as execute the action of the dialog or wizard.

A Visual Macro itself is a worksheet description of the task you described using a dialog or wizard that you can edit in plain language without having to understand the details of any particular scripting syntax. Multiple Visual Macros can be pasted into a single Excel worksheet to create a multiple step operation. The following Visual Macro publishes a result to a shared collection and provides a new name.

Visual Macros can be executed by using the My Macros button in the Predixion Insight for Excel client, the Predixion Insight API – including the Predixion Insight for Excel VBA API – and the Predixion Insight Execute Visual Macro Integration Services task.

Saturday, November 12, 2011

Our own 11-11-11–the launch of Predixion Insight 2.0!

Although the weekend is likely to be more filled with Skyrim than Predixion Insight, today (being minutes before I posted this) we launched our groundbreaking Predixion Insight 2.0 cloud predictive analytics service. Accompanied by the musical stylings of our own Matthew Meadows the dev team works out the last details in order to make sure we have a smooth deployment of the cloud platform. As the final build draws to a close, the team chats about the distribution of Now & Laters , laughs about political gaffes, and the third thing, oops, I can’t remember the third thing.
The last few weeks as we’ve been able to step back and actually use the product we’ve built, it’s truly a delight. Bogdan and I have both just stood in the hallway and related how fun it is. Who would think that a predictive analytics business application could be “fun”? The collaborative visualizations we’ve added to the product transform how we think about problems and how we talk about them. The features even have changed how we conduct bug triages as we can link to the product directly from Visual Studio bug reports and have the issues presented right there. No need to capture a bitmap or rely on the descriptive range of test prose – we can just link to the freakin’ issue! Boom – there it is!
And it’s pretty too – overlaying tiles and animated graphics really transform the pattern discovery experience. Being able to share your explorations with colleagues through commenting and actually see exactly what they were seeing when they made the comment is remarkable once you start using it. I can’t imagine going back to a single analyst view of the world after using Insight 2.0 – it would just seem….stifling.

This release means a lot to me personally – my goal has always been to change the world through software, and with Predixion I believe I’m in the right place with the right team to make it happen. We have an incredible development org right now that deserves all the credit in the world for building such an amazing product that will only get better. I really feel honored to be working with this great team that made this possible:

Bogdan Crivat
Shuvro Mitra
Abdul Shameer
Duong Nguyen
Jeff Willis
Tatyana Yakushev
Matt David
Yimin Wu
Matthew Meadows
Raghu Ramachandran

Also – it goes without mentioning that the development team couldn’t make this happen in a vacuum and it’s the SoCal and roaming Predixion team members and of course our investors that create the environment that allows us to deliver our vision and technology to the world. So without listing our company directory, I’ll just say that I appreciate all the pulls and tugs and support from all the other Predixionites that make this possible.
Anyway - 2.0 is really just the beginning from which we will continue to make dramatic changes in not just “predictive analytics” but how people value and appreciate their data and how data can make a difference in people’s lives. And just to wrap it up, and since I made it today – here’s a tutorial video introducing the frame controls for Predixion Insight 2.0.

Predixion Insight 2.0 Viewer Controls Tutorial

Tuesday, November 8, 2011

Introducing Predixion Insight 2.0!

We are on the verge of releasing Predixion Insight 2.0 and I wanted to give everyone a heads up on some of the new capabilities. It’s been a very exciting release and the product (if I may say so myself) is looking beautiful! Every time I step back and actually use the product (rather than develop it) I’m stunned by what we’ve accomplished!

We have in short measure changed what it means to “do” predictive analytics. Predixion Insight 2.0 provides real advances in discovery and collaboration that significantly improve how users will even be able to talk about their predictive work.

In my mind there’s so much more road for us to travel, so much more we can – and will – be doing to transform predictive analytics to predictive intelligence – but this milestone puts quite a bit of that highway behind us.

I put together a short vblog showing some of the changes we’re releasing this week. Later I’m sure we’ll have a nice polished version that’s messaged appropriately, but this one is just a fairly raw footage of me introducing 2.0 to you. Enjoy and come back for more!.

A taste of Predixion Insight 2.0

Tuesday, August 16, 2011

Scoring R models against PowerPivot Data–Part 3–Evaluating and Scoring

This is part 2 of a three part series about integrating R and PowerPivot.

Part 1 – Intro (includes links to necessary components)
Part 2 – Creating a classification model in R
Part 3 – Evaluating and Scoring R models in Predixion

In my previous post, I walked through the creation of a classification model in R and imported the model into Predixion Insight. The necessary files for this part are on my SkyDrive – in order to follow through with the steps in this phase you will need to import the PMML file into Predixion Insight if you don’t already have it. (Go to Insight Analytics/My Stuff – click “Import”). Also, open the Excel file “R Demo.xslx”.
In this installment I will show you how to create a profit chart using the R model against data in Excel, and then score the model against data in PowerPivot.
A profit chart allows you to determine how much profit (or loss) you will achieve by using a predictive model to select which cases – e.g. leads, customers, etc – to target. The profit chart gives you information on how to apply the model to data in order to maximize your profit based on your cost metrics.
To launch the Profit Chart wizard, click on the Profit Chart button in the “Test” section of the Insight Analytics ribbon.

Once the Profit Chart wizard appears (you will be prompted to log in to Predixion Insight if you haven’t already) skip past the intro screen and select the model you imported from R – it will be under the PMML Models node which appears after all of the Predixion datasets

Note that when you select your PMML model you will see information about the model in the description pane. Not all PMML models support retrieving the probability of output states like the models created using Predixion Insight. If a PMML model supports this functionality it will be listed in the Output section in the Description pane. Retrieving the probability of the output states is required to use Profit Charts and Accuracy charts on PMML models.

The next step is to specify the business parameters of the profit chart – that is, the real factors that are important to how the model is expected to perform. In this case, you need to choose how much the campaign is going to cost and how much you expect to receive from successful contacts. Since there is a variable cost associated with contacting leads, you will need to click on the “…” button next to the individual cost field. Specify your costs according to these screen shots and click next.

Clicking Next on the dialog will bring you to the Select Input Data page. The R Demo Excel file has some prepared test data that you can use. You can also use data from PowerPivot, or test data that is automatically separated in Predixion datasets. If you were already looking at the test data when you launched the wizard, the table would already be selected – otherwise select the test data and click next to advance.

On the final page of the wizard, you need to specify the relationships between the model columns and the columns in your Excel spreadsheet. Predixion Insight will automatically bind the columns based on the column names if possible. At this point you can click the Finish button and be done with it, however, if you want to be able to easily rerun the profit chart with different parameters or even automate the process, you can click on the little arrow on the Finish button and select “Execute and Create a Visual Macro.” This will not only start the Profit Chart, but also create a worksheet with a Profit Chart Visual Macro – that is, a worksheet with the executable description of the job you just performed. This Visual Macro can be altered and re-run however you choose, and even executed from VBA or automated via SQL Server Integration Services

When the progress dialog comes up – it’s a good idea to dismiss it by clicking on the “Minimize to Task Pane” button which allows the operation to execute in the background. When complete you can click on the “Results” button in the job chicklet which renders the Profit Chart into your workbook. You can also click on the collaboration icon to send the job results directly to your friends and colleagues or publish to shared collections.

The actual Profit Chart itself shows us the maximum profit we can expect by using the R model to optimize our campaign. Additionally it tells us the probability threshold we need to consider when selecting which leads to contact. Specifically, in this case, we should only contact leads that have a probability of conversion, as dictated by the model, of 34.4% or greater.

Now that you know how to evaluate your R model – or more appropriately – how to evaluate the predictions from your R model, it’s time for the climax of this series – applying that R model to PowerPivot data. Actually, it’s almost an anticlimax because it’s so simple, but let’s walk through the process and I’ll point out some interesting bits along the way. To get started, click on the Query button on the Insight Analytics ribbon.

As before you select the model you want to query (the R model!) and then select the input data. This time, you need to select that the Input Type is “Power Pivot Data.” If you haven’t already, Predixion Insight will launch PowerPivot and then connect to PowerPivot to collect the list of tables. One option you have with PowerPivot data that you don’t have with Excel data is that you can specify an arbitrary data filter. This allows you to, for instance, only score the leads that were received within the last month or so rather than the entire dataset – very handy for continual deployment of your models!

After selecting your data, you have to map the model columns to the data columns as you did with the Profit Chart and then you get to the results page. In the results page you choose what information you need from the model. Since we are looking for leads with a probability to convert of at least 34.4%, all we need to do is to check the Probability of Yes checkbox. Since we want to integrate the results directly into PowerPivot, we don’t need to check any of the boxes to copy source data. You generally use that option when you need to move the results to a different destination.

Finally, you tell Predixion Insight what to do with the result. In this case we are going to add the results from our R model back into PowerPivot. Specify a table name and check the box with appends the result to the original table. Predixion Insight automatically detects the best identifier column, so you don’t have to actually specify anything there.

Again, you can finish (even choose to create a Visual Macro) and either wait for the job complete or dismiss it and keep working while the job executes in the background. Once the job finishes it will automatically update PowerPivot, or if you dismissed it, you click on the results to populate the PowerPivot table. Like the ProfitChart result, you can share the results with colleagues by clicking on the collaboration button – in which case, it would probably be a good idea to add some of the additional source columns for context! Note that you can also change the name of the job chicklet by selecting and typing over the default name. This can be useful when sharing or if you want to revisit past job submissions.

Finally after you’ve fetched results you can see the results in your PowerPivot table. In this screenshot you see that Predixion Insight created a new PowerPivot table called “R Query Results” and added a new calculated column to the source table that references the predicted result for each row.

Now you can analyze your PowerPivot data the typical way (PivotTables published to SharePoint!) integrating predictive results from R. My first step would be to add a calculated column indicating which leads crossed the threshold indicated by the ProfitChart, but the possibilities are limitless!

Friday, May 20, 2011

Scoring R models against PowerPivot Data–Part 2–Creating an R classification model

This is part 2 of a three part series about integrating R and PowerPivot.

Part 1 – Intro (includes links to necessary components)
Part 2 – Creating a classification model in R
Part 3 – Evaluating and Scoring R models in Predixion

In this installment, I’m going to walk through the process of creating a predictive model in R, exporting the model as PMML and importing the model into Predixion Insight. The demo requires packages from Togaware to create models and export them– you can go to the Togaware website and follow their instructions, or you can just launch the R console and enter:

> install.packages("RGtk2") 
> install.packages("rattle", dependencies=TRUE)

Now, you may be prompted to install additional packages and you may have to restart R after the library installation is complete – it’s probably a safe bet in any case.

After you’ve installed R and the correct libraries and are in the correct state of R preparedness, you can load the necessary libraries into R using these commands:

> library(rpart)
> library(pmml)

The next step is to load the training dataset into R. As preparation for this article, I created a model (the same model as in our demo video) and used the technique in my previous post (for those who already read that, this was the point) to export the training set and saved the result as a .csv file. I’ve put the training data file up on my SkyDrive if you just want to grab it.

This line reads the.csv file and puts the result into an object called “insurance”. Note that R, due to it’s UNIX origins, uses forward slashes instead of backslashes as directory separators – also it’s case sensitive!

> insurance <- read.csv("Data/Insurance Demo/insurance.train.csv")

Next, I need to create a predictive model. The rpart command creates a tree model on a dataset given a target column. There are many additional parameters controlling how the tree is created, but for the purposes of this blog post, I’m just providing the shortest possible command. This line creates a tree model and places it into an object called “TreeModel” that we can use later

> TreeModel <- rpart(Converted~.,insurance)

The pmml command converts the tree model to PMML, assigning the result to “TreePmml”.

> TreePmml<-pmml(TreeModel)

Finally, use write to output the pmml into a file – note again the forward slashes in the file path!

> write(toString(TreePmml),"Data/Insurance Demo/RModel.pmml")

This creates the file “RModel.pmml” that, if you’re curious, you can open up and examine in notepad or any other text editor. Inspecting the text you’ll see that PMML is simply an XML representation of the patterns that were learned by the R algorithm – if you search for the word “node” you will find the structure of the tree itself.

Now that I have a model in PMML, I can launch Excel with Predixion Insight for Excel installed, switch to the Insight Analytics ribbon and click the My Stuff button. In the My Stuff dialog, I need to click “Import PMML” to load the R model. Note that PMML support is available in the Predixion Insight Free Trial and the Predixion Insight with PMML products.

After I click Import PMML and select and import my R model, it appears in the list of datasets and models under a special PMML node in all dialogs where you select a model, such as the My Stuff dialog or the accuracy chart and query wizards. For most purposes PMML models are treated the same as native Predixion models in that you can import and export them, share them with other users, and publish them to shared collections.

In the next installment, I will show you how to use the R model in Predixion to generate accuracy charts, and, as promised, score data in PowerPivot!

Wednesday, May 18, 2011

Scoring R models against PowerPivot Data - Part 1 of 3–Introduction

This series is all about combining two completely different technologies that target two completely different audiences. However, it turns out that those two audiences often have to serve the same master in looking for information and insight within their data. And typically those two audiences don’t or can’t talk to each other, which makes driving common goals a bit difficult.

PowerPivot is a tool for preparing and creating data applications from a business analyst’s perspective. You can bring data in from multiple sources, match them together, derive new information, create some calculations, and present the results in an attractive and meaningful business context. You can download and find more information about PowerPivot at www.powerpivot.com. Go ahead – it’s free!

R is a statistical language for preparing and creating data applications from a data scientist’s perspective. You can import data, apply statistical tests and analysis, derive new information, and visualize the results in a scientific and statistical context. You can download and find more information about R from www.r-project.org. Go ahead – it’s free!

So the question remains, how can we make these two worlds collide in a meaningful way that takes the science performed in R and applies it to the business context of PowerPivot. You knew I was going to say this, but Predixion Insight is the way! Predixion Insight and Predixion Insight for Excel allows you to take predictive models created in R and apply them to data in PowerPivot (and Excel actually), thereby taking the scientific abstracts and providing them concrete business context.

In this three part series, I’m going to walk through the complete process of creating a model in R and then applying it to data in PowerPivot by means of Predixion Insight. The four parts will cover the following topics:

Part 1 – Intro (you’re reading this now)
Part 2 – Creating a classification model in R
Part 3 – Evaluating and Scoring R models in Predixion