Thursday, December 9, 2010

Predixion Insight Update



We just launched a new, incremental version of Predixion Insight with a whole bucket load of new features – some subtle, some more obvious and in your face.  Actually – many you shouldn’t even notice – just have a feeling of the product being “better” all around.

In this posting I’ll just give a quick overview of some of the new additions and later I’ll get around to providing details on all the goodies.  It’s better if you just try them our for yourself anyway – if you are a current user, just log in and you’ll be prompted to update, and if not what are you waiting for?  (One great new “feature” is that we’ve extended the free trial to 30-days, to give you ample time to check everything out!)

So I’m going to tell you about new features in my trademark stream of consciousness – no particular order kind of way.



The Normalization allows you to normalize data in Excel or PowerPivot by Z-Score, Min-Max, or Logarithm.  It’s actually really cool because you can normalize all the data at the same time instead of column per column.    Also, if you’re normalizing PowerPivot data, you can conditionally normalize so you can get Z-scores by group.  Way cool.




Explore Data


We added additional data exploration options so the “Explore Data” button became the “Explore Data” menu and the previous “Explore Data Wizard” became the “Explore Column Wizard”.

Profile Data provides descriptive statistics about your Excel or PowerPivot data.  I won’t say too much about it here since Bogdan already wrote up an excellent post just on this feature! 

Also, Explore Column has an added button to allow you to look at your data (and bin it!) in log space.  All your data scrunched up on the side?  Click the “log” button and see it spread out with a nicer, friendlier distribution.  This button also shows up in the Clean Data/Outliers Wizard as well.

Getting Started


I guess I really should have put this one first, despite this being stream of consciousness.  Anyway, noting that customers were having a hard time finding our sample data, help and our support forums, among other things, we added a handy (and pretty) Getting Started page.


PMML Support


In the Manage My Stuff dialog you can import models from SAS, SPSS, R, and any other PMML source.  You can then use all of the model validation tools and the query wizard to score and validate against data in Excel and PowerPivot.  Way to leverage your extend investment in predictive analytics tools to the Excel BI user base!

Insight Index and Insight Log

image The Insight Index is an automatically created guide to all of the predictive insights and results you create with Predixion Insight.  This guide provides descriptions of every generated report along with links to each report worksheet that are automatically updated when you rename worksheets and change report titles.  The Insight Log automatically maintains a trace of all Predixion Insight operations that generate Visual Macros.  These operations can be re-executed individually or copied to additional worksheets to easily create custom predictive workflows.  Both of these features can be turned on or off in the options dialog.

VBA Connexion

Predixion Insight now supports VBA programming interfaces allowing you to create custom predictive applications inside Excel.  It’s like super hard – for example, look at this excerpt I wrote for a demo that applies new data to an existing time series model so you can forecast off of a short series:


Oh, wait – it’s not hard – it’s easy!

Visual Macros


While we’re on the topic of Visual Macros, I should mention that Visual Macros now support all of the Insight Now tasks as well as the Insight Analytics tasks.  This means that you can easily encapsulate any operation we perform on the server in a macro and use it to string together your own workflows.

Other stuff working better

There are always little things here and there that are better as well that you don’t notice until you get there – the Query Wizard is cleaner and easier to use.  Exceptions – even ones caused by data entry – give you an immediate way to provide feedback directly to the dev team.  And I can’t even begin to talk about how much better our website is!  (Mostly because I’m out of time to write this post – HA!)

Really – try it out and let me know what you think.  If you’ve tried it before go check out what’s new – if not, now is a wonderful time to do so, it’s lookin’ pretty good.

Tuesday, September 14, 2010

And launched! (plus some secrets)

Yesterday we officially launched our first  offering from  Predixion Software – titled “Predixion Insight”.  For those who haven’t been following along, Predixion Insight is a cloud predictive analytics service access through an embedded Exceyippeel client (Predixion Insight for Excel) that has absolutely no infrastructure or procurement friction and works with Excel 2007 and botht he 32-bit and 64-bit versions on Microsoft Excel.  Oh – and it’s also directly integrated with Microsoft’s new PowerPivot offering allowing powerful analytics, business modeling and now, predictive analytics, right in the Excel working environment.

We actually closed down the beta and turned on Insight’s lights a week ago today.  Coming from a “packaged software” background it was interesting that shipping software in a cloud environment all boils down to switching a DNS entry on and we’re up and running.  We actually had one customer forego the free trial and become our first paying customer on day one!  A lot of our beta customers came back to take advantage of the free trial (sign up here if you haven’t already) and our service has been humming along making predictive magic for everyone quite nicely.  I do thank all of our beta customers for helping out and finding issues based on configurations and connection performance that would have been difficult for us to find out on our own – we also managed to incorporate quite a bit of feedback from the beta into the product.

On Friday we celebrated the launch in the dev offices with a toast of Pyrat Rum and then heads down to keep the Predixion train going.  Personally I’ve been busy with Simon, our CEO, demonstrating the product to industry analysts – 20 so far, and many more to come – and getting great feedback and some exciting responses (first published notes here, here, and here), while designing elements of our Enterprise offering as well as some exciting incremental functionality.

One interesting thing about any product release is always what is the last feature to make the cut.  What is that last piece of functionality that you just need to put in or that you want so badly that it gets in no matter what.  In our case it’s all in the task pane of Predixion Insight for Excel – the task pane was definitely the runaway surprise success story for the beta – one participant even responded “I wish all software worked that way.”  Given how happy people were with this feature, and we always wanted to add a little more functionality, we made sure that search and filter made it into v1.

image With this feature you can type arbitrary text and Predixion Insight for Excel will search all fields (including extended info) of each task and only display the results, or you can filter based on task type or items that are expiring soon.  Neat, huh?  Ok, maybe just “ok”.  Anyway, since you’ve already downloaded, installed the software, and signed up for a free trial, you may have noticed that if you select one of those filtering options it actually places something in the search box like this:imageThis is totally meant to imply that there can be other things you could “type” into the search box to filter your tasks, and as of today, this is the only place you’ll ever find out about what they are.  Search tags usage is simply “tag:value”  if the engine finds the “tag:” starting the search string it uses it – plain and simple – it’s not even case-sensitive.  The super-secret Predixion Insight for Excel search tags are:

Tag Value Description Example
tasktype InsightNow
Returns tasks of the specified task tasktype:Query
expires integer hours Returns tasks expiring within the specified number of hours expires:48
created integer hours Returns tasks created within the specified number of hours created:2
duration integer minutes Returns tasks with a duration less than the specified minutes duration:1
completed (optional)
Returns tasks that have completed, or not completed:
results (optional)
Returns tasks that have results to download, or not results:
status succeeded
Returns tasks with the specified status status:pending
tag any string Searched only the “tag” field of the task – i.e. the large bolded text tag:Profit Chart

You can add a bang (!) to the beginning of the tag to return any results not specified by the filter – for example you can get all the tasks created in the last hour by using the string “created:1”, and you can return all the tasks not created in the last hour by using the string “!created:1” – very handy indeed.

Oh, and just for a bonus, you can edit the “tag” (large bolded text) of any task for  your own personal edification – that is, you can change “Profit Chart” into “My Superbad Market Mayhem Chart” and then find it with the search “tag:Superbad

Anyway, that’s what is there for now – who knows what may come as Predixion Insight grows…..

Wednesday, August 18, 2010

Closer to liftoff…

This week we officially launched our public beta, and itimage was one of those moments where you’ve pushed so hard running on adrenaline that when you’ve reached that summit you collapse because you can finally sleep a good sleep – if only for a moment.  With the beta launch we have people from around the world enjoying predictive analytics in the cloud via Predixion Insight.  It’s exciting watching from behind the scenes as customers launch asynchronous predictive tasks ranging in size from a few kilobytes to 100 megs.  The machinations of a Rube Goldberg contraption comes to mind as the pieces of the system coordinate – a user presses a button causing their data to be launched to the image cloud while simultaneously they are automatically provisioned across an array of servers.  The data shuttled seamlessly and invisibly between tasks on their behalf being dissected and analyzed before being dropped into a predictive report right back on their desktop. 

The movement from development to beta deployment is really wonderful for me personally.  If you haven’t yet seen it, go to our website and click on play video to get an overview of the company.  If there’s one word I can say about that piece of “marketing,” it is that it is sincere.   Go ahead – go watch it.  This is something I’ve been working toward a long time.  We’re releasing a version 1 product and we have a lot to do to fully reach our goals, but right now, any user, anywhere, can access powerfulimage easy-to-use predictive analytics without having to jump through hoops for procurement, acquisition, installation, management, etc. etc. etc.   By creating an Excel-native, subscription-based predictive service, we’re taking the traditional barriers barring people from even opening the door to predictive analytics and slashing them to the ground.  

So I was going to write a longer post explaining some more details about the product, but you should try it now (and anyway, Bogdan already wrote a great post with some feature details)  You can watch a demo that gives a lightning fast overview of the product here and then go download and enroll for the “free trial” beta.  I have it on good authority that there may be some interesting beta events for accomplished users, and you still have the opportunity to get in early, so don’t wait!

Sunday, August 1, 2010

Predixion on the brink….

   I officially started my career as “founding CTO” with Predixion on January 6, 2010, and now, just 7 months later we are on the brink of launching the VIP beta of our new product and service, Predixion Insight, on August 2.  WithLG1 a development team of only 5 people we’ve created, what I think, is a truly disruptive entry in the predictive analytics space, and we’re just getting started.

It’s been a very exciting time – meeting the co-founders of Predixion, deciding to venture off from Microsoft to start something new based on the ideas developed over the past several years, recruiting the best development team you could ask for, filming corporate videos at my house, meeting with customers, partners, and venture capitalists – there hasn’t been a boring day yet!

This last week we’ve moved to a new office space in Redmond and wrapped up the bits for our VIP beta.  This beta is limited to only 12 select people.  We ran two incredible online demos and some feedback we received: “let me say that I loved it”, “Can't wait to play with the product!”, “Based on what I saw yesterday, Insight is more like a coral reef than a warm bath!”

Anyway don’t be worried that you will be left out because you’re not part of our VIP beta – we are quickly filling up our next phase of the beta offered on a first come-first serve basis.   This phase will be launched on August 16th and you can sign up on our website.  We’ve been working hard and fast at making a product that you can use immediately, every day, without boundaries, and we’re on the brink of delivering it to you.  Over the next few weeks we will be creating collateral materials that make using Predixion Insight even easier.  Stay tuned, true believers, you’ll like what we have coming!

Friday, May 7, 2010

Bootstrapping Windows on GoGrid – getting your admin password on the box.

 I spent a lot of time this week working on trying to get our service running on GoGrid as a potential alternative to Amazon’s EC2.  They jury’s still out, but they seem to offer better hardware for the price.  There are a lot of other pro’s and con’s between the two services, but maybe that’s a subject for a future article – maybe after we make a final decision!  The nature of our service requires that we can perform on-demand machine requisitioning and provisioning.  Using Amazon’s EC2, certain aspects were easier than GoGrid, due to the nature of the way they handle server images “AMI” in Amazon lingo, “MyGSI” in GoGrid.  In short, the nature of the sysprep step performed on a newly provisioned machine at GoGrid causes some problems with certain services we need to run and user accounts we need to provision.
Part of the issue has to do with the way GoGrid provisions administrator passwords – on a newly provisioned machine, the administrator account will have a new password, which you would expect, but also, any additional administrators you create are on the image aren’t valid after provisioning.  So the GoGrid-provisioned password is pretty much all you have.  This is OK if you can interactively logon to the machine after provisioning, but not so OK if you want to do this automatically.  To solve this problem, I came up with a method to fetch the admin password from GoGrid itself from the machine after launch.  We trigger this via a web service call after the machine is launched, but presumably you could do this on a startup event as well – I haven’t experimented with that as of yet, but presumably it should work.
The difficulty in the solution is simply due to the limited information you have about your machine from your machine.  The basic approach is to call the GoGrid API to get the list of passwords from all your machines, and then find the password that matches the public IP of your machine.  In order to use this code, the first thing you need to do is to go to your GoGrid account page and add an API key which you will use to securely interact with the GoGrid service.  The type of API key should be System User, as that is required to fetch passwords.  This key will be embedded in your code on the GoGrid image, so you should take necessary steps to protect it.
In this solution I use the GoGridClient class from the GoGrid Wiki Documentation – copy that code and specify your api_key and shared secret.
The first task is to write a function to get the passwords from GoGrid (we wrote the GoGridIPType and GoGridIPState enums – they contain the values in the code):

public static string GetPasswordsRaw() // returns the raw XML as provided by GoGrid
    string returnValue = String.Empty; 
        GoGridClient grid = new GoGridClient(); 
        System.Collections.Hashtable parameters = new System.Collections.Hashtable();
        parameters.Add("format", "xml");
        string requestUrl = grid.getAPIRequestURL("/support/password/list", parameters);
        returnValue = grid.sendAPIRequest(requestUrl);
   return returnValue;

After you have this function, you need a function to get the list of ip addresses from your machine and compare it to the ip addresses from GoGrid.  The function first grabs all of the ipaddresses from the local machine and then uses Xpath queries to isolate and iterate the password objects from the GoGrid response.  Then it uses more Xpath queries to grab the ipaddress and password from each object.  Finally it checks to see if the ipaddress matches any ipaddress on the machine and returns the associated password.

private string GetAdminPassword()
    // Fetch ip addresses for the local machine and store into a list
    List<string> ipaddresses = new List<string>();
    System.Net.IPHostEntry IPHost = System.Net.Dns.GetHostEntry(System.Net.Dns.GetHostName());
    foreach (System.Net.IPAddress ip in IPHost.AddressList)
        // Only take the IPv4 addresses
        if (ip.AddressFamily == System.Net.Sockets.AddressFamily.InterNetwork)
            Report("Found ip: {0}", ip.ToString());

    // Get the password information from GoGrid and load into an XML document
    string xml = GetPasswordsRaw();
    XmlDocument d = new XmlDocument();

    // Use Xpath to select the "password" objects
    string path = "/gogrid/response/list/object[@name='password']";
    XmlNodeList nodes = d.SelectNodes(path);
    foreach (XmlNode node in nodes)
        // Extract the password and ipaddress from the password object
        XmlNode pwdnode = node.SelectSingleNode("attribute[@name='password']");
        XmlNode ipnode = node.SelectSingleNode
            ("attribute[@name='server']/object[@name='server']/attribute[@name='ip']" +

        // API Key passwords will not have an ipnode
        if (pwdnode == null || ipnode == null)

        string password = pwdnode.FirstChild.Value;
        string ipaddress = ipnode.FirstChild.Value;
        // Check to see if the ipaddress belongs to this machine
        if (ipaddresses.Contains(ipaddress))
            return password;
    throw(new SystemException("Did not find password"));

Once you have the admin password, you can use it to impersonate the box admin as necessary to run additional code requiring such privileges.   It really helps in allowing us to automatically deploy boxes on GoGrid.  Given the creative commons license of the GoGrid API, the same technique should apply to other cloud providers as necessary.
Hope this helps with your cloud infrastructure deployments – love to hear your comments.

Friday, April 30, 2010

Cheers from the Predixion Dev Team!!


We’re assembled and ready to rock!  Have a great weekend!

-Jamie and the PX Devs

Tuesday, April 13, 2010

Cases lost in Time


This post was inspired by a question on the MSDN data mining forum that we knew would come to us one day.  When developing the SQL Server Data Mining platform, we had made one of those design decisions that was kind of wonky, but made sense if you turn your head sideways and squint a bit.  It all resolved to the fact that since our Time Series algorithm was based on Decision Trees, we could use the Decision Tree viewer to show more information about your time series model than anyone had ever seen before – you could see a piecewise linear regressions for each distinct pattern over time – it was one of those “OMG – it’s full of stars….” moments.

monolith Anyway, one of the things that you get to see when using the Decision Tree Viewer is the number of cases or facts or rows or however you want to call them.  This information shows up in the Mining Legend, like this:

image So, when you create a time series model, you get the same kind of information – Total Cases = some number.  Nobody really considered that number too harshly in SQL Server 2005, but then we greatly improved the Time Series algorithm in 2008, and things changed.  The most obvious change is that we supplemented our 2005 decision tree algorithm, ARTXP, with a (fairly) standard implementation of the ARIMA time series algorithm.  A user noticed that if they created a model using only the ARIMA algorithm, the “Total Cases” number was higher than when they used ARTXP or the default blended mode.

S0, is ARTXP eating cases?  Is it ignoring valuable slices of time lost to eternity?  No, not really – like I said, if you turn your head and squint it really does make sense that ARTXP will have less “cases” than ARIMA.  The part that doesn’t make sense is that to satisfy the devil of “consistency” we kind of overloaded the term “cases”.  ARIMA – Auto Regressive Integrated Moving Averages – is more of what you would naturally think of in a forecasting algorithm – it performs calculations on time slice values to determine patterns and make forecasts.  ARTXP – Auto Regressive Trees with cross (X) Predict - on the other hand, doesn’t work in a “way you would naturally think” kind of way.  ARTXP decomposes the time slices into a series of “cases” that it then feeds to the decision tree engine.

Let’s examine how this works.  Let’s take a simple series with 10 values – this one should do:

11, 12, 13, 14, 15, 16, 17, 18, 19, 20

If we assume AR(4), that is, using 4 values to predict our “target”, we get “cases” that look like this:

Case Input Input Input Input Predict
1 11 12 13 14 15
2 12 13 14 15 16
3 13 14 15 16 17
4 14 15 16 17 18
5 15 16 17 18 19
6 16 17 18 19 20

You see that for each time (t), we need to take the previous values (t-1), (t-2), (t-3), and (t-4).  This means that the first four values of the series aren’t available as case targets – they are preceded by nothing.  In the end, for 10 time slices using AR(4), you end up with only 6 “cases” to analyze.  Whereas if you used ARIMA, it would simply use all the slices and the “Total Cases” would be 10.

So, like I said – turn your head and squint and it makes sense.  Of course, once you understand this, the “Total Cases” for the ARIMA models doesn’t make sense.  (cue evil laughter).  Yeah yeah – it doesn’t make sense, but you know what it means.

Anyway, for other cases lost in time, I realized I missed an important series in my digest of postings of yore – the incredible Time Series Reporting Stored Procedure series – a three-part series in four parts – go figure – it’s kind of like that cases lost in time in reverse, I suppose.  This series shows how to create a report that contains both the historical data and predicted data from a Time Series model.

TS Reporting Sproc Part 1
TS Reporting Sproc Part 2
TS Reporting Sproc Part 3
TS Reporting Sproc Part 4

I do believe that is the last of the digested posts of yesteryear.  I’ll have some more coming up as Predixion motors on!

Tuesday, March 16, 2010

Executing DMX DDL from a linked server

Luckily before I left MSFT, I had the foresight to change my contact email on that old wet blog of mine that I’m no longer able to contribute to – no hard feelings.  I received a question which is something that has come up frequently enough that it just needs to be dealt with so for all future posts, you can just say “look at the Executing DMX DML from a link server post on Jamie’s new blog – the only blog that matters,” and be done with it.

So just for definition’s sake – DMX – Data Mining eXtensions to SQL, DDL – Data Definition Language, DMX DDL – DMX statements that create or modify objects!  You would think you can add two TLA’s and get an SLA, but that stands for “service level agreement” which has nothing to do with this post.  This post could also have been named “how to execute non-rowset returning commands on Analysis Services from SQL Server”, but not only do I digress, I like the actual title better with the dual unpronounceable acronyms..

Anyway, in my DMX Digest post, I referenced this post which showed how to execute DMX statements from SQL and put the results in SQL table.  In short (just in case you don’t want to click those links), you set up a linked server and then use OPENQUERY to make the DMX call.  One (well, at least one) adventurous reader sought fit to try other kinds of statements than queries – in particular a DROP MINING STRUCTURE statement.  The problem with DROP MINING STATEMENTS – and other DDL statements is that they don’t return a rowset, which is a requirement for OPENQUERY – which really wants some output columns to bind to.

The nice way to do this would be to take advantage of the SQL EXECUTE command, which, at least in SQL Server 2008, has been extended to execute commands on linked servers.  Such a command would look very elegant, like this:

AT MyDataMiningServer

Wow – that would be nice!  If only it worked, that is.  If you endeavor to try such a think you’ll get the pleasant response of “Server 'MyDataMiningServer' is not configured for RPC.”  What this means, evidently, is that the nice way of doing things isn’t going to happen.

But, never fear, we can take advantage of all that boundless flexibility built in to SQL Server Data Mining to make it happen.  All we need to do is to create some kind of statement that can be called from SQL Server’s OPENQUERY that executes a statement of our choosing.  And the way to do this is to write a stored procedure that executes a statement and returns some sort of table.  This is the really big hammer solution to the problem.

And what do you know, I happen to have that stored procedure right here…..

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using Microsoft.AnalysisServices.AdomdServer;

namespace DMXecute
public class Class1
public DataTable DMXecute(string statement)
DataTable Result = new DataTable("Table");
Result.Columns.Add("Column", typeof(int));
if (Context.ExecuteForPrepare)
return Result;

AdomdCommand cmd = new AdomdCommand(statement);

return Result;

And calling it from SQL Server – easy

EXEC sp_addlinkedserver @server='MyDataMiningServer', -- local SQL name given to the linked server
@srvproduct='', -- not used
@provider='MSOLAP', -- OLE DB provider
@datasrc='localhost', -- analysis server name (machine name)
@catalog='MyDMDatabase' -- default catalog/database


SELECT * FROM OPENQUERY(MyDataMiningServer,'CALL DMXecute.DMXecute("DROP MINING STRUCTURE [MyMiningStructure]")')


Of course, you can execute any DMX or MDX statement you want there, so this is simply dangerous is general – you definitely shouldn’t be sending unvalidated user input through here for fear of SQL Injection style attacks.  A better way, in general, would be to write stored procedures that performed exactly the operations you need taking just the object name as a parameter.

Sunday, March 14, 2010

Gluten Free Waffles

OK, this posting has nothing to do with SQL Server, Data Mining or Predictive Analytics, or even Predixion.  It’s kind of a follow-up to my previous post – I’ve gotten some emails and other communiqué about my twins diet.

Every Sunday is family waffle day, and I’ve come up with a pretty good waffle recipe for the boys.  I have to make a “regular” batch for the older kids, April and myself and I make a gluten-free, casein-free batch for the boys.  Of course, I have to use a separate waffle iron to avoid contamination.

Anyway, the recipe I use is as follows:

Turn the waffle iron on to high to heat up while you prepare the ingredients.  In a medium-large bowl, mix all the dry ingredients.  In a separate medium bowl, beat the egg yolks up a little bit, and then add the vanilla, rice milk, and canola oil.   Pour the wet ingredients, save the egg whites, into the dry ingredients and mix well.

Using an electric mixer, beat the egg whites until stiff peaks form.  Gently fold the egg whites into the mixture so it is mixed but all the air doesn’t escape from the egg whites.

Pour 1/3 cup of mixture onto each waffle area of the iron.  Gluten-free waffles take a bit longer to cook then their glutinous counterparts – I usually increase the time by 1 minute, which means it takes 6 minutes for a batch on our waffle iron, but your mileage may vary.

NB:  I use a PAM cooking spray to keep the waffles from sticking.  PAM and all other cooking sprays contain soy lecithin.  Typically, we avoid soy, but it seems that my boys aren’t sensitive to small amounts of soy lecithin.  If your child is sensitive, you can brush on canola oil with a pastry brush or paper towel.

Makes 10-12 waffles.


Friday, March 12, 2010

Just some doodles… and my favorite old post…

Since all my Facebook friends were kind enough to remind me that it was by birthday today, I decided I can post anything .   At the end I’ll let you in on my favorite posting of all time from my past life, but for now something completely different.

I looked around my desk and found these weird scribbles that come about whenever I’m on the phone.  I can’t necessarily remember what I was talking about, but here they are:

This creature was on my daily todo list -

TODO Monster

These guys came out of my pen while I was talking to a lawyer about hiring processes and H1-B’s.

Legal Creatures

And, if I remember correctly, this was a stressful conversation.  I probably should have dropped the memory into that beaker thingie….


Anyway, enough of rambling doodles. I’m sure to make more since I got a lot of response to my job posting on StackOverflow and will be calling to follow up on.

And for my favorite previous-life posting of all time ….

These Kids Won’t Eat Anything!this post describes this really cool demo I did at PASS 2008 building a model using the Excel Data Mining Addins to predict what possible foods my twin boys would eat culminating in a deployment of that model to my mobile phone.  Lot’s of fun and a pic of the boys as well.

So I think that wraps up all the posts from worth mentioning – looking forward to providing you with new content – and maybe some doodles.

Thursday, March 11, 2010

SQL Server Data Mining Code Posting Digest

As promised, here is a digest of my old blog’s postings about coding with respect to SQL Server Data Mining.  I actually thought there would be a lot more, but it turns out that most of my evangelizing in that space ended up as Tips & Tricks on – what do you think?  Should I digest those as well?

Anyway, here are the relevant postings from “old blog” delivered to you on my new blog.

DMX Queries – The Datasource Hole – this is probably the most important coding post.  This post provides the source code for a stored procedure to allow you to create datasources from a DMX call, which are required in order to query external data.  Since almost all data you would mine is external, this is pretty important!

Tree Utilities in Analysis Services Stored Procedures – this post provides a set of stored procedures for getting a variety of information from tree models, for example, the shortest path, longest path, etc.  Neat stuff that I used to help reduce the size of a gargantuan online questionnaire.

The amazing flexibility of DMX Table Valued Parameters – this post shows how table-valued parameters were meant to be done and how you can use them.  No offense to the SQL relational engine – natch.

Automatic Generation of CREATE MINING MODEL statementsthis post shows how to generate the DMX for a CREATE MINING MODEL statement given the model.  This is particularly useful, for example, when the model was created with BI Dev Studio or some other interface that uses XMLA.

The next set of links aren’t my own code, but references to other people’s great work in adding to the SQL Server Data Mining experience

Support Vector Machines for SQL Server Data Mining – A reference SVM plug-in implementation available on CodePlex by Joris Valkonet

Visual Numerics integration into SQL Server Data Mining – A great whitepaper by Visual Numerics discussing the C# plug-in algorithm model

Automatically Labeling Clusters Using Analysis Services Stored Procedures – another codeplex project – this time from furmangg – giving sprocs containing some cluster labeling hueristics


So that’s it for this digest and I think I’ve covered the most important posts – maybe next I’ll create a digest of the fluff pieces?  Let me know what you think….

Friday, March 5, 2010

Openings at Predixion

I wrote this job description up to capture a bunch of interest, but didn’t really have a place to post it, so why not here?  If you have comments on the posting – email me!  If you think you may be interested – email me!  Eventually this copy will be floating around elsewhere, but you can read it here first!

I’m the CTO of a new, well-funded startup company in the Seattle, WA (Eastside) area. I’m looking to hire around 10 developers over the next several months, possibly a couple skilled test engineers. Our company is building some unique software on the Microsoft stack, so MS-haters need not apply, I’m not much for dogma anyway. The software will by analytical in nature with a business focus. We’re offering competitive salaries and a great team environment and a well-managed company put together with the clear and careful forethought towards successful execution and high valuation.
I’m looking for developers that are talented, mature, creative and confident enough to strive forward in an ambiguous startup atmosphere, yet also maintain enough humility to understand that they are on a team and boastful pride will get you nowhere. Also, we’ll be in shared offices at least for a while, so a modicum personal hygiene is respected. Strong CS fundamentals are a must, demonstrated ability and strong references strong plusses. The vast majority of coding will be in .NET, although there may be some opportunities for C++ as well. Additional bonus points for any of the following skillsets:
  • Silverlight
  • Sharepoint
  • SQL Server + SQL Server BI
  • Cloud Computing – EC2/Azure
  • R and other statistical languages
  • Predictive Analytics and Data Mining
Originally I was only considering candidates with unrestricted eligibility to work in the US, but after talking to our lawyer this morning I can consider candidates who need sponsorship, provided requirements are met.
As I stated above, we’re in the unique position to be offering competitive salaries in a startup company. We will be offering an excellent insurance package once we’ve grown to enough employees, but until then we can assist with any COBRA payments to continue your existing healthcare. We’re also located in one of the few (or so I’m told) “green-certified” buildings and we’re in walking distance of many restaurants, a post office, a library, shopping and parks.
So, who am I? I have 18-19 years experience in software development shipping a wide variety of products, developing the low-level architectures to the front ends and in the mean time managing teams from 3 to 30+ people, traveling worldwide to present at conferences, and also wrote a couple books. I’m a family guy with four kids (thus the reason we’re investing in a good insurance plan) and I believe in my product and my team. I enjoy the team environment and have been lucky enough to have found excellent talent to work with and am looking for more. My goal has always been to change the world through software in whatever ways that I can.
So, if you are interested in these opportunities and live in the area or are willing to relocate on your own (sorry, no relo packages), drop me a note with a resume clearly stating relevant skills and experience, references and all that jazz to .

Thursday, March 4, 2010

DMX Posting Digest

OK, I promised a digest of all the great postings past – really, I’ll leave out the rather lame ones.  I figure I’ll start out with all of the DMX related posts.  Interestingly enough I see that never in any posting did I ever really introduce DMX – Data Mining eXtensions to SQL!  What a jerk – I must have been trying to sell a book or something.  In any case, here are the relevant DMX postings with some descriptions so you don’t have to actually go to the old blog for things that aren’t interesting.  Maybe after these digest thingies are done, I can start from the beginning, so to speak.

These are most of the postings dealing with DMX.  Some postings that are more “code-like” I’m saving for a future digest article.


To (a) or not to (a), that is the question- This posting demonstrates a neat modeling trick for transforming a multinomial target into a series of binomials

Time Series Prediction – discusses the tricky nature of getting deviation information from the time series algorithm.  PROTIP – the posting is really just a redirect to this article I wrote.

Predicting the non-majority state – demonstrates how to specify a threshold probability for “true” using DMX

Predict based on rules alone – shows how to filter Association Rules prediction queries to only show results that are based on learned rules and not simple popularity.

Predicting based on rules alone and getting everything you always wanted – Modifies the query in the previous post to use the TopCount function to filter the result set so you get the right number of results (assuming those results exist in the model).

Executing multiple DMX statements from SSMS – Not really a “DMX” post, but a useful one that likely won’t show up in any other digest.  Shows how to use SQL Server Management studio to execute multiple statements, essentially allowing you to create DMX “scripts”.

New DMX Syntax option in SQL Server SP2 – Shows the DMX generalization introduced in SQL Server 2005 SP2 (also in SQL Server 2008) that allows you to bind DMX function parameters to variables or even input columns.

Getting Data Mining results into SQL Tables – Demonstrates how to directly import the results of a DMX query into a SQL table – no middleman (that’s you SSIS!) required.

Querying the Dependency Net – Not particularly DMX, but it shows you how to call the stored procedure to get the information displayed in the dependency network view.

Unwinding MDX Flattening Semantics with DMX – And finally – totally NOT DMX, but there will never be another place for this great trick showing how to better understand MDX semantics by shoving the result through a DMX query!

Enjoy, and come back for future digests – I think the next one will be CODE….

Monday, March 1, 2010

The blog is dead! Long live the blog!

It took me a while to start up a new blog after I lost access to my old blog on MSDN by leaving Microsoft.  It's kind of crummy to have to leave it behind, but I guess it makes sense from a MS point of view - don't want ex-softie's posting any bad vibes on the MSDN site.  Not that I'm going to be doing that anyway, since my new venture is building on top of the great data mining work inside SQL Server Analysis Services.  In any case, this blog is outside of any professional entanglements so it can be my permanent home regardless of where I am.

Speaking of that, at my new company Predixion Software, we finally have a "parking page" where you can learn very little about what we're doing, but you can also sign up to get more info when it's available.  April (my lovely wife) already signed up - maybe to get independent verification that I do actually do something?  You should sign up as well (but maybe not for those reasons...)

Anyway, it's great to be back.  Given that I can't go back to the old site and there are over 100 posts there, the first thing I'm going to do is create a handy digest of the more useful posts over the next few days.

Hopefully you found me again - feel free to drop a line anytime, and you can follow Predixion Software on twitter (we'll see how that works out....)