Tuesday, March 16, 2010

Executing DMX DDL from a linked server

Luckily before I left MSFT, I had the foresight to change my contact email on that old wet blog of mine that I’m no longer able to contribute to – no hard feelings.  I received a question which is something that has come up frequently enough that it just needs to be dealt with so for all future posts, you can just say “look at the Executing DMX DML from a link server post on Jamie’s new blog – the only blog that matters,” and be done with it.

So just for definition’s sake – DMX – Data Mining eXtensions to SQL, DDL – Data Definition Language, DMX DDL – DMX statements that create or modify objects!  You would think you can add two TLA’s and get an SLA, but that stands for “service level agreement” which has nothing to do with this post.  This post could also have been named “how to execute non-rowset returning commands on Analysis Services from SQL Server”, but not only do I digress, I like the actual title better with the dual unpronounceable acronyms..

Anyway, in my DMX Digest post, I referenced this post which showed how to execute DMX statements from SQL and put the results in SQL table.  In short (just in case you don’t want to click those links), you set up a linked server and then use OPENQUERY to make the DMX call.  One (well, at least one) adventurous reader sought fit to try other kinds of statements than queries – in particular a DROP MINING STRUCTURE statement.  The problem with DROP MINING STATEMENTS – and other DDL statements is that they don’t return a rowset, which is a requirement for OPENQUERY – which really wants some output columns to bind to.

The nice way to do this would be to take advantage of the SQL EXECUTE command, which, at least in SQL Server 2008, has been extended to execute commands on linked servers.  Such a command would look very elegant, like this:

EXECUTE ( 'DROP MINING STRUCTURE [MyMiningStructure]' )
AT MyDataMiningServer


Wow – that would be nice!  If only it worked, that is.  If you endeavor to try such a think you’ll get the pleasant response of “Server 'MyDataMiningServer' is not configured for RPC.”  What this means, evidently, is that the nice way of doing things isn’t going to happen.



But, never fear, we can take advantage of all that boundless flexibility built in to SQL Server Data Mining to make it happen.  All we need to do is to create some kind of statement that can be called from SQL Server’s OPENQUERY that executes a statement of our choosing.  And the way to do this is to write a stored procedure that executes a statement and returns some sort of table.  This is the really big hammer solution to the problem.



And what do you know, I happen to have that stored procedure right here…..



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using Microsoft.AnalysisServices.AdomdServer;

namespace DMXecute
{
public class Class1
{
[SafeToPrepare(true)]
public DataTable DMXecute(string statement)
{
DataTable Result = new DataTable("Table");
Result.Columns.Add("Column", typeof(int));
if (Context.ExecuteForPrepare)
return Result;

AdomdCommand cmd = new AdomdCommand(statement);
cmd.ExecuteNonQuery();

return Result;
}
}
}



And calling it from SQL Server – easy



EXEC sp_addlinkedserver @server='MyDataMiningServer', -- local SQL name given to the linked server
@srvproduct='', -- not used
@provider='MSOLAP', -- OLE DB provider
@datasrc='localhost', -- analysis server name (machine name)
@catalog='MyDMDatabase' -- default catalog/database

GO

SELECT * FROM OPENQUERY(MyDataMiningServer,'CALL DMXecute.DMXecute("DROP MINING STRUCTURE [MyMiningStructure]")')

GO



Of course, you can execute any DMX or MDX statement you want there, so this is simply dangerous is general – you definitely shouldn’t be sending unvalidated user input through here for fear of SQL Injection style attacks.  A better way, in general, would be to write stored procedures that performed exactly the operations you need taking just the object name as a parameter.


Sunday, March 14, 2010

Gluten Free Waffles

OK, this posting has nothing to do with SQL Server, Data Mining or Predictive Analytics, or even Predixion.  It’s kind of a follow-up to my previous post – I’ve gotten some emails and other communiqué about my twins diet.

Every Sunday is family waffle day, and I’ve come up with a pretty good waffle recipe for the boys.  I have to make a “regular” batch for the older kids, April and myself and I make a gluten-free, casein-free batch for the boys.  Of course, I have to use a separate waffle iron to avoid contamination.

Anyway, the recipe I use is as follows:

Turn the waffle iron on to high to heat up while you prepare the ingredients.  In a medium-large bowl, mix all the dry ingredients.  In a separate medium bowl, beat the egg yolks up a little bit, and then add the vanilla, rice milk, and canola oil.   Pour the wet ingredients, save the egg whites, into the dry ingredients and mix well.

Using an electric mixer, beat the egg whites until stiff peaks form.  Gently fold the egg whites into the mixture so it is mixed but all the air doesn’t escape from the egg whites.

Pour 1/3 cup of mixture onto each waffle area of the iron.  Gluten-free waffles take a bit longer to cook then their glutinous counterparts – I usually increase the time by 1 minute, which means it takes 6 minutes for a batch on our waffle iron, but your mileage may vary.

NB:  I use a PAM cooking spray to keep the waffles from sticking.  PAM and all other cooking sprays contain soy lecithin.  Typically, we avoid soy, but it seems that my boys aren’t sensitive to small amounts of soy lecithin.  If your child is sensitive, you can brush on canola oil with a pastry brush or paper towel.

Makes 10-12 waffles.

Enjoy!

Friday, March 12, 2010

Just some doodles… and my favorite old post…

Since all my Facebook friends were kind enough to remind me that it was by birthday today, I decided I can post anything .   At the end I’ll let you in on my favorite posting of all time from my past life, but for now something completely different.

I looked around my desk and found these weird scribbles that come about whenever I’m on the phone.  I can’t necessarily remember what I was talking about, but here they are:

This creature was on my daily todo list -

TODO Monster

These guys came out of my pen while I was talking to a lawyer about hiring processes and H1-B’s.

Legal Creatures

And, if I remember correctly, this was a stressful conversation.  I probably should have dropped the memory into that beaker thingie….

Stress?

Anyway, enough of rambling doodles. I’m sure to make more since I got a lot of response to my job posting on StackOverflow and will be calling to follow up on.

And for my favorite previous-life posting of all time ….

These Kids Won’t Eat Anything!this post describes this really cool demo I did at PASS 2008 building a model using the Excel Data Mining Addins to predict what possible foods my twin boys would eat culminating in a deployment of that model to my mobile phone.  Lot’s of fun and a pic of the boys as well.

So I think that wraps up all the posts from blogs.msdn.com/jamiemac worth mentioning – looking forward to providing you with new content – and maybe some doodles.

Thursday, March 11, 2010

SQL Server Data Mining Code Posting Digest

As promised, here is a digest of my old blog’s postings about coding with respect to SQL Server Data Mining.  I actually thought there would be a lot more, but it turns out that most of my evangelizing in that space ended up as Tips & Tricks on sqlserverdatamining.com – what do you think?  Should I digest those as well?

Anyway, here are the relevant postings from “old blog” delivered to you on my new blog.

DMX Queries – The Datasource Hole – this is probably the most important coding post.  This post provides the source code for a stored procedure to allow you to create datasources from a DMX call, which are required in order to query external data.  Since almost all data you would mine is external, this is pretty important!

Tree Utilities in Analysis Services Stored Procedures – this post provides a set of stored procedures for getting a variety of information from tree models, for example, the shortest path, longest path, etc.  Neat stuff that I used to help reduce the size of a gargantuan online questionnaire.

The amazing flexibility of DMX Table Valued Parameters – this post shows how table-valued parameters were meant to be done and how you can use them.  No offense to the SQL relational engine – natch.

Automatic Generation of CREATE MINING MODEL statementsthis post shows how to generate the DMX for a CREATE MINING MODEL statement given the model.  This is particularly useful, for example, when the model was created with BI Dev Studio or some other interface that uses XMLA.

The next set of links aren’t my own code, but references to other people’s great work in adding to the SQL Server Data Mining experience

Support Vector Machines for SQL Server Data Mining – A reference SVM plug-in implementation available on CodePlex by Joris Valkonet

Visual Numerics integration into SQL Server Data Mining – A great whitepaper by Visual Numerics discussing the C# plug-in algorithm model

Automatically Labeling Clusters Using Analysis Services Stored Procedures – another codeplex project – this time from furmangg – giving sprocs containing some cluster labeling hueristics

 

So that’s it for this digest and I think I’ve covered the most important posts – maybe next I’ll create a digest of the fluff pieces?  Let me know what you think….

Friday, March 5, 2010

Openings at Predixion

I wrote this job description up to capture a bunch of interest, but didn’t really have a place to post it, so why not here?  If you have comments on the posting – email me!  If you think you may be interested – email me!  Eventually this copy will be floating around elsewhere, but you can read it here first!

I’m the CTO of a new, well-funded startup company in the Seattle, WA (Eastside) area. I’m looking to hire around 10 developers over the next several months, possibly a couple skilled test engineers. Our company is building some unique software on the Microsoft stack, so MS-haters need not apply, I’m not much for dogma anyway. The software will by analytical in nature with a business focus. We’re offering competitive salaries and a great team environment and a well-managed company put together with the clear and careful forethought towards successful execution and high valuation.
I’m looking for developers that are talented, mature, creative and confident enough to strive forward in an ambiguous startup atmosphere, yet also maintain enough humility to understand that they are on a team and boastful pride will get you nowhere. Also, we’ll be in shared offices at least for a while, so a modicum personal hygiene is respected. Strong CS fundamentals are a must, demonstrated ability and strong references strong plusses. The vast majority of coding will be in .NET, although there may be some opportunities for C++ as well. Additional bonus points for any of the following skillsets:
  • Silverlight
  • Sharepoint
  • SQL Server + SQL Server BI
  • Cloud Computing – EC2/Azure
  • R and other statistical languages
  • Predictive Analytics and Data Mining
Originally I was only considering candidates with unrestricted eligibility to work in the US, but after talking to our lawyer this morning I can consider candidates who need sponsorship, provided requirements are met.
As I stated above, we’re in the unique position to be offering competitive salaries in a startup company. We will be offering an excellent insurance package once we’ve grown to enough employees, but until then we can assist with any COBRA payments to continue your existing healthcare. We’re also located in one of the few (or so I’m told) “green-certified” buildings and we’re in walking distance of many restaurants, a post office, a library, shopping and parks.
So, who am I? I have 18-19 years experience in software development shipping a wide variety of products, developing the low-level architectures to the front ends and in the mean time managing teams from 3 to 30+ people, traveling worldwide to present at conferences, and also wrote a couple books. I’m a family guy with four kids (thus the reason we’re investing in a good insurance plan) and I believe in my product and my team. I enjoy the team environment and have been lucky enough to have found excellent talent to work with and am looking for more. My goal has always been to change the world through software in whatever ways that I can.
So, if you are interested in these opportunities and live in the area or are willing to relocate on your own (sorry, no relo packages), drop me a note with a resume clearly stating relevant skills and experience, references and all that jazz to jamiemac@predixionsoftware.com .

Thursday, March 4, 2010

DMX Posting Digest

OK, I promised a digest of all the great postings past – really, I’ll leave out the rather lame ones.  I figure I’ll start out with all of the DMX related posts.  Interestingly enough I see that never in any posting did I ever really introduce DMX – Data Mining eXtensions to SQL!  What a jerk – I must have been trying to sell a book or something.  In any case, here are the relevant DMX postings with some descriptions so you don’t have to actually go to the old blog for things that aren’t interesting.  Maybe after these digest thingies are done, I can start from the beginning, so to speak.

These are most of the postings dealing with DMX.  Some postings that are more “code-like” I’m saving for a future digest article.

 

To (a) or not to (a), that is the question- This posting demonstrates a neat modeling trick for transforming a multinomial target into a series of binomials

Time Series Prediction – discusses the tricky nature of getting deviation information from the time series algorithm.  PROTIP – the posting is really just a redirect to this SQLServerDataMining.com article I wrote.

Predicting the non-majority state – demonstrates how to specify a threshold probability for “true” using DMX

Predict based on rules alone – shows how to filter Association Rules prediction queries to only show results that are based on learned rules and not simple popularity.

Predicting based on rules alone and getting everything you always wanted – Modifies the query in the previous post to use the TopCount function to filter the result set so you get the right number of results (assuming those results exist in the model).

Executing multiple DMX statements from SSMS – Not really a “DMX” post, but a useful one that likely won’t show up in any other digest.  Shows how to use SQL Server Management studio to execute multiple statements, essentially allowing you to create DMX “scripts”.

New DMX Syntax option in SQL Server SP2 – Shows the DMX generalization introduced in SQL Server 2005 SP2 (also in SQL Server 2008) that allows you to bind DMX function parameters to variables or even input columns.

Getting Data Mining results into SQL Tables – Demonstrates how to directly import the results of a DMX query into a SQL table – no middleman (that’s you SSIS!) required.

Querying the Dependency Net – Not particularly DMX, but it shows you how to call the stored procedure to get the information displayed in the dependency network view.

Unwinding MDX Flattening Semantics with DMX – And finally – totally NOT DMX, but there will never be another place for this great trick showing how to better understand MDX semantics by shoving the result through a DMX query!

Enjoy, and come back for future digests – I think the next one will be CODE….

Monday, March 1, 2010

The blog is dead! Long live the blog!

It took me a while to start up a new blog after I lost access to my old blog on MSDN by leaving Microsoft.  It's kind of crummy to have to leave it behind, but I guess it makes sense from a MS point of view - don't want ex-softie's posting any bad vibes on the MSDN site.  Not that I'm going to be doing that anyway, since my new venture is building on top of the great data mining work inside SQL Server Analysis Services.  In any case, this blog is outside of any professional entanglements so it can be my permanent home regardless of where I am.

Speaking of that, at my new company Predixion Software, we finally have a "parking page" where you can learn very little about what we're doing, but you can also sign up to get more info when it's available.  April (my lovely wife) already signed up - maybe to get independent verification that I do actually do something?  You should sign up as well (but maybe not for those reasons...)

Anyway, it's great to be back.  Given that I can't go back to the old site and there are over 100 posts there, the first thing I'm going to do is create a handy digest of the more useful posts over the next few days.

Hopefully you found me again - feel free to drop a line anytime, and you can follow Predixion Software on twitter (we'll see how that works out....)