
IBM Analytics33










































Q2 - How much governance and process is required from data science teams?

Cortnie Abercrombie
I think we've gotten right into that

Cortnie Abercrombie
Many data science teams are frustrated right now about how long it takes to "govern" external data

Alfred Essa
Minimal. We want the DS team to focus on analysis and research.

jameskobielus
And how are CDO/CAO teams managing data sicence model governance in addition to data governance?

Cortnie Abercrombie
they want to bring it in right away and start exploring - they don't need heavy handed data governance to start innovating

Alfred Essa
I do have members of DS team that have experience with IT infrastructure and process. They interface with IT for us.

Alfred Essa
@cortnie_cdo correct. we try to shield them from that.

Bob E. Hayes
If you got a team of highly skilled #datascientists who already work as a team, they might govern themselves pretty well with minimal oversight (full disclosure: I don't govern any teams).

Alfred Essa
we follow a three stage process.
Nicholas Marko
I agree. Governance certainly has a place, but the DS team should not be the ones doing this

Alfred Essa
Stage 1: Data Science Research

Alfred Essa
Stage 2: Product Validation

Alfred Essa
Stage 3: Product Development

Bob E. Hayes
Team members with complementary skills are more effective than any individual #datascientist alone > https://twitter.com/...

Alfred Essa
During stage 1 we try to keep all types of governance as light as possible.

jameskobielus
Nicholas: Why not? Why shouldn't data science teams be doing governance of the models they're building and deploying into apps?
Nicholas Marko
Well, let me clarify...

Shesh Ramachandran
@malpaso But aren't DS teams owners of the process and know the data inside out

Alfred Essa
there is governance under our chief product officers principally but it's very light; during this state we also to make minimal demands on other teams, including IT
Nicholas Marko
I think the DS teams should essentially be consumers of data that is already well governed as much as possible.

Cortnie Abercrombie
@malpaso how do you ensure enough governance for your models?

Alfred Essa
@rshesh91 say more...

jameskobielus
@malpaso How does data science research tie into governance? How does product validation? Product development?
Nicholas Marko
So, for us, governance happens at the enterprise level. We try to govern the data that lives in the stack and then have the data scientists use that data

Shesh Ramachandran
@malpaso for instance if there are issues with data quality or something wrong with the data model, wouldn't it eventually have to be handled by DS teams

Alfred Essa
@cortnie_cdo this is where our stages comes in. once the DS team comes up with a candidate product idea it moves to product for validation. more governance at that stage. and then stage three prod deve even more governance.
Nicholas Marko
As an aside, however, it is worth noting that it is impossible to make all of an organization's data really "clean.". In my opinion, that means that data scientist have to understand and manage sources of error...rahtwr than spending too much of their time

Bob E. Hayes
I would guess you would need subject matter experts to understand if data you have are any good (with respect to what's being measured and how it is being measured). Data have meaning - > https://twitter.com/...
Nicholas Marko
Cleaning data.

Alfred Essa
a lot of the work in initial data science exploration is messy. data is messy and ugly. it's like working in the coal mines. or as i imagine it to be.
Nicholas Marko
Also, I see model validation as being something very different than data governance. Certainly our DS teams are expected to validate their models

jameskobielus
@bobehayes Right. But I'm interpreting "data scientist" to include at least three categories of complementary skills/roles: statistical modeler, data prep engineer, and data-driven domain expert.

Alfred Essa
@bobehayes good point. DS don't work in isolation. as we come with possible product insights or features. we show it to customers and business to make sure we are on the right track.

Alfred Essa
I am a big fan of Michael Schrage at MIT Sloan on rapid prototyping and iteration.
Nicholas Marko
Yes. Real world data is messy. In my mind, the solution is for data scientists to learn how to love messy data, not for them to spend all their time cleaning it.

Alfred Essa
@jameskobielus correct.

Alfred Essa
I agree completely with Nicholas

Bob E. Hayes
@jameskobielus That's a good definition of the three types of #datascientists; one supported by research.
Nicholas Marko
Another good point - terminology is key. I use the term "data scientist" to refer to a narrow group of very highly skilled modelers, computationalists, and mathematicians
Nicholas Marko
This is very different than data engineers, data managers, etc. These are all different skill sets, and I try to put different people in these roles
Nicholas Marko
We had a good discussion about that issue at last year's IBM cdo meeting.
Nicholas Marko
Not a shameless plug :) ...actually a very important issue that I think merits more consideration

jameskobielus
@malpaso Here's a blog that Michael Schrage published in IBM Big Data & Analytics Hub last year: http://www.ibmbigdat...