ibmCDO

CDO's, Data Science & Results
How are organizations, CDO's, CAO's are approaching data science? Join us and share your ideas!
   8 years ago
#IBMCDOChief Data Officer ChatHow are organizations, CDO's, CAO's are approaching data science? Join us and share your ideas!
IBM Analytics
Q2 - How much governance and process is required from data science teams?
Cortnie Abercrombie
I think we've gotten right into that
Cortnie Abercrombie
Many data science teams are frustrated right now about how long it takes to "govern" external data
Alfred Essa
Minimal. We want the DS team to focus on analysis and research.
jameskobielus
And how are CDO/CAO teams managing data sicence model governance in addition to data governance?
Cortnie Abercrombie
they want to bring it in right away and start exploring - they don't need heavy handed data governance to start innovating
Alfred Essa
I do have members of DS team that have experience with IT infrastructure and process. They interface with IT for us.
Alfred Essa
@cortnie_cdo correct. we try to shield them from that.
Bob E. Hayes
If you got a team of highly skilled #datascientists who already work as a team, they might govern themselves pretty well with minimal oversight (full disclosure: I don't govern any teams).
Alfred Essa
we follow a three stage process.
Nicholas Marko
I agree. Governance certainly has a place, but the DS team should not be the ones doing this
Alfred Essa
Stage 1: Data Science Research
Alfred Essa
Stage 2: Product Validation
Alfred Essa
Stage 3: Product Development
Bob E. Hayes
Team members with complementary skills are more effective than any individual #datascientist alone > https://twitter.com/...
Alfred Essa
During stage 1 we try to keep all types of governance as light as possible.
jameskobielus
Nicholas: Why not? Why shouldn't data science teams be doing governance of the models they're building and deploying into apps?
Nicholas Marko
Well, let me clarify...
Shesh Ramachandran
@malpaso But aren't DS teams owners of the process and know the data inside out
Alfred Essa
there is governance under our chief product officers principally but it's very light; during this state we also to make minimal demands on other teams, including IT
Nicholas Marko
I think the DS teams should essentially be consumers of data that is already well governed as much as possible.
Cortnie Abercrombie
@malpaso how do you ensure enough governance for your models?
jameskobielus
@malpaso How does data science research tie into governance? How does product validation? Product development?
Nicholas Marko
So, for us, governance happens at the enterprise level. We try to govern the data that lives in the stack and then have the data scientists use that data
Shesh Ramachandran
@malpaso for instance if there are issues with data quality or something wrong with the data model, wouldn't it eventually have to be handled by DS teams
Alfred Essa
@cortnie_cdo this is where our stages comes in. once the DS team comes up with a candidate product idea it moves to product for validation. more governance at that stage. and then stage three prod deve even more governance.
Nicholas Marko
As an aside, however, it is worth noting that it is impossible to make all of an organization's data really "clean.". In my opinion, that means that data scientist have to understand and manage sources of error...rahtwr than spending too much of their time
Bob E. Hayes
I would guess you would need subject matter experts to understand if data you have are any good (with respect to what's being measured and how it is being measured). Data have meaning - > https://twitter.com/...
Alfred Essa
a lot of the work in initial data science exploration is messy. data is messy and ugly. it's like working in the coal mines. or as i imagine it to be.
Nicholas Marko
Also, I see model validation as being something very different than data governance. Certainly our DS teams are expected to validate their models
jameskobielus
@bobehayes Right. But I'm interpreting "data scientist" to include at least three categories of complementary skills/roles: statistical modeler, data prep engineer, and data-driven domain expert.
Alfred Essa
@bobehayes good point. DS don't work in isolation. as we come with possible product insights or features. we show it to customers and business to make sure we are on the right track.
Alfred Essa
I am a big fan of Michael Schrage at MIT Sloan on rapid prototyping and iteration.
Nicholas Marko
Yes. Real world data is messy. In my mind, the solution is for data scientists to learn how to love messy data, not for them to spend all their time cleaning it.
Alfred Essa
I agree completely with Nicholas
Bob E. Hayes
@jameskobielus That's a good definition of the three types of #datascientists; one supported by research.
Nicholas Marko
Another good point - terminology is key. I use the term "data scientist" to refer to a narrow group of very highly skilled modelers, computationalists, and mathematicians
Nicholas Marko
This is very different than data engineers, data managers, etc. These are all different skill sets, and I try to put different people in these roles
Nicholas Marko
We had a good discussion about that issue at last year's IBM cdo meeting.
Nicholas Marko
Not a shameless plug :) ...actually a very important issue that I think merits more consideration
jameskobielus
@malpaso Here's a blog that Michael Schrage published in IBM Big Data & Analytics Hub last year: http://www.ibmbigdat...
IBM Analytics
Q4 - How do CDO's or CAO's set up their data science teams, organizationally speaking?
Alfred Essa
the data science team at MHE is central unit but we work closely with the BUs; depending on the project the DS could be embedded for a period of time.
Cortnie Abercrombie
I've seen CDO's and CAOs do a hybrid approach. Keeping some in the Center of Excellence with them and some out close to where the business processes are actually taking place
jameskobielus
The data science teams may already be in existence. In those cases, the CDO/CAO's initiative may involve ensuring that they all converge around common role definitions, workflows, tools, practices, etc.
Nicholas Marko
Same here. My DS team is a mix of about 8 people who are mathematicians and programmers. We go wherever the particularly difficult or particularly valuable problems live
Cortnie Abercrombie
I tend to think that you almost need to pair up directly with the business on an ongoing fashion so that you can gain new ideas on how to innovate in their areas
jameskobielus
The CDO/CAO may bootstrap the enterprise's DS practice by outsourcing it initially to strategic partners, with an eye toward bringing it in-house as DS proves itself out in terms of successful projects that drive ROI.
Cortnie Abercrombie
let the business also help you drive the new initiatives in other words... and not just from the top down...but from worker bees up
Alfred Essa
I should mention that there are also individuals in the organization that have "data science" knowledge but are not formally classified as data scientists. we make them part of our family.
Nicholas Marko
But, we have lots of analysts and statisiticians imbedded in the business units. Again, to me "analyst" and "statistican" are very different from "data scientist"
Cortnie Abercrombie
@malpaso I like that you used the word "family"
Alfred Essa
as a culture what we are trying to develop is the idea that "data science" is not a specialization or a job function. everyone in the organization has to start thinking and acting as a data scientist. it's also cool and a lot of fun.
Cortnie Abercrombie
it takes a village to raise this data
Bob E. Hayes
To me, statistics is a key part of being a #datascientist... especially if you're a research data scientists.
Nicholas Marko
To be clear - different people for different jobs. You may only need 1 real data scientist for every 10 or 15 analysts or statisticians. Because only 10% of the problems we face are the "hardest 10% of problems!"
Alfred Essa
i also think of DS team as navy seals. small team. difficult, unsolved problems.
Nicholas Marko
This is what is interesting about the term "data scientist"... It takes on different meanings in different organizations!
Alfred Essa
agree with Nicholas. the best DS love the hardest problems.
Bob E. Hayes
Yes, Nicholas. We need to be clear about what we mean by datascientists. When somebody tells me they are a data scientist, I follow up with the questions, "What kind? What are your skills?"
Nicholas Marko
Yes, Alfred. That is how I think if them too
Nicholas Marko
Agree with Bob. The thing that sometimes gets me is that we already have terms for statistician, analyst, etc. So why not just use those terms for people who do those jobs, right?
Nicholas Marko
To me, the term "data scientist" is like SWAT officer
Nicholas Marko
Different than police officer or detective. Not "better" or "more important," just different
Seth Dobrin
Data Science is a team sport. If someone can solve a problem on their own its probably not worth solving