Has govt’s demand for data scientists been overblown?

Data Panel NSW

While governments have an obligation – and, indeed, interest – to share data freely (with citizens, fellow agencies, as well as external industry partners), and moreover to be transparent with the public about how it is used, there is increasing concern about the risks of getting it wrongof oversharing sensitive data, of unintentionally misusing data, or perhaps in simply misreading it, and thus misrepresenting constituents.

We’ve taken a snapshot of The Future of Data in NSW Public Sector featured at the FST Government NSW 2022 conference. Expert panellists from across NSW Government weighed in on the challenges and opportunities arising from the proliferation and sharing of government data assets, on the legislative guardrails protecting agencies and their staff from unintended data mishaps, and why the demand for data scientists in government is heavily outweighed by the need for more on-the-ground data skillsets.

Featured panellists (as pictured from left to right):

  • Elizabeth Tydd, Information and Privacy Commissioner, NSW Information and Privacy Commission
  • Simon Herbert, Chief Data Officer, Department of Customer Service
  • Kate Carruthers, Chief Data & Insights Officer, University of New South Wales
  • Adam Oaten, Enterprise Account Manager Public Sector NSW, Splunk

Moderated by Katarina Ruszczyk, Founder & Director of Beam Ideation, and Academy at the NSW Department of Customer Service.


Ruszczyk (MOD): Data opens up tremendous opportunities to positively transform government. What do you think are the key opportunities to leverage data and insights within the New South Wales public sector?

Herbert (Dept Customer Service): Data, I think, can fundamentally shift the NSW Government to being a more customer-centric organisation. It can improve collaboration between our very siloed clusters, and we can use data to improve the services that we deliver to our citizens and to the state.

Oaten (Splunk): Data can be a blessing and a curse for a lot of places; trying to understand how to manage it effectively is a challenge, but also presents an opportunity.

We find organisations and agencies in the best position to take advantage of the data they’re generating today are those that understand where it is, who owns, and who manages it.

 

When it’s fragmented and siloed and teams work in their bespoke houses, you have to overcome different challenges to realise the true potential of the incredible amounts of data being generated today, and data that’s still to come.

 

Ruszczyk (MOD): Looking to data governance. Is governance, in your opinion, still imperative? How do we further empower the public sector’s data capabilities through governance?

Tydd (NSW Info Commissioner): Governance remains important, and probably even more important because of that wider landscape of government doing business externally.

A very small example, in the contracts space, is a social housing tenant wanting to challenge the rental subsidy. This rental subsidy is calculated using an algorithm developed by an outsourced third-party provider. The Government agency, the then Department of Family and Community Services and now Department of Justice and Communities, wanted to know how the algorithm worked, because it was the one providing the subsidy. The social housing landlord, who was a third-party provider, wanted to know how the algorithm worked, because they wanted to know their liability. The tenant wanted to know how it was calculated, because she’d been earning [income] in between. However, the provider of the algorithm under contract relied successfully on their commercial in-confidence provisions, and that meant no one knew how that was being calculated. As we can see, there’s a risk there for everyone in that equation.

So, what are the governance measures that you could put in place to better manage those risks? There are different types of provisions under a contract:

  • Firstly, capture them under the GIPA Act.
  • Secondly, make sure that the contracting department retains access and a right of access to the test suite of data to the government information that is shared [and the] audit trails – has anyone, for instance, been in there playing with that algorithm, opening that ‘black box’?
  • Thirdly, think about that notion of, if there’s harm recorded – and this is really an issue that’s apparent in North America right now – by way of litigation or complaints etcetera, notify the department responsible because they’re the ones assuming the risk. And that works both ways, for both the provider and the government agency.

So, there are some fundamental new governance issues to look at, along with a lot of other existing governance requirements.

 

Ruszczyk (MOD): This ties into a comment from the audience: ‘We’re shifting our risk appetite to strong, proactive publication, and accepting that means that we’ll sometimes get it wrong and publish something we shouldn’t. Our staff are nervous these mistakes will erode public trust.’ How would you respond?

Tydd (NSW Info Commissioner): I’ll first respond from a legislative perspective and then address the people issue.

Under the GIPA Act, if you make a mistake, but you’re acting in good faith, there’s no offence.

 

There’s a recognition that people sign up to do the right thing and that we have the checks and balances in place. As such, managing the people has to be around, ‘Well, what’s your intent?’ Your intent is to proactively disclose. And what does that do? It builds trust; it gives people information that they might need. So, how does bad faith sit against being proactive and being positive in terms of open government?

Most government sectors are acutely aware of this priority of ensuring that we have an open government. And some recent outcomes of elections might tell you that that’s very much at the forefront of the public’s mind.

Carruthers (Uni NSW): When we conceptualise data governance, we tend to think of it in terms of how you manage your data and how you govern it. However, if you start to think about your data supply chains, and about contracting as the first part of the data supply chain and then needing to manage it – because data governance, at its essence, is a risk management function – how do you know what data you’ve got? How do you know the data that’s important and needs to be protected? And, how do you protect it? Ultimately, that’s what you’re trying to do.

If you understand your supply chain from the time you write your contract right through when you dispose of your data, that’s going to be a better way to conceptualise it. It’s not just the bit when you’ve already got it. It’s end-to-end, and it’s about working out where the end-to-end is.

Herbert (Dept Customer Service): During Covid, just to give you an example, we published a lot of the data through the Data.NSW hub, enabling many citizen data scientists to pull that data out and perform their own analyses. We were, in fact, publishing at a record level. Now, as we all know, you can start to identify people if you publish at such record levels – even if you’ve taken out all of those typical personal information identifiers. One thing that we did, with Dr Ian Opperman, was run something called a Personal Information Factor – a ‘PIF’.

What that technology did was mathematically work out the chance of a re-identification occurring. This was then cleared through the Information and Privacy Commissioner, which enabled us to publish the data.

 

That level of transparency, though, absolutely created a significant level of trust with our citizens about how we were managing Covid [and Covid-related data].

 

Katarina Ruszczyk: Moving on to the problem of skillsets in the data space. How do we equip and support wider technology and business groups with the right data skillsets?

Herbert (Dept Customer Service): Unfortunately, there are a few roles that people have latched onto: ‘data scientist’ is the classic one. But in actual fact, there are some unsung heroes in the data world. The first is the data steward, who makes sure that data governance occurs. You can automate a certain amount, but you still need a human at a point to look at a piece of data and go, ‘No, that’s not right. We’re not letting that out!’ The other key role is the data engineer, the ‘plumber’ as I like to call them because the data is normally sitting in a silo, a transactional system, and we need to be able to get it out and into the analytics capability to understand what’s going on.

Whilst everybody says, ‘We need more data scientists!’, actually, we probably need more data engineers and data stewards.

 

Carruthers (Uni NSW): My ratio of data engineers to data science is about five to one: five data engineers for every data scientist you’ve got if you’re operating at scale. That’s my personal experience. But there is this crazy notion that every normal person out there trying to do their day job wants to become a data scientist. They don’t want that!

What they want is the insights and information they need to do their job to be accessible to them. That’s what we should be building as organisations.

A person who’s at the front line trying to service customers shouldn’t be having to work out how to use R [programming language]. That’s preposterous!

 

Ruszczyk (MOD): And I suppose that ties back into the governance question as well. While there might be that ‘bench strength’ of data knowledge within an organisation, there’s also lacking that foundational, enterprise-wide skillset of understanding that everyone must work with data to some degree. What are your thoughts on this and what are you seeing within your organisation?

Oaten (Splunk): I’d describe it as being almost part of the DNA of the organisation; it’s really a data mindset.

We’re all currently going through significant digitisation of many, if not all, services. And the more you do it, the more people are exposed to it, and the more people embrace it, the more the DNA of the place starts to change. This gives you a real lifecycle, or ‘cradle-to-grave’, consciousness around the data.

But skillset is critical. Coming from industry, a lot of places I see are experiencing change and churn within that space; it’s a challenge for us, because we’re trying to help a lot of government agencies, and a lot of services that are being delivered for them.

And [government agencies] have to pay a lot of money, because they haven’t necessarily got their skills and resources in-house.

 

And, to augment that, unfortunately by paying market rates, becomes very expensive.

We’re currently looking to have graduate programs – not just from a vendor’s perspective, but also from a customer’s perspective. The idea, longer-term, is that those people eventually get placed within the public sector, not only bringing in new people who are given new opportunities, but also building a digital foundation within government agencies themselves. That then becomes the nursery and the workforce of the future.

 

Ruszczyk (MOD): Simon [Herbert], from a DAT perspective, there are significant limitations on agencies sharing customer data when acting in the interest of the customers to build processes or services.

How can the Information Commissioner support agencies in obtaining this information more easily? Also, how do we establish common, shared data across NSW Government?

Herbert (Dept Customer Service): There are two ways we can share data. The first is to share de-identified, aggregated data – and we can do so without [contravening] any privacy provisions. The second way, which is what I understand this question is specifically addressing, is in us sharing an individual’s data between two departments.

We used an example earlier around somebody in the fire service coming to a house and immediately being able to pull up relevant information from all across government related to that property or service. But to do so, we’d need to go through a public interest declaration – what we call the PID or a code of conduct. Both of these processes are very specific. They talk about ensuring that you’ve gained consent to do this and there must be a very specific purpose – and, often, we don’t actually know that purpose.

There are two pieces of legislation: the Privacy Act and the Data Sharing Act. And between these two pieces of legislation, we have got to work a way through to being able to support government made easy, that ‘tell me once’ capability, and being able to connect those things up. It’s a work in progress.

Tydd (NSW Info Commissioner): Thank you, Simon. And in the interest of having a controversial panel, to correct Simon, there are in fact three pieces of relevant legislation, including the GIPA or the Government Information (Public Access) Act.

That Act provides you with a lot of power in other ways. It provides that you must, as a government agency, set out the information that you hold, how you make it publicly available, and how you don’t make it publicly available. It also requires that you set out how decisions are made and how you will engage with the public. So, for example, for Simon’s purposes, think of the asset of data that’s held by every department. Consider how they’re mandated to publish that they’re holding that information; think of the power that gives the citizen to ask for that information and, therefore, to set up a situation where that information might become publicly available – and I’m not just talking about personal information here. But then, importantly, as government increasingly relies upon data – as it should – to make good, informed decisions and to better inform policy and service delivery, think about how our agencies are empowered under the GIPA Act to tell citizens that: ‘Actually, we run this data through this type of approach, and we apply these sorts of intelligence factors, we have these number of scientists and this number of engineers, and we’re able to come up with these datasets.’

Once you start explaining how your decisions are made, that really empowers citizens and it circulates that body of information, that body of knowledge, and makes for a more open government.

 

Ruszczyk (MOD): Finally, as we draw this session to a close, what would you offer as your key takeaway message? Perhaps I could prompt this by looking specifically at the critical lenses of real-time accessibility and innovation, and how we apply these lenses safely to customer data.

Carruthers (Uni NSW): One of the things that became obvious to me in fairly recent times at the University is, if you want to do some research, you have to go through an ethics approval process and have all your research in compliance with it. But for the simple running of the place as a business and the teaching of students, we don’t do any of that. As a result, there are people using AI and ML in the research space who’ve got permission and that have gone through committees, and then there are people in the teaching space who are thinking they’ll just ‘Do some ML on my students!’, all unsupervised.

We’re now conscious of putting ethical lenses over what we do as a business, and we’re having conversations about how we can do that most efficiently – because everybody’s thinking, ‘Oh, that’ll just slow us down!’, but it is necessary. Indeed, that’s a really important consideration: How do you put an ethical lens over everything that you’re doing with your data?

Oaten (Splunk): The fact that we’re having this type of conversation, and the fact that we’re trying to figure out how to make this work effectively is a positive, because it’s helping us move forward.

Yes, there are always a lot of associated risks, and you’re not going to get it perfect from the start, because that’s just not possible. But, by actually testing the waters, taking some risks, going off and doing it – and maybe sometimes making some mistakes and having some information that shouldn’t be made available getting in the wrong hands – that’s the risk we take in progressing. That’s where I always find the positives, and I see NSW as a genuine leading state in that regard. The acceptance of us doing more of that going forward seems to be greater within the community.

Tydd (NSW Info Commissioner): In the context of partnerships and cross-agency work, there are three key questions: Who holds the information? In what form is it held? And how can access be provided? They take you to contract. They take you to the citizen. And they take you to your responsibilities.

Herbert (Dept Customer Service): I’m going to cover something completely different. And that’s a program we’re launching called ‘Data Ship’. The idea behind this is that, in order to create a data culture in NSW, we actually need the executive to become ‘data fluent’ – I don’t like the term ‘data literate’.

So far, we’ve done some interviews with various executives to understand where they’re at. Our research so far shows us two things. First, they lack the data basics. All executives have been telling us, ‘I have not been given the data basics! I’ve haven’t been told this is a scatterplot diagram or this is how a correlation graph works.’ The second piece is around what I call ‘data habits’. The first thing when something’s presented to you as a leader should be: Where’s the evidence? Where’s the data to support this? What’s the data telling us? We got a lot better at this during Covid; we all started asking the data.

I believe that the data transformation that will occur in NSW Government will be people-led.

 

If we improve the data culture and we all have our own personal data habits, we will in fact use data far more effectively.


This is an edited extract from the Data Panel Discussion | The Future of Data in NSW Public Sector featured at the FST Government NSW 2022 event.