Victoria’s chief Information Commissioner has called on public agencies to strengthen their privacy protection provisions following an investigation which found the state’s public transport authority had likely breached the Privacy and Data Protection (PDP) Act 2014.
The investigation by the Office of the Victorian Information Commissioner (OVIC) found that Public Transport Victoria (PTV) had released de-identified data that exposed myki users’ travel histories – data that was later found by researchers be re-identifiable, according to a report commissioned by the OVIC.
Myki is a reloadable contactless smart card ticketing system used across Melbourne’s public transportation system. According to 2018 figures, more than 15 million active myki cards are in circulation.
In the wake of the findings, OVIC has issued Victoria’s Department of Transport with a compliance notice demanding it strengthen policies and procedures, data governance, training and reporting around its data holdings – extending these recommendations to all Victorian public agencies.
The transgression occurred during the Melbourne Datathon event in July last year – an annual ‘hackathon’ pitched at Victoria’s data science community – with the PTV said to have granted the release of a dataset containing de-identified information from 15.1 million myki cards, including 1.8 billion myki ‘tap on’ and ‘tap off’ events between July 2015 and June 2018.
The dataset was disclosed to Data Science Melbourne in response to a request from the Department of Premier and Cabinet (DPC), which administers the Victorian Government’s open data platform.
While information commissioner Sven Bluemmel acknowledged that the granting of the release of the data was “well-intentioned”, he stressed that the breach had exposed “failures in governance and risk management [that had] undermined the protection of privacy”.
“Your public transport history can contain a wealth of information about your private life. It reveals your patterns of movement or behaviour – where you go and who you associate with,” Bluemmel said. “This is information that I believe Victorians expect to be well-protected.”
In the report, Bluemmel cautioned agencies against assuming de-identified data is anonymised and not traceable to an individual.
“Where a data set contains unit-level data about individuals, especially where it contains longitudinal unit-level data about behaviour, more recent research indicates such material may not be suitable for open release, even where extensive attempts have been made to de-identify it,” he said.
The suspected breach was reported by a participant at the Melbourne Datathon, who raised concerns over the potential for the dataset to be re-identifiable.
Separately, academics working at the University of Melbourne had located the myki dataset online and had been able to identify themselves, as well as persons known to them, according to the OVIC report. The academics notified OVIC of their findings and raised concerns regarding the release of the dataset, including the potential for numerous re-identification attacks on the dataset.
In a post-report article published on the University of Melbourne site, the three enquiring academics – Dr Chris Culnane, Associate Professor Benjamin I. P. Rubinstein, and Associate Professor Vanessa Teague, University of Melbourne – said they found it “incredibly easy” to identify themselves through the supposedly anonymised dataset.
Further, the team said they had not only “re-identified ourselves” but also “a co-traveller and a member of the Victorian Parliament”.
“What’s concerning is that our analysis shows that most people in the released dataset are identifiable from just a handful of touch on or touch off events,” they said.
“Information like this can then lead to further revelations like home and work locations; it can reveal regular patterns of travel; it can also tell us who card holders travel with, like family members or ex-partners, or if they travel alone – like unaccompanied children returning from school often do.”
CSIRO’s Data61 found personal information could also be obtained from the PTV dataset “without expert skills or resources”.
“Our research found that when two myki card scans are known by time and stop location, more than three in five of those pairs of scans are unique and therefore more likely to be personally identifiable,” said Dr Paul Tyler, Data Privacy Team Leader at CSIRO’s Data61. “So-called ’de-identified’ data can still carry re-identification risk especially in linked transactional data”
The OVIC investigation concluded that the PTV had failed to address the possibility that individuals in the dataset could be re-identified by combining information in the dataset with information from other sources such as social media.
Curiously, the Department of Transport said it does not accept the Commissioner’s finding that the release of the myki dataset breached myki users’ privacy. However, the Department said it is committed to implementing the actions set out in the compliance notice.