Short of turning off all of the computers at your research facility (and maybe not even then), there is no way to keep your MRI research data 100% secure, according to Thomas Close, PhD, who discussed practical issues for protecting your MR data at a session May 16 at the 2021 International Society for Magnetic Resonance in Medicine (ISMRM) virtual meeting.
Close, a fellow at the University of Sydney's National Imaging Facility, provided an overview of the issues that need to be considered over the lifecycle of MRI research projects, touching on a wide range of security topics, from encryption of DICOM data transfer and defacing of structural MRI scans, to disclosure risks in published outputs.
Primarily, he proposed the use of the "Five Safes" method for ensuring privacy in MRI research: safe project, safe people, safe setting, safe data, and safe output.
"We look to apply multiple layers of privacy control that minimize the identifiable information and maximize the security of the data," he said.
Close asked -- and answered -- a number of questions related to MRI data safety.
Q: Is the use of the data in the research project design appropriate? Are the risks involved justifiable?
The safety of a research project is usually assured by the human research ethics committee (HREC) at your institution and involves a risk analysis that weighs the probability of a subject being identified from data versus the impact of a privacy breach.
When the potential impact of the privacy breach is low, but the chance for identification of a sample of names in the metadata associated with the MR images is probable, a high level of security will need to be applied, such as the ISO 27001 standard, according to Close.
On the other hand, if the impact of a possible breach is low and it's unlikely that a person can be identified, it becomes more likely that the dataset can be made public. HREC risk analyses may put studies in an intermediate zone, where it might be suitable to store data on a standard research infrastructure with moderate levels of security.
"Ultimately, the principal investigator [PI] is responsible for the privacy of the data. However, institutions are expected to provide the means to make this feasible for the PI," Close said.
Q: Can the user be trusted to use the data in an appropriate manner? Are the people who have access to the data authorized?
"This basically means checking out that the user is who they say they are," Close said.
For people accessing data, institutional accounts, such as those at universities or hospital networks, are preferred because they are actually linked to the person's official identity, as opposed to email or Facebook accounts, for instance. Also, they provide support for multifactor authentication within the institution and dedicated help desks for password retrievals, for instance.
Close recommends using purpose-built data sharing platforms for imaging data, such as XNAT, LORIS, COINS, or the commercial provider Flywheel.io. These platforms are effective because they store data in one central place, which reduces the proliferation of insecure copies of images or other data if someone involved in the study takes a copy and puts it on their home computer, he said.
Q: Does the facility limit unauthorized use or mistakes? Has appropriate and sufficient protection been applied to the data?
For those who aren't information technology security experts, Close said his overriding advice when it comes to securing your data is to leave it to the experts.
"Doing it yourself is challenging to say the least," he said.
Hackers with access to the physical storage infrastructure or your device can easily gain unauthorized access to the data stored on it unless it is encrypted. If you are using a cloud provider or open stack system, make sure you set your firewall rules up properly. By all means, use encryption on data at rest, but don't rely on it because it is really only protecting a physical layer, Close said.
An important aspect to consider when transferring DICOM data between sites across the internet is that many instruments do not support encrypted transfer. You want to use a virtual private network (VPN) tunnel for data transfer between sites to ensure that the data is encrypted, he said.
Alternatively, you can use the Management Information Report Control (MIRC) clinical trials processor (CTP), which uses HTTPS to transfer data. As the name suggests, Close said, CTPs are approved for clinical trials.
"That's an added bonus," he said.
Q: Does the data itself contain sufficient information to allow confidentiality to be breached? Is there a disclosure risk in the data itself?
The first step in reducing disclosure risk is to redact as much sensitive metadata as you can. DICOM data typically contains quite a bit of protected health information (PHI). Unfortunately, the DICOM standard does not include a comprehensive list of the different fields used to store PHI.
It is safer to employ a policy of explicit inclusion of fields required for your analysis rather than explicitly excluding fields in case you miss any, Close said. A good option for neuroimaging data is the Brain Imaging Data Structure (BIDS), which is based on a minimum set of required data.
In addition, there are a number of ways to edit DICOM metadata to remove PHI, Close said. Consider DICOM editing tools such as DcmTK and Pydicom, for instance, which are fully automated and designed to help you avoid human error.
When it comes to facial data within MR images, according to a 2009 HIPAA ruling, high-resolution structural MRI datasets are comparable to full-face photographs. Volume rendering software is freely available.
The safest method is to skull-strip, Cole said. Facial blurring can be reversed. Defacing methods and tools include afni_refacer, mridefacer, pydeface, or quickshear.
Defacing is required for public datasets, although the original should probably be kept in secure storage as reference. AFNI Refacer may be suitable for automatic processing, according to Cole.
Moreover, defacing has minimal impact on subsequent preprocessing, he said.
Q: Will the outputs of the data lead to a disclosure?
Close warned that one way research outputs can lead to disclosure is called a linkage attack, where multiple datasets are combined to narrow down the identity of individuals in the study.
For example, you may have data on the time and location of a scan and the hospital where it was performed in the scan's metadata, which an attacker can link with other geolocation data to eventually identify the person in the scan. The risk for these predatory attacks increases with the amount of metadata included in the dataset, Close said.
Ultimately, make sure you are using well-tested infrastructure and platforms where possible, and "strip" your data before publishing, he advised.
"The configuration you use will be dependent on the subject consent used in the study and the type of project you are working on," he concluded.