Data Management and Analysis Core (DMAC) Miller School of Medicine

DMAC will support the research projects at the Center by providing comprehensive support for data management and analysis, including strategy building for metadata, geo-tags, data formatting, and deidentification.

Aim 1: Coordination with Projects and Other Cores

While DMAC understands the work the project and core teams will conduct, it may not be in the position to decide the exact methodologies for their work and the nature of the data. There will also be a time gap of approximately 1.5 years between the time of grant submission and the start of the center. Therefore, it is important to refresh DMAC’s understanding of the data management and analysis needs for the center. To accomplish this, DMAC will conduct initial meetings with the project and core leaders.

In these meetings, DMAC will figure out the following:

What techniques and protocols will be employed to generate the data.
What data formats will they take. The information gathered includes the attributes, the ranges of the
attributes, and the error margin. The information also includes the format of the datasets (.csv, .xlsx, etc.).
What schedule will be used to generate the data. This includes how many collections will be made over
the course of the project and each collection will contain how much data.
Who will handle the data generation process, and who will be responsible for submitting data to DMAC.
Where the data will be stored initially.
What postprocessing, if any, will be used.
Which part of the collected data and post-processed data will be shared/published, and in what format?

Once the project-wise and core-wise consultation is complete, DMAC will explore ways to pool the disparately generated data for integrative data analysis.

Aim 2: Fostering Data Sharing and Interoperability

DMAC will provide service for integrating data both within and across projects. DMAC will also provide service for sharing data with other Superfund Centers and NIEHS. For supporting the data sharing and interoperability mission, DMAC will follow NIH’s FAIR principles to create human and machine-readable metadata [1]. DMAC will leverage its experience and the existing resources such as CEDAR Workbench [2]. The metadata and annotation that DMAC provides for the data collected and received will be indexed and can be found with keyword searches. This enables the discovery of data by the center members as well as external users including NIH. In addition, DMAC’s data analytics members will provide support for research team members to analyze the data generated. The University of Miami has institutional subscriptions to many statistical and computational software tools including SAS, SPSS, ArcGIS, and MATLAB. The high-performance computing facilities at the Frost Institute of Data Science and Computing (IDSC) will be leveraged. The preliminary analysis within the capacity of these readily available tools will be provided by DMAC.

The existing software environment of IDSC can compile and run codes written in Python, R, Julia, C/C++, and Java, etc. DMAC will leverage IDSC’s and UM Library’s training programs and coordinate with the Training Core to support training of graduate students and postdoctoral researchers. If the analysis requires approaches that are beyond the capabilities of these software tools, DMAC will develop such tools. IDSC has a host of in-house analysis tool will re-purpose such tools if possible. Specifically, DMAC will:

Develop strategies for writing data ontologies and metadata.
Establish plans for de-identification.
Establish a means to receive data from the project and core teams into the Center’s internal repository.
Adopt the above means to receive data and verify formatting.
Control access privileges to the received data.
Provide data format conversion tools to be used for the internal data.
Provide version control of the data and programming scripts.
Collect external data to be used in the analysis.
Assist in data preparation for publication and upload to public repositories.

Aim 3: Data Quality Assurance and Quality Control

DMAC establishes, and periodically reviews with team leaders, a policy for data quality assurance and quality control. DMAC will share the policy with the project and core leaders for review. The policy specifies how the teams can work with DMAC to determine the data management and analysis process. The specifics of the policy include:

How DMAC receives and ingests data from the teams.
How the teams can review the data after DMAC’s pre-processing, including the method for verification.
How the teams specify access privilege to members of the center.
How the teams set up the schedule for publishing data, including version control.
How the data should be curated.
How the data should be divided into data components for independent access.