Toolkit for Communities
Using Health Data
How to collect, use, protect, and share data responsibly
National Committee on Vital Statistics
May 2015
U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES
Centers for Disease Control and Prevention
National Center for Health Statistics
Table of Contents
Introduction ...................................................................................................................................7
Data Lifecycle ..............................................................................................................................11
Data Stewardship ........................................................................................................................ 15
Accountability ..............................................................................................................................1 6
Openness, Transparency, and Choice ..........................................................................................18
Community and Individual Engagement
and Participation .........................................................................................................................24
Purpose Specification .................................................................................................................29
Quality and Integrity ................................................................................................................... 33
Security .......................................................................................................................................3 6
De-identification ......................................................................................................................... 39
Appendix A: Definitions ............................................................................................................... 47
Appendix B: Federal and State Laws ........................................................................................... 49
Appendix C: Case Studies ............................................................................................................. 57
Appendix D: Worksheet and Checklists .......................................................................................65
7
Introducon
Introducon
The Naonal Commiee on Vital and Health Stascs (NCVHS)
is the U.S. Department of Health and Human Services’ (HHS)
statutory public advisory body on health data, stascs, and
naonal health informaon policy. NCVHS has historically made
recommendaons regarding stewardship of health informaon
collecon, use, and disclosure.
In recent years, NCVHS hearings and roundtable discussions
about how communies are using data to improve health at
the individual, subgroup, and community levels have shown
the need for guidance on the meaning and applicaon of
data stewardship. These eorts have focused on the needs of
community-level organizaons. NCVHS created the Community
Data User Toolkit to be a substanve introducon to the
elements of data stewardship for communies that want to use
data.
In this document, a community is dened broadly as a formal
or informal group with a shared interest, which could be
dened by a shared characterisc such as geography, race
or ethnicity, a shared medical diagnosis, or a combinaon
of characteriscs. For example, a community could be a
neighborhood in a city, an online community of individuals
aected by cancer, or a racial subgroup within a city.
This document also uses the term data broadly. Communies
may use many dierent types and sources of data to promote
the health of the community, subgroups, or individuals. Some
data will be related to health condions, but other data could
relate to environmental factors, such as locaons of grocery
stores or access to safe walking routes. Data related to health
condions could come to the community as aggregated data
collected for other purposes, such as disease surveillance.
Other health data could be abstracted from paent medical
records, or collected by the community user through a survey
or some other mechanism.
Community groups today are using data to tackle important
health issues in ways that were not even imagined a few years
ago. In the past, access was largely limited to government-
based public health agencies or health care systems. Now
communies can access data because data availability has
exploded, parcularly data in digital formats. Federal and
state governments, local health informaon exchanges, and
other organizaons have data that could be made available to
promote community and individual health. If used eecvely,
"Community
groups today are
using data to
tackle important
health issues in
ways that were not
even imagined a
few years ago."
8
ToolKit for Communies Using Health Data
"Failure to use
good stewardship
practices could
harm individuals
or communities."
data may help improve communies’ understanding of:
Health of the community and members of the community
Health challenges facing the community
Health promoon successes within the community
Opportunies to improve the health of the community
as a whole and the health of individuals living in the
community
Many organizaons have data that may be available for
communies to use. These organizaons may also provide
tools and guidance for communies wanng to use their
data. This Toolkit includes important themes in stewardship—
proper data protecon and use—and, where relevant, refers
community data users to some of these resources.
Eecve data use requires eecve stewardship pracces.
Failure to use good stewardship pracces could harm
individuals or communies. Improper data handling or the
failure to protect individuals’ privacy or condenality could
limit parcipaon and impede the use of data.
This Toolkit was created to support communies that are
using data by promong sound stewardship pracces, while
helping them avoid the missteps and potenal harm that can
result when data users do not follow sound data stewardship
pracces. The Toolkit is not meant to provide a comprehensive
explanaon of every aspect of data stewardship, nor is it
meant to be a substute for legal counsel or experse in
data collecon, use, disclosure, or security. We hope that
communies will nd this Toolkit helpful as they connue to
use data to improve health.
Why a Toolkit and Why Now?
Technology is changing everything. Thanks to technology,
informaon is now developed, shared, and used in new ways.
Communies have opportunies to use data to improve
community health and the health of individuals living in the
community, opportunies that did not exist in the past.
Another less obvious opportunity comes from the growing
realizaon that communies are in the best posion to
idenfy the challenges they face and the strengths they enjoy.
Therefore, communies themselves may be best posioned
to nd the most eecve ways to use data to understand and
address their health needs.
By bringing technology and community-dened concerns
together, data can now be eecvely used to address
community-dened problems and to secure and protect
9
Introducon
community assets. Measurement and analysis are necessary
(not oponal) pieces of the puzzle that allow communies
to know where, and why, health is improving or declining. In
addion to addressing what is known, data have the potenal
to allow communies to discover unknown factors that maer
to them. Data also have the potenal to yield ndings that may
be surprising to, or unwelcomed by, community members.
Done right, using data builds the trust that is essenal for
nding, dening, exploring, strengthening, and improving
health at the community and individual levels.
What the Toolkit Does
The Toolkit briey introduces each important principle of data
stewardship for communies using health data.
1
It provides
both broad background informaon and ps for data users.
Descripons of stewardship principles are provided, along with
checklists for each principle.
As experienced data stewards know, and as emerging data
stewards will learn, the dierent principles described in the
Toolkit do not divide neatly into separate categories, but
rather overlap and intertwine. For example, the two principles
Openness, Transparency, and Choice and Community and
Individual Engagement and Parcipaon, are relevant across
every step in the stewardship framework and throughout the
data lifecycle. To the extent that principles are interrelated,
they are introduced in a unique secon, but are also referenced
in secons addressing other topics when relevant.
Dierent types of data trigger dierent approaches to
stewardship, with the burdens of stewardship and the
balancing of interests changing from one type of data to
another. Because of its likely sensive character, health
informaon presents important issues for data stewards. A
data steward invesgang the density of grocery stores in a
neighborhood is not likely to encounter major concerns about
privacy or condenality. But a data steward who wants to use
personally idenable health records that contain the results
of genec tesng is very likely to encounter those concerns.
The primary focus of the Toolkit is health data, which will
typically require rigorous aenon to all of the elements of
data stewardship. However, the principles in the Toolkit may
be more broadly applicable to many dierent types of data and
their uses for communies.
1 For a more detailed discussion of the NCVHS framework of stewardship prin-
ciples, see Naonal Commiee on Vital and Health Stascs, Leer to Secretary
Kathleen Sebelius, “A Stewardship Framework for the Use of Community Health
Data,” (Dec. 5, 2012) available from: hp://www.ncvhs.hhs.gov/wp-content/
uploads/2014/05/121205lt.pdf.
"Measurement
and analysis are
necessary
(not optional)
pieces of the
puzzle that allow
communities
to know where,
and why, health
is improving or
declining."
10
ToolKit for Communies Using Health Data
Appendices
Appendices are provided with supplemental informaon,
including:
Denions
Legal Consideraons
Case Studies
Checklists
11
Data Lifecycle
Data Lifecycle
Data have a lifecycle, represented in the gure below. Eecve
stewardship extends to all lifecycle phases. Examples of
communies using data across the lifecycle are provided
throughout the Toolkit.
Not all data move through all parts of the lifecycle. Some are
collected and never analyzed. Some analysis fails to produce
reportable results. Some data are never destroyed but are
stored in perpetuity.
There are also steps that communies using data to improve
health must take that are outside of the data lifecycle, such as
doing a literature review to learn about the current knowledge
on the topic and to beer frame the purpose of the inquiry.
Lifecycle of Data: From Collecon to Disposion
collect/create
merge
process
repurpose
de-identify
discard
analyze
report/share
archive
destroy
"Not all data
move through
all parts of the
lifecycle. Some
are collected and
never analyzed.
Some analysis
fails to produce
reportable results.
Some data are
never destroyed
but are stored in
perpetuity."
12
ToolKit for Communies Using Health Data
Original or Repurposed Data
Community health data can be either original or repurposed.
Original data are gathered for an inially specied purpose;
they are data that did not previously exist. For example,
original data may be collected through a survey of community
members about access to fresh fruits and vegetables in local
markets, observaon of acvies of children in a playground,
or new survey research on the incidence of a health problem in
the community.
Repurposed data are collected for one purpose and then used
for a dierent purpose. Communies may want to repurpose
data from a variety of sources.
Unl recently, the data in paent medical records were used
primarily for paent care, payment, and health care instuon
operaons. Data abstracted from paper medical records were
used for research and other purposes, but it was costly and
dicult to extract data. Uses of repurposed health data have
expanded sharply with access to digital data from electronic
health records and other informaon technology; these uses
are likely to connue to expand.
For example, an individual may complete a quesonnaire about
health status as part of a doctors visit that is then entered
into the history and physical secon of the electronic medical
record. Later, relevant responses are pulled from the electronic
health records of all paents who completed the quesonnaire
into a new data set that will be used to evaluate the prevalence
of a condion among community members. The responses to
the inial health quesonnaire collected for the purpose of
treatment are repurposed to determine disease prevalence.
Communies also oen repurpose public health data
generated by local, state, and federal government agencies. For
example, communies might invesgate changes in teen birth
rates, opiate deaths, cancer clusters, or suicide rates. In doing
so, they might use data that were collected for one purpose,
such as to determine cause of death, for another purpose, such
as to explore correlaons between social factors and suicide.
They might also combine these public health data sets with
other available data or data they collect themselves.
"Uses of
repurposed
health data have
expanded sharply
with access to
digital data from
electronic health
records and other
information
technology; these
uses are likely
to continue to
expand."
13
Data Lifecycle
Relaonship Between Technology and the Data
Lifecycle
Informaon technology has greatly changed how data are
managed at all lifecycle stages from creaon to destrucon or
archive. Technology speeds the capture of data and makes it
available for use sooner. It can help to keep a descripon of the
characteriscs of data, called metadata, including who collected
the data, when the data were collected, what permissions or
restricons are aached to the data, aws or limitaons of the
data, and other such characteriscs. Technology can also be
used to set up rules for data capture and collecon, processing,
storage, exchange, and disseminaon in ways not imagined just
a few years ago.
New technology enables users to:
Store large amounts of electronic data
Process and analyze large data sets eciently
Enrich data sets by merging data from dierent sources
Monitor trends over me to track changes
Repurpose data in ways not conceived when the data were
collected
Access data remotely
Copy or transmit data rapidly
Electronic health records are, like paper medical records,
used inially to support the delivery of paent care,
payment, provider operaons, and quality improvement,
but the electronic format makes the records more useful to
researchers, public health agencies, and communies seeking
to improve the health of individuals and communies. For
example, electronic claims data are increasingly used to track
public health issues and to allocate limited funds to areas of
greatest potenal impact.
Technological advances oer both opportunies and risks to
communies using health data.
"Technological
advances offer
both opportunities
and risks to
communities using
health data."
14
ToolKit for Communies Using Health Data
Opportunies include:
Understanding health at a more detailed or granular level,
such as geo mapping health data to show how disease
aects individuals living on a specic block within a
community
Evaluang the impact of programs on health by linking
data about who received an intervenon with data from
a communitywide health informaon exchange or from
repurposed claims data
Risks include:
Data breaches: Data security is challenging, even for large
companies and governments with substanal resources.
Data elements: They can appear to be the same but have
dierent meanings across systems, causing incorrect
interpretaon.
Repurposing: This can cause harm when it happens
without appropriately engaging and involving individuals
and communies, as shown in many of the Case Studies
described later in this Toolkit (Appendix C ).
Problemac inferences due to analysis of electronically
processed data: These may result in social sgma
and harm to the reputaons of wrongly categorized
individuals.
The Toolkit can help data users take advantage of the
opportunies that technology oers while avoiding risks.
Governmental and Nongovernmental Data Collectors
and Users
Data stewardship for nongovernmental data collectors or
users has much in common with, but is not the same as, data
stewardship for governmental data collectors or users. Sll,
both government and nongovernment data stewards must
follow the laws, regulaons, and policies designed to protect
the privacy and condenality of individuals and the integrity
and security of the data. Governmental data stewards hold
data in trust for the public; they have an armave duty to
serve the public by openly and transparently sharing data.
Nongovernmental data users and collectors do not share that
armave duty, although sharing data to serve the community
and public good is consistent with stewardship principles.
"The Toolkit can
help data users
take advantage of
the opportunities
that technology
offers while
avoiding risks."
15
Data Stewardship
Data Stewardship
Data stewardship is a responsibility, guided by principles and
pracces, to ensure the knowledgeable and appropriate use of
data. More specically, stewardship of health data recognizes
the benets to society of using personal health informaon to
improve understanding of health and health care, while at the
same me respecng individuals’ privacy and condenality.
The individual elements of data stewardship are driven by
ethical imperaves that require data users to respect the
individuals who are the subjects of health data.
Many people touch data as it moves through its life cycle, and
each person who touches the data should have an awareness
of relevant stewardship principles and pracces.
Communies are encouraged to use data to improve health,
while following responsible data use pracces so that
individuals or groups whose data are used by communies
to improve health can trust that private or condenal
informaon is being used appropriately.
Nonlinear, Overlapping Concepts
The gure showing the elements of data stewardship below
suggests that stewardship elements follow a certain order. In
reality, as noted throughout the Toolkit, elements overlap, and
the stewardship process may require data users to loop back or
jump forward as needed.
Principles of Data Stewardship
"The individual
elements of data
stewardship are
driven by ethical
imperatives that
require data users
to respect the
individuals who
are the subjects of
health data."
16
ToolKit for Communies Using Health Data
"Failure to
identify and
address concerns
regarding proper
data stewardship
may lead to
downstream
consequences,
some mild, others
quite serious."
Accountability
The rst thing a community should do when thinking about
a new data analysis project is assign responsibility for
accountability for all parts of the project. Accountability means
that an individual or enty is responsible for:
Ensuring appropriate collecon or creaon, use,
disclosure, and retenon of data through policies and
pracces, and
Establishing mechanisms to nd and respond to any failure
to follow policy and procedures.
It should be made clear who is accountable at each phase
of the data lifecycle—from project planning, through inial
collecon and use, to data destrucon, storage, or repurposing.
Dierent people or enes might be accountable for dierent
phases, but this should be made explicit. Accountability for
each aspect of data stewardship should also be clearly assigned
so data users understand who is responsible. If there is a failure
of accountability, the responsible individual or enty should
face appropriate consequences and provide remediaon to
individuals aected by the lapse.
Failure to idenfy and address concerns regarding proper data
stewardship may lead to downstream consequences, some
mild, others quite serious.
Data Use Agreements and Accountability
Data use agreements (DUAs) can help an enty enforce
the various privileges and obligaons involved in sharing
or obtaining data. In combinaon with other protecve
measures, these agreements can be useful tools for managing
accountability.
DUAs are not a guarantee that data will not be misused. With
or without statutory authority, an enty that shares data may
need to take legal steps to enforce a DUA if a data user violates
the agreement.
Consideraons in Signing DUAs
A DUA is a contract—a legal document with legal implicaons.
It should not be taken lightly. If a data user is asked to sign
a DUA, the user should consider the accountability checklist
items outlined in Appendix D. An organizaon that is asked to
sign a DUA should understand what the DUA requires of it and
should be condent that it can meet those requirements. If an
organizaon has quesons or concerns about the document, it
may be useful to consult legal counsel.
17
Accountability
Summary
Accountability may lie with an individual or enty.
Dierent people may be accountable for dierent phases
of the data lifecycle or dierent stewardship elements.
An accountable individual or enty should be named and
held responsible for stewardship.
DUAs are one way to establish accountability ground rules
among data users.
Accountability
ombudsman
Vanderbilt University, a
member of the Electronic
Medical Records and
Genomics (eMERGE) Network,
idened accountable
individuals or groups for each
stage in the data lifecycle,
but found that this was not
enough. Communies that
Vanderbilt worked with
needed one person who
could be their accountability
contact. The eMERGE network
at Vanderbilt appointed an
individual to explain the
organizaon’s accountability
policies and procedures to
people in the community and
who could ensure that their
concerns would reach the
accountable person. Members
of the eMERGE Network
describe this approach as “a
lifesaver.
18
ToolKit for Communies Using Health Data
Openness, Transparency, and Choice
Openness, transparency, and choice promote trust among data
users, data sources, individuals, and communies. If data users
are not open and transparent or if they do not oer choices
to individuals and communies when required or appropriate,
this can create unwelcome surprises, destroy trust, and may
even reduce the ability to use health data to improve health
in the future. The Toolkit includes examples of such failures as
cauonary case studies.
Community engagement supports openness, transparency,
and choice. For example, community leaders, neighbors, or
advisory boards can serve as conduits for noce to community
members. Communies can also provide informaon to data
users about how community members view the data use,
the level of disclosure, and the range of choices necessary to
maintain the communitys trust, as depicted in the following
diagram.
Community engagement alone may not, however, be enough
to ensure openness, transparency, and choice in cases where
individuals’ preferences are not the same as the interests of the
community. To maintain trust, data users must be open about
expectaons of data use.
Noce and consent are at the heart of openness, transparency,
and choice.
Advancing Openness, Transparency, and Choice
"Community
engagement alone
may not, however,
be enough to
ensure openness,
transparency, and
choice in cases
where individuals’
preferences are
not the same as
the interests of the
community."
Individuals provide
informed consent
Community and
individual informed
about data use and
benefits
Community and
individual consulted
about data use
Community and
individual consent to
proposed data use
Data source(s) informed
about data use
Data Steward
19
Openness, Transparency, and Choice
Noce is informaon provided to the community about data
use.
Consent is the process of geng permission from a community
or individual to use data.
Noce
Data users should provide individuals and communies with
noce about:
What informaon is being collected
Goals and potenal benets of data use
Risks of data use
Communies and individuals whose data will be used should
be able to ask quesons about, comment on, or object to data
use. Data users may also need to give sources of data, such as
health care providers, public health agencies, or researchers,
the same type of informaon.
Individual noce
Individual noce may be needed when those whose data
are being used are idenable, for example, by name or
home address, and when the risk of compromising privacy or
condenality or sgmazing an individual or small group is
high.
Direct Individual Noce
If data users plan to use protected, personally idenable data
without other prior noce, they may need to provide individual
noce. In some instances, laws or regulaons require individual
noce, but stewardship pracces also may warrant individual
noce if the risk of violang an individual’s condenality or
privacy is signicant, or if disclosure could cause harm. Data
users may provide individual noce through a telephone call,
a face-to-face encounter, e-mail, or tradional mail. Mail is the
most costly and burdensome form of noce. For example, a
data user may have a name but no address, so the data user
would spend me and resources nding the person’s address
or other means of contact. Even where addresses or telephone
numbers are available, it is costly to place phone calls or to
mail nocaons to individuals for more than a small number
of individuals.
"Communities and
individuals whose
data will be used
should be able
to ask questions
about, comment
on, or object to
data use."
20
ToolKit for Communies Using Health Data
Data users should be careful when the nocaon itself could
reveal private or condenal informaon. For example, a
leer mailed from an organizaon that supports individuals
with a sgmazing condion, such as substance abuse or HIV,
could inadvertently reveal informaon to others, such as other
members of the household.
Individual Noce Through Noce of Privacy Pracces
A noce of privacy pracces informs individuals about what
personal informaon may be collected and how it may be used.
Although not a noce of impending or actual use, this type of
noce alerts individuals to the possibility that their data may be
used in addional ways. Examples of this type of noce include
the noce of privacy pracces required by the Health Insurance
Portability and Accountability Act (HIPAA) Privacy Rule or Terms
of Use nocaons on social media sites.
Individual Noce of Opt-in/Opt-out Consent
In contrast to a noce of privacy pracces, noce of an opt-
in or opt-out opon gives individuals the noce of a consent
process, as discussed in more detail below.
Community Noce
In some cases, noce is given to the community, not
individuals. Dierent methods may be used to give noce to a
community, including:
Community meengs or town halls
Booths at community events
Flyers or noces posted at libraries, community centers, or
government oces
Websites or Web-based adversing
Media stories or adversements
Meengs with community leaders
In cases where data about small groups of individuals are being
used, more targeted noce may be needed. For example, if
data use were to aect Asian women with cancer, noce could
be given in a newsleer for this populaon, shared on blogs
for or by members of this group, or posted in cancer treatment
centers. Similarly, if a small geographic area is being studied,
everyone on the block or in a neighborhood could be sent a
leer explaining the data use that is planned.
Engaging the
community to
determine type of noce
MyHealth Access, a non-
prot health informaon
exchange in Oklahoma, took
on the challenge of engaging
the residents of Tulsa. The
organizaon’s Privacy and
Security Commiee explored
two disnct choices: noce
through the newspaper or
personal nocaon. They
conducted focus groups
in doctors’ waing rooms,
asking, “Where do you want
to learn about the sharing of
your data?” Paents did not
want to read about it in the
newspaper for a number of
reasons. Rather, they wanted
to receive noce about data
use in the doctors oce;
overwhelmingly they wanted
the engagement to occur on a
one-on-one basis.
21
Openness, Transparency, and Choice
Determining what noce should be provided
When determining the appropriate level and type of noce,
data users should rst determine whether laws, regulaons,
or agreements with a data source dictate the level and type
of noce required. See “Laws and Regulaons” for more
informaon.
If no legal mandates exist, data users should consider the risk
of:
Disclosing condenal or private informaon
Generang results that individuals or communies have
chosen not to know or that challenge fundamental beliefs
Sgmazing individuals, small groups, or communies
Data users should weigh the burdens of individual noce,
discussed above, against the benets of using data. When the
benets of use are great and compelling and the cost of noce
is very high or impraccal, the data user may determine that
individual noce is not required.
More targeted noce is warranted when individual privacy or
condenality is at risk and when individuals can be contacted
without undue expense or diculty.
Noce can be given broadly to communies or subgroups
within a community, or targeted to the individuals whose data
will be used.
Engaging
individuals and
communies preserves
the use of fetal blood
spots to improve human
health
When a baby is born, the
hospital may collect a blood
sample by pricking the child’s
heel. In some states, parents
led legal acons to prevent
the use of these fetal blood
spots for purposes that would
not directly aect the child.
Researchers launched
naonal and local eorts to
understand parents’ views
on the issue. They learned
that most parents were
willing to allow the use of the
blood spots for research, but
parents wanted to know how
the samples were being used,
and they wanted the ability to
limit the use.
Reecng these preferences,
states passed laws and
adopted policies addressing
parents’ concerns about
use of the blood spots. For
example, in Michigan, the
parents of newborns are now
noed that the Michigan
Biotrust hosts a website
where parents can choose to
limit the use of their child’s
blood spots through an opt-
out system. If parents do not
take acon to opt out, the
child’s biological samples may
be used for research.
22
ToolKit for Communies Using Health Data
Consent
In addion to noce, individuals may have the opportunity
to choose whether their data may be used. Certain uses are
required by state public health laws and do not require or
oer the opportunity for individual consent. However, other
situaons mandate choice and consent. The HIPAA Privacy Rule
and the federal regulaons regarding the Protecon of Human
Subjects in Research, known as the Common Rule, mandate
choice in many situaons, as discussed in Appendix B.
2
Consent may be required for original data collecon, for
example, when an individual agrees to parcipate in a
research study. Or consent may be required for some ways
of repurposing data that were not included in the original
consent. For example, individuals who have consented to the
use of their data to study diabetes might need to be given a
chance to choose whether they want to parcipate in a study
of correlaons with mental illness or substance abuse. Even
if laws or regulaons do not say how data are to be used,
community stewards should assess whether ethical imperaves
or the need to maintain trust require a consent process. There
are several approaches to obtaining consent from individuals or
communies whose data are being used.
Individual Consent
Some instances of data use require individual informed
consent. This requires the user to inform the individual about
planned data use and to obtain the individual’s consent before
using the data. This type of consent is usually required in
research studies, especially those where the data use has a high
level of risk.
Although individual consent oers individuals the highest
level of choice, it may not always be possible or feasible. For
example, it may not be possible to link biological samples
collected by the U.S. Army from draees during World War
II to the names of the people from whom the samples were
collected and thus to obtain individual consent for use of the
samples. In other cases, while it may be possible to idenfy
the source of data, that process itself may increase the risk of
violang the privacy rights or condenality of the person. In
other cases, the cost of obtaining individual consent may be
greater than the benets.
2 Data users should take special care when requesng access to or using substance
abuse treatment records, which are strictly regulated under federal law. See 42
C.F.R. Pt. 2
"Even if laws
or regulations
do not say how
data are to be
used, community
stewards should
assess whether
ethical imperatives
or the need to
maintain trust
require a consent
process."
23
Openness, Transparency, and Choice
Community Consent
In cases where individual consent is not required, feasible,
or warranted, data users may obtain community consent.
For example, a local elected ocial may consent to
community data being used instead of obtaining consent
from individuals. This type of consent can be used when
the risks to community members are relavely low, but may
not be the best approach when risks to individuals or small
subsets of individuals in the community are high.
Opt-in/Opt-out
In some cases, individuals may be given the choice
between allowing their data to be used or not used. Opt-
in and opt-out provisions usually have a default. With
an opt-in approach, individuals must take acon to have
their data included for a parcular use. With an opt-out
approach, individuals’ data will be available for use unless
they take acon to restrict or deny access to their data.
Local or regional health informaon exchange systems
typically include or exclude data based on opt-in or opt-out
defaults. As noted above, these systems require noce so
that individuals who are aected may exercise the choice
between opons.
"Local or regional
health information
exchange systems
typically include or
exclude data based
on opt-in or
opt-out defaults."
24
ToolKit for Communies Using Health Data
Community and Individual Engagement
and Parcipaon
Data users have an ethical, and somemes legal, obligaon
to promote community and individual engagement and
parcipaon in projects that use personally idenable,
de-idened, or aggregated data and when data use could
sgmaze individuals, small groups, or communies.
When data are used without appropriately engaging
communies and individuals in data use decisions, trust may
erode. Negave consequences of a breach of trust can have
subsequent radiang eects, as shown in many case studies.
Communies can be eecvely engaged at every phase of
the data lifecycle and when applying stewardship principles.
Engagement can be a way to protect the rights of individuals,
small groups, and communies. Engagement can also help
researchers or others in using data to improve health.
Mechanisms for engaging community members
Data users can engage community members in a number of
ways. When determining how to engage the community, data
users should think about which types of engagement would
provide legimacy for the data eort. In a polically polarized
community, for example, elected ocials may not be seen as
represenng the interests of all voters. The following briey
summarizes some approaches to community engagement.
Community Leaders
Community leaders can somemes serve as representaves
for a community as a whole. Leaders may include elected
ocials, leaders of community groups, leaders of religious
or spiritual organizaons, or even informal leaders. Use
cauon when using community leaders as representaves
of the community, as they may not accurately represent the
communitys view as a whole, and they may not understand
the concerns of subgroups or individuals within the
community.
Cautionary
Tale:
Repurposed
use of blood
samples
Members of the
Havasupai Tribe
volunteered to parcipate
in research studies on
diabetes by providing blood
samples. Years later, they
were surprised to nd out
that the researcher had
used the samples to study
family lineage, schizophrenia,
alcoholism, and migraon
paerns without obtaining
addional consent. In the
resulng lawsuit, Arizona
State University, which
employed the researcher,
paid the tribe a substanal
nancial selement and
returned the remaining
samples to the tribe.
25
Community and Individual Engagement and Parcipaon
Focus Groups
Focus groups provide another way to engage communies,
and are a good way to nd out how individuals feel
about an issue. Guidelines on how to run a focus group
are available from: hps://assessment.trinity.duke.edu/
documents/How_to_Conduct_a_Focus_Group.pdf. Like
engagement through community leaders, focus groups
can miss issues that maer to subgroups if members of
subgroups are not among the focus group members.
Community Advisory Boards
Community advisory boards are a commonly used form of
community engagement. To be eecve, advisory boards
should represent a range of interests and subgroups
within a community. One issue that must be addressed
in forming community advisory boards is how members
will be chosen, and whether members will be leaders of
community groups, or community members who are not
leaders. Some data repositories have specic requirements
about characteriscs of representaves who serve on
advisory boards.
Community Surveys
Community surveys can be completed online, on paper, or
in personal interviews. They can help data users to gather
and analyze informaon from many people as a form of
community engagement. An example of a survey to assess
community members’ percepons about community
health is available from: hp://www.naccho.org/topics/
infrastructure/mapp/framework/clearinghouse/upload/
Example-Survey-CTSA-Community-Health.pdf. While a
community survey can get input from more individuals,
the scope of results may be limited because the scope of
informaon is dened by the quesons asked and by the
characteriscs of the individuals who choose to complete
the survey.
The community
takes the lead
In Denver, a community group
called Taking Neighborhood
Health to Heart is working to
address a variety of health
problems. The community
helps to determine the
quesons to be asked,
research to be conducted,
and how and when data are
released. In some cases,
community members are
hired to collect survey data.
Because the community is an
acve parcipant in all parts
of research, the iniave
has learned about issues
that might never have been
addressed for fear that results
would be used to sgmaze
community members.
26
ToolKit for Communies Using Health Data
Opportunies for engaging community members
across the data lifespan
Purpose Specicaon
When planning projects and framing research quesons,
engaging the community can help data users to:
Understand community perspecves
Avoid mistakes that can occur when someone outside
of the community makes assumpons about dynamics
within a community
Target issues that are relevant and useful to the
community
Openness, Transparency, and Choice
The most important point in the engagement process
occurs when implemenng the stewardship principle
of openness, transparency, and choice. See Openness,
Transparency, and Choice for specic recommendaons on
community engagement.
Data Collecon and Acquision
Data users may engage communies in the data collecon
process, and data holders can require that those who want
to use their data engage communies:
Community members can administer surveys, which
may improve participation and response rates (see
The community takes the lead and the case study in
Appendix C )
Community members can provide insight into how
unique characteriscs of the community may aect
data collecon eorts (see the case study, A Refugee
Communitys Expectaons describing the University of
Maine community data project in Appendix C)
Organizaons sharing data may require that those using
their data involve community advisory boards
Data analysis
Community members can explain to data users aspects
of the community that may inuence how data are
interpreted and analyzed by individuals who do not have an
understanding of community dynamics. Communies can
be very helpful in reviewing ndings and interpretaons of
ndings before ndings are released to the public.
Sgmazaon in
the AIDS epidemic
In the early days of the AIDS
epidemic, data suggested
that Hai was a source
of the infecon and that
Haian immigrants were
overrepresented among
the populaon subgroups
with the disease in the
United States. (See Ellio
Frank, et al. “AIDS in Haian-
Americans: A Reassessment.
Cancer Research 45 (Suppl
9):4619s–4620s. 1985.) The
result was widespread fear
of Haian immigrants and a
drop in tourism to Hai. One
of the doctors aempng to
treat this populaon later
reported that he encountered
widespread mistrust because
of the sgmazaon. (See
Ronald Bayer, Gerald M.
Oppenheimer. AIDS Doctors:
Voices from the Epidemic: An
Oral History. New York, NY:
Oxford University Press;
p 28–29. 2000.)
27
Community and Individual Engagement and Parcipaon
Subgroup Concerns
Some data use can trigger dierent concerns from dierent
communies, so data users must consider whether mulple
communies or subgroups within a community should
be represented. A subgroup can share a racial, ethnic, or
geographic trait, or even be aected by a shared disease.
Subgroup concerns can arise whether data are personally
idenable, de-idened, or aggregated.
Avoiding Sgma and Discriminaon
Data users may engage communies to avoid or address
concerns about data uses that have the potenal to result in
discriminaon against or sgmazaon of the community or its
members. Community engagement can help data users idenfy
areas of sensivity or concern and be a means of addressing
concerns. Data users from outside the community may not
see how the data could negavely aect communies. Studies
of prevalence of health issues such as sexually transmied
diseases, substance abuse, behavioral health, or genec
disorders, whether using data from medical records or public
health surveillance data, may be used to idenfy subgroups
in the populaon with increased risks for adverse health
outcomes and have the potenal to sgmaze community
members. The data user should give thoughul consideraon
to the use and analysis of these data to avoid sgmazing
groups or individuals.
Community engagement can also help data users to
communicate ndings in ways that do not sgmaze
communies or subgroups, although in some cases it may not
be possible to publicly release certain types of data without
the risk of sgma and discriminaon. Even then, community
engagement in purpose specicaon (see below) can help data
users to strike an acceptable balance between data use and the
interests of research parcipants and communies who may
want to learn from, but perhaps not publish, results.
Engaging a
disnct community
subgroup
The Populaon Study of
ChINese Elderly (PINE)
idened aconable concerns
among older Chinese adults
in Chicago, a community
cohort that was less well
understood. By engaging more
than 20 community groups
and by using mullingual
sta to interview parcipants
according to their preferred
languages and dialects, the
survey response rate was 91%.
The result of the eort was
reported in The PINE Report,
which showed that members
of this populaon are aected
by medical comorbidies,
physical disabilies, low
health care ulizaon rates,
psychological distress, social
isolaon, and elder abuse
at higher rates than other
older adults in the United
States. The PINE Report
idened opportunies for
family members, community
stakeholders, health
professionals, and policy
makers to improve the health
and well-being of older
Chinese adults.
28
ToolKit for Communies Using Health Data
Summary
Evaluate opportunies for engaging communies and
individuals at every step in the data lifecycle and across all
elements of the stewardship framework
Be aware of the concerns of subgroups within
communies whose interests may be dierent from those
of the larger community
Consider the risk of sgmazaon of communies or
small groups and engage the community or individuals to
determine an acon plan for addressing the risk
Engaging the
community in
health informaon
exchange
Health informaon exchanges
enable providers to share
health informaon across
organizaons and provider
types to improve paent
care. In some communies,
concerns about privacy and
condenality of health data
have decreased informaon
sharing through exchanges,
and adversely impacted
the quality of paent care.
To avoid similar concerns,
MyHealth Access, the Tulsa
exchange described earlier,
engaged the community in
a 100-day planning process
that involved 200–300
people. At the beginning,
parcipants agreed to focus
on the objecves of health
improvement and quality. This
focus allowed the community
to agree on a system of
privacy and condenality
protecon that permied the
ow of data needed to treat
paents opmally.
29
Purpose Specicaon
Purpose Specicaon
Researchers are trained to start every inquiry by framing
the queson. What queson is the project designed to
answer? Data users should explicitly and carefully frame the
queson and be able to explain how the data will answer
the queson. This process is called purpose specicaon.
Purpose specicaon helps data users reach the intended goal,
regardless of the data source or type.
Purpose specicaon is relevant whether data are personally
idenable or de-idened. It is also important regardless of
data source. If obtaining a data set from an enty, data users
will typically need to explain the purpose for the data use. Even
for data that are publicly available, explaining the purpose is
important if the data use is to achieve its intended goal.
Purpose specicaon has many benets:
By requiring that data collected is carefully linked to the
purpose of the project and possible follow-on projects,
data collecon will be targeted, focused, and thorough
Data collecon eorts that contemplate repurposing at the
outset can increase eciency while decreasing the data
collecon burden
Purpose specicaon can help data users avoid
unwelcome surprises by emphasizing the need to
ancipate and plan to address negave impacts
Community engagement can support the purpose specicaon
process. Communies and individuals can help data users
to understand challenges or concerns about which the data
user may be unaware. Laws or regulaons may dictate the
purpose of data collecon by government agencies, such
as health surveys or infecous disease surveillance. Though
overall purpose for these eorts may be broad, even these data
collecon eorts are usually driven by a queson that the data
may help to answer.
When engaged in purpose specicaon for a project involving
original data collecon, data users should ancipate and adjust
for the possibility that data may be valuable for repurposing.
For example, biological samples may remain at the conclusion
of a study evaluang the prevalence of a vitamin deciency.
A data user, aware that samples could be used to invesgate
human health problems in the future, can ancipate
repurposing. To address ancipated repurposing, a data user
might ask for consent in the primary study for samples to be
used in later studies dened in the consent.
Deliberave
Democracy Model
A “biobank” collects,
processes, stores, and
distributes bio-specimens
and related data for use in
research. A biobank might
include specimens of blood,
saliva, plasma, or DNA. When
the Mayo Clinic started
biobanking and repurposing
data from their electronic
medical records, it adopted a
deliberave democracy model
that engaged community
members in open dialogue for
4 days. The deliberants were
provided with background
materials on biobanking,
biomedical research, and local
eorts at Mayo. They were
then given an opportunity to
interact with domain experts,
including sciensts involved
in genecs research as well as
privacy advocates. The result
was community support and
an accepted framework for
the use of biological samples
and health data.
30
ToolKit for Communies Using Health Data
In the process of purpose specicaon, data users should
consider the balance between dening a specic and
narrow purpose or a less specic and broader purpose when
using data. The advantages of a narrow scope are that the
purposes are easily dened and described, so communies
and individuals may be more likely to trust users and allow
the desired uses of their data. However, future uses may be
circumscribed. A data project that species a more open-
ended or unknown purpose gains greater exibility for future
uses, but runs the risk that individuals may be less likely to
parcipate because they do not understand the full extent of
potenal future uses for which their consent is being sought, or
that future uses will surprise individuals or communies with
unexpected, perhaps even unwanted, results.
Repurposed Data
Repurposed data are collected for one purpose and then
used for another. Public health surveillance data collected by
state health departments is repurposed when shared with
communies or researchers to invesgate a concern that
the data may help explain. Laboratory tests performed to
guide paent diagnosis and treatment are repurposed when
combined with many other tests to show the prevalence of a
condion in a subgroup of individuals.
When using repurposed data, users should consider concerns
that may be raised by those whose data are being repurposed.
The cases of the research study of the Havasupai tribe and
the collecon of fetal blood spots show the harm that can
occur when data are repurposed without the consent of
the individuals whose data are being used. The case study
describing the community-based approach used by MyHealth
Access shows how data users can more likely avoid problems
encountered by data users who did not consider the risks of
repurposing.
Public health data used by communies might have been
originally collected for the purpose of controlling or prevenng
injury and disease, or for legal and administrave reasons,
or both. For example, birth and death cercates include
informaon useful for legal purposes (such as establishing
rights to an estate), administrave purposes (establishing family
benets or ceasing benets to decedents), or surveillance for
unusual incidence of disease (such as genec birth defects, or
deaths from suicide or cancer in a geographic area). Rates of
premature death, cancer, and obesity are examples of the types
Cautionary
Tale:
Repurposing
data without
individual or
community
engagement
Most newborn babies receive
blood tests to determine if
they have treatable medical
condions. Realizing that
these blood “spots” could also
be used for other purposes
that would benet public
health, such as monitoring
rates of genec disorders
or infecous diseases, the
holders of the blood spots
began to make them available
for research. Parents in
several states found out that
biological samples taken
from their babies were being
used without their consent
and brought legal acons. In
Texas, the legal selement
resulted in the destrucon of
more than 5 million biological
samples.
31
Purpose Specicaon
of data communies can repurpose to improve community
health.
3
Users should also be aware of any limits to repurposing that
may be imposed by laws governing the collecon and use of
the source data set or data use agreements. Laws in some
states, for example, explicitly address the repurposed use of
fetal blood spots. To take another example, state laws may
limit the repurposing of vital stascs, such as birth and death
records. However, many states have laws that allow the broad
use of health care data sets to measure health care cost,
quality, and access. In these states, the data steward will have a
data oversight commiee and data release policies that strictly
govern data release and reporng uses and these are included
in the DUA. In other cases, state laws or regulaons allow the
sharing of government health data only for specic purposes.
Tensions between data used for improving
community health and for research
Purpose specicaon can also be used to address a tension
between the goals of academic research and the goals of
advancing community health. Research ethics and funding
sources somemes mandate that researchers disseminate
their ndings through publicaon or presentaons at academic
meengs. Communies, to the contrary, may want to use
funding to improve health, while liming disseminaon of
potenally sgmazing or otherwise harmful results. Once
again, community engagement in the purpose specicaon
process can help address this tension at the outset of a project.
At the outset of any data project, explicitly and carefully
dene the purpose of data collecon or use of repurposed
data.
Consider how to most eecvely engage the community in
the purpose specicaon process.
Consider and address possible adverse impacts of data use
or collecon.
Be aware that data may be repurposed and design
collecon accordingly.
When using repurposed data, consider how changing the
3 Community Commons oers tools to help communies use repurposed data
eecvely. From its website: “Community Commons is an interacve mapping,
networking, and learning ulity supporng broad-based and sustainable healthy
communies with free access to resources for registered users.”
See “About” available from: www.communitycommons.org.
"Laws in some
states, for example,
explicitly address
the repurposed
use of fetal blood
spots."
32
ToolKit for Communies Using Health Data
original purpose may trigger the need for addional noce
or consent or if these changes are allowed under the DUA
with the data steward.
If the project brings together academic researchers and
communies using data to improve health, address any
tension among academic goals, funding mandates, and
community interests in protecng use limitaons.
Summary
Occasionally, rare events, even in the aggregate, in conjuncon
with detailed local knowledge may inadvertently lead to clues
or speculaon about specic individuals. These eects may
be in violaon of explicit data use agreements or generally
recognized principles of privacy.
In such cases, another strategy may be to arrange with the
original data steward for some kind of trusted intermediary
through which a community can analyze data in a secure data
center, allowing access to the data in a controlled environment
while sll honoring the need to protect the condenality of
the data in the custody of the original data steward.
Using a trusted
intermediary
The Southern Illinoisan, a
newspaper, sought cancer
registry data in an Illinois
Freedom of Informaon Act
request in order to see if there
was a cancer cluster in an
area of petroleum extracon.
Dr. Latanya Sweeney, then
a Professor of Computer
Science at Carnegie Mellon
University, and an expert in
re-idencaon of supposedly
de-idened data sets,
tesed that individuals
could be idened using the
requested data in conjuncon
with publicly available
informaon because the
number of cases was small.
The newspaper was successful
in the lawsuit and obtained
the data. To avoid the suit,
Illinois could have suggested
disclosing the data through a
trusted intermediary such as
a university, which could have
permied data analysis under
a promise of condenality in
a secure seng. Communies
seeking such cancer registry
data might want to try this
opon if they encounter
condenality concerns.
Southern Illinoisan v. Illinois
Department of Public Health,
218 Ill. 2d 390 2006).
33
Quality and Integrity
Quality and Integrity
Stewardship principles require that the quality and integrity of
data are managed so that they are usable for their intended
purposes.
Data quality refers to the accuracy, relevance, meliness,
completeness, validity, and reliability of the data. The data
collected or used for a parcular purpose must have an
appropriate nexus to that purpose that is mely, and as
complete as reasonably necessary to answer the quesons
asked without bias, skewing, or other distoron. Data must be
recorded or captured accurately, and it must represent what
it is claimed to represent. For example, quesons that are
ambiguous in a survey may not yield answers that correspond
to what the data user believes them to mean.
Data integrity means that the data have not been corrupted.
Data users must be aware of the problem that data may
be modied or otherwise garbled as they are used. When
data sets are combined, there are risks that they may not
be properly matched. Therefore, the combined data may no
longer accurately reect the sources.
It is seldom possible or necessary to have perfect data, but
stewards should consider and make a judgment about whether
data accurately and adequately measure what is being studied,
and if the data can be trusted. The data stewards of large public
health databases should provide detailed documentaon about
the underlying data and its limitaons, and should be consulted
to validate and review ndings prior to public release of reports
or stascs derived from these data sets.
Data Quality and Integrity Through the Lifecycle
Review of the Literature
Data users should research and evaluate what has already
been done; doing so helps to ensure the quality of data and
can answer the following quesons:
Is further data collecon needed, or is the necessary
informaon already available?
If others have addressed the issue in a dierent
populaon, can a proven methodology be used rather
than starng from scratch?
What methodologies have failed to work?
By starng with a scienc literature review, the data user
can avoid duplicang eort and avoid others’ past mistakes.
"When data sets
are combined,
there are risks
that they may
not be properly
matched."
34
ToolKit for Communies Using Health Data
Data Collecon, Data Entry, and Data Cleaning Processes
To ensure data quality, users should assess that original
data are collected in accordance with generally accepted
procedures, and that sources of repurposed data are
trustworthy. For example, a trustworthy data source
would be able to provide assurances about how data were
collected, entered into a database, and stored. The Data
Quality and Integrity Checklist (Appendix D) outlines the
steps for users to follow.
Analysis
Data analysis should be conducted by trained and
experienced individuals or enes. If an organizaon
lacks internal experience, it may consider associang with
researchers who are interested in the issue being studied.
Reporng Results
Results, whether published in a journal or report or used
within an organizaon for internal purposes, should
accurately describe the results ndings of the analysis and
should avoid bias.
Special Consideraon for Merged Data Sets
Data users somemes merge data from two or more
sources to gain enriched data that is more useful than
either data set alone. However, data users must be careful
to combine data sets where the measures use the same
populaons, standards, and scale, so that they are not
comparing apples and oranges but using data to make valid
inferences.
Examples of Merged Data Sets
The results of a survey of nutrional habits of
adolescents, administered by dierent school districts
in dierent cies in a state, could be combined to
increase the studys stascal power.
Two dierent data sets could be combined to beer
understand a phenomenon. For example, obesity rates
obtained from government sources could be combined
with a map of safe walking routes to consider whether
lack of safe walking routes is associated with higher
rates of obesity.
"...a trustworthy
data source would
be able to provide
assurances about
how data were
collected, entered
into a database,
and stored.
"
35
Quality and Integrity
Validity of Merged Data Sets
When two or more data sets are combined, users should
ensure that a merger or aggregaon is valid, and that the
data retain integrity. In determining validity, data users
should ask:
Are the populaons the same for the dierent data
collecon eorts?
Do survey quesons and response categories match?
Might dierences in survey administraon dates aect
survey results?
What were the survey sample designs?
Many of the issues involved in determining if survey data
can be combined and how they should be combined are
substanve, and require consideraon by subject-maer
experts. These issues should be resolved before any
stascal consultaon takes place.
4
Summary
Ensure that data quality and integrity are maintained
throughout the data lifecycle, as outlined on the Data
Quality and Integrity Checklist in Appendix D.
Before merging data sets, consider how the merger will
aect data quality and integrity.
4 For a detailed discussion on how to evaluate the validity and integrity of merging
data sets, see the U.S. Department of Health and Human Services, Oce of the
Assistant Secretary for Planning and Evaluaon’s Data on Health and Well-being
of American Indians, Alaska Naves, and Other Nave Americans: Data Catalog,
Contract No. 233-02-0087, Appendix B: Data Set Aggregaon, B-1 (Dec. 2006),
available from: hp://aspe.hhs.gov/hsp/06/catalog-ai-an-na/report.pdf.
"Might differences
in survey
administration
dates affect survey
results?"
36
ToolKit for Communies Using Health Data
Security
Securing data means protecng the data’s condenality,
integrity, and availability. Good security protects data from
loss of control, and, therefore, potenal unauthorized access,
damage, or manipulaon. Security safeguards may be technical,
administrave, or physical controls and can range from using
locks on an oce door and procedures for handling paper
forms to the use of sophiscated encrypon soware. Security
is parcularly important for personally idenable data that are
private or condenal.
Some of the primary threats to loss of data include using weak
passwords; failure to back up electronic data; infecon by
viruses or malware; and loss of portable electronic devices,
such as smart phones, thumb drives, and laptop computers.
Employees can increase the likelihood of security incidents
by either failing to follow policies and procedures designed
to protect data security, or by deliberately taking, altering,
or destroying data. Even paper can be at risk—for example,
completed surveys could accidentally be placed in a recycle bin
during an oce cleanup.
Responsible data security includes these steps:
Evaluate ancipated risks
Develop a plan to reduce ancipated risks
Re-evaluate risks periodically
Elements of a security plan could include:
Idencaon of major risks
Adopon of methods to secure paper documents
Password protecon for access to computers, networks,
and electronic devices
Encrypon of data stored on removable devices such
as laptops, tablets, or phones, ensuring data cannot be
accessed if the computer is lost or stolen
Automated backup processes to protect against accidental
data loss
Training for employees on security measures
Signed condenality agreements from all sta collecng,
managing, analyzing data
As with the queson of noce, data users must assess the need
to secure data, and the costs of doing so, against the risk of
data loss, inappropriate access, or manipulaon.
"Security is
particularly
important for
personally
identiable data
that are private or
condential."
37
Security
Ways to improve data security
Physical
Install locks on cabinets or rooms where paper records
are stored
Keep records away from areas vulnerable to damage in a
ood
Protect electronic storage facilies against break-ins or
destrucon
Back up data with o-site storage capabilies
Administrave
Run a risk analysis
Set up policies and procedures for accessing paper re-
cords, disposing of data, or adding new equipment on a
network
Train those with access to sensive informaon in data
security
Require robust passwords
Control who has access to view or change the data
Conduct due diligence on employees who handle data
Implement an incident response program
Technical
Maintain logs of system access and unauthorized extrac-
on of data
Add encrypon
Specic elements in a data set
Data set as a whole
Devices that allow access to the data set, such as
laptop computers
Implement monitoring to scan for and idenfy cyber at-
tacks
38
ToolKit for Communies Using Health Data
For more detailed informaon about security, see the Naonal
Instute of Standards and Technology guides on assessing and
maintaining data security,
5
which are useful for nonfederal
organizaons. The Oce for Civil Rights of the U.S. Department
of Health and Human Services also publishes security guidance
in plain language
6
for enes covered by the HIPAA Security
Rule, which nonetheless is instrucve for organizaons not
covered by that rule.
Role of De-idencaon in Data Security
De-idencaon is a process where personal ideners such as
name, address, telephone number, or date of birth reduce the
risk that private or condenal informaon will be disclosed.
The process of de-idencaon and protecon from re-
idencaon are addressed in the next secon.
5 A list of guides published by the Naonal Instute of Science and Technologys
Computer Security Resource Center is available from: hp://csrc.nist.gov/publi-
caons/PubsSPs.html.
6 Educaonal materials from the Oce for Civil Rights about the HIPAA Security
Rule and other sources of standards for safeguarding electronic protected
health informaon include the HIPAA Security Informaon Series, available
from: hp://www.hhs.gov/ocr/privacy/hipaa/administrave/securityrule/se-
curityruleguidance.html. In parcular, HIPAA Security Series 1: Security 101 for
Covered Enes gives an overview of basic concepts, and HIPAA Security Series
7: Security Standards: Implementaon for the Small Provider describes basic
topics for data users. Other available resources include “Privacy and Security
Training Games”(hp://www.healthit.gov/providers-professionals/privacy-
security-training-games); “Guide to Privacy and Security of Electronic Health
Informaon” (hp://www.healthit.gov/sites/default/les/pdf/privacy/privacy-
and-security-guide.pdf); “Security Risk Assessment Tool” (hp://www.healthit.
gov/providers-professionals/security-risk-assessment); and “Your Mobile Device
and Health Informaon Privacy and Security” (hp://www.healthit.gov/pro-
viders-professionals/your-mobile-device-and-health-informaon-privacy-and-
security).
"De-identication
is a process where
personal identiers
such as name,
address, telephone
number, or date of
birth reduce the
risk that private
or condential
information will be
disclosed. "
39
De-idencaon
De-idencaon
De-idencaon is the process of removing or obscuring
any directly or indirectly idenfying informaon from data
in a way that minimizes the risk of unintended disclosure of
individuals’ identy and informaon. By removing directly
idenfying elements and otherwise treang data through de-
idencaon, released informaon can be both condenal
and useful for legimate purposes.
Good de-idencaon pracces reduce risks of re-idencaon
to a level judged acceptable given the data’s sensivity. Using
de-idened data whenever possible is a strong privacy
pracce, because it reduces risks of a data breach and other
violaons of personal privacy. Data de-idencaon makes
it very hard to link data to a specic individual, allowing the
study of a variety of sensive issues while greatly reducing the
risk of disclosing personal or condenal informaon. Aside
from organizaons that must follow HIPAA de-idencaon
methods, no standard, universally adopted de-idencaon
method is used throughout health care.
Identy and Aribute Disclosures
There are two areas of concern regarding re-idencaon:
The rst is identy disclosure, which happens when an outside
party can assign an identy to a record in a disclosed data set;
the second concern is aribute disclosure.
Aribute disclosure allows an outside party to aribute
characteriscs to someone in a data set even if he or she has
not been individually idened. This form of disclosure is of
primary concern in summary data releases, and it may arise
from the presence of empty cells either in released tables
or linkable sets of tables. The presence of a zero cell within
a table could allow an outside person to infer that no one in
the parcular category had the characterisc in queson. This
could be very sensive informaon. For example, the zero
cell could indicate lack of control of blood glucose levels, and,
by inference, that no one in a specic category of diabetes
paents dened by race and sex had good control of their
blood glucose levels.
If the opposite is true—for instance, a cell has 100% of a
parcular subgroup in a sample showing a specic aribute—
then membership in the subgroup implies having that aribute.
For example, if all of the homosexual men in a sample are
posive for Hepas C, then any homosexual man in the
sample can be assumed to have Hepas C.
"Good
de-identication
practices reduce
risks of
re-identication
to a level judged
acceptable
given the data’s
sensitivity."
40
ToolKit for Communies Using Health Data
Simple De-idencaon
De-idencaon in its simplest form means deleng a paent’s
name from the associated health record. However, even
before the advent of computer databases, this simple form
of de-idencaon would have been insucient to maintain
condenality. Learning how re-idencaon aacks happen
provides some answers:
Even when an administrator removes all of the data elds he
or she thinks might be uniquely idenable from a data set, an
aacker may sll be able to unlock the identy of the subject
of a record by discovering pockets of uniqueness remaining in
the data. This type of re-idencaon is possible because, even
without a specic idener, certain combinaons of values
may be so rare that they create a “ngerprint” poinng to only
one person. A re-idencaon aack aempts to locate the
unique ngerprints in a de-idened data set, and then search
for that same ngerprint in another data set containing unique
ideners. This technique is best shown using a Venn diagram:
Looking for Unique “Fingerprints” in a Database
7
This process of re-idencaon can be as simple as doing a
reverse phone number lookup on a data set containing phone
numbers. In a more complex form, the re-idencaon aack
might idenfy a health record with a combinaon of age, zip
code, and sex that is unique in the data set, and then cross-
reference that informaon with a voter registry to determine
the one individual in that zip code of that sex who was born on
that day. De-idencaon tries to protect against this external
linkage via uniqueness.
7 See “Understanding HIPAA Privacy,” published by the Oce for Civil Rights, U.S.
Department of Health and Human Services, in Health Informat ion Privacy: Guid-
ance Regarding Methods for De-idencaon of Protected Health Informaon in
Accordance with the Health Insurance Portability and Accountability Act (HIPAA)
Privacy Rule, available from: hp://www.hhs.gov/ocr/privacy/hipaa/understand-
ing/coveredenes/De-idencaon/guidance.html.
"This type of
re-identication
is possible
because, even
without a specic
identier, certain
combinations
of values may
be so rare that
they create a
‘ngerprint’
pointing to only
one person."
Data set
(e.g., hospital
records)
Data set
uniques
Potential
links
Population
uniques
Population
records
(e.g.,
registration
list)
41
De-idencaon
"Use of
de-identied data
whenever possible
is a good privacy
practice as it
reduces risks of a
data breach and
other violations of
personal privacy."
Re-idencaon Using Public Records
8
Data Considered for Sharing
Age
Zip Code Gender
Diagnosis
15
21
36
91
00000
00001
10000
10001
Male
Female
Male
Female
Diabetes
Influenza
Broken Arm
Acid Reflux
Voter Registration Records (Idetified Resource)
Birthdate
Zip Code Gender
Name
2/2/1989
3/3/1974
4/4/1919
00001
10000
10001
Female
Male
Female
Alice Smith
Bob Jones
Charlie Doe
De-idencaon methods must not only aempt to remove
any informaon that would be personally idenable, but also
manipulate the data set to ensure that it contains no unique
ngerprints.
Individual-level De-idencaon
Data users can de-idenfy individual records through a number
of methods. The most common are suppression, generalizaon,
and distoron.
Example: Data Set
Age (years) Gender ZIP code Diagnosis
15 M 00000 Diabetes
21 F 00001 Inuenza
36 M 10000 Broken arm
91 F 10001 Acid reux
Suppression occurs when informaon is completely removed
from the data set. Direct ideners such as names and Social
Security numbers are common examples of an individual’s data
that are completely suppressed. Some data such as birth dates
and zip codes, however, cannot be completely suppressed
without destroying the ulity of the data set.
Example: Data SetSuppressed
Age (years) Gender ZIP code Diagnosis
M 00000 Diabetes
21 F 00001 Inuenza
36 M Broken arm
F Acid reux
Where complete suppression is impraccal, data are oen
generalized. In generalizaon, a parcular variable, such as age,
is divided into broader categories, such as 5-year age spans.
Generalizaon is oen extremely eecve at balancing ulity
and privacy in a data disclosure.
8 Id.
42
ToolKit for Communies Using Health Data
Example: Data Set—Generalized
Age (years) Gender ZIP code Diagnosis
< 21 M 00000 Diabetes
21 ≤ 34 F 00001 Inuenza
35 ≤ 44 M 10000 Broken arm
> 45 F 10001 Acid reux
Distoron may also be used to de-idenfy data, but with health
data, distoron oen destroys the reliability of the data for use
in drawing eecve ndings.
De-Idencaon Through Aggregaon
Aggregaon is another way to de-idenfy data. Instead of
removing ideners from individual-level data, data can be
combined into aggregate, or stascal, reports. This form of
de-idencaon can be parcularly eecve at maintaining
ulity while protecng the data’s condenality. However, the
risk of inadvertent aribute disclosure remains. For example,
the following table logically implies that all Hispanic females
enrolled in the Healthyville School District during the 2014–
2015 school year and included in the survey used illicit drugs.
Example: Data Set—Inadvertent Aribute Disclosure
2014–2015 Healthyville School District Drug
Usage Survey
No drugs Illicit Illegal
White male 85 40 15
White female 90 12 7
Black male 45 15 8
Black female 50 11 13
Hispanic male 10 5 7
Hispanic female 0 3 0
When releasing aggregate or stascal reports, one
eecve strategy is to avoid small “cell” counts. When a
cell in aggregated data is small, it increases the risk of re-
idencaon. For example, when a data set contains health
data represenng thousands of paents, but only four paents
are aected by a specic type of cancer, those four paents are
at high risk of being idened. In the aggregated data shown in
the following gure, the number of persons of Hispanic origin
is so small that reporng the number of those individuals raises
the risk of re-idencaon.
"Data users
can de-identify
individual records
through a number
of techniques."
43
De-idencaon
Aggregated Data
The risk of re-idencaon also increases when data are
combined from more than one source, or when data represent
members of a small group of people, whether they are
members of an ethnic or racial minority, or members of a group
suering from a specic illness.
Even when aggregaon is used, and even if small cells are not
reported, some risk of re-idencaon may remain. If this is
the case, data users should seek expert advice for assistance in
methods to further reduce these risks.
In addion, data users can use data use agreements, discussed
in the following “De-idencaon, Limited Data Sets, and Data
Use Agreements ” secon, to limit aempts at re-idencaon.
Another approach is to ask individuals whose data are to be
used if they would consent to data use even if there were a risk
of re-idencaon.
Although the risk of re-idencaon may not be eliminated,
the risk may be outweighed by the benets of using health
data. Data users should explicitly address the tension between
the desire to maintain condenality and privacy and the
desire to use data to advance public health. The data steward
must consider a series of tradeos, including the applicaon
of rigorous stascal and data management controls to reduce
the risk of re idencaon, while preserving as much data
ulity as feasible.
State health data organizaons that maintain hospital
discharge databases apply a layered approach to protecng
the data sets they release, following standards adopted
by the Naonal Associaon of Health Data Organizaons
(NAHDO). This methodology reduces the probability of
unique re-idencaon of individuals through stascal and
technical modicaons that alter the data. De-idencaon
combined with data management measures (such as data
oversight boards, training and educaon of users, or penales
for misuse), and informaon technology soluons (such as
"This form of
de-identication
[aggregation] can
be particularly
effective at
maintaining utility
while protecting
the condentiality
of the data."
44
ToolKit for Communies Using Health Data
encrypon), are methods that may help to manage the risk of
release while making relevant health care informaon more
available to data users.
Quanfying and Evaluang the Risk of
Re-idencaon
Evaluang risk of re-idencaon can be a very technical
process that requires substanal experse, but community
data users can use certain general principles as a guide. The
most important factor to consider is the number of individuals
who share a certain set of characteriscs. Name, address, and
telephone number are obvious examples of data elements that
can reveal the identy of a person, but other data elements
may be less obvious.
Communies should be aware that merging data sets, in
parcular, may increase the risk that individuals or small groups
could be idened. Merged data sets raise concerns when
people would not expect the data to be combined (for example,
correlaons among prescripons lled, food purchases, and
method of payment for food that could be obtained from
private supermarket data); when analysis of the combined data
sets may have negave consequences for those whose data are
used; or when merger raises the risk that private or condenal
data may be disclosed.
Good data stewardship pracces require evaluang the re-
idencaon risks for new mergers of de-idened data sets,
and for all new uses of de-idened data sets. The Oce
for Civil Rights of the U.S. Department of Health and Human
Services provides guidance
9
on how HIPAA-covered enes
can evaluate risk of re-idencaon, but community-based
data users should not undertake this process without expert
guidance.
De-idencaon, Limited Data Sets, and Data Use
Agreements
The HIPAA Privacy Rule requires data use agreements (DUAs)
when researchers use “limited data sets.” A limited data set
is created from protected health informaon by removing
all ideners except certain informaon about dates and
locaons. Users obtaining de-idened data sets also may
need to enter into a DUA with the enty supplying the data, to
9 See Health Informaon Privacy: Guidance Regarding Methods for De-idenca-
on of Protected Health Informaon in Accordance with the Health Insurance
Portability and Accountability Act (HIPAA) Privacy Rule, published by the Oce
for Civil Rights, U.S. Department of Health and Human Services, available from:
hp://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredenes/
De-idencaon/guidance.html.
Cautionary
Tale:
Small cell sizes
An academic used
state vital records
data from death
cercates to
study cause of
death from a variety of causes.
This researcher was able to
idenfy one individual because
of small cell size. As a result,
the government agency that
supplied the data decided
to increase the suppression
criteria from 5 to 10. They put
in place a system where an
automac check is performed
in the background before
results are reported back to
a researcher to check for cell
sizes smaller than 10. Now, if
someone runs an analysis for
which any of the cell sizes are
less than 10, the cell will come
up blank or just indicate “<10.
45
De-idencaon
promise to protect the data against re-idencaon or to make
addional privacy and security arrangements. Data stewards
of state data sets may be subject to laws prohibing aempted
re-idencaon of records, and many DUAs impose penales
for noncompliance with DUA requirements. Communies that
engage in the collecon of original data may share de-idened
data with other organizaons, and when doing so, should use
a DUA to make clear the expected arrangements for use of the
data, including liming aempts to re-idenfy de-idened
data.
Summary
De-idencaon can be used to limit the risk that
individuals’ condenal or private data will be disclosed.
Two types of de-idencaon are:
Individual de-idencaon
Aggregaon
Data users can use a number of strategies for liming the
risk of re-idencaon, such as:
Suppressing small cell counts
Grouping variables that could make re-idencaon
easier
Data use agreements that prohibit aempts to re-idenfy
individuals can add a layer of protecon to other strategies
for protecng condenality and privacy.
When de-idencaon interferes with the purpose of the
data use, individuals can be asked if they accept the risk of
re-idencaon.
Cautionary
Tale:
Washington
State Hospital
Discharge Data
A researcher
purchased
hospital discharge data from
the state of Washington.
Although the data set did
not include paent names,
the researcher was able to
corroborate highly sensive
informaon about specic
individuals by linking publicly
available informaon from
newspaper reports about
accidents to the informaon
contained in the data set.
Washington did not use
the NAHDO Guidelines for
release, which recommend a
layered approach of stascal,
management, and regulatory
controls. Washington learned
from the experience and put in
place a system using data use
agreements that, among other
things, requires researchers
accessing data to agree not to
try to re-idenfy individuals in
the de-idened data set.
47
Appendix A: Denions
Appendix A: Denions
The following denions explain how terms are being used
in the Toolkit, although the denions are similar to other
common uses.
Condenality
The treatment of informaon that a person has disclosed in
a relaonship of trust with the expectaon that it will not be
passed on to others in ways that are inconsistent with the
understanding of the original disclosure without permission.
Consent
A process through which a community or individual gives
permission for data to be collected or used by a specic enty
for a specic purpose.
De-idened Health Data
Health data about an individual that has had ideners, such
as name, address, telephone number(s), and date of birth
removed. For HIPAA-covered enes using protected health
informaon (PHI), the HIPAA Privacy Rule governs the specic
data elements that must be removed to create a de-idened
data set.
Health Data
Informaon about the health of specic individuals, such as
blood pressure, or about subgroups of individuals, such as
children under 5 years old with asthma living in a specic zip
code, or about a community, such as the number of residents
with stage 4 adenocarcinoma of the colon.
HIPAA
Health Insurance Portability and Accountability Act. The part
of HIPAA that most people have encountered is the Privacy
Rule, which gives certain rights to individuals—for example, to
obtain copies of their medical records—and imposes dues on
health care providers, their business associates, and insurance
companies or other payers to maintain the privacy and
condenality of paent informaon.
48
ToolKit for Communies Using Health Data
IRB
Instuonal Review Board. A structure created by the
“Common Rule” for the Protecon of Human Subjects in
Research, that ensures that research involving people meets
legal and ethical requirements. Federal and state laws and
regulaons determine what research must be approved by an
IRB.
Noce
Informaon given to the community or individuals about how
their data may be used.
Protected Health Informaon or PHI
Refers to informaon about an individual that is subject to
the HIPAA Privacy Rule. PHI receives specic legal protecons
under the HIPAA Privacy Rule.
Stewardship
Health data stewardship is a responsibility, guided by principles
and pracces, to ensure the knowledgeable and appropriate
use of data derived from individuals’ personal health
informaon.
User of Community Health Data
Enty within a community that collects, manipulates, stores,
analyzes, or disseminates data to improve the health of a
community, or of subgroups or individual members of the
community.
49
Appendix B: Federal and State Laws
Appendix B: Federal and State Laws
Many federal and state laws and regulaons could aect
community level data use, but two sets of federal regulaons
are most likely to aect local eorts to use data. Because there
are 50 states with 50 sets of laws that may aect data use, the
Toolkit does not address state law, but data users should learn
about the laws in their jurisdicon.
The Department of Health and Human Services (HHS)
regulaons on the Protecon of Human Subjects are found in
the U.S. Code of Federal Regulaons, Title 45, Part 46 (45 CFR
46). These regulaons govern human subjects research across a
range of sengs, including research done by universies, state
and local governments, and nonprot organizaons. Research
acvies covered under 45 CFR 46 must be approved by an
Instuonal Review Board (IRB). This Toolkit provides guidance
to data users to help them determine if data use requires IRB
oversight.
The Health Insurance Portability and Accountability Act (HIPAA)
Privacy Rule may also apply to enes disclosing data when
communies are seeking access to health data, and it may be
useful to understand the dues and limitaons of the enes
from which communies want to obtain data.
This Toolkit does not give data users everything they need
to know about HIPAA or human subject protecon rules and
regulaons. Instead, the goal is to alert data users to situaons
when they need to ask for further guidance from aorneys,
or compliance experts to ensure that data use complies with
major federal regulaons that govern health data use and data
collecon eorts.
General Principles
Although the HIPAA Privacy Rule and rules governing human
subjects research may not apply to community-level use of
data to improve health, the underlying principles of these
laws and regulaons can be instrucve to data users. These
laws and regulaons were developed to respond to concerns
about perceived and actual harm resulng from data use in
the past. If a data user nds itself in an ungoverned area,
he or she should think about the types of protecons and
inquiry required of data protecon and sharing under the
HIPAA Privacy Rule and human subject protecon laws and
regulaons. These protecons may somemes put limits on
data sharing that would be unduly burdensome when using
data to promote community health; they may also be less
restricve than some communies would want when the risk of
harm to small groups or individuals is very high.
50
ToolKit for Communies Using Health Data
How the Regulatory Structure of Data Can Allow
Community User Access to Data
By learning how data are regulated, communies may be more
eecve in accessing data needed to promote community
health. For example, a community that understands which data
are regulated by HIPAA may be more condent in reaching out
to health informaon exchanges or providers to request data.
Similarly, communies may be more willing to engage with
researchers from a local college or university if they understand
the role of Instuonal Review Boards for the Protecon of
Human Subjects in Research. The nal secon of this tool kit
gives community data users an introducon to these systems.
Human Subjects Research
A brief summary of the regulaon at 45 CFR 46, Protecon of
Human Subjects, also known as the Common Rule, is given to
prompt community groups using health data to think about
whether projects must comply with this federal regulaon. The
most authoritave primary source of informaon about federal
human subjects regulaons is found on the website for the
Oce of Human Research Protecons of the U.S. Department
of Health and Human Services: hp://www.hhs.gov/ohrp/
index.html.
Other federal and state laws and regulaons may impose
requirements on data collecon and use. For example, eorts
to test intervenons or collect data in the schools may be
aected by educaon laws and regulaons.
Federal law denes research as “a systemac invesgaon,
including research development, tesng, and evaluaon,
designed to develop or contribute to generalizable knowledge.
A human subject is “a living individual about whom an
invesgator doing research obtains either
data through intervenon or interacon with the
individual , or
idenable private informaon.”
Intervenons include physical procedures, such as collecng a
blood sample, or manipulang the person’s environment—for
example, changing the placement of fruits and vegetables in
a local market as part of a project to measure whether the
change aects the amount of fruits and vegetables purchased.
Interacons include any communicaon or contact between a
data collector and the person, which occurs, for example, when
a data collector interviews a person.
51
Appendix B: Federal and State Laws
Private informaon is informaon about people collected in
a place where the person would expect privacy, such as inside
their home. An observaon of mothers with their children
in a public playground would not be private informaon. But
private informaon does include informaon given by a person
for specic purposes that is expected to remain private (for
example, a medical record). Informaon that a person gives to
a reporter would not be private informaon. If the informaon
is not linked to a specic person who is or may be idened, it
is not considered private informaon under 45 CFR 46.
Systemac invesgaon
A systemac invesgaon is a plan to collect and analyze data
for answering a queson. Systemac invesgaons include:
Medical chart reviews
Surveys and quesonnaires
Interviews and focus groups
Analysis of biological specimens
Epidemiological studies
Psychological or sociological experiments
Analysis of repurposed data
Generalizable knowledge
Data collecon that is “designed to develop or contribute
to generalizable knowledge” includes eorts to set up a
knowledge base that can be applied to other communies.
For example, a community group may want to inuence
policy about school nutrion. They design a project where
their members interview students across a random sample
of schools across the city about their food choices in school
cafeterias. They expect that the results can be presented to the
news media, that they might be used to change laws on student
nutrion, and that they might be presented at a naonal
conference. This project would likely be considered research.
Some acvies are usually not considered research:
Biographies or oral histories documenng past events
Employee or student evaluaons
Data or evaluaon collected for use internal to an
organizaon that will not be shared with the public
Quality improvement acvies that will not be shared with
the public
52
ToolKit for Communies Using Health Data
An IRB may need to review a proposed project to ensure that
these acvies are not research under 45 CFR 46.
Next Steps
Community data users who determine that a project is or may
be research with human subjects should consult an IRB or
compliance ocer to determine what they must do to comply
with any laws and regulaons governing their project.  
Is a project “human subjects research”?
yes no
Likely will be considered human subjects
research under 45 CFR 46
yes no
yes no
yes no
yes no
living individual?
collecting data through intervention or
interaction with individual
identifiable private information
systematic investigations?
seeking generalizable knowledge?
Likely not human
subject research
under 45 CFR 46
53
Appendix B: Federal and State Laws
Health Insurance Portability and Accountability Act
(HIPAA) Privacy Rule
A brief summary of the HIPAA Privacy Rule is provided to
prompt community groups using health data to consider
whether they may be covered and to understand the dues of
enes providing data to them
1
.
Health data users should know:
Individuals and organizaons covered by the HIPAA Privacy
Rule
Informaon protected by the HIPAA Privacy Rule
Disclosures of informaon allowed by the HIPAA Privacy
Rule
Nocaon that must be given to individuals whose data
are being shared
1 A comprehensive, authoritave summary of the HIPAA Privacy Rule may be
obtained from the Oce for Civil Rights, U.S. Department of Health and Human
Services, Summary of the HIPAA Privacy Rule, at hp://www.hhs.gov/ocr/
privacy/hipaa/understanding/summary/. The full text of the HIPAA Privacy Rule
can be found at 45 CFR Part 160 and Subparts A and E of Part 164.
Who is covered by HIPAA?
What is covered by HIPAA?
What disclosures does HIPAA permit?
Covered entities Others
Health plans
Health care providers billing
electronic
Health care clearinghouses
Business associates are subject to
some parts of the rule and to
terms of an agreement
Researchers may be affected
by HIPAA under a data use
agreement (DUA)
No authorization required
Data sets
For treatment, payment, and
health care
Business associates
Public health activities
For research when approved by Privacy
Board preparatory to research
Limited data set with a DUA
After removal of 18 specific identifiers
Certified by a statistical expert that
re-identification is unlikely
Protected health information (PHI)
Identifies an individual’s: -
Physical or mental health -
Health care received
- Payment for health care
When authorized by the patient
54
ToolKit for Communies Using Health Data
An enty asking to use data from a HIPAA-covered enty
(broadly speaking, health care providers, insurers, and health
care clearinghouses) may need more informaon than is
included in this Toolkit.
HIPAA Privacy Rule and Research
The Privacy Rule species when a covered enty may share
an individual’s data without an authorizaon for release
from the paent. The following is provided to help data users
understand the limitaons on data sharing by covered enes.
The covered enty is allowed to share paent data only when
doing so complies with the HIPAA Privacy Rule. The Privacy
Rule addresses access to protected health informaon, not
human subjects research; projects using protected health
informaon from covered enes are governed by the Privacy
Rule and regulaons protecng human subjects in research.
De-identified Data
Researchers may be able to access de-idened paent data
from a covered enty. The Privacy Rule does not restrict
the use or disclosure of de-idened data, but there is no
requirement that a covered enty disclose de-idened data.
Data are considered de-idened if the 18 ideners listed
below are excluded from the data used for research and the
covered enty does not know that remaining informaon can
be used to idenfy the individual, or if a qualied stascian
determines that the data are de-idened.
55
Appendix B: Federal and State Laws
Privacy Rule De-idened Data Elements
To create a de-idened data set from HIPAA-protected health
informaon, a covered enty must remove the following
ideners:
*Ideners marked with an asterisk may be included in a
limited data set.
Names
*Geographic subdivisions
smaller than a state
*Dates
Telephone numbers
Fax numbers
E-mail addresses
Social Ssecurity numbers
Medical record numbers
Health plan beneciary
numbers
Account numbers
Cercate/license numbers
Vehicle ideners and serial
numbers, including license
plate numbers
Device ideners and serial
numbers
Web universal resource
locators (URLs)
Internet protocol (IP) address
numbers
Biometric ideners, including
ngerprints and voiceprints
Full-face photographic images
and any comparable images
Any other unique idenfying
number, characterisc, or
code, unless otherwise
permied by the Privacy Rule
for re-idencaon
56
ToolKit for Communies Using Health Data
Limited Data Set
Recognizing that de-idened data may be needed for research
to improve health, the HIPAA Privacy Rule allows covered
enes to use or share a limited data set. A limited data set
excludes most, but not all, elements excluded in a de-idened
data set. Specically, certain dates and geographic data may
be provided in a limited data set. A covered enty may use or
disclose a limited data set only for research, public health, or
health care operaons. In addion, the covered enty must
have a data use agreement when sharing a limited data set.
Relaonship Between HIPAA Privacy Rule and Protecon
of Human Subjects in Research
Meeng the Privacy Rule’s requirements for receiving health
data from a covered enty does not relieve an organizaon of
meeng requirements imposed on research involving human
subjects. An organizaon planning to use data from a covered
enty should consult with an Instuonal Review Board or
compliance ocer to determine addional requirements that
other federal or state laws or regulaons may impose.
57
Appendix C: Case Studies
Appendix C: Case Studies
eMERGE Network
The eMERGE network is studying the relaonship between
genome-wide genec variaon and common human traits. The
eMERGE network has emphasized privacy and ethical data use.
Members of the eMERGE network have used a variety of
methods for engaging communies in discussions about the
use of individuals’ genec samples. In Phase 1, four of ve
sites used Community Advisory Boards; three of ve sites used
focus groups; and fewer than three used telephone surveys,
consensus panels, deliberave engagement, Web surveys of
dierent populaons, interviews, or newsleers.
Just as dierent network members used dierent ways to
engage the community, they have dierent approaches to
protecng individual privacy and condenality. The eMERGE
network is connuing to work to dene what it means to de-
idenfy biospecimens, biological data, and clinical informaon.
Vanderbilt
Vanderbilts system involved a Web survey of 4,037 individuals
and a Community Advisory Board, set up to ensure that the
community had a voice. Board members worked with members
of the eMERGE network at Vanderbilt and brought informaon
back to the community. It inially consisted of 12 individuals
who represented interests including parenng, church groups,
civic communies, and educaon. Board members were
not expected to have educaonal or genecs backgrounds.
Vanderbilt found community board members to be inquisive
and acve parcipants. They were not passive; instead, they
wanted to know about what the eMERGE network was doing,
and they wanted to give recommendaons.
Vanderbilt also found that community boards alone were not
enough: People in the community needed a specic person to
talk with about the project. That focal person, somemes called
an ombudsman, can explain the organizaon’s accountability
policies and procedures when working with the community and
ensure that concerns reach the right person.
Members of the eMERGE Network have found that community
engagement has been “a lifesaver.
Although Vanderbilts Instuonal Review Board did not
view the project to be “human subjects research” (see Legal),
they added more layers of oversight, including evaluaon
by the universitys Ethics Commiee and three oversight
58
ToolKit for Communies Using Health Data
boards: Ethics, Scienc, and Community Advisory Boards.
Their de-idened repository allows individuals to opt out of
parcipaon. In addion, researchers using eMERGE data must
register each study separately, and alert researchers when their
data use may violate policies and the intent of the persons in
the community whose data are being used.
Sources:
Bradley Malin, Ph.D., Vanderbilt University (tesmony and
correspondence)
eMERGE website: hp://emerge.mc.vanderbilt.edu/
Mayo Clinic
When the Mayo Clinic started biobanking and reuse of
electronic medical records, it adopted a deliberave democracy
model. The model engaged people in the community in open
dialogue for four days. The deliberants were given background
materials on biobanking, biomedical research, and local eorts
at Mayo. They were then given an opportunity to interact
with domain experts, including sciensts involved in genecs
research, as well as privacy advocates.
Parcipants debated the issues and formulated specic
recommendaons about how Mayo should address noce,
consent, and privacy within its biobanking and medical record
reuse system.
Sources:
eMERGE website: hp://emerge.mc.vanderbilt.edu/
McGuire AL, Basford M, Dressler LG, Fullerton SM, Koenig BA, Li
R, McCarty CA, Ramos E, Smith ME, Somkin CP, Waudby C, Wolf
WA, Clayton EW. Ethical and praccal challenges of sharing
data from genome-wide associaon studies: The eMERGE
Consorum experience. Genome Res 2011 21(7):1001-7.
Newborn Blood Spots
Almost every baby born in the United States is screened for
a range of diseases via the taking of a small amount of blood
shortly aer birth. Parents have been rounely told that the
blood spots are used for diagnosis and quality improvement.
Over me, however, researchers realized that the blood spots
could be used for biomedical research that could potenally
benet public and individual health. Ocials in some states
allowed blood spots to be used for research purposes without
rst nofying parents about the repurposing of the blood
spots.
59
Appendix C: Case Studies
When some parents learned that the samples were stored
long aer the blood spots were used to diagnose diseases in
newborns, and were later used for research without consent
or nocaon, they brought lawsuits against states, academic
instuons, and researchers. Although a case in Minnesota was
dismissed, a Texas case was seled aer the pares reached
an agreement to destroy 5.3 million newborn blood spots.
The destroyed samples were potenally a valuable source of
informaon about genec variaon, infecous disease, and
other public health challenges.
The U.S. Department of Health and Human Services engaged
researchers to evaluate parents’ preferences about future use
of newborn blood spots. The researchers reported that most
parents approved of using the samples for research, but they
wanted to be noed of the possible use. Some asked for the
ability to opt out of research.
For samples collected aer April 30, 2010, parents of children
born in Michigan can opt out of research on behalf of their
children, but if they do not opt out, the biological samples
default to an “opt in” status. Michigan BioTrust has created
a website where parents can learn more and complete the
process of opng in or out of research. This website is a
good example of how data users can promote openness,
transparency, and choice.
Sources:
Botkin JR, Goldenberg AJ, Rothwel, E, Anderson RA, Lewis MH.
Retenon and research use of residual newborn screening
bloodspots. Pediatrics. 2013; 131(1):120–7. Available from:
hp://www.ncbi.nlm.nih.gov/pmc/arcles/PMC3529945/.
Michigan Department of Community Health. Biotrust
consent opons. Available from: hp://www.michigan.gov/
mdch/0,1607,7-132-2942_4911_4916_53246-244016--,00.
html.
Community Engagement on the Communitys Terms
In tribal communies, leaders may be older individuals
who may not have “the demeanor that is expected in a
governmental, bureaucrac seng where eciency is highly
valued.” Instead of blocking out a 15-minute period for a
meeng, the leader might say, “If this is important, lets spend
a few days on it.” To eecvely engage a community, data users
may have to ignore how management gurus say a meeng
should be run; instead, “just follow your grandmother's advice:
somemes you just need to listen and not say anything.
60
ToolKit for Communies Using Health Data
Source:
Tesmony of Dr. Phillip Smith, Indian Health Service
Instuonal Review Board, NCVHS Subcommiee on Privacy,
Condenality and Security, April 17, 2012.
A Refugee Communitys Expectaons
One community health promoon project found that people
in some immigrant and refugee communies did not expect
privacy and did not understand how sharing informaon might
cause harm. In the same project, researchers encountered
a clash between the U.S. emphasis on individuals and some
communies’ emphasis on the family unit. Families did not
want the “head of household” represenng the family on a
survey; instead, they wanted the family to complete the survey
as a unit. Although the organizaon’s Instuonal Review
Board found this approach disturbing because it would not
preserve condenality among family members, the board
agreed to proceed by following the preferences of people in
the community.
Source:
Linda Silka, Ph.D., University of Maine (interview and
correspondence)
Taking Neighborhood Health to Heart
Taking Neighborhood Health to Heart (TNH2H) started as a
community-based parcipatory research project involving
diverse urban neighborhoods in Denver, the University of
Colorado Denver, and the Stapleton Foundaon. Funding from
the Naonal Instutes of Health allowed TNH2H to study the
impact of the built and social environment on health and
health disparies among neighborhood residents. Informaon
about the project is available at TNH2H.org.
TNH2H involves community members at every stage. In
addion to involving them in creang the survey, community
members gave informaon that helped develop the surveys,
and people in the community were employed to administer
surveys. The outcomes of the original research project were
shared with neighborhoods. In addion, the community
idened and directed follow-up studies and outcome
disseminaon.
Laws and regulaons do not rounely require the level of
involvement from community members in research that
is found in TNH2H. By going beyond legal requirements of
openness, transparency, and choice, TNH2H earned the trust of
the community and has successfully engaged the community in
61
Appendix C: Case Studies
improving the health of its members.
Source:
Debbi Main, Ph.D., University of Colorado Denver (interview
and correspondence)
PINE Study
The PINE Study is the joint product of collaboraon among the
Chinese Health, Aging, and Policy Program at Rush University,
Northwestern University, and more than 20 community services
organizaons in the Chicago area, including the Chinese
American Service League and Xilin Asian Community Center
as the main community partners. This academic–community
partnership is guided by a community-based parcipatory
research approach. The PINE Study was designed to idenfy
aconable health policy concerns among a populaon of
individuals whose preferences and service needs are poorly
understood. Older Chinese adults are hard to reach because
they tend to distrust programs run by the federal government,
due to the Chinese community’s past experience with harsh
violence and decimaon. The issue is further compounded by
vast cultural and linguisc barriers.
During 2011–2013, the PINE Study carried out face-to-face
interviews with 3,159 community-dwelling older adults from 60
to 105 years old. The mullingual sta interviewed parcipants
based on their preferred languages and dialects, including
English, Cantonese, Taishanese, Mandarin, or Teochew dialects.
Data were collected using Web-based soware that recorded
simultaneously in English, Chinese tradional, and simplied
characters. Due to the careful planning and community
engagement, the response rate was 91%.
The result of the eort was The PINE Report, a comprehensive
study that examined the health and well-being of Chinese older
adults in the greater Chicago area—the largest cohort of older
Chinese adults ever assembled for epidemiological research
in Western countries. The report revealed that individuals in
this populaon are aected by medical comorbidies, physical
disabilies, low health care ulizaon rates, psychological
distress, social isolaon, and elder abuse at higher rates than
the average older adult in the United States. Many experience
low acculturaon levels, nancial hardship, and insucient
social support. The PINE Report idened opportunies for
family members, community stakeholders, health professionals,
and policymakers to improve the health and well-being of older
Chinese adults.
62
ToolKit for Communies Using Health Data
Source:
Dong XQ, Chang ES, Wong E, Wong B, Skarupski KA, Simon MA.
Assessing the health needs of Chinese older adults: Findings
from a community-based parcipatory research study in
Chicago's Chinatown. J Aging Res 2011 2010:1–12.
MyHealth Access
MyHealth Access Network is a nonprot coalion of more
than 200 organizaons in northeastern Oklahoma, with
a goal of improving health care quality and the health of
area residents while controlling costs. The organizaon was
chartered to facilitate communicaons and connecons among
parcipants in the health care systems. MyHealth does not
directly provide care, but gives those that do the technology,
informaon, communicaons, and analycs to support
improved care quality and reduced costs (see hp://www.
myhealthaccessnetwork.net/).
MyHealth Access Network engaged the community in a 100-
day planning process that involved 200–300 people. At the
beginning, parcipants agreed to focus on the objecves
of health improvement and quality. They recognized that a
primary focus on privacy and security, without starng by
dening the return on investment, would scule any eort to
share and use health data to improve health.
A subset of task forces was formed to address specic issues,
including content, clinical, privacy and security, and costs.
The recommendaons and ndings from these groups were
reviewed by top-level governance to create a plan.
Throughout the process, facilitators refused to allow conict
to become disengagement, which led to the model’s widely
recognized success.
Source:
Interview with David Kendrick, M.D., M.P.H., MyHealth Access
Research on biological samples from members of the
Havasupai Tribe
Members of the Havasupai tribe gave DNA samples to Arizona
State University (ASU) researchers in the early 1990s. The
researchers suggested that the DNA samples might provide
informaon about the tribe’s very high diabetes rates. In the
early 2000s, however, a tribal member heard a presentaon
about the data that addressed migraon, mental health, and
“inbreeding.
The tribe was deeply disturbed that biological samples taken to
63
Appendix C: Case Studies
assist tribal members with a specic health concern were used
in ways that directly challenged beliefs of tribal members, while
also sgmazing all people of the tribe. This shows that harm
is not only caused when personal health data are disclosed (as
in the hospital discharge data set), but when every person in a
small group can be sgmazed.
Aer a lawsuit was led, ASU agreed to a selement to “right
the wrong” in using the data in a way that violated tribal
members’ right to consent.
Source:
American Indian and Alaska Nave Genecs Resource Center
website: hp://genecs.ncai.org/case-study/havasupai-Tribe.
cfm.
64
Appendix D: Worksheet and Checklists
Purpose Specicaon Worksheet
Accountable enty or individual(s) ___________________________________________
Describe the purpose of data use
Describe the role of the community and aected individuals in specifying the purpose of data collecon
or use
Describe data elements needed to achieve the purpose
From what source(s) will you get the data?
Federal public data sets
State public data sets
Medical records
Original survey
Other
Will data be repurposed?
Yes No
Appendix D: Worksheet and Checklists
6
5
ToolKit for Communies Using Health Data
What potenal adverse consequences, if any, do you ancipate:
Risk of breaching individual’s privacy or condenality
Negave impact on community
Sgmazaon of individuals or small groups
Describe plans to lessen possible adverse consequences (e.g., noce, data protecon, community
consultaon)
Describe possible future use/repurposing
Describe procedures for considering, and limits on, unplanned use
Describe how to evaluate the need to consider addional consent when repurposing data
Data Quality and Integrity Checklist
Data Collecon
Accountable individual/enty:________________________________________
Describe the plan for community engagement in the data collecon process.
66
Are either original or repurposed data collected following acceptable data collecon and use pracces?
If the organizaon lacks experse in data collecon best pracces, look for outside assistance
from a researcher, health care provider, state health department, or other organizaon with
experse in data collecon and entry
Sample is representave of populaon of interest
Data collecon procedures set up and documented before data collecon
Training for those engaged in data collecon
Require those collecng data to sign condenality agreements
Audit data collecon processes
Training for those entering data (if a separate process)
Audit data entry processes
Repurposed Data
Data source is trustworthy
Merging Data Sets
Accountable individual/enty: ________________________________________
Are the populaons the same for the dierent data collecon eorts?
Do survey quesons and response categories match?
Might dierences in survey administraon dates aect survey results?
What were the survey sample designs?
Describe methods to be used when merging data sets.
Data Analysis
Accountable individual/enty: ________________________________________
Describe valid methods for analyzing qualitave or quantave data, or idenfy the individual or enty
that will do the analysis
Appendix D: Worksheet and Checklists
6
7
ToolKit for Communies Using Health Data
Reporng Results
Accountable individual/enty: ________________________________________
Describe how reported results will protect communies, subgroups, or individuals from bias or sgma.
Describe protecons to ensure accurate reporng of results.
Data Security
Accountable individual/enty: ________________________________________
Idenfy ways to protect data integrity/security
Encrypt personally idenable informaon on mobile devices
Create a de-idened data set
Use valid methods if producing a de-idened data set
Limit password-protected access to idenable data to those with a need to know
Limit the ability to delete, add, or change data to those with appropriate training and need
Store paper records with idenable informaon in a dierent place from records that do not
contain ideners
Openness, Transparency, and Choice
Accountable enty or individual(s): ___________________________________________
Describe community engagement in the data collecon process
68
Determine the appropriate level of disclosure
Community noce (describe)
Small group noce (describe)
Individual noce (describe)
Create a feedback loop with parcipants/community to report ndings and recommendaons
(describe)
Data Use Agreement Checklist
Data use agreements designed to limit re-idencaon of de-idened data should, at a minimum,
address the following elements:
Dene the scope of data use
Require recipient to use safeguards to prevent use or disclosure not allowed in the scope of the
agreement
Require recipient to report to the data source any use or disclosure not allowed in the scope of
the agreement
Require recipients agents, such as subcontractors, that receive the data to agree to the same
restricons and condions that apply to the recipient
Require the recipient to agree to refrain from idenfying or contacng individuals whose health
informaon is contained in the shared data set.
Dene scheduled monitoring by data source and/or assurances by data recipient conrming that
terms of the agreement are being honored
Specify consequences of the data recipient’s failure to comply with terms of the agreement
Specify who bears the cost of enforcing the agreement if the data recipient is alleged to violate
the agreement
Appendix D: Worksheet and Checklists
69
ToolKit for Communies Using Health Data
If you are being asked to sign a data use agreement in order to receive data, nd out:
What laws or regulaons, if any, govern the data sharing and what the laws or regulaons
require of you as a recipient of data
What the document allows you to do and not do with the data
How does the document dene the scope of use?
Limits on aempts to re-idenfy or contact individuals associated with the data
Who can see or work with the data, inside or outside of the organizaon
Can you provide physical or technological safeguards that must be in place to secure the data
under the agreement?
Can you meet requirements to audit data use or track access to data?
What are your dues if there is a breach of the agreement
Reporng? To whom?
How will you address any allegaon that you or your agents have breached the agreement?
Limited Data Set Checklist
If receiving a limited data set (LDS) from a covered enty, an organizaon should conrm that the data
use agreement includes the following elements
1
:
Idenes the receiving organizaon as the recipient of the LDS
States that the LDS will be used only for research, public health, or health care operaons
Describes the purpose for using the LDS
LDS recipient agrees to refrain from using or disclosing the LDS for any purpose not specied in
the agreement
LDS recipient agrees to use appropriate safeguards to prevent use or disclosure not specied in
the data use agreement
LDS recipient agrees to report LDS use or disclosure of LDS not specied in the DUA
LDS recipient agrees that its agents, such as subcontractors, that receive the LDS agree to the
same restricons and condions that apply to the LDS recipient
LDS recipient agrees to refrain from idenfying or contacng individuals whose health
informaon is contained in the LDS
1 The University of Wisconsin has compiled “HIPAA Privacy and Security Rule Policies and Procedures,” including limited data set infor-
maon as noted above. The compilaon is available from: hp://hipaa.wisc.edu/hipaa-policies.htm.
70
NCVHS Membership, September 2014
Larry A. Green, M.D., Chair
John J. Burke, M.B.A, MSPharm.*
Raj Chanderraj, M.D., F.A.C.C.
Bruce B. Cohen, Ph.D.
Llewellyn J. Cornelius, Ph.D.
Leslie Pickering Francis, J.D., Ph.D. *Subcommiee Co-Chair
Alexandra Goss
Linda L. Kloss, M.A., RHIA *Subcommiee Co-Chair
Vickie M. Mays, Ph.D., M.S.P.H.*
Sallie Milam, J.D., CIPP, CIPP/G*
Len Nichols, Ph.D.
W. Ob Soonthornsima
William W. Stead, M.D.
Walter G. Suarez, M.D., M.P.H.*
James M. Walker, M.D., FACP
* Member of the Privacy, Condenality and Security
Subcommiee
Lead Staff for the Subcommittee
Maya A. Bernstein, J.D. ASPE
Execuve Sta Director
James Scanlon
Deputy Assistant Secretary,
Oce of Science and Data Policy
Oce of the Assistant Secretary for Planning and Evaluaon,
DHHS, ASPE
Acng Execuve Secretary
Debbie M. Jackson, M.A.
Senior Program Analyst
Classicaons and Public Health Data Standards Sta,
Office of the Director
National Center for Health Statistics, CDC
This report was written by NCVHS Consultant Writer Maureen Henry,
in collaboration with NCVHS members and staff.
The Naonal Commiee on
Vital and Health Stascs
(NCVHS) is the statutory [42
U.S.C. 242k(k)] public advisory
body to the Secretary of Health
and Human Services (HHS) for
health data and stascs. The
Commiee provides advice and
assistance to the Department and
serves as a forum for interacon
with interested private-sector
groups on a variety of key health
data issues. The Commiee is
composed of 18 members from
the private sector who have
disnguished themselves in
the elds of health stascs,
electronic interchange of health
care informaon, privacy and
security of electronic informaon,
populaon-based public
health, purchasing or nancing
health care services, integrated
computerized health informaon
systems, health services
research, consumer interests in
health informaon, health data
standards, epidemiology, and
the provision of health services.
Sixteen of these members are
appointed by the HHS Secretary
to terms of four years each,
with about four new members
appointed each year. Two
addional members are selected
by Congress.
For more informaon, see the
NCVHS website:
hp://www.ncvhs.hhs.gov/
National Committee on Vital and Health Statistics