This page contains a Flash digital edition of a book.
In the first instance, statisticians can follow well available.
2
The increased availability of genomic
established fair information practices to protect pri- data for research, coupled with demonstrations that
vacy, such as collecting only the information that is conventional protective procedures do not com-
needed, articulating the purpose of the information pletely mask the presence of an individual ’s genetic
collection, and providing informed consent. A critical material in certain databases, has led to measures for
element of informed consent is to accurately explain increased security.
what assurances of confidentiality are available. Today we are developing better statistical tools
Statisticians also have a long history of studying that can help guide the proper release of data. First,
ways to protect the confidentiality of data while pro- those tools can help ensure the proper assessment of
viding information to policymakers. The traditional risk. Second, the tools help ensure the proper treat-
way of ensuring confidentiality while disseminating ment of confidential information, so that confiden-
data has been to aggregate information and report it tial facts do not become public knowledge through
in tables. This approach generally acts to mask infor- the apparently harmless release of aggregated data
mation that might specifically identify anyone. or de-identified micro-data. Statisticians, work-
The challenge of safeguarding confidentiality has ing with computer scientists and others, can help
become more difficult for data custodians. Many ensure continued access to research data while pro-
new forms of data on human behavior, such as tecting the privacy of the individuals from whom
video data, biologic samples, or transaction data, are the data came.
not particularly useful to researchers or policy mak- A brief discussion of the statistical resourc-
ers in tabular form. As a result, such “micro-data” es available follows. For further information,
is often disseminated after the information is “de- please contact the chair of the ASA’s Privacy and
identified.” Unfortunately, statistical research shows Confidentiality Committee. This contact informa-
that such de-identification is often insufficient and tion can be obtained from the committee’s web site,
could result in a breach of confidentiality if reiden- or by calling the American Statistical Association,
tification were attempted by an individual with the (703) 684-1221.
right skills, a computer, and access to publicly avail-
able databases.
Background
In one example, a student at the Massachusetts
The ASA recognizes that risk assessment and con-
Institute of Technology showed that 97 percent of
fidentiality protection are not simple matters. It
the names and addresses on the 1997 voting list
believes that statistical techniques are essential to
for Cambridge, Massachusetts were unique using
identifying and preventing potential disclosures and
only zip code and date of birth.
1
The same research
invaluable to resolving them. However routine or
showed that this same information, along with med-
unusual the information to be protected, the statis-
ical insurance claims records of state employees, was
tician can considerably enhance its usefulness while
contained in files made available to researchers by
also protecting privacy. The ASA emphasizes the
the Massachusetts Group Insurance Commission.
following points:
By comparing the two sources, the records of the
Governor of the state were re-identified, even Confidentiality protection is important
though his personal identifiers had been removed
The protection of personal privacy is of paramount
from the insurance records.
importance in the production and distribution of
In another example, geneticists who have made
statistical data.
substantial progress in the mapping of the human
The quality of those data is strongly influenced
genome also have found that there is reason for
by the public’s trust that pledges of confidentiality
caution in making genetic information generally
will be rigorously observed.
1
Sweeney, L. Computational Disclosure Control: A Primer on Data Privacy Protection. Doctoral Dissertation,
Massachusetts Institute of Technology, May, 2001.
2
NIH Background Fact Sheet, August 28, 2008. http://grants.nih.gov/grants/gwas. See also, Homer N, Szelinger
S, Redman M, et al. (2008) Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex
Mixtures Using High-Density SNP Genotyping Microarrays. PLoS Genet 4(8): e1000167. doi:10.1371/journal.
pgen.1000167
FEBRUARY 2009 AMSTAT NEWS 7
Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54  |  Page 55  |  Page 56  |  Page 57  |  Page 58  |  Page 59  |  Page 60  |  Page 61  |  Page 62  |  Page 63  |  Page 64  |  Page 65  |  Page 66  |  Page 67  |  Page 68  |  Page 69  |  Page 70  |  Page 71  |  Page 72  |  Page 73  |  Page 74  |  Page 75  |  Page 76  |  Page 77  |  Page 78  |  Page 79  |  Page 80  |  Page 81  |  Page 82  |  Page 83  |  Page 84
Produced with Yudu - www.yudu.com