This page contains a Flash digital edition of a book.
Data access is important
It is a common misconception that once names
Optimizing access to high quality data is critical to
and certain other direct identifiers (address, tele-
informed decision making and is the principal justi-
phone number) that lead directly to a person have
fication for their collection and dissemination.
been removed from a body of information, the
The assurance of confidentiality is a primary
remaining data may be judged safe and can be
concern in considering what scope and extent of
shared without risk of compromising the privacy
access to personal information will be granted.
of the data providers. Statisticians and members
of other professions, however, have demonstrated
Statisticians can play an important role
repeatedly that modern computational technology
Statisticians can play an important role in ensur-
and the widespread availability of personal informa-
ing that both goals are met, and need to work with
tion on the Internet can render information quite
data users, data producers, and data custodians to
as revealing as though the names had not been
accomplish these goals.
removed. That is, a seemingly anonymous body of
The sharing and dissemination of information
data could be rendered identifiable.
gathered under a pledge of confidentiality must be
subject to rigorous statistical scrutiny to ensure con-
Techniques for Protecting
sistency with the confidentiality pledges.
Confidentiality
The profession of statistics has developed the
Taking the aforementioned facts into account,
requisite tools to help with the appropriate treat-
statistical scientists and agencies responsible for
ment of confidential information. Additionally, the
developing and distributing data have developed a
profession is actively engaged in research to further
variety of counter measures to de-identify statisti-
refine these tools and to develop means to make
cal databases to block efforts to manipulate them
useful information available for public policy and
to disclose personal information. Strategies for pre-
scientific advancement.
venting unauthorized and inappropriate disclosure
of identifiable information generally involve some
The Assessment of Risk in
combination of modification of data content and
Statistical Data
restriction of data access. The first strategy involves
The assessment of risk depends on the way in which
some loss of information detail and the second,
the information is produced. Until fairly recently,
while permitting access to more complete data to
the production of information for dissemination
qualified users, limits who, under what conditions,
to the public relied principally on printed, tabular
and for what purpose they may be used. Thus the
data. Statisticians have long been sensitive, there-
selection of a strategy involves a careful consider-
fore, to the potential risk of disclosure in such data.
ation of the interests of legitimate data users while
Although tables are intended to protect individual
strictly adhering to confidentiality protections
information by presenting grouped figures, there
promised to the subjects of the data; by selecting
are situations in which the size and/or the distribu-
among a variety of strategies a satisfactory resolu-
tion of those groups can reveal more information
tion can often be found. Alternatives that have been
about individuals or business establishments than
considered include:
had been publicly known. Modifying the values of information
In contrast to tabular data, which are presented items to maintain statistical quality but
in aggregate form, the information contained in avoid disclosures. One such strategy is to
micro-data is disaggregated. The information con- blur or disguise the data in such a way
tained is specific to the individual. An electronic that individual data items cannot be
micro-data file may contain many thousands of data uniquely associated with or attributed to
records, each referring to a separate person. This for- a particular person or establishment.
mat permits the researcher to specify with exactitude
the kind of questions that can be addressed and to
Distributing synthetic data sets whose variables
utilize much more powerful analytic tools. This very
have the same statistical distributions and
advantage, however, carries with it the possibility
relationships as the original data from which
of identifying one or more respondents—and the
they are derived but containing no actual
more detailed the information, the more individual
information from the original data. Partially
records become distinct from each other, making
synthetic files are another way to avoid disclo-
study participants easier to identify.
sures while keeping the bulk of the data intact.
8 AMSTAT NEWS FEBRUARY 2009
Page 1  |  Page 2  |  Page 3  |  Page 4  |  Page 5  |  Page 6  |  Page 7  |  Page 8  |  Page 9  |  Page 10  |  Page 11  |  Page 12  |  Page 13  |  Page 14  |  Page 15  |  Page 16  |  Page 17  |  Page 18  |  Page 19  |  Page 20  |  Page 21  |  Page 22  |  Page 23  |  Page 24  |  Page 25  |  Page 26  |  Page 27  |  Page 28  |  Page 29  |  Page 30  |  Page 31  |  Page 32  |  Page 33  |  Page 34  |  Page 35  |  Page 36  |  Page 37  |  Page 38  |  Page 39  |  Page 40  |  Page 41  |  Page 42  |  Page 43  |  Page 44  |  Page 45  |  Page 46  |  Page 47  |  Page 48  |  Page 49  |  Page 50  |  Page 51  |  Page 52  |  Page 53  |  Page 54  |  Page 55  |  Page 56  |  Page 57  |  Page 58  |  Page 59  |  Page 60  |  Page 61  |  Page 62  |  Page 63  |  Page 64  |  Page 65  |  Page 66  |  Page 67  |  Page 68  |  Page 69  |  Page 70  |  Page 71  |  Page 72  |  Page 73  |  Page 74  |  Page 75  |  Page 76  |  Page 77  |  Page 78  |  Page 79  |  Page 80  |  Page 81  |  Page 82  |  Page 83  |  Page 84
Produced with Yudu - www.yudu.com