Feature Article

I2b2 and the MDHR: New tools for rapid, easy access to clinical data for health outcome assessment and research
December 2012 Issue

William Adams, MD
Director, BU-CTSI Clinical Research Informatics
Authors has nothing to disclose with regards to commercial support.



Are you aware that BUMC has two data repositories of patients’ clinical data from Boston Medical Center (BMC) and its affiliated Community Health Centers (CHCs) that can be used by researchers to perform self-service, aggregate data research queries? For more details about how these repositories might assist you with your research, read on.

What is i2b2?

i2b2 is an acronym that stands for “Informatics for Integrating Biology and the Bedside.” i2b2 is both an NIH-funded project within the National Center for Biomedical Computing (NCBC) and the name given to the scalable, open-source informatics framework and architecture that is a core component of the program.  The i2b2 framework provides: 1) a standardized way to organize clinical data; and 2) software tools to allow non-technical researchers to perform queries of clinical data.

The Boston University data repository that is called BU-i2b2 is the core resource of the Boston University Clinical and Translational Research Institute (BU-CTSI). The repository itself is made up of data from the Boston Medical Center (BMC) Clinical Data Warehouse (BMC-CDW) and is built from data extracted from two BMC clinical data systems: the Centricity Electronic Medical Record (EMR) and SDK (the BMC scheduling and billing system).  The data in i2b2 are processed by non-research (technical) staff from the BMC-CDW and technical Business Associates via automated database scripts. The data in BU-i2b2 are de-identified and are updated every other month.   The system is accessible via a website within the BMC intranet (behind the BMC firewall).

Background and Purpose

There is a pressing need and opportunity to use the massive data stored within clinical information systems to better understand health and health care, especially for high-risk populations. Until recently, gaining access to large amounts of clinical data has been difficult, expensive, time consuming, and frequently involved the exchange of protected health information (PHI). One main purpose of i2b2 is to function as an institutional data repository to facilitate self-service queries by BUMC researchers to: 1) increase the pace of hypothesis generations; 2) speed the preparation of grants and new proposals; and 3) support population-based health services research. All of these activities are achieved within the i2b2 framework while strictly protecting patient privacy and confidentiality.

BU-i2b2 (http://www.i2b2.org) utilizes an open-source informatics architecture and consists of two major pieces. The first is the back-end infrastructure (the “Hive”) which is a collection of interoperable software modules (“Cells”) that manage things like security, access rights, and the underlying data repository. The second piece is a software application suite of query and mining tools that allows users to ask questions about the data (the “Workbench”). This system was developed within The Partner’s HealthCare system at Massachusetts General Hospital (MGH). It serves as the architecture for their Research Patient Data Registry (RPDR).  Since then, i2b2 has been adopted by more than 36 other academic medical centers.

I2b2 is a unique and important resource for three reasons: 1) It is powerful, effective, and secure; 2) The software is open-source; and 3) The system offers a standard way of storing data and representing clinical knowledge and concepts, thus supporting research query interoperability. Designed around cohort identification to facilitate translational research, i2b2 allows researchers to query large patient populations to identify small subsets based on certain inclusion and exclusion criteria. The i2b2 architecture can accommodate data from a wide range of clinical data systems. Data from each source system are extracted, transformed, and loaded (ETL) into a common data framework and, during this process, is also linked to standard reference codes to achieve a standards-based clinical system.

What kind of information is available from BU-i2b2?

BU-i2b2 currently includes data about patients such as:  medications, problems, visit/discharge diagnoses, visits, demographics, and clinical observations (over 5,000 clinical data types including labs, vital signs, answers to questions, billing codes, procedures, etc.). As time progresses, other sources will be identified and added from the BMC-CDW. Currently, approximately 1.5 million individuals are represented within the Centricity and SDK databases.  The plan is to continue to add individuals to the i2b2 repository at a rate of approximately 8,000 to 10,000 new individuals per month.

To create the BU-i2b2 data repository, Limited Data Sets are generated within the BMC-CDW, which is the core data resource for BMC clinical data reporting and management. The limited data sets are generated in the i2b2 data format and then transferred out of the BMC-CDW and into the BU-i2b2 data repository. The BU-i2b2 repository is located outside the BMC-CDW but within the BMC Data Center behind a secure firewall.

How are the data protected?

The BMC-CDW maintains a copy of the unique Registry ID for each patient medical record. This Registry ID is associated with the each patient’s medical record number in the "BU-i2b2 Link Table" so that the BU-i2b2 dataset can be updated periodically. The linking table is encrypted and stored within the BMC-CDW. Only three members of the BMC-CDW or Business Associate Technical Teams (non-researchers) have the key to decrypt the Link Table. This approach reduces the risk for breaches in confidentiality since, with each update, a much smaller amount of identifiable data needs to be extracted. All PHI resides within the BMC-CDW and is managed by members of the BMC-CDW or a technical Business Associate. The BU-i2b2 team, led by William Adams, MD, manages the BU-i2b2 servers and users. BU-i2b2 data will be gathered and updated on an ongoing, cumulative, long-term basis, with no pre-defined end date.

In i2b2, the 18 HIPAA identifiers (e.g., name, street address, telephone, fax numbers, email addresses, web URL address, IP addresses, Social Security numbers, medical record numbers, health plan beneficiary number, any account number, certificate number, license number, vehicle identifiers, serial numbers, and device identifiers) have been stripped from the data.  The only information related to these identifiers that is retained in i2b2 is census tract or zip code and dates. The census tract/zip code data are only available in the aggregate form.  In order to protect patient privacy, the software will not allow an aggregate query to be run where the result contains a cell size of less than 6 individuals. In addition, the date of birth for each subject is adjusted to a random day within the previous or following month of birth, and all other dates for that subject are adjusted by the same number of days (for example, a person born on 2/15/1965 could be assigned a birth date of 1/31/1965). All other dates for that individual would then be adjusted by the number of days between the new birth date and the actual birth date (-15). In this way, the actual day and month of birth will never be stored in i2b2, and it will not be possible to determine the day or month that any event occurred. However, the timing (including seasonality) and intervals between all events are preserved.


A second related data repository is the “Massachusetts Health Disparities Repository” (MHDR) which uses the same software platform as i2b2 but includes data from community health centers and selected insurance plans.  The MHDR utilizes clinical data from adult and pediatric patients who have had a visit at either BMC or one of its affiliated Community Health Centers (CHCs) since January 1, 2000. The MHDR includes all problems, diagnoses, allergies, visit dates (and processes and procedures performed during visits, and results from these, if applicable), medications, as well as claims data from the BMC Health Plan. Like BU-i2b2, the data are only identified by an internal Repository ID and census tract. Keys to the identities of these patients do not reside within the MHDR and are not accessible to recipient investigators. An additional feature of the MHDR is that all data for an individual are linked across the care continuum (health center, ER, hospital) so that a wide variety of population health research questions can be asked.

Like BU-i2b2, the MHDR is hosted within the BMC-CDW, and data are managed in the same way as BU-i2b2.  An important difference between BU-i2b2 and MHDR is how data access is provided and what types of data are available.  Unlike BU-i2b2, the MHDR is accessible only via a secure research workspace, and approval is required for all projects as well as approval from the Boston Healthnet Research Subcommittee.  However, with these approvals and signed data use agreements, researchers can access raw data extracts from the MHDR database and also use a specialized i2b2 component called the “Health Outcome Monitoring and Evaluation (HOME Cell)” to assess a vast array of potential longitudinal health outcome and population health research questions.  The differences between BU-i2b2 and the MHDR are shown in the figure below:



IRB Approval for use of i2b2 by recipient investigators

The Federal regulations define a human subject as, “a living individual about whom an investigator…conducting research obtains 1) data through intervention or interaction with the individual, or 2) identifiable private information.  Based on this definition, the BUMC IRB has determined that aggregate data released from i2b2 and MHDR do not meet the definition of “human subjects”.  Therefore, BUMC investigators who obtain aggregate data from i2b2 and MDHR, in accordance with the “Investigators Agreement” (discussed below) are not engaged in human subjects research and do NOT have to submit an IRB application /obtain IRB approval for these projects.  


Because a small portion of the data provided (census tract/zip codes and dates) are created from HIPAA identifiers, the data obtained by investigators from MHDR represents a HIPAA Limited Data Set.  As such, in order to obtain the data from MHDR, recipient investigators must sign a Data Use Agreement.


What do recipient investigators do?

Recipient investigators who wish to obtain aggregate (non-patient level) data from BU-i2b2 or BU-MHDR for research proposals, including prep to research and quality assurance activities, need only to complete a few simple steps.

  • For BU-i2b2, access is permitted after signing the BU-i2b2 Aggregate Research User Agreement.  A HIPAA Data Use Agreement for Limited Data Set (LDS) is NOT required for the BU-i2b2 data.
  • For BU-MHDR (HealthNet–i2b2) data, you will need to contact William Adams, MD (badams@bu.edu) and will need to discuss the project with regard to suitability and feasibility.  The best projects are ones where independent grant support seeking is planned to share support of the MHDR going forward.  For data access to be allowed, researchers will need to sign a combined user agreement and Data Use Agreement (for LDS).  Dr. Adams is currently working with legal counsel and the CHC Executive Directors to combine these two agreements into one document. The plan is to have this available in January.  Please contact Dr. Adams for additional details.
  •  At this time, to obtain MHDR data, you will also need a Project Summary signed by each CHC Executive Director and the Boston Healthnet Research Subcommittee. For more information, contact Judi Henderson at phone: 617-638-6903.

Research Workspace servers have been set up within the BMC Data Center.  As part of the MHDR Agreement, users agree to never remove patient-level data from the Research Workspace and to follow the rules and responsibilities related to accessing and using a limited data set.  Only aggregate data outputs and analysis may be removed.  In this way, no actual data from the limited datasets used for research will exist outside the workspace and can be easily destroyed at the end of the study period.

Recipients will then be able to use the i2b2 and /or MHDR Workbench query tools to access the data for their research. Each researcher will be given a unique user name and password. Using the i2b2 Workbench, researchers will drag-and-drop search terms into a Venn diagram-like interface and execute data queries that return aggregate numbers of patients meeting the specified criteria. The results of these queries are numbers (also called "counts). Researchers can view BU-i2b2 and MHDR data only via aggregate reports, and the reports are further restricted to include only counts with cell size greater than or equal to 5 subjects. 

If investigators need to obtain individual patient level data from i2b2 or MHDR to conduct their research, then a separate IRB protocol must be submitted and BUMC IRB approval must be obtained before access to that data can be allowed.



This article provides an overview of the BU-i2b2 and BU-MHDR repositories.  It also provides information about how recipient investigators can obtain data from these repositories to conduct certain queries without having to go through the process of obtaining IRB approval.

Editor’s Note:

In addition to i2b2 and MHDR, researchers can also obtain clinical data (both identified and de-identified) from the BMC Clinical Data Warehouse through the services of the Research Data Warehouse and its manager, Linda Rosen.  With IRB approval of what data can be obtained, researchers provide that approved list to Ms. Rosen who accesses the database, collects the data, and provides them to the researcher.   These services are offered through the Office of Clinical Research and the Clinical and Translational Science Institute.  More information can be found on the OCR website:  http://www.bumc.bu.edu/ocr/clinical-research-clinical-warehouse-data-access/


This Quiz applies to the next recertification period from July 1, 2013 to June 30, 2015, but we recommend that you take this quiz now so you can stay up-to-date.

Click here, close this window, and login to My Account if you are
a BUMC researcher and would like to take the quiz now.

Close Window