E-health enables the sharing of patient-related data whenever and wherever necessary. Electronic health records (EHRs) promise to improve communication between health care providers, thus leading to better quality of patients’ treatment and reduced costs. However, as highly sensitive patient information provides a promising goal for attackers and is also frequently demanded by insurance companies and employers, there is increasing social and political pressure regarding the prevention of health data misuse. This work addresses this problem and introduces a methodology that protects health records from unauthorized access and lets the patient as data owner decide who the authorized persons are, i.e., who the patient discloses her health information to. Therefore, the methodology prevents data disclosure that negatively influences the patient’s life (e.g., by being denied health insurance or employment). The framework provides health care providers with a unique solution that guarantees data privacy (e.g., according to HIPAA) and allows primary and secondary use of the data at the same time. The security analysis showed that the methodology is secure and protected against common intruder scenarios.
“In today’s health care system, the availability of reliable information has a tremendous impact on decisions regarding the patients’ care and, as a result, on the quality of treatment and patients’ health. Over the past years, electronic health records (EHRs) have been introduced as a method for improving communication between health care providers and access to data and documentation, potentially leading to better clinical and service quality (cf. ). The EHR promises the reduction of adverse drug events, which are estimated to account for about $175 billion a year in the US , and a reduction of the very high number of more than 150,000 cases of deaths related to adverse drug reactions each year in the US  as it provides physicians and their health care teams with decision support systems and guidelines for drug interactions. The EHR could achieve massive savings with the digitizing of the results of diagnostic tests and images. A study by the Rand Corporation found that adopting the EHR could result in more than $81 billion in annual savings in the US if 90% of the health care providers used it . However, the electronic storage of health data raises considerable privacy concerns. In fact, the discussion of privacy is one of the fundamental issues in health care today and is often seen as a trade-off between the patient’s requirement for privacy and the society’s needs for improving efficiency and reducing costs in the health care system. With informative and interconnected health-related data comes highly sensitive and personal information. Due to the high sensitivity of the data, there is increasing social and political pressure to prevent the misuse of health data. It is the fundamental right of every citizen to demand privacy, because the disclosure of medical data can cause serious problems for the patient. Insurance companies or employers could use the information to deny health coverage or employment. The disclosure of sensitive data, such as a history of substance abuse or HIV infection, could result in discrimination or harassment. In addition to social and political pressure, legal acts demand the protection of health data. The Health Insurance Portability and Accountability Act (HIPAA)  demands the protection of patients’ data that is shared from its original source of collection. In the EU the processing and movement of personal data has been legally regulated by the EU with Directive 95/46/EC . A citizen’s right to privacy is also recognized in Article 8  of the European Convention for the Protection of Human Rights and Fundamental Freedoms. In order to protect patients’ privacy when using, transferring and storing medical records, a variety of privacy enhancing technologies (cf.  for a definition) have been proposed. However, existing approaches often (i) do not comply with the current legal requirements (cf. [4,7,8,9,10]), (ii) do not fulfil basic security requirements (cf. [11,12,13]), and (iii) are not suitable for use with clinical studies (cf. Section 2). This work presents the pseudonymization methodology PIPE (Pseudonymization of Information for Privacy in e-Health). PIPE is used for decoupling the medical data from the patient-identifying data as well as restoring the link for authorized parties, while the actual medical records are maintained and accessed by external (health) applications. The pseudonymization methodology is based on cryptographic operations and, therefore, uses a server-side hardware security module (HSM, cf. ), a specially protected piece of hardware, for the execution of cryptographic operations, which ensures that the encryption and decryption operations are executed within a secure environment and that no secret key is present outside the HSM in plaintext at any time. Unlike other HSM applications that rely on the device as both a specially secured environment for encryption and decryption operations and a secure keystore, in PIPE the HSM is employed as trusted cryptographic processor only. This research uses a combination of conceptual-analytical, artifact-building and artifact-evaluating research approaches. The article starts with a detailed exploration of existing pseudonymization protection mechanisms, such as encryption, anonymization and pseudonymization, by comparing and analyzing related work (conceptual-analytical approach). Based on these results and the identified shortcomings, a pseudonymization methodology is defined and evaluated by means of a threat analysis. Finally, the research results are validated with the design and implementation of a prototype (artifact building and artifact evaluation).”
“Protection of the patients’ privacy can be achieved with two different techniques, anonymization and encryption, which unfortunately both suffer frommajor drawbacks: While anonymization – the removal of the identifier from the medical data – cannot be reversed and therefore prevents primary use of the records by health care providers who obviously need to know the corresponding patient (as a minor point, patients cannot benefit from the results gained in clinical studies because they cannot be informed about newfindings), encryption of the medical records prevents them from being used for clinical research (secondary use) without the explicit permission of the patient, who has to decrypt the data and, in doing so, reveals her identity. Considering that somemedical records tend to be very large (up to hundreds of MB ), encryption could also be a very time-consuming operation . A method that resolves these issues is pseudonymization, where identification data is transformed and then replaced by a specifier that cannot be associated with the identification data without knowing a certain secret [17,12,18]. Pseudonymization allows the data to be associated with a patient only under specified and controlled circumstances. A pseudonymized database must contain at least two tables, one where all the personal data is stored, and one where the pseudonyms and the pseudonymized data are stored. The process of identifying and separating personal from other data is called depersonalization (cf. ). After depersonalization and subsequent pseudonymization, a direct association between individuals and their data can only be established under strictly defined circumstances.”
2.1. Pseudonymization “However, existing pseudonymization approaches and systems have a variety of shortcomings: Thielscher et al. (cf. ) developed a system consisting of two databases, one for the patient’s personal identification data and the other for the medical data. While the datasets are stored in a decoupled way, the relationship between the patient and her medical records can be restored with secret keys stored on a smart card. The secret keys generate unique data identification codes, which are also stored in the database and do not contain any patient-identifying information. Authorizations are granted by sharing these codes between the patient and health care providers where these authorizations are valid for only a certain period of time. As a fallback mechanism in case a patient loses her smart card, a centralized patient pseudonym list is maintained, as otherwise there would be no way to recover the identifier. This centralized patient pseudonym list could be the target of intrusion attacks. Thielscher et al. circumvent this security flaw by operating the list offline. This organizational work-around seems to promise a higher level of security until an insider attack is conducted (cf. [21,22,23]) or an attacker gains physical access to the computer that holds the list. The approaches developed by Pommerening (cf. [24,25]) are only applicable for secondary use, pseudonymizing the data for transport, and rely on a combination of hashing and encryption techniques. The different approaches apply to different scenarios: (i) Beginning with medical data from overlapping sources for one-way secondary use, a unique patient identifier is replaced by a pseudonym generated by a one-time encryption operation conducted at a pseudonymization service provided by a trusted third party. Furthermore, the medical data is encrypted with the public key of the secondary user to ensure one-time secondary use. (ii) To extend this scenario with the possibility of re-identification, the one-way pseudonymization is replaced by a reversible encryption. An important prerequisite for this scenario is that the unique patient identifier is not available publicly, but generated specifically for the project in question by a second trusted third-party service. (iii) The final model involves multiple secondary users in a research network, where a central research database is introduced containing the medical data and the unique patient identifier, again generated by a trusted third-party service. Each secondary user accesses the medical data using the pseudonymization service. The drawback of the Pommerening approaches are the trusted third party services. While they provide security against external attacks, an insider, perhaps bribed by an attacker, could still abuse the service. El Kalam et al.  propose a very similar approach that involves hashing and encryption for the secondary use of pseudonymized data. In this case, a unique patient identifier, which may not be publicly known, is used for deriving an anonymized identifier. Unlike in the Pommerening approaches, this patient identifier is not generated by a trusted third party but within the secured environment of a smart card. The identifier also stays within the smart card and should never be disclosed. Furthermore, a unique project identifier is required as input, so that the anonymized identifier is the result of a one-way hashing procedure with the concatenation of the unique patient identifier stored on the smart card and the unique project identifier as input. That way, the patient has to explicitly consent to each secondary use of her data in each individual project. In order to prevent attacks where an external adversary tries to link data held by two different hospitals by knowing the fingerprint (hash value) of a certain patient/project and gaining unauthorized access to the database of the hospital where the project is conducted, the anonymized identifier is encrypted with a secret key only known to the hospital. The corresponding decryption key is only known to the project. Although this concept provides protection against external attackers, a malicious insider can still link the medical data to the corresponding patient by knowing the fingerprint. Another issue is the smart card as generator and storage provider for the unique patient key. As this key never leaves the smart card, its uniqueness cannot be ensured. Furthermore, lacking a suitable backup mechanism for the unique identifier, the medical data cannot be tracked back to the patient if she loses the smart card. Noumeir et al.  describe the pseudonymization of radiology data encoded as DICOM  files for secondary use. The unique patient identification numbers in the DICOM images are replaced with pseudonyms and the files are stored in a separate research database. Here, the authors distinguish between two kinds of pseudonyms: irreversible one-way pseudonyms (i.e., anonymization) and reversible pseudonyms. One-way pseudonyms are generated by processing the patient identification number with a hashing algorithm. As hashing is prone to collisions, the authors propose to additionally hash the patient’s medical history. In order to prevent dictionary attacks, the technique of salting, i.e., concatenating a random value to the input, can be applied. Alternatively, hash-based message authentication code techniques requiring a secret key may be used. Reversible pseudonyms also involve a secret key for the encryption of the patient identification number. Noumeir et al. propose the application of the DES algorithm using 64 bits of input, which is especially suitable for the DICOM patient identification field that is constrained to a maximum of 64 characters. Although the authors state that the secret keys need to be protected, no suggestion is made concerning how to protect the keys. The approach developed by Peterson  involves the use of multiple encryption keys and three data tables to make personal medical data available without compromising the patient’s privacy. During registration, the patient is issued a unique global key (GK) and a server side key (SSID). Furthermore, she has to provide a unique personal encryption key (PEK) and a password. This information is stored in the user table. The security table contains the reference to the user (SSID), a server side encryption key (SSEK) and the reference to the personal data in the personal data table. The personal data is doubly encrypted with the PEK and the SSEK. Data retrieval requires the knowledge of either the PEK or the GK (can be printed on an ID card), which is transferred to the server, which looks up the referenced SSID. With the SSID, the personal data record can be retrieved and decrypted with the corresponding keys. Data addition, deletion, and modification require the knowledge of the password in addition to the PEK or GK. As a fallback mechanism in case of a lost smart card, the patient can log in with the PEK and password and is issued a new GK, which can be printed on a new ID card, invalidating the old GK. This approach comes with some serious drawbacks: As all keys needed for decrypting the medical data are stored in the database, an attacker gaining access to the database could decrypt all information. Even more importantly, as the password and the keys are stored in the database, the attacker could change data stored in the database. Furthermore, the PEK poses a security flaw as the user is required to enter new PEKs during registration until a unique one has been found. This mechanism allows a possible attacker to immediately gain access to the medical data related to this key. The architecture for the realization of the German Electronic Health Card (EHC) proposed by the Fraunhofer Institute for Software and System Engineering [30,31] and supported by the German Federal Ministry of Health is designed as a service oriented architecture divided into five layers realizing different service applications, such as emergency data, electronic prescription, or electronic health record retrieval. Access to the medical data stored in a virtual file system is handled with the ticket toolkit concept involving hybrid encryption: The medical data is encrypted with a unique session key, and this key, in turn, is encrypted with the public key of the authorized user. The asymmetric keypair is stored on the health card. The ticket toolkit contains a ticket builder, a ticket verifier, access policy information and an encrypted link to the file (or directory) in the virtual file system. Each file and directory is assigned a default ticket toolkit and a number of private ticket toolkits defined by the patient for authorized users. If no private ticket toolkit is available for the user requesting a certain file, the system returns the default ticket toolkit based on a challenge that the data requestor has to solve in order to gain access rights. As the asymmetric keypair is stored on the health card, which is prone to loss or damage, a second private ticket toolkit is generated for each entry relying on a second asymmetric keypair stored on an emergency card (e.g., a relative’s card). This second keypair can then be used to decrypt the session keys in order to re-key them with the new public key stored on the new health card. While the ticket concept ensures confidentiality, fully encrypting the medical data is time-consuming, especially for large medical images, as well as unsuitable for secondary use. Stingl et al.  also rely on the encryption of medical data in order to realize a centralized and secure health data repository. Similarly to the EHC concept, the actual medical data is encrypted with a symmetric key and the patient shares this key with her trusted associates, encrypted with the public key of their individual asymmetric keypair representing their data access authorizations as clearance packages. The documents are stored in a hierarchical manner and the data repository is pseudonymized by obscuring the master/detail relationship by encryption. Thus, only those who are in possession of the correct key are able to establish the link between patient and health data. The scheme also specifically distinguishes between data creator, owner, access grantor, and grantee. Like the EHC, this approach suffers from the problem of fully encrypted medical data. In [33,34], Stingl and Slamanig adapt this concept to be used in a pseudonymized e-health portal where the same roles are applied and the medical data is encrypted in a hybrid manner. In addition, the e-health portal concept introduces an identity management system where each user is in possession of multiple sub-identities, one publicly known and several private sub-identities. Each sub-identity is related to an asymmetric keypair stored on a smart card and accessed by individual PINs. The public subidentity is used for data access authorizations (involving the public sub-identity of the access grantor and grantee, creator, and patient) where the public sub-identity of the receiver is then replaced by a private one. The publicly known subidentity and the corresponding public key, however, open the architecture to impersonation attacks, e.g., for sending fake medical documents.”
2.2. Hardware security module “The cryptographic operations required in the proposed pseudonymization procedures are best provided by a dedicated and secure hardware-based cryptographic system, i.e., a hardware security module (HSM) . An HSM is an encapsulated and tamper-resistant hardware module that is designed to withstand logical as well as physical attacks [35,36]. The protection measures range from solid metal casings to special switches that zeroize the memory when tampering is detected . HSMs provide standardized interfaces to communicate with their host computers such as PKCS#11  and serve as secure keystore and cryptographic processors, often also providing secure key generation. HSMs are deployed in multiple application areas where security is of utmost importance: Anderson et al.  identify the application of HSMs in automated teller machines for PIN management including PIN acquisition/verification/generation, in electronic payment schemes as an integral part of the back-end systems at banks processing the transactions, or in military applications as encryption and decryption modules for highly sensitive communication or as nuclear command and control tools. Wherry  recognizes the need for an HSM in public key infrastructures (PKIs) to protect the most important entities in PKIs, the cryptographic keys. Lorch et al.  utilize an HSM as a secured online credential repository in a grid PKI. Rössler et al.  apply the HSM to e-voting as an electronic ballot box. In this scheme, the voter receives a ballot and a voting token containing her voting ID after authentication with a signature card (smart card). The ballot is then encrypted and transferred to the election authority where the votes are decrypted only in the HSM during the counting procedure. The private decryption key is only available within the HSM so that the vote cannot be decrypted insecurely outside the HSM. Baldwin and Shiu extend the concept of HSMs to not only provide a secure keystore and secure cryptographic operations, but to provide complete security services, resulting in hardware security appliances (HSAs) as demonstrated in  to secure Web services. Another example of securing Web services is given by Mont et al. in , while Baldwin and Shiu apply HSAs to enhancing trust relationships in an outsourcing scenario of a merchant’s online presence . In , Ferreira et al. propose the application of an HSA in an accountability model suitable for health care environments where the HSA encapsulates the accountability service responsible for recording events and providing access for retrieving the recorded events only for authorized roles.”
 S. Märkle, K. Köchy, R. Tschirley, H.U. Lemke, The PREPaRe system—patient oriented access to the personal electronic medical record, in: Proceedings of the 17th International Congress and Exhibition on Computer Assisted Radiology and Surgery, ser. International Congress Series, no. 1256, 2001, pp. 849–854.
 F.R. Ernst, A.J. Grizzle, Drug-related morbidity and mortality: updating the cost-of-illness model, Journal of the American Pharmacists Association 41 (2) (2001) 192–199.
 United States Department of Health Human Service, HIPAA administrative simplification: enforcement; final rule, Federal Register/Rules and Regulations 71 (32) (2006).
 European Union, Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, Official Journal of the European Communities L 281 (1995) 31–50.
 Council of Europe, European Convention on Human Rights, Martinus Nijhoff Publishers, 1987.
 S. Fischer-Hübner, IT-Security and Privacy: Design and Use of Privacy-Enhancing Security Mechanisms, Springer, 2001.
 S. Hinde, Privacy legislation: a comparison of the US and European approaches, Computers and Security 22 (5) (2003) 378–387.
 G. Hornung, C.F.-J. Goetz, A.J.W. Goldschmidt, Die künftige Telematik-Rahmenarchitektur im Gesundheitswesen, Wirtschaftsinformatik 47 (2005) 171–179.
 U.S. Department of Health & Human Services Office for Civil Rights, “Summary of the HIPAA Privacy Rule”, 2003.
[Online]. Available: http://www.hhs.gov/ocr/privacy/hipaa/ understanding/summary/.
 U.S. Congress, “Health Insurance Portability and Accountability Act of 1996”, 104th Congress, 1996.
[Online]. Available: http://www.cms.hhs.gov/HIPAAGenInfo/ Downloads/HIPAALaw.pdf.
 T. Schabetsberger, E. Ammenwerth, G. Göbel, G. Lechleitner, R. Penz, R. Vogl, F. Wozak, What are functional requirements of future shared electronic health records? Connecting Medical Informatics and Bio-Informatics (2005) 1070–1075.
 B. Riedl, T. Neubauer, G. Goluch, O. Boehm, G. Reinauer, A. Krumboeck, A secure architecture for the pseudonymization of medical data, in: Proceedings of the Second International Conference on Availability Reliability and Security, 2007, pp. 318–324.
 R.C. Barrows, P.D. Clayton, Privacy, confidentiality, and electronic medical records, Journal of the American Medical Informatics Association 13 (1996) 139–148.
 J Attridgde, An Overview of Hardware Security Modules, SANS Institute, Tech. Rep., 2002.
 J. Montagnat, F. Bellet, H. Benoit-Cattin, V. Breton, L. Brunie, H. Duque, Y. Legré, I.E. Magnin, L. Maigne, S. Miguet, J.M. Pierson, L. Seitz, T. Tweed, Medical images simulation, storage, and processing on the European DataGrid Testbed, Journal of Grid Computing 2 (4) (2004) 387–400.
 R Sharp, Principles of Protocol Design, Springer, 2008.
 A. Pfitzmann, M. Koehntopp, Anonymity, unobservability and pseudeonymity—a proposal for terminology, in: International Workshop on Designing Privacy Enhancing Technologies, Springer-Verlag, Inc., New York, 2001, pp. 1–9.
 K.A. Taipale, Technology, security and privacy: the fear of Frankenstein, the mythology of privacy and the lessons of King Ludd, International Journal of Communications Law & Policy 9 (2004).
 A Rector, J. Rogers, A. Taweel, D. Ingram, D. Kalra, J. Milan, P. Singleton, R. Gaizauskas, M. Hepple, D. Scott, R. Power, Clef—joining up healthcare with clinical and post-genomic research, in: Proceedings of UK e-Science All Hands Meeting, 2003, pp. 203–211.
 C. Thielscher, M. Gottfried, S. Umbreit, F. Boegner, J. Haack, N. Schroeders, Patent: data processing system for patient data, Int. Patent, WO 03/034294 A2, 2005.
 K. Maris, The human factor in information technology, in: Proceedings of Hack.lu, Luxembourg, 2005.
 T. Thornburgh, Social engineering: The “Dark Art”, in: Proceedings of the First Annual ACM Conference on Information Security Curriculum Development, ACM Press, 2004, pp. 133–135.
 M. Bishop, D. Gollmann, J. Hunker, C.W. Probst, Countering insider threats, in: Dagstuhl Seminar Proceedings 08302, 2008.
 K. Pommerening, Medical requirements for data protection, in: Proceedings of IFIP Congress, vol. 2, 1994, pp. 533–540.
 K. Pommerening, M. Reng, Secondary use of the electronic health record via pseudonymisation Medical and Care Compunetics, vol. 1, IOS Press, 2004, pp. 441–446.
 A.A.E. Kalam, Y. Deswarte, G. Trouessin, E. Cordonnier, A generic approach for healthcare data anonymization, 2004.
 R. Noumeir, A. Lemay, J. Lina, Pseudonymization of radiology data for research purposes, 2007.
 Digital Imaging Communications in Medicine, National Electrical Manufacturers Association Std., 2008.
 R.L. Peterson, Patent: encryption system for allowing immediate universal access to medical records while maintaining complete patient control over privacy, US Patent US 2003/0074564 A1, 2003.
 J. Caumanns, Der Patient bleibt Herr seiner Daten: Realisierung des eGK-Berechtigungskonzepts über ein ticketbasiertes, virtuelles Dateisystem, Informatik-Spektrum 29 (5) (2006) 323–331.
 Fraunhofer Institut, Spezifikation der Lösungsarchitektur zur Umsetzung der Anwendungen der elektronischen Gesundheitskarte, March 2005.
 C.D. Stingl, D. Slamanig, D. Rauner-Reithmayer, H. Fischer, Realisierung eines sicheren zentralen Datenrepositories, in: Tagungsband, DACH Security, 2006, pp. 1–15.
 C. Stingl, D. Slamanig, Berechtigungskonzept für ein e-health-portal, in: G. Schreier, D. Hayn, E. Ammenwerth (Eds.), eHealth 2007—Medical Informatics Meets eHealth, no. 227, Österreichische Computer Gesellschaft, 2007, pp. 135–140.
 C. Stingl, D. Slamanig, Privacy aspects of e-health, in: Proceedings of the Third International Conference on Availability, Reliability and Security, 2008, pp. 1226–1233.
 M.K. Bond, Understanding Security APIs, Ph.D. dissertation, University of Cambridge, Computer Laboratory, Emmanuel College, 2004.
 Federal information processing standards publication, “Security requirements for cryptographic modules (Fips pub 140-2),” Institute of Standards and Technology (NIST), Tech. Rep., 05 2001.
 R. Anderson, M. Bond, J. Clulow, S. Skorobogatov, Cryptographic Processors—A Survey, University of Cambridge, Computer Laboratory Tech. Rep., 2005.
 PKCS#11 v2.20: Cryptographic Token Interface Standard, RSA Laboratories Std., 2004.
 D.C. Wherry, Secure Your Public Key Infrastructure with Hardware Security Modules, SANS Institute, Tech. Rep., 2003.
 M. Lorch, J. Basney, D. Kafura, A hardware-secured credential repository for grid PKIs, in: Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2004.
 T. Rössler, H. Leithold, R. Posch, E-voting: a scalable approach using XML and hardware security modules, in: Proceedings of the 2005 IEEE International Conference on e-Technology, e-commerce and e-Service EFF’05, 2005.
 A. Baldwin, S. Shiu, Hardware encapsulation of security services, in: 8th European Symposium on Research in Computer Security (ESORICS), ser. LNCS, vol. 2808, Springer, Berlin/Heidelberg, 2003.
 M. Casassa-Mont, A. Baldwin, J. Pato, Secure Hardware-based Distributed Authorisation Underpinning a Web Service Framework, Trusted Systems Laboratory, HP Laboratories Bristol, Tech. Rep., 2003.
 A. Baldwin, S. Shiu, Hardware security appliances for trust, in: First International Conference on First International Conference, ser. LNCS, vol. 2692, Springer, 2003.
 A. Ferreira, S. Shiu, A. Baldwin, Towards accountability for electronic patient records, in: Proceedings of the 16th IEEE Symposium on Computer-Based Medical Systems (CBMS’03), 2003, pp. 189–194.
 A. Shamir, How to share a secret, Communications of the ACM 22 (11) (1979) 612–613.
 T. Jurgensen, S. Guthery, Smart Cards: The Developer’s Toolkit, Pearson Education, Inc. Prentice Hall PTR, Upper Saddle River, 2002.
 B. Holcombe, Government Smart Card Handbook, U.S. General Services Administration (GSA), 2004.
 International Statistical Classification of Diseases and Related Health Problems (ICD), World Health Organization Std., 2007.
 R.H. Dolin, L. Alschuler, C. Beebe, The HL7 clinical document architecture, Journal of the American Medical Informatics Association 8 (6) (2001) 552–569.
 A. Avizienis, J.-C. Laprie, B. Randell, C. Landwehr, Basic concepts and taxonomy of dependable and secure computing, IEEE Transactions on Dependable and Secure Computing 1 (1) (2004) 11–33.
Neubauer, Thomas and Heurix, Johannes: A methodology for the pseudonymization of medical data, International Journal of Medical Informatics , Volume 80 , Issue 3 , 190 – 204