Sunday, November 17, 2013

De-identification and Informed Consent in Clinical Trials

Monday, October 7, 2013

The future of CDISC CT:s

A poll posted by Lex Jansen (@lexjansen) in the LinkedIN group for CDISC (Clinical Data Interchange Standards Consortium) triggered me to write down some thoughts on the future of CDISC's so called Controlled Terminologies (CT:s):

When you import CDISC Controlled Terminology from NCI EVS at, which format do you use?
  (Excel, Text, ODM XML, or OWL/RDF)

My vote goes to the formats with the best potential for the future, that is the formats serializing RDF modeled data e.g. turtle, json-ld, n-triples, and xml (See the blog post: Understanding RDF serialisation formats)

Today's RDF version

The recently published OWL/RDF version of the CT:s (serialized in xml) uses the first version of the CDISC2RDF schema 1) implementing the model behind the existing export of a limit part of  the content in NCI Thesaurus (NCIt). 

It is modeled to support today's use of the CT:s only as text strings to populate variables in CDISC defined data sets (e.g. SDTM domains) with submission values.That is, it provide study specific clarity making it easy for humans to read the clinical data and metadata.

Next RDF version

Based on very useful discussions with the terminology expert Julie James (LinkedIn profile) working for HL7, IMI EHR4CR and FDA/PhuSE Metadata definition project, these are my thoughts for the next RDF version:

To provide cross study semantic interoperability making it easy for machines to directly integrate and query clinical data and metadata across health care and clinical research we need an enhanced model.

That is, a model that fully leverage the content in NCIt. And address the issues people have experienced when using the CT:s in attempts to implement them in BRIDG / ISO21090. Using the insights from the IMI EHR4CR project and from the development of the IHE DEX profile (Data Element Exchange).

I think there is also an opportunity to leverage the work on binding value sets to data elements part of the HL7 FHIR (Fast Healthcare Interoperability Resources) development 2). Julie also pointed me to a new ISO standards: ISO/CD 17583 3) The next version should also apply both the OID (Object identfier) standard and the URI (Uniform Resource Identifier) standard to identify each value set and each value.

1)  CDISC2RDF poster (presented at DILS 2013, Data Integration in Life Science conference) and FDA/PhUSE Semantic Technology project 
3) ISO/CD 17583: Health informatics -- Terminology constraints for coded data elements expressed in (ISO 21090) Harmonized Data Types used in healthcare information interchange.

Friday, September 13, 2013

Justifications of Mappings

A common theme in the Semantic Trilogy events in Montreal this summer (see Semantic Trilogy preparations and Semantic Trilogy report part 1) was mappings such as the mappings provided via the NCBO BioPortal

For example the mappings in the Bioportal expressed as skos:closeMatch are the result of using the LOOM lexical algorithm. Examples of not so good mappings, such as this one, were highlighted:

<NCI Thesaurus: Chairperson (subclass to Person)> 
<Int. Classification for Patient Safety: Chair (subclass to Piece of Furniture)>

One view was: ‘Don’t use them!’ (tweet). Another view was “Give us the justification of the mappings so we can decide when it makes sense to use them.”

Mappings in chemical informatics

When I came back from the Semantic Trilogy and read about mappings, or linksets as they are called, in the new version of the Open PHACTS specification "Dataset Descriptions for the Open Pharmacological Space" I saw some opportunities to make mappings more explicit and hence more useful.

I think the editor, Alasdair Gray (@gray_alasdair), and the whole team of authors, have done a great job on this specification.
"The Dataset Descriptions for the Open Pharmacological Space is a specification for the metadata to described datasets, and the linksets that relate them, to enable their use within the Open PHACTS discovery platform. The specification defines the metadata properties that are expected to describe datasets and linksets; detailing the creation and publication of the dataset."
I especially liked the part on making the justification of mappings explicit. For example, what is the justification behind stating that there is a close match (skos:closeMatch), or exact match (skos:exactMatch), between what is described in two different chemical datasets, such as the RDF datasets sourced from ChemSpider and ChEMBL.

The figure depicts four distinct linksets: two sourced from ChemSpider
depicted in blue which use different link predicates; one sourced from ChEMBL
depicted in red; and one sourced from a third party depicted in green.
My understanding is that for the chemical informatics community the Open PHACTS specification will establish a vocabulary to express the justifications for links/mappings between chemical entities. This enables them to explicitly state justifications such as "Has isotopically unspecified parent" or "Have the same InChI key" (see B.2 Link Justification Vocabulary Terms to also get the URIs for these terms).

Mappings between medical terminologies

Together with members of the EU projects EHR4CR and SALUS, MedDRA MSSO, and W3C HCLS, I am now exploring the idea of establishing a similar approach for the medical terminology community. That is, a vocabulary of terms to express the justifications for different mappings between concepts/terms in terminologies across healthcare and clinical research, such as ICD9, SNOMED CT and MedDRA.

This is part of a broader discussion on the use of terminologies in semantic web focused environments, with formal representations in RDF of both the terminologies themselves and of the mappings between them. Here's an example of a visualization from such a formal representations of MedDRA and SNOMED-CT terms and mappings between them in SKOS/RDF.

The example show the hierarchy of cardiac disorders in both the MedDRA and
SNOMED-CT concept schemes, expressed using the skos:broader property. Mappings between
similar concepts in both concept schemes are stated using the skos:exactMatch property.
SALUS Harmonized Ontology for Post Market Safety Studies

Monday, July 29, 2013

Semantic Trilogy report part 1

It's been two very nice summer weeks of vacation after I got home from a week at the Semantic Trilogy events in Montreal, Qc, Canada. (See my previous blog post: Semantic Trilogy preparations.) Here's the first part of my report from seven intensive days of conferences, tutorials, workshops and great discussions with researchers in biomedical ontologies and data integration in life sciences.

It was very nice to meet colleagues from other pharma companies; Sanofi, UCB and NovoNordisk, and to discuss with early adopters in traditional software vendors, such as Siemens, and with experts from niche vendors, such as IO Informatics. It was also nice to discuss common topics, such the use of semantic web standards and linked data principles on for example, with key individuals such as Olivier Bodenreider, NLM (National Library of Medicine).

During the two main conferences I used Twitter as my note book and in the evenings I gather tweets and related links in two Storify items:
  • ICBO2013
    Storify: 4th Interational Conference on Biomedical Ontology (ICBO), 7-9 July
  • DILS2013
    Storify: 9th Conference on Data Integration in the Life Sciences (DILS), 11-12 July 
My poster
The last evening I presented a CDISC2RDF poster on our joint AstraZeneca and Roche CDISC2RDF project, now part of the FDA/Phuse Semantic Technology working group. I really enjoyed the discussions it triggered.

I'll be back in mid August, after couple of days of trecking in the Swedish mountains, with more details about the papers, presentations and discussions I did find most interesting. (For a first glimpse of two of them see this blog post from HL7 Watch by Barry Smith: An OGMS-Based Model for Clinical Information.) 

Monday, June 24, 2013

Semantic Trilogy preparation

The Swedish Midsummer weekend is over and it's time to look forward. Saturday 6th to Friday 12th of July I'll attend the Semantic Trilogy in Montreal, Qc, Canada.

I plan to attend these events during the week:
In 2011 I, together with three colleagues, attended the ICBO 2011 event (see my three blog post: Preparations part 1 and part 2,  report). So, I look forward to reconnect with people in the OBO (The Open Biological and Biomedical Ontologies) community.

And to meet F2F interesting people in the W3C HCLS (Semantic Web Health Care and Life Sciences Interest Group). And people interested in ontologies and semantic web working for e.g. Sanofi, Novo Nordisk, Mayo Clinic.

I'm also very happy that I'll get the opportunity to attend my third semantic web related event in Canada.
  • In 2007 I attended the WWW2007 conference in wonderful Banff.
"During the WWW2007 conference a breakthrough of the Linked Data idea happened in a session where web experts demonstrated the power of a new generation of the web, a web of data. For us attending the session it was hard to imagine the full potential on what this idea would mean for individual scientists and for a pharmaceutical company." 
From  Linked Data, an opportunity to mitigate complexity in pharmaceutical research and development, Bo Andersson and Kerstin Forsberg, LWDM 2011 

And yes, I do hope to also get some time during the weekend to visit the Jazz Festival.

Tuesday, June 11, 2013

Standards for common aspects

Through the last three years I have been engage with different groups working on standards, both for data exchange, such as CDISC, and for vocabularies such as MedDRA MSSO and NCI EVS. As they now start to see the value of using "standards for standards".

Push Back
From Flickr bitpuddle

Standards for standards

So, "I push back" to standard organisations to use semantic web standards and linked data principles to make their standards directly usable for humans and for machines.

A good example is CDISC and their growing interest in using semantic web standards (based on RDF, Resource Description Framework): CDISC2RDF. For some background see Clinical studies and the road to Linked Data. Today FDA, CDISC, pharma:s, CRO:s and software vendors are working together on this in a FDA working group for Semantic Technology organised by PhUSE.

Standards for common aspects

The last year or so, I have also tried to keep up to date with groups developing RDF-based standards for common aspect such as:
  • data descriptions (VoID)
  • data provenance and versioning (PROV and PAV)
  • concept based vocabularies and value sets (SKOS)
  • multi-dimensional statistical data (RDF Data Cube)
I try to ensure that we have a good view of the maturity and applicability of these standars so we can use them in our internal“integration factory”. But most of all “push back” to vendors. I foresee that we in the same way started to add requirements on web-interfaces for better end user usability back in the late 90:ies, we now should start to add requirements on web-interfaces for better machne usability. So we need to to understand how to incorporate these common aspects in our URS:s, RFI:s RFP:s etc..

For software vendors to use RDF-based standards for common aspects, for example:
  • MediData's Rave and Perceptive's IMPACT to describe datasets using VoID.
  • Accelrys' Pipeline Pilot to use W3C PROV.
  • Microsoft's SharePoint to use term sets for tagging in SKOS.
  • SAS Institute's Drug Development to create analysis results using RDF Data Cube.

So, this interview with Reza B'Far, Vice President of Development, Oracle on the W3C blog made me vryy glad: Oracle on Data on the Web
Oracle to use W3C provenance standard to create a single audit time line across systems
"One of the hugest problems we faced was maintaining transaction audit trails in a heterogeneous environment in a standard and compatible way. Audit trails are described with literally millions of different formats in different organizations. This used to mean it was impossible to create a single audit time line. PROV solves this problem. We now provide (and consume) a PROV feed that unifies the audit trails generated by transactions across heterogeneous systems."
See also the Implementation report with 60+ examples of usage of the W3C Provenance specifications.

For a nice intro to the W3C Provenance Specifications, see the tutorial by Paul Groth (@pgroth) at the Extended (European) Semantic Web conference.

Saturday, May 25, 2013

Three Linked Data meetings in Sweden

I'm back after two nice day in the south of Sweden. Yesterday, 24 May, I attended the first meetup for Linked Data in Malmö.

This was the third Linked Data meeting in Sweden. They have all been great events with more than 30 attendees each. I do hope these will encourage more friends and colleagues In Sweden across academia, industry, consult companies and government to start applying the Linked Data principles and use the stack of Semantic Web standards. 

Links to all three events:
Kudos to Bosse Andersson (@bBalsa), Marie Gustavsson-Friberg (@mariegus)
and Eva Blomqvist (@evabl444) for arranging. I look forward the next one!

Sunday, March 31, 2013

Talking to machines

The last week I remotely followed two events while commuting, two events related to Evidence Based Medicine (EBM), both took place in Oxford:

+Ben Goldacre did speak at both events. At the Cochrane event he talked about getting better in talking to the Public, to Policy makers and to Machines. In the last part of his talk: Talking to Machines he says "That it's odd how we share results of RCTs (Randomised, Controlled Trials) in C19th essay format!" This is also how Cochrane Collaboration share reviews and meta-analyses of clinical trial data.

Structured data in RDF

Instead we should use "C21th structured data standards". I was especially pleased to hear how he was even more explicit: "Publish in RDF a good, quality standard, nice data format" [at 36.50 mins]

See also what the web development director at Cochrane, +Chris Mavergamessay in his excellent presentation on how linked data can help free content from the 'container of the article'.

This is related to our the work we do on linked clinical data standards, see my recent blog post: CDISC2RDF. That is, a semantic web versions of  data standards for clinical data on subject/participant level.

Clinical Data Transparency 

Given the recent move towards clinical data transparency (see a good summary in Nature this week Drug-company data vaults to be opened) I foresee a discussion also on data standards for the summary level data in clinical study reports and per-reviewed papers using semantic web standards.

An alternative could be to represent tables in the reports and paper as RDF using the RDF Data Cube Vocabulary (for multi-dimensional statistical data), see the CSVImport and the CubViz projects (Representing and browsing multi-dimensional statistical data as RDF using the RDF Data Cube Vocabulary, previously called Stats2RDF) This EU/FP7 project has used this vocabulary to publish biomedical statistical data, e.g. the WHO's Global Heath Observatory dataset (see Publishing and Interlinking the Global Health Observatory Dataset).

A challange is to express the clinical trial design and other contextual information as structured data to make it easier to make informed decisions for trial reviews and cross trial analyses.

Tuesday, February 12, 2013


In a recent article from (The Voice of Semantic Web Technology and Linked Data Business) the project CDISC2RDF is nicely decribed: Clinical Studies And The Road To Linked Data.

The project will be presented at the Conference on Semantics in Health Care & Life Sciences (CSHALS) meeting at the end of February by Charlie Mead, co-chair of the W3C’s Health Care and Life Sciences Interest Group (HCLSIG).

Here is a slide deck describing the first deliverable of the project. A refined slide deck will be presented at the CSHALS meeting together with a couple of CDISC2RDF blog post to describe the transformation process.