Rick McCoy recently commented that data in a medical blockchain or indeed in any medical big data system is only useful if it is accurate. He went on to question whether the patient forms were accurate? That’s a really good point and I agreed that I should write an article to discuss the issues around it.
It’s really exemplified by the expression GarbageIn/GarbageOut which was used as a teaching mantra by George Fuechsel working on the IBM 305 RAMAC who in turn may have extracted it from US military use. It was later to evolve into the aphorism GIGO and it applies today to blockchain and big data just as much as it did to computing then.
The concept, is, of course much older and is recorded when Charles Babbage presented the design for his “difference engine” to England’s Parliament. Recounting an early encounter with some members of that body, he wrote: “On two occasions I have been asked,— ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.” The honourable members of that august body had actually anticipated one of the important factors in data science, data quality.
Our ICO Pre-Sale begins tonight! You can sign-up via the link on our Home-Page which will be published at 00:00 GMT tonight. Join now and receive up to 50% bonus tokens! https://t.co/qfQmg6aEbT #ICO #TokenSale #MediChain
— MediChain (@MediChainOnline) February 4, 2018
There are a number of ways we handle this in MediChain, but here are the first four which give a flavor of the approach.
(1) Diagnostic and prescription data is definitive at a particular level. It is the actual raw data and the actual treatment prescribed. The choice of the latter may, of course, be subject to discussion but the database says exactly what was prescribed. The only inaccuracies that can occur in those cases are if the wrong drug is given out in the dispensary or the wrong identifiers attached to diagnostic records. MediChain actually gives the means of cross-checking that in the future to allow another level of veracity.
(2) The clinician’s treatment notes should be kept at a professional level of accuracy. Errors will occur but if they are put into the EMR at the time of consultation these should be minimal. Some offices may use transcription which forms a possible additional level of errors which it is important to spot whether or not a blockchain is involved. If there are consistent errors we should be able to alert to allow an improvement of the standard of record taking in the physicians record and therefore the standard of care.
(3) Different fields of the receptionist and patients self-entered notes have measurable statistical discrepancies that result in measurable error levels. ZIP codes and patient age and gender can be cross-checked, height and weight will be entered by a nurse in many cases etc. More complex data entered by patients will be marked as such and would typically be excluded from high veracity statistical trials. (for the answer to the question why ZIP codes consider the first serious serious epidemiological study used location data to save lives).
(4) Every set of data should be given a veracity index – i.e. the probability that a piece of data collected in that particular way, in that particular location, (perhaps by that particular doctor and hospital) is actually accurate. We would use AI to determine this. However, it is important for uptake and acceptance that this is not considered as a trap or a test, but rather a help and training aid for the physicians and hospitals.
— MediChain (@MediChainOnline) February 3, 2018