An LR framework incorporating sensitivity analysis to model multiple direct and secondary transfer events on skin surface

Bayesian logistic regression is used to model the probability of DNA recovery following direct and secondary transfer and persistence over a 24 hour period between deposition and sample collection. Sub-source level likelihood ratios provided the raw data for activity-level analysis. Probabilities of secondary transfer are typically low, and there are challenges with small data-sets with low numbers of positive observations. However, the persistence of DNA over time can be modelled by a single logistic regression for both direct and secondary transfer, except that the time since deposition must be compensated by an offset value for the latter. This simplifies the analysis. Probabilities are used to inform an activity-level Bayesian Network that takes account of alternative propositions e.g. time of assault and time of social activities. The model is extended in order to take account of multiple contacts between person of interest and ‘ victim’. Variables taken into account include probabilities of direct and secondary transfer, along with background DNA from unknown individuals. The logistic regression analysis is Bayesian -for each analysis, 4000 separate simulations were carried out. Quantile assignments enable calculation of a plausible range of probabilities and sensitivity analysis is used to describe the corresponding variation of LRs that occur when modelled by the Bayesian network. It is noted that there is need for consistent experimental design, and analysis, to facilitate inter-laboratory comparisons. Appropriate recommendations are made. The open-source program written in R-code ALTRaP (Activity Level, Transfer, Recovery and Persistence) enables analysis of complex multiple transfer propositions that are commonplace in cases-work e.g. between those who cohabit. A number of case examples are provided. ALTRaP can be used to replicate the results and can easily be modified to incorporate different sets of data and variables.


Secondary transfer
A minimum one hour after washing hands, participant A and B shook 140 hands for 30 seconds and then immediately stroked their own upper arm 141 on a pre-marked square 10 × 5cm, ten times with medium pressure. The If there are two contributors: With three contributors: 177 LR φ = P r(E|H p : P OI + V + U 1 ) P r(E|H d : The sub-source likelihood ratio is referred to as LR φ whereas the activity 178 likelihood ratio is LR a ; the H p proposition of LR φ is conditioned as true 179 when activity level is addressed [1]. that previously described -the formulae and notation underlying the nodes 236 are the same as described in supplement 1 of [1] except that the probability 237 of recovering background DNA (b) is included. The sub-activity nodes are 238 generalised: "X and Y had social contact", "X assaulted Y" and "Unknown 239 person assaulted Y" (Supplement S1, Fig. S1). The BN will be adapted To inform the BN model outlined above, it is necessary to assign probabili-245 ties of DNA persistence and recovery, following secondary and direct transfer.

246
To do this, a series of experimental data (supplement S7, S8) were generated 247 as described in section 2, where the sub-source likelihood ratio (log 10 LR φ ) 248 was compared to the time difference between deposition and sampling (in 249 hours). There is an expectation that longer time difference between two 250 events will result in less DNA that can be recovered from the POI and this 251 will be reflected in reduced probabilities of secondary and direct transfer.

252
Inspection of scatter-plots of the data (Fig. 1) showed the following: x is the decision threshold value. From the data, the range of sub-source 283 LR φ s extends from < 1 to 10 20 . It is convenient to express x in log 10 scale, 284 to be consistent with log 10 LR φ . respectively: P r(T |h = 0, x = 3) = 0.61 and P r(T |h = 0, x = 6) = 0.52.

296
The persistence and recovery of DNA reduces over time so that at 24 hours 297 the probability assignments are in the region of P r ≈ 0.01.

298
Summary statistics are shown in Then the secondary transfer logistic function was calculated from the 363 direct transfer logistic function, using the same coefficients, but time of de-364 position is adjusted to (h + f x ) (Fig. 3).
In Fig. 3, the x-axis for the secondary transfer logistic function is rescaled  The offset calculation was programmed into ALTRaP as follows:

371
For a given value of x, the corresponding P r(S|h, x) value was calculated 372 from the Pareto distribution ( Fig. S2) of secondary transfer data.

373
The probability P r(T |h, x) was calculated using the β 0 and β 1 coeffi-374 cients, based on posterior medians, from the logistic regression model and hand side of eq. 5 was determined by the exact method described in supple-378 ment S5. An alternative method is 'optimisation' using function 'optimizer' 379 from R package (stats). This was used in sensitivity analysis described in 380 section 14.6.
Both P r(S) and P r(T ) are defined by the same β 0 and β 1 logistic regression 382 coefficients calculated from a given x value and P r(S) is assigned: This calculation was carried out for each of the 4000 pairs of logistic     sumes independence between S and T given h: The theory is outlined in Fig. 4  The victim had showered in the morning at 8am, but did not wash hands 460 or shower before the samples were collected. Before Mr. Y was apprehended, 461 he had fully washed and showered, however. x were compared with the effect of n = 1−4 contacts per hour (     Table 4: Activity level likelihood ratios from logistic regression where logistic regression decision threshold x = 3 and n = 1. All times (h) are measured from the time the sample was collected e.g. in the first row, the assault occurred six hours before sampling, and social contact at eight hours beforehand.      DNA. This is also accommodated in our framework and a sensitivity analysis is described in section 14.7. The framework provided here allows for multiple direct and secondary 732 transfer events and includes simulation of cohabitation (Fig. 4). In casework,  There will always be uncertainty relating to case-circumstances. This is 757 why it is necessary to carry out exploratory analysis of reasonable assump-  Table 7. The scientist provides the information and the caveats. There is reliance 834 placed upon the lawyers and the judiciary to ensure that the necessary steps 835 Figure 7: A total of 4000 log 10 LR a s simulated from logistic regression coefficients using T h=6 , S h=18 , n = 1, x = 3. The quantiles of P r(S) shown on the y−axis are based upon analysis of 1000 bootstraps of the secondary transfer data (h = 0, 1). For each value of logistic regression decision threshold (x), a density (violin) plot is shown in white. Superimposed is a box-whiskers plot in green, and behind, the blue rectangle delimits 0.05 and 0.95 quantiles, whereas the red rectangle delimits 0.025 and 0.975 quantiles.

Multiple contacts and cohabitation
are followed to establish that agreement does indeed exist between the rele-   Table 7: Comparison of log 10 LR a s where S h=18 , T h=6 , varying probability of background P r(B). There is no effect when the 'POI only' is recovered, along with prevalent DNA from 'known' individuals. The LR is always lower when background is present. The LR a increases as P r(B) increases.
reiterate Champod [25] "the landscape is complicated"; it is pertinent to ask  The question remains how to report the evidence? Either a range of LR a s 902 could be reported or some lower limit corresponding to to a percentile (e.g. is that it provides a framework to promote discussion -either new data or 931 new parameters can be explored to determine the effect upon the LR a . This 932 is particularly useful when the experimental data-size is limited. The plots 933 illustrated in Fig. 6 should also be made available for disclosure. It is neither 934 realistic nor necessary for models to be 'perfect'. Over time, as new infor-935 mation is gained, models will evolve to take account. An outline of the generalised Bayesian Network is shown (Fig. S1, based 1176 upon Gill et al. [1], supplement 1, where a detailed explanation can be found 1177 for further details. Figure S1: Generalised Bayesian Network used to evaluate the DNA results programmed into ALTRaP, given the activity level propositions summarising prosecution and defence's view of events. Nodes have been coloured so that black represents propositions, blue represents the sub-activities: social contact is modelled as common ground (P r = 1 for both Hp and H d ); "X assaulted Y" is proposed by H p (P r = 1); "AO assaulted Y" is proposed by H d (P r = 1). Yellow represents transfer and accumulation probabilities, grey represents a background node and red represents the results node. First layer of yellow nodes employ the word 'transfer' in a broad sense -fully written: ". . . transfer from X, persistence and recovery from Y. . . ." Conditioning that DNA is recovered from an unknown contributor(U d ) is 1180 present under H d , this could come either from an unknown assailant or as a 1181 result of background (from one or more contributors) or both: Where t is the probability of direct transfer, persistence and recovery

1191
Under H p , The probability of obtaining a DNA profile (E) only con-1192 tributed by the POI, after direct transfer, with no background becomes: The probability of obtaining a DNA profile (E) contributed by the POI, with 1194 background from one or more unknowns: is an alternative assailant -hence the absence of an unknown contributor 1206 is because of concurrent no background transfer and no assailant transfer  success-events and can be calculated as: where (  The proportion of data (k) where log 10 LR φ > 0.1, were excluded from the Parameters were calculated as: α = 4.43, β = 5.78. Data from h ∈ 0, 1 1275 were combined, hence the mean value h = 0.5 was estimated.

1279
The exponential distribution (λ = 0.598) returned similar results to Figure S2: Pareto distribution analysis using 'fitdistrplus', for DNA recovery following secondary transfer at times T |h = 0, 1, before rescaling with k = 0.67 P r(log 10 LR > x

1282
The package rstanarm was used to prepare logistic regression models for

1286
In R, the model is generated by the following command line: Where 'dat' is an array of the Time vs. log 10 LR φ data for a given thresh-1291 old value x.

1292
For the output, 4000 randomly generated Intercept and Time coefficients 1293 were generated (Table S3) where β 0 and β 1 are the Intercept and Time coefficients respectively.

1298
These probabilities are input into the BN calculations described in sec-1299 tion S2, using the logistic function, and the median value is calculated (quan-1300 tiles can also be derived).    The Pareto distribution defines probability of secondary transfer eq S8: 1320 P r(log 10 LR φ ≥ x) = (1 − k) β β + x α (S10) The probability of direct transfer is defined by the logistic regression in 1321 eq S9: 1322 P r(T ) = 1 1 + e −(β 0 +β 1 h) (S11) where β 0 and β 1 are the Intercept and Time coefficients respectively. 1323 We seek to find (from eq: 5): 1324 P r(T h=fx |x) = P r(S h=0 |x) (S12) Where f x is the offset value, which we need to determine from the follow- found on the pin of a hand-grenade and a statement was provided at trial:

1335
"the conclusion was that the mixed DNA result was 1 billion times more likely if the DNA came from the appellant and two unknown and unrelated persons".
At the original trial this evidence was sufficient for conviction.