Temporal variability of surgical technical skill perception in real robotic surgery

Jason D. Kelly; Michael Nash; Nicholas Heller; Thomas S. Lendvay; Timothy M. Kowalewski

doi:10.1007/s11548-020-02253-5

Temporal variability of surgical technical skill perception in real robotic surgery

Jason D. Kelly, Michael Nash, Nicholas Heller, Thomas S. Lendvay, Timothy M. Kowalewski

Mechanical Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

Purpose: Summary score metrics, either from crowds of non-experts, faculty surgeons or from automated performance metrics, have been trusted as the prevailing method of reporting surgeon technical skill. The aim of this paper is to learn whether there exist significant fluctuations in the technical skill assessments of a surgeon throughout long durations of surgical footage. Methods: A set of 12 videos of robotic surgery cases from common human patient robotic surgeries were used to evaluate the perceived technical skill at each individual minute of the surgical videos, which were originally 12–15 min in length. A linear mixed-effects model for each video was used to compare the ratings of each minute to those from every other minute in order to learn whether a change in scores over time can be detected and reliably measured apart from inter- and intrarater variation. Results: Modeling the change over time of the global evaluative assessment of robotic skills scores significantly contributed to the prediction models for 11 of the 12 surgeons. This demonstrates that measurable changes in technical skill occur over time during robotic surgery. Conclusion: The findings from this research raise questions about the optimal duration of footage needed to be evaluated to arrive at an accurate rating of surgical technical skill for longer procedures. This may imply non-negligible label noise for supervised machine learning approaches. In the future, it may be necessary to report a surgeon’s skill variability in addition to their mean score to have proper knowledge of a surgeon’s overall skill level.

Original language	English (US)
Pages (from-to)	2101-2107
Number of pages	7
Journal	International Journal of Computer Assisted Radiology and Surgery
Volume	15
Issue number	12
DOIs	https://doi.org/10.1007/s11548-020-02253-5
State	Published - Dec 2020

Bibliographical note

Publisher Copyright:
© 2020, CARS.

Keywords

Bias
Crowd sourcing
Surgical technical skill
Video segmentation

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

10.1007/s11548-020-02253-5

OpenUrl availability

Full text

Cite this

@article{228cc8aa1b5a4a4fb591c8bad9ff4954,

title = "Temporal variability of surgical technical skill perception in real robotic surgery",

abstract = "Purpose: Summary score metrics, either from crowds of non-experts, faculty surgeons or from automated performance metrics, have been trusted as the prevailing method of reporting surgeon technical skill. The aim of this paper is to learn whether there exist significant fluctuations in the technical skill assessments of a surgeon throughout long durations of surgical footage. Methods: A set of 12 videos of robotic surgery cases from common human patient robotic surgeries were used to evaluate the perceived technical skill at each individual minute of the surgical videos, which were originally 12–15 min in length. A linear mixed-effects model for each video was used to compare the ratings of each minute to those from every other minute in order to learn whether a change in scores over time can be detected and reliably measured apart from inter- and intrarater variation. Results: Modeling the change over time of the global evaluative assessment of robotic skills scores significantly contributed to the prediction models for 11 of the 12 surgeons. This demonstrates that measurable changes in technical skill occur over time during robotic surgery. Conclusion: The findings from this research raise questions about the optimal duration of footage needed to be evaluated to arrive at an accurate rating of surgical technical skill for longer procedures. This may imply non-negligible label noise for supervised machine learning approaches. In the future, it may be necessary to report a surgeon{\textquoteright}s skill variability in addition to their mean score to have proper knowledge of a surgeon{\textquoteright}s overall skill level.",

keywords = "Bias, Crowd sourcing, Surgical technical skill, Video segmentation",

author = "Kelly, {Jason D.} and Michael Nash and Nicholas Heller and Lendvay, {Thomas S.} and Kowalewski, {Timothy M.}",

note = "Publisher Copyright: {\textcopyright} 2020, CARS.",

year = "2020",

month = dec,

doi = "10.1007/s11548-020-02253-5",

language = "English (US)",

volume = "15",

pages = "2101--2107",

journal = "International Journal of Computer Assisted Radiology and Surgery",

issn = "1861-6410",

publisher = "Springer Verlag",

number = "12",

}

TY - JOUR

T1 - Temporal variability of surgical technical skill perception in real robotic surgery

AU - Kelly, Jason D.

AU - Nash, Michael

AU - Heller, Nicholas

AU - Lendvay, Thomas S.

AU - Kowalewski, Timothy M.

PY - 2020/12

Y1 - 2020/12

N2 - Purpose: Summary score metrics, either from crowds of non-experts, faculty surgeons or from automated performance metrics, have been trusted as the prevailing method of reporting surgeon technical skill. The aim of this paper is to learn whether there exist significant fluctuations in the technical skill assessments of a surgeon throughout long durations of surgical footage. Methods: A set of 12 videos of robotic surgery cases from common human patient robotic surgeries were used to evaluate the perceived technical skill at each individual minute of the surgical videos, which were originally 12–15 min in length. A linear mixed-effects model for each video was used to compare the ratings of each minute to those from every other minute in order to learn whether a change in scores over time can be detected and reliably measured apart from inter- and intrarater variation. Results: Modeling the change over time of the global evaluative assessment of robotic skills scores significantly contributed to the prediction models for 11 of the 12 surgeons. This demonstrates that measurable changes in technical skill occur over time during robotic surgery. Conclusion: The findings from this research raise questions about the optimal duration of footage needed to be evaluated to arrive at an accurate rating of surgical technical skill for longer procedures. This may imply non-negligible label noise for supervised machine learning approaches. In the future, it may be necessary to report a surgeon’s skill variability in addition to their mean score to have proper knowledge of a surgeon’s overall skill level.

AB - Purpose: Summary score metrics, either from crowds of non-experts, faculty surgeons or from automated performance metrics, have been trusted as the prevailing method of reporting surgeon technical skill. The aim of this paper is to learn whether there exist significant fluctuations in the technical skill assessments of a surgeon throughout long durations of surgical footage. Methods: A set of 12 videos of robotic surgery cases from common human patient robotic surgeries were used to evaluate the perceived technical skill at each individual minute of the surgical videos, which were originally 12–15 min in length. A linear mixed-effects model for each video was used to compare the ratings of each minute to those from every other minute in order to learn whether a change in scores over time can be detected and reliably measured apart from inter- and intrarater variation. Results: Modeling the change over time of the global evaluative assessment of robotic skills scores significantly contributed to the prediction models for 11 of the 12 surgeons. This demonstrates that measurable changes in technical skill occur over time during robotic surgery. Conclusion: The findings from this research raise questions about the optimal duration of footage needed to be evaluated to arrive at an accurate rating of surgical technical skill for longer procedures. This may imply non-negligible label noise for supervised machine learning approaches. In the future, it may be necessary to report a surgeon’s skill variability in addition to their mean score to have proper knowledge of a surgeon’s overall skill level.

KW - Bias

KW - Crowd sourcing

KW - Surgical technical skill

KW - Video segmentation

UR - http://www.scopus.com/inward/record.url?scp=85089961331&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85089961331&partnerID=8YFLogxK

U2 - 10.1007/s11548-020-02253-5

DO - 10.1007/s11548-020-02253-5

M3 - Article

C2 - 32860549

AN - SCOPUS:85089961331

SN - 1861-6410

VL - 15

SP - 2101

EP - 2107

JO - International Journal of Computer Assisted Radiology and Surgery

JF - International Journal of Computer Assisted Radiology and Surgery

IS - 12

ER -

Temporal variability of surgical technical skill perception in real robotic surgery

Abstract

Bibliographical note

Keywords

UN SDGs

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this