Artificial intelligence (AI) / Machine Learning (ML) applications are widely available for different domains such as commercial, industrial, and intelligence applications. In particular, the use of AI applications for the security environment requires standards to manage expectations for users to understand how the results were derived. A reliance on “black boxes” to generate predictions and inform decisions could lead to errors of analysis. This paper explores the development of potential standards designed for each stage of the development of an AI/ML system to help enable trust, transparency, and explainability. Specifically, the paper utilizes the standards outlined in Intelligence Community Directive 203 (Analytic Standards) to hold machine outputs to the same rigorous accountability standards as performed by humans. Building on the ICD203, the Multi-Source AI Scorecard Table (MAST) was developed to support the community towards test and evaluation of AI/ML techniques. The paper provides discussion towards using MAST to rate a semantic processing tool for processing noisy, unstructured, and complex microtext in the form of streaming chat for video call outs. The scoring is notional, but provides a discussion on how MAST could be used as a standard to compare AI/ML methods that complements datasheets and model cards.
|