UNCLASSIFIED DISTRIBUTION STATEMENT A Reference Number 18-S-1486; May 9, 2018
TRMC Big Data Analysis /
Knowledge Management
Initiative
Ryan Norman
Big Data and Knowledge Management Initiative Lead
Test Resource Management Center
ryan.t.norman.civ@mail.mil
What is Big Data Analytics?
The use of advanced statistical analytic techniques in a parallel
processing high-performance computing environment against very large
diverse data sets that include different types of data
Allows analysts to make better and faster decisions using data that was
previously inaccessible or unusable
Previously under-utilized data sources can be analyzed to gain new
insights resulting in significantly better and faster decisions
Instead of analyzing small chunks of data, Big Data Analytics can give
the analyst a broad view of the system, allowing the discovery of
“unknown unknowns.”
Most important (and relevant to T&E) big data analytics techniques:
Anomaly Detection Did something go wrong?
Causality Detection – What contributed to it?
Trend Analysis What’s happening over time?
Predicting Equipment Function and Failure – When will something go wrong?
Regression Analysis How is today’s data different than the past?
Data Set Comparison – Is test repeatable? Is the simulation the same as the test?
Is the perceived truth the same as the ground truth?
Pattern Recognition – Are there hidden relationships in the data set?
2
Better tools and techniques so analysts can do their jobs
3
Example Big Data Analytics
Return on Investment
Analysis: Brief (~300 ms) false on-ground event for sensor during flight
Big Data Analytics enables faster & more comprehensive
analysis across the lifecycle of a program
Result: JSF-KM project
discovered unknown problem
with ground sensor
Need: An Evaluation Revolution
Most T&E investments have been focused on the “T” rather than the “E”
Our analysis & evaluation capabilities are not keeping up with the complexity and speed
required by today’s acquisition systems
The next-generation of acquisition systems will be exponentially more complex than today
Impact: T&E quality is inadequate for our needs
More data is being collected than can be properly analyzed
Only a tiny fraction of data is looked at
Analysis occurs on a small fraction of data
Focus is on a single test, rather than data collected across the system lifecycle
No systematic anomaly detection, trend analysis, regression analysis, causality analysis,
pattern recognition, simulation/test comparisons, perceived truth/ground truth comparisons
are being done
Impact: T&E timeliness is inadequate for our needs
Analyst retrieval of test data in many cases takes days/weeks rather than seconds/minutes
Sometimes it’s easier (though not cheaper) to just re-run a test rather than find old data that
may answer the question
Long data ingest times prevent proper debriefing of test participants after a test is over,
since their statements cannot be correlated with data in real time
Impact: T&E dollars are being spent unnecessarily
More tests than necessary are being done, sometimes at enormous expense
Cross-program lessons learned only occur anecdotally
4
A systematic approach to Big Data Analytics and Knowledge Management
is required to address these three serious issues
Long-term DoD T&E Big Data and
Knowledge Management Vision
5
Result: T&E data used more effectively & efficiently during acquisition
The primary product of T&E is data &
knowledge
Embrace KM & Big Data Analytics to
efficiently handle & securely share
T&E data
Organize T&E data to build knowledge
across all DoD acquisitions
Federate distributed data
repositories to enable execution &
automated search scenarios that
cannot occur today
Use modern mechanisms to enable
collaboration between SMEs in
government and industry
Fundamental Functions
Performed by KM and BDA
1. Understand and Document T&E challenges & needs
(FY12) Completed Data Management for Distributed Testing (DM-DT) Study
Result: Developed functional requirements for T&E enterprise distributed Data Management
(FY13) Comprehensive Review of T&E Infrastructure report published
Key Recommendation: Use DoD cloud solution for T&E data
Key Recommendation: USD(AT&L) establish a DoD-wide KM capability for T&E to help achieve
better acquisition outcomes and reduce costs
2. Execute proofs of concept that inform an enterprise approach to T&E
Knowledge Management
(FY15-18) Joint Strike Fighter Knowledge Management (JSF-KM) project
Goal: Assess KM technologies and methodologies in support of an existing acquisition program
(FY15-17) Collected Operational Data Analytics for Continuous Test & Evaluation
(CODAC-TE) project
Goal: Apply KM technologies and methodologies across the lifecycle
3. Develop investment plan that achieves strategic objectives:
Integrate T&E infrastructure into cohesive Knowledge Management enterprise
Modernize T&E practices & processes to leverage Big Data analytics techniques
Apply Big Data analytics tools & techniques to the T&E mission space
Realizing Big Data and Improved
DoD T&E Knowledge Management
Investments Path Forward:
It Starts with Architecture
The Big Data and Knowledge Management Architecture
Reference Document (ARD) identifies:
Deficiencies in current T&E data analysis and knowledge management
practices
Government, commercial and open source software and hardware that
could address these deficiencies
The end state we are looking to achieve
TRMC has released the ARD for feedback in preparation for
making it a JMETC community standard
Reviewers should request access to BDKM User Group on TRMC website
Standardization scheduled for August JMETC Configuration Review Board
Once a reference architecture is standardized, we can build it
Goal: Synergize evaluation investments across DoD T&E
7
https://www.tena-sda.org/display/BDKM/Documentation
What do we need?
Individual Range
2. Cloud Analytics Capability
4. Trained Data
Science Workforce
Integrated
Scalable
Cost-Effective
State-of-the-Art
Working
Files
Regional Analytics Capability
Virtualized Big Data Tools
Processing
Tiered Storage
MLS Security
Data Scientists
Current Range Infrastructure
Existing Tools
Existing Storage
Existing Ingest Capabilities
Range Augmentation
Virtualized Big Data Tools
Some Processing
Some Tiered Storage
MLS Security
Enhance Ingest
Individual Range
Cloud-Based Big Data Analytics and
Knowledge Management System
Regional Analytics Capability
Virtualized Big Data Tools
Processing
Tiered Storage
MLS Security
Data Scientists
New
Existing
Quick-Look
Schedule
Info
Application
Repository
Reports
Data
Regional Analytics Capability
Virtualized Big Data Tools
Processing
Tiered Storage
MLS Security
Data Scientists
Video
Audio
Imagery
1. Integrated Local Data
3. Big Data Tools
Big Data and TENA Relationship:
The Big Data Analytics Architecture is an Extension of TENA
Into the Analytic World Seamless Integration
9
Event Data Is
Ingested into Big
Data Enterprise
System
Working
Files
Current Range Infrastructure
Existing Tools
Existing Storage
Existing Ingest Capabilities
Range Augmentation
Virtualized Big Data Tools
Some Processing
Some Tiered Storage
MLS Security
Enhance Ingest
Individual Range
Quick-Look
Big Data Software Architecture
Overview
10
Existing Range Computing and Storage
Structured Database Unstructured/Semi-Structured Database (Hadoop)
Structured Data Engine
Unstructured Data Engine
Query Engine Federated access for both Structured and Unstructured Data
Data Analysis Packages
User-Defined
Analytic Plugins
Massively Parallel Tiered Computing, Storage, and Network Infrastructure
At Multiple Independent Levels of Security
Extract-
Transform-
Load
Data Sources
Analytic Services
Big Data Visualization
UC S TS SAP SAR
Security
Existing Range
Databases
Flat Files
Raw Files
Setup, Configure, and Manage
Policies Security
Define Metadata
Prioritization
Streams
Micro-batch
Mega-batch
Parallel
Verify
Transform
Add Metadata
Index
Warehouse
Configuration
Metadata Replication
Build Queries
Quick-Look Real-Time Continuous
2D/3D/Anim
Display Reports
Design Reports
Customized
Displays
Display Alerts
User Interface
Authenticate
Authorize
Access
Control
Enforce
Policies
Enforce
Workflow
Threat
Detection
Intrusion
Detection
Active
Defenses
Working Sets
Tables
Encryption
Audit
Alerts
Load Balancing
Fault/Recovery
MILS Secure
Cloud
Statistics
Key-Value Store
Distributed
File System
Generate Reports
AI Tools
Simulation
Analysis Tools
Alerting
Scheduling/Automation
Legacy Tools
SQL Services
Remote Data
Replication
T&E Specific Custom BDA Services
Anomaly Detection Trend Analysis
Causality Detection Regression Analysis
Ground Truth Comparison Pattern Recognition
Filter Sort Summarize Parallelize Optimize
Machine Learning
Data Mining
Customized
UIs
Structured
Unstructured
Audio/Video
Schema
Computing
Resources
Computing
Resources
Create
Automated
Products
Abstraction Layer (Virtualization)
Hypervisor
Virtualized Legacy Tools
Infrastructure as a Service Platform as a Service Software as a Service
Virtualized New Tools
Simulation as a Service
Graph-Based
Schema
Audio/Video Analysis
New
Databases
Provisioning
StreamingScripting
COTS/GOTS Software
New Hardware/Network
TRMC-Developed Software
Existing Range HW/SW
Applications
Resource Mgmt
VM Library
Cloud
License
Customization
Data Services
Organization
Core
Operations
Share
Serve
Messaging
Metadata
Store Retrieve
VersioningTagging
Publish/Subscribe
Crawl/Index
Transfer
Transform
Catalog
Search
Verify
Administrative
COO/DREnforce Policies Archive Tools
DB Admin
Config Mgmt
Sync Data/Video
Spatio-temporal
Ontologies
MPP Programming and Execution Engine
C/R/U/D Consistency
Existing
Computers
Pipeline
Workflow
Range
Protocols
TENA
Data Lifecycle
Workflow
Create
Software
IDE
SDK
Security Architecture:
Notional MILS and CDS
Regional
Analytics
Capability
Long-Term
Storage
Med-Speed
High-Speed
Classification C
Long-Term
Storage
Med-Speed
High-Speed
Classification B
Long-Term
Storage
Med-Speed
High-Speed
Classification A
Long-Term
Storage
Med-Speed
High-Speed
MLS Database
Enterprise Big
Data Analysis
MILS-CDS
12
Data Science
Computer
Science
Machine
Learning
Math and
Statistics
Traditional
Software
Traditional
Research
Subject Matter
Expertise
Big Data
Analytics
Unique DoD data challenges require
an interdisciplinary approach with
skills & analytical techniques required
from 3 broad areas:
Statistics
Especially Bayesian statistics with multivariate analysis
Knowledge of probability, distributions, hypothesis testing, and multivariate analysis
Computer Science
Databases, SQL, data structures, algorithms, parallel computing, distributed computing, etc.
Subject Matter Expertise
Ability to assess which models are feasible, desirable,
and practical in different settings
Clear ideal of the distinction between correlation and
causality
Help Wanted: DoD Data Scientists
“Proliferation of sensors and large data sets are overwhelming analysts, as they lack
the tools to efficiently process, store, analyze, and retrieve vast amounts of data”
ASD(R&E) Department of Defense Research & Engineering website
TRMC Investments Support
Joint Strike Fighter Needs
13
Realistic Distributed
Mission Environments
On-Board
Instrumentation
CRIIS
Miniaturized
Data Capture
QRIP
On Board JSF
Model Validation
& Improvement
Interoperable With JSE
TENA
LVC Integration
Next-Generation Threats
NCR
Cyber T&E
EWIIP
EW
Analysts, Evaluators, & Decision-Makers
Data Ingest &
Validation
RAPIDS
Big Data Analytics & Evaluation
JSF-KM
KM Data Archive
MLS-JCNE
Cross Domain Solutions
MILS Network
JMETC
14
JSF T&E KM & Data Needs
Addressed by TRMC
1. Data Capture: DART Pod is too large, requires significant jet modifications, and
is not certified to support F-35 full operational profile
2. Data Warehousing: Flight test data should be stored in a government facility to
expedite data access & discovery
3. Data Ingest: Current DART Pod test data ingest is too slow to meet multi-ship
quick-look and quick-turn requirements
examples: 2 on 2; 4 turn 2; 4 on 4 turn 4 on 4
4. Data Access: Test data should be available for quick-look analysis during
mission debrief to inform decision making
5. Video: DART Pod video should be available for quick-look analysis during
mission debrief to inform decision making
6. Big Data Analytics: Analysis capabilities need to proactively identify “unknown
unknowns” and other anomalies impossible for a human to discern
7. Remote Operations: Analysts need a rapid reaction capability to harvest data
and conduct quick-look analyses in situations / locations where a network
connection is not possible
JSF-KM Improvements to
Existing T&E Capabilities
DT Today
OT Today
With JSF-KM
Parallel
Data
Ingest
30 minutes
(multiple aircraft)
Raw Data
Available
Video/Data at Post-
Mission Debrief
Big Data
Analytics
Govt. Analyst
Data Request
Analysis
Note: Numbers reflect single 2 hour flight mission
Data
Ingest
Raw Data
Available
Govt. Analyst
Data Request
Analysis
2 hours
(per aircraft)
1 day
1 week
30 seconds
Data
Ingest
Raw Data
Available
Govt. Analyst
Data Request
Analysis
1-2 hours
(per aircraft)
10 minutes
4-5 hours
Data Ready for
Use @ (Govt)
30 seconds
90 minutes
Data Ready
for Use @ LM
>20 weeks of data
available online
Data Ready for
Use @ (Govt)
3 weeks of data
available online
15
Sample JSF-KM Success Stories
Identified flights which experienced propulsion component failure
During a blind analysis of 1,392 flights of propulsion data, JSF-KM data scientist was able to identify 7
of 10 flights with JSF analyst known engine issues
Led to creation of a predictive model* for identifying future failures (*model validation pending)
Without JSF-KM this predictive model may not have been generated
Video available during post-mission debrief due to JSF-KM data ingest improvements
from DART Pod
Existing tools could not process video in time to support post-mission de-brief
Without JSF-KM, there would be no flight video during post-mission debrief
Discovery of avionics box issue during first night mission
Pilot and Analyst discovered problem from video data available 30 minutes after landing
Avionics Box was replaced before another mission was flown
Without JSF-KM, problem would not have been discovered for several days
Reduced data profile time from 5+ hours to 47 seconds per Query
Big Data tool enabled massive improvement to data profile generation
Without JSF-KM it would still take 5+ hours to perform data profile data runs
9 hour routine analysis process reduced to 23 milliseconds
Patuxent River system drastically reduced routine MATLab analysis process from 9 hours to 23
milliseconds prior to KM system even being fully deployed
Patuxent River leadership already identifying other airframes which could use the system
16
Big Data Initiative Summary
TRMC is acting upon recommendations from the Comprehensive Review of
T&E Infrastructure. Strategic Goals:
Integrate T&E infrastructure into cohesive Knowledge Management enterprise
Modernize T&E practices & processes to leverage Big Data analytics techniques
Apply Big Data analytics tools & techniques to the T&E mission space
TRMC-funded proofs of concept are delivering proven capabilities
Enabling Big Data analytics for JSF T&E
Improving transfer of knowledge between fielded and next-gen systems
Informing an investment roadmap that advises future infrastructure, process, and
workforce decision-making
Big Data Architecture Reference Document (ARD) will ensure interoperability
and efficiencies in next-generation range knowledge management
Big Data ARD will be standardized through JMETC Configuration Review Board (JCRB)
https://www.tena-sda.org/display/BDKM/Documentation
17
TRMC will consider additional pilots that continue to
expand big data analytics in acquisition
Event Scheduling / Event Questions
Interoperability Events
Keith Poch
(850) 389-6044
keith.poch@tena-sda.org
Help Desk
Connectivity / Network Questions
NCRC Expansion / Site Questions
JMETC Points of Contact (POCs)
JMETC Program Manager
George Rumford
(571) 372-2724
george.j.rumford.civ@mail.mil
TENA Software Development Activity Director
Ryan Norman
(571) 372-2725
ryan.t.norman.ci[email protected]
National Cyber Range Complex Director
AJ Pathmanathan
(571) 372-2702
arjuna.pathmanathan.civ@mail.mil
NCRC, Deputy Director
Rob Tamburello
(501) 372-2753
Cyber Events
Lizann Messerschmidt
(571) 451-4295
JMETC MILS Network (JMN)
Ben Wilson
(757) 492-7621
JMETC Secret Network (JSN)
Jeff Braget
(850) 389-6031
jeff.braget@tena-sda.org
Action Items, Questions, Tasks, Software Needs, Bug Reports: https://www.tena-sda.org/helpdesk
TENA Products / Software Repository
TENA Software Development Manager
Steve Bachinsky
(703) 253-1068
steve.bachinsky@tena-sda.org
Miscellaneous Questions
For JMETC questions: [email protected]g
For TENA questions: feedback@tena-sda.org
Websites
Unclassified, FOUO, DoD-Restricted (CAC required): https://www.trmc.osd.mil
Distribution A, Industry, non-DoD (username/password required): https://www.tena-sda.org
Range Support and Training
TENA User Support Manager
Gene Hudgins
(850) 803-3902
gene.hudgins@tena-sda.org
JMETC Information Assurance Lead
Robin Deiulio
(540) 553-4098
JTEX-03: August 21-23, 2018; Orlando, FL