Paper
19 December 2002 Statistical methodology for massive datasets and model selection
G. Jogesh Babu, James P. McDermott
Author Affiliations +
Abstract
Astronomy is facing a revolution in data collection, storage, analysis, and interpretation of large datasets. The data volumes here are several orders of magnitude larger than what astronomers and statisticians are used to dealing with, and the old methods simply do not work. The National Virtual Observatory (NVO) initiative has recently emerged in recognition of this need and to federate numerous large digital sky archives, both ground based and space based, and develop tools to explore and understand these vast volumes of data. In this paper, we address some of the critically important statistical challenges raised by the NVO. In particular a low-storage, single-pass, sequential method for simultaneous estimation of multiple quantiles for massive datasets will be presented. Density estimation based on this procedure and a multivariate extension will also be discussed. The NVO also requires statistical tools to analyze moderate size databases. Model selection is an important issue for many astrophysical databases. We present a simple likelihood based 'leave one out' method to select the best among the several possible alternatives. The performance of the method is compared to those based on Akaike Information Criterion and Bayesian Information Criterion.
© (2002) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
G. Jogesh Babu and James P. McDermott "Statistical methodology for massive datasets and model selection", Proc. SPIE 4847, Astronomical Data Analysis II, (19 December 2002); https://doi.org/10.1117/12.460339
Lens.org Logo
CITATIONS
Cited by 5 scholarly publications and 1 patent.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Statistical analysis

Data modeling

Databases

Data storage

Astronomy

Statistical modeling

Computer simulations

RELATED CONTENT

Application of big data analysis in economic statistics
Proceedings of SPIE (December 19 2021)
Knowledge discovery in astronomical data
Proceedings of SPIE (July 15 2008)
Exploration of parameter spaces in a virtual observatory
Proceedings of SPIE (November 01 2001)
Building the infrastructure for the virtual observatory
Proceedings of SPIE (November 01 2001)

Back to Top