KEYWORDS: Clouds, Astronomy, Observatories, Data storage, Prototyping, Computer architecture, Data centers, Large Synoptic Survey Telescope, Web services, Data processing
The advancement of technology in telescopes and observation instruments has allowed more and more data to be available with higher resolutions and quality, but the way to obtain this data is the same, astronomers are conditioned obtain observation time. Once the data is public in most cases fall into disuse and except with initiatives such as virtual observatories that seek to offer this data over standards and protocols for storage and access, the data are not used again. Our proposal seeks to recover these large volumes of observations already made and offer them as a cloud service from a High Performance Computing environment to be reprocessed by the scientific community with the aim of generating new science from them, for this in the Chilean Virtual observatory we have created ChiVOLabs, a system that has a microservice architecture on Docker that allows us, based on searches in our virtual observatories, to offer the possibility of reprocessing ALMA data through a Jupyter Notebook interface. The process start using the IVOA protocol: Simple Cone Search to find the astronomical data and then select the option to reprocess them. Our architecture create a DOCKER environment with the entire analysis stack of ChiVO and Python in our datacenter, furthermore transforms the reduction script used to generate the original product in a jupyter notebook automatically connected to our CASA cluster (Pipleine of reduction of ALMA). Finally we search in our datalake raw ALMA data, known as ADSM, from which is generated the selected product and are created a link to these raw in the container. With this we offer the scientific community the possibility of reprocessing these data according to their scientific interests and generating new data products that can finally be downloaded or continue to be working with our libraries using the full power of our datacenter
KEYWORDS: Astronomy, Data modeling, Principal component analysis, Matrices, Data archive systems, Spectroscopy, Chemical elements, Data conversion, Data storage, Spectral data processing
The big data problem in Astronomy is a well now know issue, but in majority of the cases is constrains to volume of this data. Our propose take care of another aspect of the problem: the dimensionality problem, in the scope of multidimensional data especially Astronomical data cubes. We use tensor decompositions for two goals, first using tucker we achieve super compression rates that allowed to saving until 91% disks space and network traffic and second using a CANDECOMP/PARAFAC or Canonical Decomposition (CP) we build a system to find the multi-linear manifold in this astronomical cubes. Because this is a problem of BigData, for ours library we test three implementations: One approach over an intense use of GPU supported by PyTorch and using the traditional approach of HPC using MPI. Our proposed start from a simple but powerful idea, if we are dealing with multidimensional data (astronomical cubes), Why are we limited to use bi-dimensional techniques?. For example we use PCA for dimensionality reductions in spectral cubes instead of multidimensional approach that preserve the multi linear manifold inside this multidimensional data, we propose to pass from a linear algebra approach to a multi-linear algebra approach using tensor theory.
KEYWORDS: Astronomy, Data storage, Data centers, Clouds, Data processing, Algorithm development, Observatories, Prototyping, Computer architecture, Computing systems
The research on computational methods for astronomy performed by the first phase of the Chilean Virtual Observatory (ChiVO) led to the development of functional prototypes, implementing state-of-the-art computational methods and proposing new algorithms and techniques. The ChiVO software architecture is based on the use of the IVOA protocols and standards. These protocols and standards are grouped in layers, with emphasis on the application and data layers, because their basic standards define the minimum operation that a VO should conduct. As momentary verification, the current implementation works with a set of data, with 1 TB capacity, which comes from the reduction of the cycle 0 of ALMA. This research was mainly focused on spectroscopic data cubes coming from the cycle 0 ALMA's public data. As the dataset size increases when the cycle 1 ALMA's public data is also increasing every month, data processing is becoming a major bottleneck for scientific research in astronomy. When designing the ChiVO, we focused on improving both computation and I/ O costs, and this led us to configure a data center with 424 high speed cores of 2,6 GHz, 1 PB of storage (distributed in hard disk drives-HDD and solid state drive-SSD) and high speed communication Infiniband. We are developing a cloud based e-infrastructure for ChiVO services, in order to have a coherent framework for developing novel web services for on-line data processing in the ChiVO. We are currently parallelizing these new algorithms and techniques using HPC tools to speed up big data processing, and we will report our results in terms of data size, data distribution, number of cores and response time, in order to compare different processing and storage configurations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.