SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Using Principal Component
Analysis to Remove Correlated
Signal from Astronomical Images
Kim Scott
National Radio Astronomy Observatory
Data Science Meet-up
February 18, 2014
Galaxy Evolution in One Slide...
Galaxy Evolution in One Slide...
Galaxy Evolution in One Slide...

?
Galaxy Surveys – What Are We Missing?
Galaxy Surveys – What Are We Missing?

Optical surveys miss
~50% of star formation
in galaxies
Optical surveys
are biased

Dust reemits stellar
radiation at infrared to
millimeter wavelengths
(λ ~ 20 – 2000 μm)
Galaxy Surveys at (Sub)mm Wavelengths
Atmospheric emission

1000× stronger than signal from galaxies

Extragalactic emission:
Transmitted
Absorbed
Removing the Atmosphere by
Modulating the Signal in Time
Detector array

Galaxy
Removing the Atmosphere by
Modulating the Signal in Time
Detector array

i=1

i=2

i=3

Galaxy

xij: power measured for
time sample i on detector j
Surveys at λ=1.1mm with AzTEC
ASTE Telescope
AzTEC Dewar
AzTEC Array
(117 detectors)
Raw Time-stream Data

Sample rate = 1∕(15.625 ms)
Raw Time-stream Data

Sample rate = 1∕(15.625 ms)
(20 s = 1280 samples)
Principal Component Analysis (PCA)

[Used in supervised learning to compress data - fit to
fewer number of features]
• xij: power measured for time sample i on detector j
• n = number of detectors; m = number of time samples
• X = [ x1 x2 ... xm ] → n × m matrix

*Only input needed for PCA*
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for each detector
• Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector
• Set xij (xij − μj) ∕ σj
• X = [ x1 x2 ... xm ] → n × m matrix
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for each detector
• Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector
• Set xij (xij − μj) ∕ σj
• X = [ x1 x2 ... xm ] → n × m matrix
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for each detector
• Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector
• Set xij (xij − μj) ∕ σj
• X = [ x1 x2 ... xm ] → n × m matrix

1mV

*PCA can identify lower level
correlations among subsets of
the detectors*
Principal Component Analysis (PCA)
Step 2: Calculate covariance matrix
• C = (1∕m) X XT
(recall m = # time samples)
• C → n × n symmetric matrix
(recall n = 117 detectors)
Step 3: Eigen decomposition
• C = Q Λ Q-1 (*solve using SVD*)
• Q = [ q1 q2 ... qn ] → n × n matrix containing
eigenvectors qi
• Λ → n × n diagonal matrix containing eigenvalues λi = Λii
• Principal components = uncorrelated variables
Principal Component Analysis (PCA)
Step 4: Choose number of components to remove
• Goal: choose fewest number of components (k) to
REMOVE most of the observed variance in the data
• QR = [ qk+1 qk+2 ... qn ] → n × k matrix, k < n
• Z = [ z1 z2 ... zm ] = QRT X → k x m matrix
• To derive model of galaxy intensities on sky, use Z instead
of X (but...)
Choosing k:
Variance after PCA (given k)
< 0.05
Variance with average subtraction only
Principal Component Analysis (PCA)
Step 5: Reconstruct data without correlated signal
• Know RA/Dec for each detector: need to reconstruct
approximation for data to make image
• XR = QR Z → n × m matrix with correlated signal
removed!

1mV
Principal Component Analysis (PCA)
Step 5: Reconstruct data without correlated signal
• Know RA/Dec for each detector: need to reconstruct
approximation for data to make image
• XR = QR Z → n × m matrix with correlated signal
removed!
20μV

*Variance reduced by factor of 50*
Image of PKS J1127-1857
Make the map:
• Use information on sky position for each detector at each time
sample (RAij, Decij) and bin data onto image grid
• Set the intensity of each image pixel to the average of the xRij values
that fall into that bin
• Smooth image by telescope point-spread response function
(Gaussian with FWHM=30’’)

Average Subtraction

PCA Cleaned

• raw data = 30 MB
• ttot = 4 min
• 16640 samples/detector
An Extragalactic Survey at λ=1.1 mm
• Most galaxies are 100× fainter
than PKS J1127-1857
• raw data ~ 25 GB
• ttot ~ 80 hrs
• ~ 2×107 samples/detector
• AzTEC/COSMOS survey
• 0.7 deg2
• 500× area of HUDF
• 160 hrs versus 11 days for
HUDF
• 130 mm-bright galaxies

Aretxaga et al. 2011
An Extragalactic Survey at λ=1.1 mm

• AzTEC/COSMOS survey
• 0.7 deg2
• 500× area of HUDF
• 160 hrs versus 270 hrs for
HUDF
• 130 mm-bright galaxies
An Extragalactic Survey at λ=1.1 mm

• AzTEC/COSMOS survey
• 0.7 deg2
• 500× area of HUDF
• 160 hrs versus 270 hrs for
HUDF
• 130 mm-bright galaxies
An Extragalactic Survey at λ=1.1 mm
• AzTEC-3
• Observed 1 Gyr after Big Bang
• Starburst galaxy (SFR~1000 Msun/yr)

Capak et al. 2011

• AzTEC/COSMOS survey
• 0.7 deg2
• 500× area of HUDF
• 160 hrs versus 270 hrs for
HUDF
• 130 mm-bright galaxies

Aretxaga et al. 2011

Más contenido relacionado

La actualidad más candente

Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approachnozomuhamada
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMarjan Sterjev
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili
 
MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2arogozhnikov
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis Ibrahim Amer
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONAndré Panisson
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributionsWooSung Choi
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based ClusteringSSA KPI
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackarogozhnikov
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3arogozhnikov
 
Multidimension Scaling and Isomap
Multidimension Scaling and IsomapMultidimension Scaling and Isomap
Multidimension Scaling and IsomapCheng-Shiang Li
 

La actualidad más candente (20)

Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach
 
Pca ankita dubey
Pca ankita dubeyPca ankita dubey
Pca ankita dubey
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image Similarity
 
MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHON
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
Data Analysis Homework Help
Data Analysis Homework HelpData Analysis Homework Help
Data Analysis Homework Help
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
Multidimension Scaling and Isomap
Multidimension Scaling and IsomapMultidimension Scaling and Isomap
Multidimension Scaling and Isomap
 

Similar a Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
Introduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCDIntroduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCDChristos Kallidonis
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdfRahul926331
 
Ultimate astronomicalimaging
Ultimate astronomicalimagingUltimate astronomicalimaging
Ultimate astronomicalimagingClifford Stone
 
Mathematics and AI
Mathematics and AIMathematics and AI
Mathematics and AIMarc Lelarge
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...grssieee
 
Journey to structure from motion
Journey to structure from motionJourney to structure from motion
Journey to structure from motionJa-Keoung Koo
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...AIST
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloClaudio Attaccalite
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012Zheng Mengdi
 
Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Rediet Moges
 
MIRAS: the instrument aboard SMOS
MIRAS: the instrument aboard SMOSMIRAS: the instrument aboard SMOS
MIRAS: the instrument aboard SMOSadrianocamps
 
NMR Spectroscopy
NMR SpectroscopyNMR Spectroscopy
NMR Spectroscopyclayqn88
 
Imaging the Unseen: Taking the First Picture of a Black Hole
Imaging the Unseen: Taking the First Picture of a Black HoleImaging the Unseen: Taking the First Picture of a Black Hole
Imaging the Unseen: Taking the First Picture of a Black HoleDatabricks
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...Spark Summit
 

Similar a Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images (20)

DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Introduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCDIntroduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCD
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
Ultimate astronomicalimaging
Ultimate astronomicalimagingUltimate astronomicalimaging
Ultimate astronomicalimaging
 
Mathematics and AI
Mathematics and AIMathematics and AI
Mathematics and AI
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
 
Journey to structure from motion
Journey to structure from motionJourney to structure from motion
Journey to structure from motion
 
Xray interferometry
Xray interferometryXray interferometry
Xray interferometry
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte Carlo
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05
 
MIRAS: the instrument aboard SMOS
MIRAS: the instrument aboard SMOSMIRAS: the instrument aboard SMOS
MIRAS: the instrument aboard SMOS
 
NMR Spectroscopy
NMR SpectroscopyNMR Spectroscopy
NMR Spectroscopy
 
Imaging the Unseen: Taking the First Picture of a Black Hole
Imaging the Unseen: Taking the First Picture of a Black HoleImaging the Unseen: Taking the First Picture of a Black Hole
Imaging the Unseen: Taking the First Picture of a Black Hole
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
 

Último

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images

  • 1. Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images Kim Scott National Radio Astronomy Observatory Data Science Meet-up February 18, 2014
  • 2. Galaxy Evolution in One Slide...
  • 3. Galaxy Evolution in One Slide...
  • 4. Galaxy Evolution in One Slide... ?
  • 5. Galaxy Surveys – What Are We Missing?
  • 6. Galaxy Surveys – What Are We Missing? Optical surveys miss ~50% of star formation in galaxies Optical surveys are biased Dust reemits stellar radiation at infrared to millimeter wavelengths (λ ~ 20 – 2000 μm)
  • 7. Galaxy Surveys at (Sub)mm Wavelengths Atmospheric emission 1000× stronger than signal from galaxies Extragalactic emission: Transmitted Absorbed
  • 8. Removing the Atmosphere by Modulating the Signal in Time Detector array Galaxy
  • 9. Removing the Atmosphere by Modulating the Signal in Time Detector array i=1 i=2 i=3 Galaxy xij: power measured for time sample i on detector j
  • 10. Surveys at λ=1.1mm with AzTEC ASTE Telescope AzTEC Dewar AzTEC Array (117 detectors)
  • 11. Raw Time-stream Data Sample rate = 1∕(15.625 ms)
  • 12. Raw Time-stream Data Sample rate = 1∕(15.625 ms) (20 s = 1280 samples)
  • 13. Principal Component Analysis (PCA) [Used in supervised learning to compress data - fit to fewer number of features] • xij: power measured for time sample i on detector j • n = number of detectors; m = number of time samples • X = [ x1 x2 ... xm ] → n × m matrix *Only input needed for PCA*
  • 14. Principal Component Analysis (PCA) Step 1: Mean normalization (and feature scaling) • Compute μj = (1∕m) Σi=1,m xij for each detector • Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector • Set xij (xij − μj) ∕ σj • X = [ x1 x2 ... xm ] → n × m matrix
  • 15. Principal Component Analysis (PCA) Step 1: Mean normalization (and feature scaling) • Compute μj = (1∕m) Σi=1,m xij for each detector • Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector • Set xij (xij − μj) ∕ σj • X = [ x1 x2 ... xm ] → n × m matrix
  • 16. Principal Component Analysis (PCA) Step 1: Mean normalization (and feature scaling) • Compute μj = (1∕m) Σi=1,m xij for each detector • Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector • Set xij (xij − μj) ∕ σj • X = [ x1 x2 ... xm ] → n × m matrix 1mV *PCA can identify lower level correlations among subsets of the detectors*
  • 17. Principal Component Analysis (PCA) Step 2: Calculate covariance matrix • C = (1∕m) X XT (recall m = # time samples) • C → n × n symmetric matrix (recall n = 117 detectors) Step 3: Eigen decomposition • C = Q Λ Q-1 (*solve using SVD*) • Q = [ q1 q2 ... qn ] → n × n matrix containing eigenvectors qi • Λ → n × n diagonal matrix containing eigenvalues λi = Λii • Principal components = uncorrelated variables
  • 18. Principal Component Analysis (PCA) Step 4: Choose number of components to remove • Goal: choose fewest number of components (k) to REMOVE most of the observed variance in the data • QR = [ qk+1 qk+2 ... qn ] → n × k matrix, k < n • Z = [ z1 z2 ... zm ] = QRT X → k x m matrix • To derive model of galaxy intensities on sky, use Z instead of X (but...) Choosing k: Variance after PCA (given k) < 0.05 Variance with average subtraction only
  • 19. Principal Component Analysis (PCA) Step 5: Reconstruct data without correlated signal • Know RA/Dec for each detector: need to reconstruct approximation for data to make image • XR = QR Z → n × m matrix with correlated signal removed! 1mV
  • 20. Principal Component Analysis (PCA) Step 5: Reconstruct data without correlated signal • Know RA/Dec for each detector: need to reconstruct approximation for data to make image • XR = QR Z → n × m matrix with correlated signal removed! 20μV *Variance reduced by factor of 50*
  • 21. Image of PKS J1127-1857 Make the map: • Use information on sky position for each detector at each time sample (RAij, Decij) and bin data onto image grid • Set the intensity of each image pixel to the average of the xRij values that fall into that bin • Smooth image by telescope point-spread response function (Gaussian with FWHM=30’’) Average Subtraction PCA Cleaned • raw data = 30 MB • ttot = 4 min • 16640 samples/detector
  • 22. An Extragalactic Survey at λ=1.1 mm • Most galaxies are 100× fainter than PKS J1127-1857 • raw data ~ 25 GB • ttot ~ 80 hrs • ~ 2×107 samples/detector • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 11 days for HUDF • 130 mm-bright galaxies Aretxaga et al. 2011
  • 23. An Extragalactic Survey at λ=1.1 mm • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 270 hrs for HUDF • 130 mm-bright galaxies
  • 24. An Extragalactic Survey at λ=1.1 mm • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 270 hrs for HUDF • 130 mm-bright galaxies
  • 25. An Extragalactic Survey at λ=1.1 mm • AzTEC-3 • Observed 1 Gyr after Big Bang • Starburst galaxy (SFR~1000 Msun/yr) Capak et al. 2011 • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 270 hrs for HUDF • 130 mm-bright galaxies Aretxaga et al. 2011