Various have been made by experts in this field

Various data mining algorithms have been
applied by astronomers in like most of the
different applications in astronomy.
But long-term researches and several mining projects
have  been made by experts in this field of data
mining making use of data related to the study of
astronomy because astronomy has created numerous magnificent
datasets that are flexible
to the approach along with numerous
other areas like as medicine and high energy sciences of
physics.Instances of projects are the SKICAT-Sky Image Cataloging
and Analysis System for catalog formation and analysis technique, the catalog from digitized
skies surveys importantly the
scans given by the second Palomar Observatory
Sky Survey; the JAR Tool- Jet Propulsion
Laboratory Adaptive Recognition Tool used for
recognition of volcanoes formed in over 30,000 images of
Venus which came by the Magellan mission; the following  and more general Diamond
and the Lawrence Livermore National Laboratory Sapphire project work.



We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


Classification is an crucial preliminary step
in the scientific method as it gives a way for
arranging information in a method that may be used to
make hypotheses and compare easily with models. The
two most useful concepts in object classification are
the completeness and the efficiency, also known as
recall and precision. They are generally defined in terms
of  true and false positives (TP and FP) and
true and false negatives
(TN and FN). The completeness is the fraction of those objects that are in reality of a given type
are classified as that type: and the
efficiency is the fraction of
objects generally classified
as a given type that
are genuinely of that type These two quantities are interesting astrophysically because while one
requires both higher completeness and efficiency there is
most commonly a tradeoff involved. The paramount importance each often
the mostly depends on the application, for instance, an investigation of such rare objects
generally requires high completeness while allowing like some contamination (lower efficiency) but
statistical clustering of cosmological objects requires high
efficiency even at the cost of completeness.



Star-Galaxy Separation

 Due to the physical size in comparison to their distance
from us,most of the stars are unresolved in datasets
relating to photometry, and therefore appear
as point sources. Galaxies despite being further away, generally
subtend a larger angle and appear as extended
sources. However, other astrophysical objects such as quasars
and supernovae, are
also seen
as as point sources. Thus, the separation
of photometric catalog
into stars and numerous galaxies, or more generally, stars,
galaxies and other
objects, is an important problem. The number of galaxies and
numerous stars in typical
essential surveys (of the order of 108 or above) requires that such separation must be
automated. This problem is a well studied one and automated
approaches were specifically employed before the current data mining
algorithms became famous, mostly for instance, during digitization done by the scanning
of the various photographic plates by machines such as the APM and DPOSS.Several data mining
algorithms have been applied, including ANN,
DT,mixture modelling and SOM with most algorithms
achieving over efficiency around 95%. Typically, this is performed using a set of measured
morphological parameters that are made
from the survey photometry,
with perhaps colors or other information,
such as the seeing. The advantage
of  data mining general
approach is that all such information about
each object is easily incorporated.


Ø Galaxy Morphology 

Galaxies come
in a range of numerous sizes and shapes, or more collectively,
morphology. The most well-known system for the
morphological classification of
galaxies is the Hubble
Sequence of elliptical, spiral, barred
spiral, and irregular, along with various subclasses. This
system correlates to many physical properties known
to be crucial in
the formation and formation of galaxies. Because
galaxy morphology is a tough and complex phenomenon
that correlates to the underlying the subject of
physics, but is not unique to any one given process, the
Hubble sequence has shown, despite it being rather
subjective and based on visible-light 
morphology originally created from blue-biased photographic plates. The Hubble sequence has been extended in various other
methods, and for data mining purposes the T system has been extensively taken into consideration. This system
maps the categorical Hubble types
E, S0, Sa, Sb, Sc,
Sd, and Irr onto
the numerical values -5 to
10. One can train a supervised algorithm to allotT types to images for which measured parameters are made available. Such parameters can be completely morphological, or comprise of other information such as color. A
series of papers written by Lahav and collaborators do exactly the same, by applying ANNs to
predict theT type of galaxies at low redshift, and
finding equal amount of the real accuracy to human experts. ANNs have also
been applied to higher redshift data to
distinguish between normal and unique galaxies and the fundamentally
topological and unsupervised SOM ANN has been used to
classify various galaxies from Hubble Space Telescope
images, where the initial distribution of various classes
is unknown. Likewise, ANNs have
been used to obtain the morphological types from galaxy spectra.






An area of
astrophysics that has greatly increased in popularity in
the last few years is the estimation of redshifts from
photometric data
(photo-zs). This is because, although the distances are
less accurate than the ones obtained with spectra,
the sheer number of objects with photometric measurementscan often make up for the reduction in individual accuracy by suppressing the
statistical noise of an ensemble calculation. The two most
common approaches to photo-zs are the template method and the empirical training the set method. The template approach has many
difficult issues, comprising calibration, zero-points, priors, multi-wavelength performance (e.g., poor in
the mid-infrared), and difficulty handling missing or incomplete training data. We pay
attention in this review on the empirical approach,
as it is an implementation of supervised learning.


Ø Galaxies

 At low redshifts, the calculation
of photometric redshifts for normal galaxies
is quite straightforward due to the break in the typical
galaxy spectrum at 4000A. Thus, as a galaxy is redshifted with
increasing distance, the color (measured as a difference in magnitudes) changes
relatively smoothly. As a result, both template and empirical
photo-z approaches obtain similar outcomes, a
root-mean-square deviation of ~ 0.02 in redshift,
which is near to the best possible result given
the intrinsic spread in the properties. This has been
shown with ANNs SVM
DT, kNN, empirical polynomial relations, numerous template-based studies, and
several other procedures. At higher redshifts, acheiving accurate results
becomes more difficult because the 4000A break is shifted
redward of the optical, galaxies are fainter and
thus spectral data are sparser, and galaxies intrinsically evolve over
time. While supervised learning has been successfully used,
beyond the spectral regime the obvious limitation arises
that in order to reach the limiting magnitude of the photometric portions of
surveys, extrapolation would be required. In this regime, or where only small
training sets are available, template-based results can be used, but without
spectral information, the templates themselves are being extrapolated. However,
the extrapolation of the templates is being done in a more
physically motivated manner. It is likely that the
more general hybrid method of
using empirical data to iteratively improve the
templates or the semi-supervised procedure
described in will ultimately provide a more elegant solution. Another
issue at higher redshift is that the available numbers of objects can become
quite small (in the hundreds or lesser),thus reintroducing the curse of dimensionality by a
simple lack of objects in comparison to measured wavebands. The methods
of dimension reduction can help to mitigate this effect.