An ASCII data table refers to a tabular data set where the values are encoded in ASCII and are written in a row-oriented format where each table row is (effectively) on one line. Frequently these data files are somewhat free format in nature and do not conform to precisely definition standards such as FITS.
ASCII data tables are a fact of life for scientists and you need to be able to read and write them effectively.
astropy.io.ascii
¶The astropy.io.ascii package provides methods for reading and writing a wide range of ASCII data table formats from basic character-separated tables to astronomy-specific or even application-specific formats.
This package isn't the fastest ASCII table reader/writer available, but it is one of the most flexible and most of the time it should just work out of the box. [About speed, there is a Google Summer of Code 2014 student working on making it fast!]
The following shows a few of the ASCII formats that are available, while the section on Supported formats contains the full list.
Format | Description |
---|---|
Basic | basic table with customizable delimiters and header configurations |
CDS | CDS format table (also Vizier and ApJ machine readable tables) |
DAOphot | table from the IRAF DAOphot package |
Fixed width | table with fixed-width columns (see also Fixed-width Gallery) |
IPAC | IPAC format table |
LaTex | LaTeX table with data in the tabular environment |
SExtractor | SExtractor format table |
There are two equally valid ways of reading and writing ASCII tables within astropy:
astropy.io.ascii
moduleInternally there is no difference since they both end up calling the astropy.io.ascii
routines. It depends somewhat on your taste:
Functional interface
>>> from astropy.io import ascii
>>> pets = ascii.read(data)
>>> ascii.write(pets, 'pets.tex', format='latex') # format='<format_name>'
Table interface
>>> from astropy.table import Table
>>> pets = Table.read(data, format='ascii') # format='ascii'
>>> pets.write('pets.tex', format='ascii.latex') # format='ascii.<format_name>'
Overall we recommend using the Table interface, because this also lets you read and write many tables in FITS or VOTable format using the same syntax. Note that for ASCII tables there is absolutely no functional difference between the interfaces, while for FITS and VOTable the direct interfaces (astropy.io.fits
and astropy.io.votable
) have more functionality than the Table
interface.
First example:
import numpy as np
from astropy.table import Table
lines = """animal number
"siamese cat" 1
dog 3 """
pets = Table.read(lines, format='ascii')
pets
read()
¶The online documentation on Parameters for reading has more information.
table
argument -- Show me the table¶The required first argument for every table read call is table
, which provides access to the actual table data. It can be provided several different formats:
format
argument -- What's the format Kenneth?¶This specifies the file format, with available options describe in Supported Formats.
Table.read()
: specifying format='ascii'
is the generic "try to guess the format" option, or format='ascii.<format_name>'
.ascii.read()
: do not provide format
for the generic "guess the format" option, or format='<format_name>'
.lines = [r'\begin{table}',
r'\begin{tabular}{ l c r }',
r'X & Y & Z \\',
r'1 & 2 & 3 \\',
r'4 & 5 & 6 \\',
r'7 & 8 & 9 \\',
r'\end{tabular}',
r'\end{table}']
t = Table.read(lines, format='ascii.latex')
t
guess
argument -- To guess or not to guess¶The default behavior of ascii.read()
is that it tries to guess the table format. This often works:
Table.read(['# a | b', # Commented header contains column names
'1 | 2'],
format='ascii')
The reader uses the Will it blend strategy in reverse: throw the input data into each format reader and see if something nice comes out.
Guessing doesn't always work:
lines = ['year & event',
'2015 & Pigs fly']
try:
Table.read(lines, format='ascii')
except Exception as err:
print(err)
When this happens you need to give ascii.read
a bit of help. The first step is to turn off guessing and provide what you think is the right format. In this case this is a character-delimited table (using &
).
try:
Table.read(lines, format='ascii.basic', guess=False)
except Exception as err:
print(err)
Table.read(lines, format='ascii.basic', delimiter='&', guess=False)
delimiter
, comment
, and quotechar
-- Define table characteristics¶These arguments mostly apply to the basic
format:
delimiter
: the characters <SPACE>
, '|'
, ','
, '\t'
, and '\s'
(tab or space) are defaults.comment
: if this regular expression matches the beginning of a table line then that line will be discarded from header or data processing.quotechar
: This specifies the quote character and will typically be either the single or double quote character.lines = ['a * b', # Don't ever make a table like this!
'^hello * world^ * 5',
'! Comment line',
'more * values']
Table.read(lines, format='ascii.basic', delimiter='*', comment='!', quotechar='^')
header_start
, data_start
, and data_end
-- Locate the header and data¶The header is defined as the part of the table which contains column names and any other information such as column types and other metadata. Frequently the header is a single line of column names, but it maybe long or might not exist.
The data section is the part which has the data values.
header_start
: This specifies in the line index where the header begins. Comment lines are not included in this count and the counting starts from 0 (first non-comment line has index=0). If set to None this indicates that there is no header line and the column names will be auto-generated. The default is dependent on the format.data_start
: This specifies in the line index where the data lines begin where the counting starts from 0 and does not include comment lines. The default is dependent on the format.data_end
: If this is not None then it allows for excluding lines at the end that are not valid data lines. A negative value means to count from the end, so -1 would exclude the last line, -2 the last two lines, and so on.lines = ['Some irrelevant stuff here', # Line 0
'', # Blank (only whitespace) => Not included in count
'# Properly commented out line', # Comment => Not included
' col0 col1', # Line 1 (header start)
'', # Blank
'More junk from table creator', # Line 2
' 10 20 ', # Line 3 (data start)
'# comment line', # Comment
' 20 40 ', # Line 4 (-2, data end)
'/ Why do people do this?', # Line 5 (-1)
'/ Just to make life hard.' # Line 6 (-0)
]
Table.read(lines, format='ascii', header_start=1, data_start=3, data_end=-2, guess=False)
converters
argument -- Give me the right types¶Sometimes the automatic type inference doesn't do you wanted.
lines = """
char float
"1" 1
"2" 2
"3" 3
"4" 4
"""
t = Table.read(lines, format='ascii')
t.dtype
from astropy.io import ascii
converters = {'char': [ascii.convert_numpy(np.str)],
'float': [ascii.convert_numpy(np.float)]}
t = Table.read(lines, format='ascii', converters=converters)
t.dtype
names
, include_names
, exclude_names
-- What to call the columns and what you find interesting¶Some tables arrive nameless, that is to say there is no definition of the column names within the table. When that is the case ascii.read()
will automatically assign names like col0
, col1
and so on. These aren't very descriptive and the names
argument gives you a way to fix that on reading the table.
lines = """
1 2 3
4 5 6
"""
Table.read(lines, format='ascii', names=['a', 'b', 'c'])
Maybe someone has given you a table with tons of columns that are mostly uninteresting to you:
lines = """
a b c d e f g h i j k planet color
1 2 3 4 5 6 7 8 1 2 3 mars reddish
9 8 7 6 5 4 3 2 1 0 1 earth blue-ish
"""
Table.read(lines, format='ascii', include_names=['planet', 'color'])
fill_values
, fill_include_names
, fill_exclude_names
-- Handling missing or bad values¶This can be used to fill missing values in the table or replace strings with special meaning. See the Bad or missing values section for more information and examples.
The default is that any blank table values are treated as missing.
Unfortunately there isn't time here for the joy and madness of fill values!
Once you understand reading tables then writing is pretty simple and comparitively boring. The online documentation on Parameters for write() describes the arguments, most of which should be familiar by now.
Instead of going through the parameters exhaustively we'll just show a few examples here.
lines = """
X Y Z
1 2.0022 'hello world'
-99 4.01234 'so long world'
"""
t = ascii.read(lines) # For variety we show the functional interface
t['Y'].unit = 'm/s'
import sys
t.write(sys.stdout, format='ascii')
t.write(sys.stdout, format='ascii', delimiter=',') # Or just format='ascii.csv' in astropy 0.4 and later
ascii.write(t, format='aastex') # default functional interface output is sys.stdout
t.write(sys.stdout, format='ascii.ipac', formats={'Y': '%.4f'})
t.write(sys.stdout, format='ascii.ipac', names=['A', 'B', 'C'])
t.write(sys.stdout, format='ascii.latex', fill_values=[('-99', '...')])
First familiarize yourself with the available supported formats:
Read the following files in the data/ascii/
sub-directory and show the contents for each one. Try to use the minimum effort and number of non-default options possible.
The pattern will be Table.read('data/ascii/<filename>', format='...', ...)
.
HINT: You can examine each file with %cat <filename>
in IPython notebook (e.g. %cat data/ascii/basic.dat
).
'basic.dat'
'ipac.dat'
-- from NASA/IPAC Infrared Science Archive'daophot.dat.gz'
-- Hint: unzipping is automatic'cds/lhs2065.dat'
-- Hint: read the docs! Note that Vizier catalog results are in CDS format.'fixed_width.dat'
-- Hint: check out the Fixed width gallery.For the table from 'ipac.dat'
, there is table metadata that is in an ordered dictionary in the table meta
attribute. If you read the table into a variable named dat
then you would be looking at dat.meta
.
dat.meta
. What are they?keywords
and their values from dat.meta
, one per line in the format keyword: value
.colnames
.Read two slightly challenging tables:
hard1.dat
-- This is part of a real data file produced by a Sybase SQL query.hard2.dat
-- Your neighbor asked you to help computerize his garden and handed you this data file to read. You'll need to study up on fill_values
.Start from the table from 'ipac.dat'
and make a new table that has just first 2 lines of the first 3 columns. This is done with:
>>> dat3 = dat['designation', 'ra', 'dec'][:2]
Find the output format output that will create each of the following three outputs from the table t3
. You can use the pattern:
import sys
dat3.write(sys.stdout, format='...', ...)
designation,ra,dec,sigra,sigdec
J095542.55+690421.2,148.9273263,69.0725621,0.0401,0.039
J095540.45+690439.1,148.9185759,69.0775361,0.0519,0.0494
designation ra dec
------------------- ----------- ----------
J095542.55+690421.2 148.9273263 69.0725621
J095540.45+690439.1 148.9185759 69.0775361
\begin{table}
\begin{tabular}{ccc}
designation & ra & dec \\
J095542.55+690421.2 & 148.9273263 & 69.0725621 \\
J095540.45+690439.1 & 148.9185759 & 69.0775361 \\
\end{tabular}
\end{table}
dat3
table back out in IPAC format. Does it look like the output has the same units and metadata as the original?