org.das2.qds.util.AsciiParser
Class for reading ASCII tables into a QDataSet. This parses a file by breaking
it up into records, and passing the record off to a delegate record parser.
The record parser then breaks up the record into fields, and each field is
parsed by a delegate field parser. Each column of the table has a Unit, field name,
and field label associated with it.
Examples of record parsers include
DelimParser, which splits the record by a delimiter such as a tab or comma,
RegexParser, which processes each record with a regular expression to get the fields,
and FixedColumnsParser, which splits the record by character positions.
Example of field parsers include DOUBLE_PARSER which parses the value
as a double, and UNITS_PARSER, which uses the Unit attached to the column
to interpret the value.
When the first record with the correct number of fields is found but is not
parseable, we look for field labels and units.
The skipLines property tells the parser to skip a given number of header lines
before attempting to parse the record. Also, commentPrefix identifies lines to be
ignored. In either the header or in comments, we look for propertyPattern, and
if a property is matched, then the builder property
is set. Two Patterns are provided NAME_COLON_VALUE_PATTERN and
NAME_EQUAL_VALUE_PATTERN for convenience.
Adapted to QDataSet model, Jeremy, May 2007.
NAME_COLON_VALUE_PATTERN
pattern for name:value.
NAME_EQUAL_VALUE_PATTERN
pattern for name=value.
PROPERTY_FIELD_NAMES
PROPERTY_FILE_HEADER
PROPERTY_FIRST_RECORD
PROPERTY_FIELD_PARSER
DELIM_COMMA
DELIM_TAB
DELIM_WHITESPACE
UNIT_UTC
Convenient unit for parsing UTC times.
PROP_HEADERDELIMITER
DOUBLE_PARSER
parses the field using Double.parseDouble, Java's double parser.
UNITS_PARSER
delegates to the unit object set for this field to parse the data.
ENUMERATION_PARSER
uses the EnumerationUnits for the field to create a Datum.
PROP_VALIDMIN
PROP_VALIDMAX
getDelimParser
getDelimParser( int fieldCount, String delim ) → DelimParser
provide more control to external codes by providing a way to assert that
an N-column delim parser should be used.
Parameters
fieldCount -
delim - the delimiter pattern, such as "," or "\s+"
Returns:
the DelimParser.
search for examples
view on GitHub
view source
getFieldCount
getFieldCount( ) → int
return the number of fields in each record. Note the RecordParsers
also have a fieldCount, which should be equal to this. This allows them
to be independent of the parser.
Returns:
int
search for examples
view on GitHub
view source
getFieldIndex
getFieldIndex( String string ) → int
returns the index of the field. Supports the name, or field0, or 0, etc.
returns -1 when the column is not identified.
Parameters
string - the label for the field, such as "field2" or "time"
Returns:
-1 or the index of the field.
search for examples
view on GitHub
view source
getFieldLabels
getFieldLabels( ) → String
return the labels found for each field. If a label wasn't found,
then the name is returned.
Returns:
java.lang.String[]
search for examples
view on GitHub
view source
getFieldNames
getFieldNames( ) → String
return the name of each field. field0, field1, ... are the default names when
names are not discovered in the table. Changing the array will not affect
internal representation.
Returns:
java.lang.String[]
search for examples
view on GitHub
view source
getFieldUnits
getFieldUnits( ) → String
return the units that were associated with the field. This might also be
the channel label for spectrograms.
In "field0(str)" or "field0[str]" this is str.
elements may be null if not found.
Returns:
java.lang.String[]
search for examples
view on GitHub
view source
getFillValue
getFillValue( ) → double
return the fillValue. numbers that parse to this value are considered
to be fill. Note validMin and validMax may be used as well.
Returns:
Value of property fillValue.
search for examples
view on GitHub
view source
getHeaderDelimiter
getHeaderDelimiter( ) → String
get the header delimiter
Returns:
the header delimiter.
search for examples
view on GitHub
view source
getRecordParser
getRecordParser( ) → RecordParser
Getter for property recordParser.
Returns:
Value of property recordParser.
search for examples
view on GitHub
view source
getRegexForFormat
getRegexForFormat( String format ) → String
Convert FORTRAN (F77) style format to C-style format specifiers.
Parameters
format - for example "%5d%5d%9f%s"
Returns:
for example "d5,d5,f9,a"
See Also:
org.autoplot.metatree.MetadataUtil#normalizeFormatSpecifier
search for examples
view on GitHub
view source
getRegexParser
getRegexParser( String regex ) → RegexParser
return a regex parser for the given regular expression. Groups are used
for the fields, for example getRegexParser( 'X (\d+) (\d+)' ) would
parse lines like "X 00005 00006".
Parameters
regex -
Returns:
the regex parser
search for examples
view on GitHub
view source
getRegexParserForFormat
getRegexParserForFormat( String format ) → RegexParser
see private TimeParser(String formatString, Map fieldHandlers),
which is very similar.
- "%5d%5d%9f%s"
- "d5,d5,f9,a"
Parameters
format -
Returns:
org.das2.qds.util.AsciiParser.RegexParser
See Also:
org.das2.datum.TimeParser
search for examples
view on GitHub
view source
getRichFields
getRichFields( ) → Map
returns the high rank rich fields in a map from NAME to LABEL.
NAME:>fieldX< or NAME:>fieldX-fieldY<
Returns:
the high rank rich fields in a map from NAME to LABEL.
search for examples
view on GitHub
view source
getValidMax
getValidMax( ) → double
get the maximum value for any field.
Returns:
the validMax
search for examples
view on GitHub
view source
getValidMin
getValidMin( ) → double
get the minimum valid value for any field.
Returns:
validMin
search for examples
view on GitHub
view source
guessDelimParser
guessDelimParser( String line ) → DelimParser
Returns:
org.das2.qds.util.AsciiParser.DelimParser
search for examples
view on GitHub
view source
guessFieldCount
guessFieldCount( String filename ) → int
return the field count that would result in the largest number of records parsed. The
entire file is scanned, and for each line the number of decimal fields is counted. At the end
of the scan, the fieldCount with the highest record count is returned.
Parameters
filename - the file name, a local file opened with a FileReader
Returns:
the apparent field count.
search for examples
view on GitHub
view source
guessSkipAndDelimParser
guessSkipAndDelimParser( String filename ) → DelimParser
read in records, allowing for a header of non-records before
guessing the delim parser. This will return a reference to the
DelimParser and set skipLines. DelimParser header field is set as well.
Parameters
filename -
Returns:
the record parser to use, or null if no records are found.
search for examples
view on GitHub
view source
guessSkipLines
guessSkipLines( String filename, org.das2.qds.util.AsciiParser.RecordParser recParser ) → int
try to figure out how many lines to skip by looking for the line where
the number of fields becomes stable.
Parameters
filename -
recParser -
Returns:
int
search for examples
view on GitHub
view source
isHeader
isHeader( int iline, String lastLine, String thisLine, int recCount ) → boolean
returns true if the line is a header or comment.
Parameters
iline - the line number in the file, starting with 0.
lastLine - the last line read.
thisLine - the line we are testing.
recCount - the number of records successfully read.
Returns:
true if the line is a header line.
search for examples
view on GitHub
view source
isIso8601Time
isIso8601Time( String s ) → boolean
quick-n-dirty check to see if a string appears to be an ISO8601 time.
minimally 2000-002T00:00, but also 2000-01-01T00:00:00Z etc.
Note that an external code may explicitly indicate that the field is a time,
This is just to catch things that are obviously times.
Parameters
s -
Returns:
true if this is clearly an ISO time.
search for examples
view on GitHub
view source
isKeepFileHeader
isKeepFileHeader( ) → boolean
Getter for property keepHeader.
Returns:
Value of property keepHeader.
search for examples
view on GitHub
view source
isRichHeader
isRichHeader( String header ) → boolean
return true if the header appears to contain JSON code which could be
interpreted as a "Rich Header" (a.k.a. JSONHeadedASCII). This is
a very simple test, simply looking for #{ and #}
with a colon contained within.
Parameters
header - string containing the commented header.
Returns:
true if parsing as a Rich Header should be attempted.
See Also:
https://github.com/JSONheadedASCII/examples
search for examples
view on GitHub
view source
newParser
newParser( int fieldCount ) → AsciiParser
creates a parser with @param fieldCount fields, named "field0,...,fieldN"
Parameters
fieldCount - the number of fields
Returns:
the file parser
search for examples
view on GitHub
view source
readFile
readFile( String filename, ProgressMonitor mon ) → WritableDataSet
Parse the file using the current settings.
Parameters
filename - the file to read
mon - a monitor
Returns:
a rank 2 dataset.
search for examples
view on GitHub
view source
readFirstParseableRecord
readFirstParseableRecord( String filename ) → String
returns the first record that the record parser parses successfully. The
recordParser should be set and configured enough to identify the fields.
If no records can be parsed, then null is returned.
The first record should be in the first 1000 lines.
Parameters
filename -
Returns:
the first parseable line, or null if no such line exists.
search for examples
view on GitHub
view source
readFirstRecord
readFirstRecord( String filename ) → String
return the first record that the parser would parse. If skipLines is
more than the total number of lines, or all lines are comments, then null
is returned.
Parameters
filename -
Returns:
the first line after skip lines and comment lines.
search for examples
view on GitHub
view source
readStream
readStream( java.io.Reader in, ProgressMonitor mon ) → WritableDataSet
Parse the stream using the current settings.
Parameters
in - the input stream
mon -
Returns:
org.das2.qds.WritableDataSet
search for examples
view on GitHub
view source
setCommentPrefix
setCommentPrefix( String comment ) → void
Records starting with this are not processed as data, for example "#".
This is initially "#". Setting this to null disables this check.
Parameters
comment - the prefix
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setDelimParser
setDelimParser( String filename, String delim ) → DelimParser
The DelimParser splits each record into fields using a delimiter like ","
or "\\s+".
Parameters
filename - filename to read in.
delim - the delimiter, such as "," or "\t" or "\s+"
Returns:
the record parser that will split each line into fields
search for examples
view on GitHub
view source
setFieldParser
setFieldParser( int field, org.das2.qds.util.AsciiParser.FieldParser fp ) → void
set the special parser for a field.
Parameters
field - the field number, 0 is the first column.
fp - the parser
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setFillValue
setFillValue( double fillValue ) → void
numbers that parse to this value are considered to be fill.
Parameters
fillValue - New value of property fillValue.
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setFixedColumnsParser
setFixedColumnsParser( String filename, String delim ) → FixedColumnsParser
looks at the first line after skipping, and splits it to calculate where
the columns are. The FixedColumnsParser is the fastest of the three parsers.
Parameters
filename - filename to read in.
delim - regex to split the initial line into the fixed columns.
Returns:
the record parser that will split each line.
search for examples
view on GitHub
view source
setHeaderDelimiter
setHeaderDelimiter( String headerDelimiter ) → void
set the delimiter which explicitly separates header from the data.
For example "-------" could be used. Normally the parser just looks at
the number of fields and this is sufficient.
Parameters
headerDelimiter -
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setKeepFileHeader
setKeepFileHeader( boolean keepHeader ) → void
Setter for property keepHeader. By default false but if true, the file header
ignored by skipLines is put into the property PROPERTY_FILE_HEADER.
Parameters
keepHeader - New value of property keepHeader.
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setPropertyPattern
setPropertyPattern( java.util.regex.Pattern propertyPattern ) → void
specify the Pattern used to recognize properties. Note property
values are not parsed, they are provided as Strings. This is a regular
expression with two groups for the property name and value.
For example, (.+)=(.+)
Parameters
propertyPattern - regular expression Pattern with two groups.
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setRecordCountLimit
setRecordCountLimit( int recordCountLimit ) → void
limit the number of records read. parsing will stop once this number of
records is read. This is Integer.MAX_VALUE by default.
Parameters
recordCountLimit -
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setRecordParser
setRecordParser( org.das2.qds.util.AsciiParser.RecordParser recordParser ) → void
Setter for property recordParser.
Parameters
recordParser - New value of property recordParser.
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setRegexParser
setRegexParser( java.lang.String[] fieldNames ) → RecordParser
The regex parser is a slow parser, but gives precise control.
Parameters
fieldNames -
Returns:
the parser for each record.
search for examples
view on GitHub
view source
setSkipLines
setSkipLines( int skipLines ) → void
skip a number of lines before trying to parse anything. This can be
set to point at the first valid line, and the RecordParser will be
configured using that line.
Parameters
skipLines -
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setValidMax
setValidMax( double validMax ) → void
set the maximum value for any field. Values above this are to be
considered invalid.
Parameters
validMax -
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setValidMin
setValidMin( double validMin ) → void
set the minimum valid value for any field. Values less than
this are to be considered invalid.
Parameters
validMin -
Returns:
void (returns nothing)
search for examples
view on GitHub
view source
setWhereConstraint
setWhereConstraint( String sparm, String op, String sval ) → void
allow constraint for where condition is true. This doesn't
need the data to be interpreted for "eq", string equality is checked
for nominal data. Note sval is compared after trimming outside spaces.
Parameters
sparm - column name, such as "field4"
op - constraint, one of eq gt ge lt le ne
sval - String value. For nominal columns, String equality is used.
Returns:
void (returns nothing)
search for examples
view on GitHub
view source