org.das2.qds.util.AsciiParser

Class for reading ASCII tables into a QDataSet. This parses a file by breaking it up into records, and passing the record off to a delegate record parser. The record parser then breaks up the record into fields, and each field is parsed by a delegate field parser. Each column of the table has a Unit, field name, and field label associated with it. Examples of record parsers include DelimParser, which splits the record by a delimiter such as a tab or comma, RegexParser, which processes each record with a regular expression to get the fields, and FixedColumnsParser, which splits the record by character positions. Example of field parsers include DOUBLE_PARSER which parses the value as a double, and UNITS_PARSER, which uses the Unit attached to the column to interpret the value. When the first record with the correct number of fields is found but is not parseable, we look for field labels and units. The skipLines property tells the parser to skip a given number of header lines before attempting to parse the record. Also, commentPrefix identifies lines to be ignored. In either the header or in comments, we look for propertyPattern, and if a property is matched, then the builder property is set. Two Patterns are provided NAME_COLON_VALUE_PATTERN and NAME_EQUAL_VALUE_PATTERN for convenience. Adapted to QDataSet model, Jeremy, May 2007.


NAME_COLON_VALUE_PATTERN

pattern for name:value.


NAME_EQUAL_VALUE_PATTERN

pattern for name=value.


PROPERTY_FIELD_NAMES


PROPERTY_FILE_HEADER


PROPERTY_FIRST_RECORD


PROPERTY_FIELD_PARSER


DELIM_COMMA


DELIM_TAB


DELIM_WHITESPACE


UNIT_UTC

Convenient unit for parsing UTC times.


PROP_HEADERDELIMITER


DOUBLE_PARSER

parses the field using Double.parseDouble, Java's double parser.


UNITS_PARSER

delegates to the unit object set for this field to parse the data.


ENUMERATION_PARSER

uses the EnumerationUnits for the field to create a Datum.


PROP_VALIDMIN


PROP_VALIDMAX


getDelimParser

getDelimParser( int fieldCount, String delim ) → DelimParser

provide more control to external codes by providing a way to assert that an N-column delim parser should be used.

Parameters

fieldCount -
delim - the delimiter pattern, such as "," or "\s+"

Returns:

the DelimParser.

search for examples view on GitHub view source


getFieldCount

getFieldCount( ) → int

return the number of fields in each record. Note the RecordParsers also have a fieldCount, which should be equal to this. This allows them to be independent of the parser.

Returns:

int

search for examples view on GitHub view source


getFieldIndex

getFieldIndex( String string ) → int

returns the index of the field. Supports the name, or field0, or 0, etc. returns -1 when the column is not identified.

Parameters

string - the label for the field, such as "field2" or "time"

Returns:

-1 or the index of the field.

search for examples view on GitHub view source


getFieldLabels

getFieldLabels( ) → String

return the labels found for each field. If a label wasn't found, then the name is returned.

Returns:

java.lang.String[]

search for examples view on GitHub view source


getFieldNames

getFieldNames( ) → String

return the name of each field. field0, field1, ... are the default names when names are not discovered in the table. Changing the array will not affect internal representation.

Returns:

java.lang.String[]

search for examples view on GitHub view source


getFieldUnits

getFieldUnits( ) → String

return the units that were associated with the field. This might also be the channel label for spectrograms. In "field0(str)" or "field0[str]" this is str. elements may be null if not found.

Returns:

java.lang.String[]

search for examples view on GitHub view source


getFillValue

getFillValue( ) → double

return the fillValue. numbers that parse to this value are considered to be fill. Note validMin and validMax may be used as well.

Returns:

Value of property fillValue.

search for examples view on GitHub view source


getHeaderDelimiter

getHeaderDelimiter( ) → String

get the header delimiter

Returns:

the header delimiter.

search for examples view on GitHub view source


getRecordParser

getRecordParser( ) → RecordParser

Getter for property recordParser.

Returns:

Value of property recordParser.

search for examples view on GitHub view source


getRegexForFormat

getRegexForFormat( String format ) → String

Convert FORTRAN (F77) style format to C-style format specifiers.

Parameters

format - for example "%5d%5d%9f%s"

Returns:

for example "d5,d5,f9,a"

See Also:

org.autoplot.metatree.MetadataUtil#normalizeFormatSpecifier


search for examples view on GitHub view source


getRegexParser

getRegexParser( String regex ) → RegexParser

return a regex parser for the given regular expression. Groups are used for the fields, for example getRegexParser( 'X (\d+) (\d+)' ) would parse lines like "X 00005 00006".

Parameters

regex -

Returns:

the regex parser

search for examples view on GitHub view source


getRegexParserForFormat

getRegexParserForFormat( String format ) → RegexParser

see private TimeParser(String formatString, Map fieldHandlers), which is very similar.

Parameters

format -

Returns:

org.das2.qds.util.AsciiParser.RegexParser

See Also:

org.das2.datum.TimeParser


search for examples view on GitHub view source


getRichFields

getRichFields( ) → Map

returns the high rank rich fields in a map from NAME to LABEL. NAME:>fieldX< or NAME:>fieldX-fieldY<

Returns:

the high rank rich fields in a map from NAME to LABEL.

search for examples view on GitHub view source


getValidMax

getValidMax( ) → double

get the maximum value for any field.

Returns:

the validMax

search for examples view on GitHub view source


getValidMin

getValidMin( ) → double

get the minimum valid value for any field.

Returns:

validMin

search for examples view on GitHub view source


guessDelimParser

guessDelimParser( String line ) → DelimParser

Returns:

org.das2.qds.util.AsciiParser.DelimParser

search for examples view on GitHub view source


guessFieldCount

guessFieldCount( String filename ) → int

return the field count that would result in the largest number of records parsed. The entire file is scanned, and for each line the number of decimal fields is counted. At the end of the scan, the fieldCount with the highest record count is returned.

Parameters

filename - the file name, a local file opened with a FileReader

Returns:

the apparent field count.

search for examples view on GitHub view source


guessSkipAndDelimParser

guessSkipAndDelimParser( String filename ) → DelimParser

read in records, allowing for a header of non-records before guessing the delim parser. This will return a reference to the DelimParser and set skipLines. DelimParser header field is set as well.

Parameters

filename -

Returns:

the record parser to use, or null if no records are found.

search for examples view on GitHub view source


guessSkipLines

guessSkipLines( String filename, org.das2.qds.util.AsciiParser.RecordParser recParser ) → int

try to figure out how many lines to skip by looking for the line where the number of fields becomes stable.

Parameters

filename -
recParser -

Returns:

int

search for examples view on GitHub view source


isHeader

isHeader( int iline, String lastLine, String thisLine, int recCount ) → boolean

returns true if the line is a header or comment.

Parameters

iline - the line number in the file, starting with 0.
lastLine - the last line read.
thisLine - the line we are testing.
recCount - the number of records successfully read.

Returns:

true if the line is a header line.

search for examples view on GitHub view source


isIso8601Time

isIso8601Time( String s ) → boolean

quick-n-dirty check to see if a string appears to be an ISO8601 time. minimally 2000-002T00:00, but also 2000-01-01T00:00:00Z etc. Note that an external code may explicitly indicate that the field is a time, This is just to catch things that are obviously times.

Parameters

s -

Returns:

true if this is clearly an ISO time.

search for examples view on GitHub view source


isKeepFileHeader

isKeepFileHeader( ) → boolean

Getter for property keepHeader.

Returns:

Value of property keepHeader.

search for examples view on GitHub view source


isRichHeader

isRichHeader( String header ) → boolean

return true if the header appears to contain JSON code which could be interpreted as a "Rich Header" (a.k.a. JSONHeadedASCII). This is a very simple test, simply looking for #{ and #} with a colon contained within.

Parameters

header - string containing the commented header.

Returns:

true if parsing as a Rich Header should be attempted.

See Also:

https://github.com/JSONheadedASCII/examples


search for examples view on GitHub view source


newParser

newParser( int fieldCount ) → AsciiParser

creates a parser with @param fieldCount fields, named "field0,...,fieldN"

Parameters

fieldCount - the number of fields

Returns:

the file parser

search for examples view on GitHub view source


readFile

readFile( String filename, ProgressMonitor mon ) → WritableDataSet

Parse the file using the current settings.

Parameters

filename - the file to read
mon - a monitor

Returns:

a rank 2 dataset.

search for examples view on GitHub view source


readFirstParseableRecord

readFirstParseableRecord( String filename ) → String

returns the first record that the record parser parses successfully. The recordParser should be set and configured enough to identify the fields. If no records can be parsed, then null is returned. The first record should be in the first 1000 lines.

Parameters

filename -

Returns:

the first parseable line, or null if no such line exists.

search for examples view on GitHub view source


readFirstRecord

readFirstRecord( String filename ) → String

return the first record that the parser would parse. If skipLines is more than the total number of lines, or all lines are comments, then null is returned.

Parameters

filename -

Returns:

the first line after skip lines and comment lines.

search for examples view on GitHub view source


readStream

readStream( java.io.Reader in, ProgressMonitor mon ) → WritableDataSet

Parse the stream using the current settings.

Parameters

in - the input stream
mon -

Returns:

org.das2.qds.WritableDataSet

search for examples view on GitHub view source


setCommentPrefix

setCommentPrefix( String comment ) → void

Records starting with this are not processed as data, for example "#". This is initially "#". Setting this to null disables this check.

Parameters

comment - the prefix

Returns:

void (returns nothing)

search for examples view on GitHub view source


setDelimParser

setDelimParser( String filename, String delim ) → DelimParser

The DelimParser splits each record into fields using a delimiter like "," or "\\s+".

Parameters

filename - filename to read in.
delim - the delimiter, such as "," or "\t" or "\s+"

Returns:

the record parser that will split each line into fields

search for examples view on GitHub view source


setFieldParser

setFieldParser( int field, org.das2.qds.util.AsciiParser.FieldParser fp ) → void

set the special parser for a field.

Parameters

field - the field number, 0 is the first column.
fp - the parser

Returns:

void (returns nothing)

search for examples view on GitHub view source


setFillValue

setFillValue( double fillValue ) → void

numbers that parse to this value are considered to be fill.

Parameters

fillValue - New value of property fillValue.

Returns:

void (returns nothing)

search for examples view on GitHub view source


setFixedColumnsParser

setFixedColumnsParser( String filename, String delim ) → FixedColumnsParser

looks at the first line after skipping, and splits it to calculate where the columns are. The FixedColumnsParser is the fastest of the three parsers.

Parameters

filename - filename to read in.
delim - regex to split the initial line into the fixed columns.

Returns:

the record parser that will split each line.

search for examples view on GitHub view source


setHeaderDelimiter

setHeaderDelimiter( String headerDelimiter ) → void

set the delimiter which explicitly separates header from the data. For example "-------" could be used. Normally the parser just looks at the number of fields and this is sufficient.

Parameters

headerDelimiter -

Returns:

void (returns nothing)

search for examples view on GitHub view source


setKeepFileHeader

setKeepFileHeader( boolean keepHeader ) → void

Setter for property keepHeader. By default false but if true, the file header ignored by skipLines is put into the property PROPERTY_FILE_HEADER.

Parameters

keepHeader - New value of property keepHeader.

Returns:

void (returns nothing)

search for examples view on GitHub view source


setPropertyPattern

setPropertyPattern( java.util.regex.Pattern propertyPattern ) → void

specify the Pattern used to recognize properties. Note property values are not parsed, they are provided as Strings. This is a regular expression with two groups for the property name and value. For example, (.+)=(.+)

Parameters

propertyPattern - regular expression Pattern with two groups.

Returns:

void (returns nothing)

search for examples view on GitHub view source


setRecordCountLimit

setRecordCountLimit( int recordCountLimit ) → void

limit the number of records read. parsing will stop once this number of records is read. This is Integer.MAX_VALUE by default.

Parameters

recordCountLimit -

Returns:

void (returns nothing)

search for examples view on GitHub view source


setRecordParser

setRecordParser( org.das2.qds.util.AsciiParser.RecordParser recordParser ) → void

Setter for property recordParser.

Parameters

recordParser - New value of property recordParser.

Returns:

void (returns nothing)

search for examples view on GitHub view source


setRegexParser

setRegexParser( java.lang.String[] fieldNames ) → RecordParser

The regex parser is a slow parser, but gives precise control.

Parameters

fieldNames -

Returns:

the parser for each record.

search for examples view on GitHub view source


setSkipLines

setSkipLines( int skipLines ) → void

skip a number of lines before trying to parse anything. This can be set to point at the first valid line, and the RecordParser will be configured using that line.

Parameters

skipLines -

Returns:

void (returns nothing)

search for examples view on GitHub view source


setValidMax

setValidMax( double validMax ) → void

set the maximum value for any field. Values above this are to be considered invalid.

Parameters

validMax -

Returns:

void (returns nothing)

search for examples view on GitHub view source


setValidMin

setValidMin( double validMin ) → void

set the minimum valid value for any field. Values less than this are to be considered invalid.

Parameters

validMin -

Returns:

void (returns nothing)

search for examples view on GitHub view source


setWhereConstraint

setWhereConstraint( String sparm, String op, String sval ) → void

allow constraint for where condition is true. This doesn't need the data to be interpreted for "eq", string equality is checked for nominal data. Note sval is compared after trimming outside spaces.

Parameters

sparm - column name, such as "field4"
op - constraint, one of eq gt ge lt le ne
sval - String value. For nominal columns, String equality is used.

Returns:

void (returns nothing)

search for examples view on GitHub view source