Date: Mon, 5 Apr 2010 10:57:27 -0700
Reply-To: "Terjeson, Mark" <Mterjeson@RUSSELL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Terjeson, Mark" <Mterjeson@RUSSELL.COM>
Subject: Re: Fixed Width file
Content-Type: text/plain; charset="us-ascii"
Basically, the record length (line length)
and the human-applied fields within are
all fixed length. For example, if we take
the infamous sashelp.class sample dataset
and export it out as csv, we have a comma
separated text file that looks like this:
Obviously, if we remove the comma delimiters
the fields within the records would be a bit
tough to locate and parse out.
Back in the day when speed and storage were
both very costly, you could easily decide
you size of the file and pick which served
you more beneficially. i.e. taking the time
to locate and parse out the delimiters took
a lot of horsepower vs. virtually no extra
overhead to have fixed length fields and
merely grab the known positions and lengths.
Name Sex Age Height Weight
Alfred M 14 69 112.5
Alice F 13 56.5 84
Barbara F 13 65.3 98
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
James M 12 57.3 83
Jane F 12 59.8 84.5
Janet F 15 62.5 112.5
Jeffrey M 13 62.5 84
John M 12 59 99.5
Joyce F 11 51.3 50.5
Judy F 14 64.3 90
Louise F 12 56.3 77
Mary F 15 66.5 112
Philip M 16 72 150
Robert M 12 64.8 128
Ronald M 15 67 133
Thomas M 11 57.5 85
William M 15 66.5 112
You can recall that the import wizard
in Excel provides these two main choices
of parsing the incoming text file with
fixed or delimited. You can see that
each of the fields are a fixed width and
are padded out with spaces. Including the
last field which is also padded out which
means each line (record) will be a fixed
total length as well.
Nowadays with storage to cheap and CPU
speed so cheap and readily available the
choices now are more along the lines of
what is easiest or compatible with different
language and application products that we
The above comments are mostly discussing
the characteristics of incoming data files
or the reading thereof. Output to various
display or reporting needs will many times
require us to turn a record into some
variation of a fixed length text string in
order to have the fields displayed land in
vertically uniform columns regardless of
the varying individual field values and thus
we have created fixed length records for each
text string even though not sent to a resultant
Thus FIXED WIDTH or FIXED LENGTH meaning the
same thing. As far as ASCII FILE, people
commonly say "ASCII file" to mean a text file.
Most prevalently, the PC, DOS, or Windows type
computers are found to primarily use the ASCII
standard character set.
There are times you might be involved with files
to or from mainframes which are based on the
EBCDIC character set.
Of course any file is just made up of bytes.
And a byte-is-a-byte-is-a-byte-is-a-byte-is-a-byte.
It is just the meanings and complexity of purpose
we humans put upon these bytes or groups of bytes
in a file that we start applying different names
to these file merely because we are grouping bytes
with a common set of rules upon the file's content.
So for ASCII text files or EBCDIC text files, this
just means that for ASCII when ever you see a byte
value of 65 that we humans want to consistently
represent that byte value 65 as an uppercase-A.
Since a byte is 8 bits this means that in base-2
we can accomodate byte values of 0-255.
In ASCII, characters 0-9 are byte values 48-57,
uppercase A-Z are 65-90, lowercase a-z are 97-122,
control-characters ^A-^Z are 1-31, null is 0,
space is 32, and various punctuation characters
fill in between 33-47, 58-64, 91-96, and 123-126,
with the DELete/rubout key as 127.
These are the lower half of the 0-255 which make
up the 0-127 or non-highbit portion of the character
set. In base-2(binary) when the uppermost bit7 is
a 0 you get the decimal values 0-127. Otherwise
known as 7bit-ASCII.
The lower half of the ASCII character set rarely
has change the meanings of each byte values for
several decades now. In base-2(binary) when the
uppermost bit7 is a 1 you get the decimal values
128-255. These have been used for several purposes
over the years and different meanings have been
applied, but much of the time these highbit bytes
where bit7 is 1 the values 128-255 have been used
for graphic types of characters. The 8-bit ASCII
does vary meanings and usages for these values
The EBCDIC found more with mainframe operating
systems also has some roots back into earlier
days of teletype and other 4 and 8 bit processor
devices. For example, you will find the alphabet
chopped up into segments separated into the 1-9
portion of the hex byte values. e.g. A-I is C1-C9,
and then a break with J-R is D1-D9, etc., and
opposite of ASCII the lowercase letters in EBCDIC
are lower values than the uppercase letters, with
number-characters 0-9 as F0-F9.
This did make letters and numbers easier to see
when dumping the guts of EBCDIC files in their
packed byte value formats. But some of the reason
was for ease of circuit design for discrete wiring of
more primitive devices pre-Integrated Circuit chips.
There is a myriad of reasons and uses that different
folks have come up with. I have just mentioned a very
By far, I would guess that it would safe to say that
the ASCII 7bit character set is most commonly seen in
computers, hand-held devices, and current day radio
Hope this is helpful.
Investment Business Intelligence
Investment Management & Research
Global Leaders in Multi-Manager Investing
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Tom
Sent: Monday, April 05, 2010 9:49 AM
Subject: Fixed Width file
What is it usually meant by the wording FIXED WIDTH ASCII file?