| Date: | Fri, 21 Jun 1996 17:07:07 +0000 |
| Reply-To: | "Bruce A. Rayton" <rayton@WUECONA.WUSTL.EDU> |
| Sender: | "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU> |
| From: | "Bruce A. Rayton" <rayton@WUECONA.WUSTL.EDU> |
| Subject: | SIC codes -- Leading zeros |
|---|
I need to break down some government data into component industries for
merging into another dataset.
This government file looks something like this:
INDUSTRY VALUE
07 12
007 30
0007 7
7 99
70 200
700 300
7000 400
The firm data file looks like this:
FIRM SIC SIC3 SIC2 SIC1
A 7000 700 70 7
B 7001 700 70 7
C 721 72 7 0
D 728 72 7 0
D 79 7 0 0
E 7 0 0 0
I need to construct SIC, SIC2, SIC3, and SIC1 in the government dataset so
that I can merge values into the firm dataset by appropriate industry
grouping. The goal is to put the least aggregated government data available
into the firm dataset.
E.g., if (4-digit data available) then merge it in;
else if (3-digit data available) then merge it in;
else if (2-digit data available) then merge it in;
** The data is available for every 2-digit industry.
I need the first observation to be associated with the two-digit industry
and the second to be associated with the three digit industry. Suppose I
read in the INDUSTRY variable as numeric: This method would properly
classify observations 4-7, but it would fail on observations 1-3. It would
cause the first four observations in the government dataset get an industry
number of 7. (Perhaps I've discovered the REAL reason everyone focuses on
manufacturing firms -- they have SIC codes 2000-3900 <g>)
Notice that this isn't a problem for industries that don't start with a
zero. The numeric representation works just fine in that instance, and I
can separate the data based on the range of the INDUSTRY variable. For
example, all industry numbers between 100 and 999 are three-digit
industries (if we ignore this problem). The trick is distinguishing the
industries that start with zero.
If I read the INDUSTRY in as a character variable then it doesn't match the
SIC codes I have in the main dataset. I can pull SUBSTRings of these
variables, but I can't figure out a way to make this help me.
Help greatly appreciated.
Bruce
rayton@wuecona.wustl.edu
***********************************************************************
| Dr. Bruce A. Rayton Office: (0115) 941-8418 |
| Nottingham Trent University, Dept of EPA Home: (0115) 985-6821 |
| Burton Street Fax: (0115) 948-6808 |
| Nottingham NG1 4BU rayton@wuecona.wustl.edu |
| England http://wuecon.wustl.edu/~bruceray |
***********************************************************************
<<< I prefer electronic deliveries of manuscripts >>>
|