LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 1996, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 21 Jun 1996 17:07:07 +0000
Reply-To:   "Bruce A. Rayton" <rayton@WUECONA.WUSTL.EDU>
Sender:   "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:   "Bruce A. Rayton" <rayton@WUECONA.WUSTL.EDU>
Subject:   SIC codes -- Leading zeros

I need to break down some government data into component industries for merging into another dataset.

This government file looks something like this:

INDUSTRY VALUE 07 12 007 30 0007 7 7 99 70 200 700 300 7000 400

The firm data file looks like this:

FIRM SIC SIC3 SIC2 SIC1 A 7000 700 70 7 B 7001 700 70 7 C 721 72 7 0 D 728 72 7 0 D 79 7 0 0 E 7 0 0 0

I need to construct SIC, SIC2, SIC3, and SIC1 in the government dataset so that I can merge values into the firm dataset by appropriate industry grouping. The goal is to put the least aggregated government data available into the firm dataset.

E.g., if (4-digit data available) then merge it in; else if (3-digit data available) then merge it in; else if (2-digit data available) then merge it in; ** The data is available for every 2-digit industry.

I need the first observation to be associated with the two-digit industry and the second to be associated with the three digit industry. Suppose I read in the INDUSTRY variable as numeric: This method would properly classify observations 4-7, but it would fail on observations 1-3. It would cause the first four observations in the government dataset get an industry number of 7. (Perhaps I've discovered the REAL reason everyone focuses on manufacturing firms -- they have SIC codes 2000-3900 <g>)

Notice that this isn't a problem for industries that don't start with a zero. The numeric representation works just fine in that instance, and I can separate the data based on the range of the INDUSTRY variable. For example, all industry numbers between 100 and 999 are three-digit industries (if we ignore this problem). The trick is distinguishing the industries that start with zero.

If I read the INDUSTRY in as a character variable then it doesn't match the SIC codes I have in the main dataset. I can pull SUBSTRings of these variables, but I can't figure out a way to make this help me.

Help greatly appreciated.

Bruce

rayton@wuecona.wustl.edu

*********************************************************************** | Dr. Bruce A. Rayton Office: (0115) 941-8418 | | Nottingham Trent University, Dept of EPA Home: (0115) 985-6821 | | Burton Street Fax: (0115) 948-6808 | | Nottingham NG1 4BU rayton@wuecona.wustl.edu | | England http://wuecon.wustl.edu/~bruceray | *********************************************************************** <<< I prefer electronic deliveries of manuscripts >>>


Back to: Top of message | Previous page | Main SAS-L page