| Date: | Mon, 27 Aug 2001 17:00:40 -0400 |
| Reply-To: | Mike Rhoads <RHOADSM1@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Mike Rhoads <RHOADSM1@WESTAT.COM> |
| Subject: | Re: tcpip parsing |
|
| Content-Type: | text/plain; charset="iso-8859-1" |
|---|
Brad,
I had a couple thoughts on your problem.
One alternative to converting each TCP/IP address from base 256 would be to
zero-fill each of the 4 parts to the maximum of 3 digits/characters. You
could make the final variable either numeric (e.g. 12,010,117,208) or
character (012.010.117.208), as you prefer. Either of these would allow the
addresses to be compared, and might be easier to develop and debug since it
is easier to "map" back into the original representation. Your idea should
certainly work, however.
Also, I don't see why you'd need to compare "every number to every other
number", unless your input data set is so huge that you can't easily sort
it. I would start by sorting by LoTcpIp, and within that by descending
HiTcpIp. After that I think it's just a matter of a DATA step based on a
"look-ahead" read to the next record, where your decision algorithm is
something like:
Hold on to the current record if and only if LoTcpIp from the next record is
greater than the HiTcpIp from the current record.
I am assuming you have no "partial overlaps" (e.g. 1 - 100 followed by 50 -
200) -- seems a little unlikely given your data, and I don't know what you'd
want to do with them anyway.
Note that this is a late-afternoon, completely untested algorithm ...
Mike Rhoads
Westat
RhoadsM1@Westat.com
-----Original Message-----
From: Brad Goldman [mailto:Brad.Goldman@AUTOTRADER.COM]
Sent: Monday, August 27, 2001 2:40 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: tcpip parsing
I have a dataset, which I have created using the whois function from unix.
This dataset has variables for the hostname and tcpip range, for example:
host range
------ ------
AT&T 12.0.0.0 - 12.255.255.255
Meddve, Inc. 12.10.117.208 - 12.10.117.223
Husky Corp 12.10.117.32 - 12.10.117.47
...
What I would like to do is to only choose the most specific range(s). In
this case, the latter two lines are "included" in the AT&T range, so I want
to discard the AT&T entry. If there were another host with range
12.10.177.0 - 12.10.177.255 I would discard that also. Any bright ideas how
to proceed? All I can see is to convert the beg and end tcpips into two big
numbers (as if the tcpip was a 4 digit, base 256 number). Then these
numbers can be compared to each other. (I see no way to avoid comparing
every number to every other one, any tricks there would help also.)
My eventual goal is to create a format where a given tcpip can be mapped to
a host name.
Much thanks in advance,
Brad Goldman
|