Date: Tue, 23 Jul 2002 09:50:41 -0400
Reply-To: Steve Albert <salbert@AOL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Steve Albert <salbert@AOL.COM>
Subject: Re: How to sift data to find Time Gaps
All the responses so far have sought gaps between observations that exceed
60 seconds; however, I don't think that's what the original poster
requested:
>I now need to create a process that will show where the time gaps are.
> All data should be consecutive for the day. I need a routine that will
> validate that there is at least one entry per minute per hour per day
> per month. Then produce a report/graph that only shows me where the
> data is missing.
So it seems we need to find minutes during which there is no observation,
not observations that are more than 60 seconds after the previous one. (So
if we have an observation at 12:00:01 and another at 12:01:59, we have
observations in both the 12:00 and the 12:01 minutes, even though the time
between them exceeded 60 seconds.)
One possible approach:
1. Truncate all times to the beginning of the minute involved.
2. Search for gaps using the LAG approach others have suggested.
If the difference exceeds 60 seconds, we know that we have at least one
minute with no observation; keeping the difference would let us infer the
length of the gap. (Note number of missing minutes is difference in
minutes minus one -- a gap of one is OK.) If need be, we could generate a
dataset the included every minute that was part of the gap, from the end
time of the gap together with the gap length; alternatively, using LAG we
could add the last good time onto the record, so we'd generate a set of
records with a "previous good minute" and "next good minute" for each gap.
Another possible approach:
1. Truncate all times to the beginning of the minute involved.
2. Generate a dataset consisting only of date times, one observation for
each minute over the relevant time period.
3. Merge that dataset against the real dataset; anything that doesn't have
a match will then be part of a gap, so keeping the non-match cases produces
precisely the set of gap minutes that the poster seemed to originally
request.
Not as elegant, and probably requiring more CPU time, but perhaps a
conceptually easier way to directly produce the required dataset with every
missing minute included, for the report and graph requested.
Steve Albert
Director of Biostatistics
Spectrum Pharmaceutical Research Corp.
San Antonio, TX
SAlbert at SpectrumCRO dot com
|