Date: Mon, 1 Oct 2007 11:19:47 -0300
Reply-To: Hector Maletta <hmaletta@fibertel.com.ar>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Hector Maletta <hmaletta@fibertel.com.ar>
Subject: Re: "Match files" with duplicates in table
In-Reply-To: <000301c8042c$18288540$48798fc0$@com>
Content-Type: text/plain; charset="iso-8859-1"
The problem with Catherine's situation is that she wants first to
assign student's data to all course registrations, and then she wants to add
student data for all students not registered in any course.
The first problem is well solved with a TABLE subcommand, using the
general student list as a TABLE and the course registration list as a file.
Of course, the latter may contain duplicate ID records since the same
students may have registered in more than one course.
For the second problem, I responded in haste and late at night,
sorry. Before matching or adding, one needs to exclude from the student list
all students already included before, i.e. exclude all students registered
for courses, and then ADD to the course file the remaining students from the
student list. This complicates things a bit.
The complete process may look like this:
*Assign student data to course registration records.
MATCH FILES/TABLE='UNDUPCOMPLETE.SAV'/FILE='DupPartial.SAV'/by ID.
SAVE OUTFILE 'COURSEFILE.SAV'.
*Flag records with course registration.
COMPUTE REGCOURSE=1.
*Aggregate registration records by student.
AGGREGATE OUTFILE=*/PRESORTED/BREAK ID
/REGCOURSE=MAX(REGCOURSE).
*Match records of student registered in courses, with student list.
MATCH FILES /FILE 'UNDUPCOMPLETE.SAV'/FILE=*/BY ID.
*Exclude from list all student registered in courses.
SELECT IF SYSMIS(REGCOURSE) OR REGCOURSE=0.
*Add registered and non-registered students in a single list.
ADD FILES /FILE 'COURSEFILE.SAV'/FILE *.
SAVE OUTFILE 'FINAL.SAV'.
This is untested. Hope it works.
Hector
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
ViAnn Beadle
Sent: 01 October 2007 10:08
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: "Match files" with duplicates in table
That "warning" message is more of a note and usually can be ignored
for a
FILE file. It is however, a fatal error for a TABLE file.
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On
Behalf Of
Richard Ristow
Sent: Sunday, September 30, 2007 4:40 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: "Match files" with duplicates in table
At 02:30 PM 9/28/2007, Hector Maletta wrote:
>My final result is thus:
SPSS 14 draft output, test data as before:
MATCH FILES
/TABLE = UNDUPCOMPLETE
/FILE = DupPartial/by ID.
MATCH FILES /FILE = * /FILE= UNDUPCOMPLETE /BY ID.
LIST.
List
|-----------------------------|---------------------------|
|Output Created |30-SEP-2007 17:47:58 |
|-----------------------------|---------------------------|
id name Major
1 Ann
2 Bill ENGL
File #1
KEY: 3
>Warning # 5132
>Duplicate key in a file. The BY variables do
not uniquely identify each
>case on the indicated file. Please check the results carefully.
3 Chris HIST
3 Chris THEO
4 David
5 Ethel ENGL
6 Frank
Number of cases read: 7 Number of cases listed: 7
(Gary Moser, <9000.hal@gmail.com>, tested
earlier, and reported the same error message that's here.)
Hector wrote,
>There might be a more elegant solution, but it
>is late here and I cannot think of anything better right now.
THERE'S a question of aesthetics. I expected,
myself, that a neat double MATCH FILES would do
it. I rejected it without writing the code
because of the problem of non-unique keys, as
reported in the warning message above. But
Hector's code gives the right result, and I think will do so
reliably.
I don't like to put code in production that gives
warning messages. The last thing you want is to
leave users either blasè or confused about
whether to pay attention to warning messages. But
Hector's code does work, and it's simpler and
clearer than the MATCH FILES/ADD FILES logic I posted.
Now we can be like one of those TV shows where
viewers vote on which singer, or something, is
the best. Or maybe, simply, de gustibus non disputandum est.