| Date: | Sat, 7 Nov 1998 20:33:43 -0600 |
| Reply-To: | "Matheson, David" <davidm@SPSS.COM> |
| Sender: | "SPSSX(r) Discussion" <SPSSX-L@UGA.CC.UGA.EDU> |
| From: | "Matheson, David" <davidm@SPSS.COM> |
| Subject: | Re: Pivoting Data...Is it posiible? |
|
Sandy,
The following SPSS syntax would restructure the data as you describe.
In this code, the 2 existing variables are called v1 (the question number)
and v2 (the observed value). The code results in a 70,000 case file with
the data for questions 1 to 10 in the variables ques1, ques2,...ques10.
To adapt for the number of questions in your survey, replace the '10's in
the VECTOR and AGGREGATE commands with the proper number of questions.
compute id = mod(($casenum - 1),70000) + 1 .
execute.
sort cases by id .
vector ques (10).
compute ques(v1) = v2.
aggregate outfile = * / presorted
/break = id
/ques1 to ques10 = max(ques1 to ques10).
Note that I sorted the file before aggregating. The AGGREGATE command
will work with unsorted files, but the memory demand increases with the
number of break groups. With 70k break groups, the memory demand
would be extremely large. If the file is sorted by the break variable and
you
use the PRESORTED subcommand in AGGREGATE, the memory demand is
limited to each break group in turn.
Note also that I sorted before creating ques1 to ques10, so that the file to
be
sorted would be as small as possible. You generally need free disk space on
the order of 3 times the file to be sorted, or nearly 60mb in this case.
The above code works on the assumption that v1 is a numeric variable
that has a number from 1 to k, where there are k questions. If this is not
the
case, we may need more information on the nature of v1 - perhaps
an autorecode or use of the NUMBER function in COMPUTE will work.
However, the validity of the whole solution turns on the assumptions that
there are no missing rows of data (so that every case is represented for
every question, even if the value of v2 is missing) and that the order
of respondents is identical across all questions. Under the 1st assumption,
you could create a numeric question-number variable as follows:
compute q1 = trunc(($casenum-1)/70000) + 1.
An alternate approach, that does not assume an equal number of rows
for each question, is provided below:
do if ($casenum = 1) .
compute q1 = 1.
else if (v1 = lag(v1)).
compute q1 = lag(q1).
else if (v1 ne lag(v1)) .
compute q1 = lag(q1) + 1.
end if.
However, if the 2 assumptions above are not met, there would
have to be an existing ID variable that you haven't mentioned or the
whole solution falls apart.
The command(s) to compute q1 would be placed just after (or just before) the
COMPUTE ID
command, before the EXECUTE command above. You would then use
q1 in place of v1 as the subscript in the COMPUTE QUES() command.
Finally, if v2 was a string variable and ques1 to ques10 were also to be
string
variables (i.e., barring string-to-number transformation commands), you
would
need to adjust the VECTOR command to designate the new variables as
strings, as in
vector ques (10,a6).
for 10 strings of length 6.
I hope this helps.
David Matheson
SPSS Technical Support
> -----Original Message-----
> From: Sandra Murray [SMTP:smurray@DOE.STATE.VT.US]
> Sent: Friday, November 06, 1998 12:10 PM
> To: SPSSX-L@UGA.CC.UGA.EDU
> Subject: Pivoting Data...Is it posiible?
>
> Hi - I am working with a huge database in that is structured
> ineffectively for the work I wish to do with it. Is it possible for me
> to "pivot" the data?
> More specifically, this data captures responses for a survey
> consisting of several questions, conducted with 70,000 people. The data
> was all entered vertically - Variable 1 is a question and variable 2 are
> the responses. On line 70,001 varaible 1 becomes the second question and
> the for the next 70,000 lines, varaible 2 are the responses to that
> question, and so on.
> I want to arrange the questions across the spreadsheet horizontally
> (as variables)and have just 70,000 rows in the spreadsheet - with each
> case's responses appearing under the various questions that are spread
> across the columns. Is this possible at the data editor level, or do we
> have to re-enter the data?
> If anyone has an answer or suggestion, I'd be most appreciative!!
> Thank you - Sandy Murray
|