LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2008, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 24 Jun 2008 23:01:30 -0400
Reply-To:   Arthur Tabachneck <art297@NETSCAPE.NET>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Arthur Tabachneck <art297@NETSCAPE.NET>
Subject:   Re: help: reading in open-ended text word-by-word
Comments:   To: Matthew Yurdin <Matthew.Yurdin@UCOP.EDU>

Matt,

As Jack indicated, what you are seeking is far from trivial. However, if all of your sentences are like those in your example, something like the following might come close:

data want (drop=hold_word); length word hold_word $50.; retain hold_word; infile cards end=eof; input word $ @@; if _n_ eq 1 or (substr(reverse(trim(hold_word)),1,1) eq '.' and rank(substr(word,1,1)) in (65:90)) then do; count=1; sentence+1; end; else count+1; hold_word=word; word=lowcase(compress(word,'.')); cards; Siddhartha was thus loved by everyone. He was a source of joy for everybody. ;

HTH, Art --------- On Tue, 24 Jun 2008 16:05:02 -0700, Matthew Yurdin <Matthew.Yurdin@UCOP.EDU> wrote:

>I am looking for a way to read a txt file containing one very long piece >of open-ended text (i.e., a book-like document) into a >one-word-per-observation dataset with a within-sentence word count and a >sentence count. For example, I'd like to turn these two sentences: > >Siddhartha was thus loved by everyone. He was a source of joy for >everybody. > >into these rows: > >VALUE COUNT SENTENCE >siddhartha 1 1 >was 2 1 >thus 3 1 >loved 4 1 >by 5 1 >everyone 6 1 >he 1 2 >was 2 2 >a 3 2 >source 4 2 >of 5 2 >joy 6 2 >for 7 2 >everybody 8 2 > >Any advice on how to go about this? My apologies if this is something >that's already been asked and answered; I didn't see examples with >anything this unstructured in the list archive. > >Thanks >-Matt


Back to: Top of message | Previous page | Main SAS-L page