Date: Thu, 20 Dec 2007 11:34:43 -0500
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: SPEEDIS - underlying algorithm?
In-Reply-To: <884F070B5CB05742ACAE51AC80574CE201895E82@claexch01.licence.cla.co.uk>
Content-Type: text/plain; charset="us-ascii"
Ben:
See http://sconsig.com/
/*** TIP 00392 ***/
for an example.
S
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Ben Powell
Sent: Thursday, December 20, 2007 4:52 AM
To: Paul Dorfman; SAS-L@LISTSERV.UGA.EDU
Subject: RE: SPEEDIS - underlying algorithm?
No no no :) Against a single search term!
%spedme(string=To be or not to be);
/*array based lookup (?) against using spedis against table of 1m obs,
up to 100 char*/ ...
Cheers
Ben.
-----Original Message-----
From: Paul Dorfman [mailto:sashole@bellsouth.net]
Sent: 19 December 2007 18:07
To: Ben Powell; SAS-L@LISTSERV.UGA.EDU
Subject: Re: SPEEDIS - underlying algorithm?
Ben,
??? You are not saying you want to compare each of the 1m strings to
each other, are you? Because if you are then you will have to do lots of
preliminary work before you get to SPEDIS (or, better, COMPLEV), and
this space is way too small to even scratch it. I would recommend that
you find Sigurd Hermansen's SEUGI (do not remember exactly which one -
circa 2001-2002) paper on probabilistic record linkage and absorb its
essense carefully before diving into fuzzy matching like this head
first.
Kind regards
------------
Paul Dorfman
Jax, FL
------------
-------------- Original message ----------------------
From: ben.powell@CLA.CO.UK
>
> And for a bonus point how would I run a spedis query against an index
> of say, 1 million $100 char strings :-)
>
> Please, someone say this needs an array!
>
> Rgds.
************************************************************************
****
THE COPYRIGHT LICENSING AGENCY LIMITED
Registered Office:
SAFFRON HOUSE
6-10 KIRBY STREET
LONDON
EC1N 8TS
Company No. 1690026 (registered in England)
The contents of this email and any attachments are confidential to the
intended recipient. They may not be disclosed to, used by or copied in
any way by anyone other than the intended recipient.
Whilst any information and/or any opinion given is believed to be
correct, it is not intended to constitute legal advice; you should seek
specific legal advice as appropriate.
Please note that CLA does not accept any responsibility for viruses and
it is your responsibility to scan or otherwise check this email and any
attachments.
************************************************************************
****