|Date: ||Fri, 26 May 2006 14:00:28 -0500|
|Reply-To: ||Duck-Hye Yang <dyang@CHAPINHALL.ORG>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||Duck-Hye Yang <dyang@CHAPINHALL.ORG>|
|Subject: ||Re: grouping features based on position|
|Content-Type: ||text/plain; charset=US-ASCII|
Thanks for your help. Could you let me know how to start with sas
You are right. Many districts do not have any foster-care kids. Many
kids concentrate within city (Chicago) and fewer kids in its suburban
409 areas (elementary catchment areas, but I will consider them as
school districts) constitutes Chicago. 118 areas (school districts)
constitute the suburb.
When I mapped the kids locations, it is not that skewed. Drawing
boundaries are reasonable, mainly because we have as many as 527
districts, compared to 594 kids.
The issue here is to assign an approximately equal number of
Abused/Neglected kids to each judge, who goes to the same building. So,
minimizing distance between child and judge location is not the issue.
So, outliers are not issue.
As Richard pointed out earlier, one important crieterion is contiguity.
Can I use PROC CLUSTER? My concern is: How can I ensure the even
distribution of A/N kids into each of 13 clusters? And how to ensure
each cluster boundary is contiguous?
I tried to write a code: FREQ is used as weight using the number of A/N
kids in district. How about districts that have zero kids?
proc cluster data=districts print=15 outtree=ward method=ward
pseudo CCC;id district_id;var x y; freq n_kids; run;
proc tree data=ward out=clusters ncl=13 horizontal spaces=2 ; id
>>> "Sigurd Hermansen" <HERMANS1@WESTAT.com> 5/26/2006 12:47:43 PM >>>
I'd take a close look first at the number of school districts that
zero abused/neglected kids who entered the system for the first time
2005. A highly skewed distribution could make the locations of school
districts irrelevant. In an extreme case, if all of the kids go to
school in a single district, all boundary lines would likely go thru
I gather that you are looking for an assignment model that will work
well in the future as more cases arise. If you weight school districts
by projected numbers of A/N kids, geographic clustering of the 527
districts would give you a starting point. At least you will be able
see how many clusters it takes to minimize distances among weighted
school districts. Perhaps you could then ask David C for a step-wise
method of increasing or decreasing the number of clusters ;>
On Behalf Of Duck-Hye Yang
Sent: Friday, May 26, 2006 11:11 AM
Subject: grouping features based on position
Juvenile court wants each judge to get equal number of
kids (cases) for fosterhome placement court hearing. Each judge is
currently assigned a group of cases from a designated geographic area.
The issue is that some judges have too many cases.
The task is to delineate boundaries of 13 geographical areas with
number of cases (594 kids) who entered the system for the first time
2005. The boundaries are supposed to be based on school districts.
There are 527 school districts (polygon) and 594 kids.
The essence of solution should be 1) grouping school districts into 13
based on proximity but at the same time, 2) keeping approximately
number of kids within each of the 13 groups.
I have information on kids' location and centroid points of school
districts -- longitude/lattidute. Or I can arrange data in a way each
district has the number of foster-care kids.
Another twist of the task is: Because the boundaries based on 2005
may not be valid anymore 5 or 10 years from now, delineating
may need to be modeded on some predictors (projected number of kids
based on demographic/socioeconomic characteristics of school
so that adjustment be made each year.
Hope that someone will share his/her experience with me.