A respondent driven sample is a type of

Respondent Driven Sampling (RDS) is the method used for sampling from hard-to-reach human populations. First developed in 1997 by Douglas Heckathorn, it is commonly used in HIV research, where groups at 'high risk' of disease exposure are hard-to-reach such as sex workers and illicit drug users. It has also been used for sampling other groups for public health such as the homeless or street youth. The method has been in use since the late 1990s and it is a link tracking network sampling technique for collecting information. Data are collected through a peer referral process over social networks. RDS has been used by a number of well known public health organisations such as the World Health Organisation (WHO) and the US Centers for Disease Control as it can be used in difficult settings.

RDS essentially combines "snowball sampling" (getting individuals to refer those they know, these individuals in turn refer those they know and so on) with a mathematical model that weights the sample to compensate for the fact that the sample was collected in a non-random way. This means the results can be analysed using statistical methods and conclusions can be drawn about the target population. It overcomes the problem of trying to sample from a very small population for which there are no sampling or population lists or who are difficult to identify. It combines the breadth of coverage of network-based methods with the statistical validity of standard probability sampling methods. This makes it possible for the first time to draw statistically valid samples of previously unreachable groups. Respondents recruit their peers and and researchers keep track of who recruited whom and their numbers of social contacts. A mathematical model of the recruitment process then weights the sample to compensate for non-random recruitment patterns. This model is based on a synthesis and extension of two areas of mathematics: Markov chain theory and biased network theory.

How does it work?

Researchers select people in an ad hoc manner, typically 5-10 members of the target population to serve as 'seeds'. Each seed is interviewed and given a fixed number of coupons (three is common practice) that they may use to recruit other members of the target population. These recruits are in turn provided with coupons that they use to recruit others. In this way, the sample grows in what is called 'waves' resulting in what is termed 'recruitment trees'. Respondents are encouraged to participate and recruit through the use of financial and other incentives. It is recommended that people have a week to recruit as there is evidence to show that 92-95% of participants will distribute their coupons within the first week. The majority of participants are recruited by respondents not by the researchers. RDS is designed to begin as a convenience sample, selecting subsequent samples dependent on previous samples and then treating the final sample as a probability sample. Analysis can be done using specific computer packages such as RDS and RDS Analyst.

Assumptions

Researchers assume reciprocity - ie. the recruiter and recruit are known to each other and that both people are willing to recruit each other. Researchers should collect information about the relationship between recruiters and recruits. Researchers are also aware that respondents' decision making affects the sampling process. Respondents are not allowed to recruit people who have already participated. Researchers also have to be aware of the influence of any small well-connected subgroup being sampled at a high rate as it influences the future referral choices of other subgroup members. 

Researchers are recommended to test the following assumptions: 

  1. finite population effects on sampling; 
  2. reciprocity; 
  3. recruitment bias; 
  4. non-response 
  5. validity of the timeframe used. 

Assumptions can be tested through computer simulation or by analytical methods that detect violation of the assumptions in practice.

Advantages of Respondent Driven Sampling

  1. It can generate large samples of a wide variety of hard-to-reach populations.
  2. It is designed to reduce the biases of network-based, snowball or chain-based sampling such as the choice of initial participants, volunteerism, and masking.

Issues with Respondent Driven Sampling

  1. The sampling design is beyond the control of the researcher and not fully observable. Researchers may not know the size of the personal networks of recruits.
  2. It requires researchers to make assumptions about the recruitment process and the social network that connects the target population. There is an unknown dependency between recruiters and recruits
  3. It can be difficult to attain the sample size as the available population may be smaller than the target population or participants may be unable to recruit additional members of the target population. Other risks are insufficient incentives, inadequate network connections in the population and negative perception of the study by the target population.
  4. It may be difficult to recruit - respondents may be influenced by those they know who have already participated in the study.
  5. The recruitment process is affected by three different types of decision making: 
    1. the decision by the recruiter to pass on the coupons; 
    2. the decision of the recruit to accept a coupon 
    3. the decision of the recruit to participate in the study.
     At any point there could be non-response.

Further Reading

Gile, K; Johnston, L.G. & Salganik, M.J. 2015. "Diagnostics for respondent-driven sampling". Journal of the Royal Statistical Society, 178:241-269.

Heckathorn, D. (1997) Respondent- driven Sampling.

Scholau et al (2012) Respondent Driven Sampling for STATA.

www.respondentdrivensampling.org

Evaluation of Respondent Driven Sampling

Respondent driven sampling (RDS) is a sampling method utilized worldwide in surveys among "hard to reach" socially networked populations. RDS is a type of chain referral sampling which is useful in situations whereby traditional probability sampling methods are infeasible. RDS uses several theoretical premises borrowed from sociological statistics to mitigate the biases associated with chain referral sampling. Specifically, RDS utilizes a branch of social science known as social network theory. Social network theory attempts to map relationships and characteristics shared by groups. Information about the social networks of persons recruited into an RDS survey are used to determine the probability of each recruits selection and to mitigate the biases associated with over or under sampling certain groups.

RDS recruitment is initiated with a small, diverse and influential group of “seeds" (eligible respondents) selected by the researchers. Each seed receives a set number of recruitment coupons to recruit his/her peers who then present the coupons at a fixed site to enroll in the survey. Eligible recruits who finish the survey process are also given a set number of coupons to recruit their peers. The recruited peers of seeds who enroll in the survey become wave one respondents, and the recruits of wave one respondents become wave two respondents. This process of recruitment continues through successive waves until the calculated sample size is reached. In the end, the waves produced by effective seeds make up recruitment chains of varying lengths. The goal is to acquire long recruitment chains made up of multiple waves. In the figure below, there are four seeds (in black) and a maximum of 10 waves in one of the chains. For more information on RDS methods, see the RDS manual I wrote for the World Health Organization.

A respondent driven sample is a type of

RDS has been most widely used for epidemiological surveys to monitor HIV prevalence and risk behaviors in HIV key populations at higher risk of HIV exposure such as female sex workers, men who have sex with men and people who inject drugs. I have provided technical assistance to over 300 surveys worldwide to measure HIV prevalence and behavioral risks among these populations. However, as RDS becomes known in other disciplines, it is being used in other socially networked populations. I have also provided assistance to design and implement RDS surveys among migrants and mobile populations, survivors of rape, lesbians, transgender persons, high risk heterosexual men, high risk youth and youth living in the streets, people living with HIV and other populations.

Over the years, I have produced many materials which I offer for your use here:

RDS is both a sampling and analysis technique so it is necessary to use diagnostic tools and the appropriate estimators for analysis. These can be found in the open source software package RDS Analyst at the Hidden Populations Methods Research Group (HPMRG). For more information please see, RDS Analyst from Hidden Populations Methods Research Group (HPMRG)