### Description

This tool simulates movement paths based on a step selection function (SSF) model (Fortin et al. 2005). Given an observed movement path (e.g. using GPS telemetry collar data), an SSF model characterizes the relative probability of selecting a step based on a set of covariates. It is a use versus availability design in which each observed step is compared to a sample of available steps at each point along the movement path. A 'step' in this context refers to the straight line that connects two consecutive locations. The model does not assume that animals move in a straight line between consecutive locations, only that the environmental characteristics along that line are correlated with the likelihood of moving to that destination point.

There are a large number of ways in which SSF models can be formulated, so writing a generic simulation program is difficult. Various assumptions must be made regarding model structure. This simulator is flexible, but it does assume that: 1) step lengths and turn angles can be described by a single pair of distributions (i.e. there is only one movement state), 2) all of the covariates can be described by spatial raster datasets, and 3) these raster datasets are static (they do not change through time).

The model is specified using a 'model file', which describe the covariate raster datasets, the model coefficients, and the summary statistic used. The raster dataset is simply the path to the raster, e.g. C:\data\dem (for a grid) or C:\data\ndvi.img (for a raster in Imagine format). The rasters do not have to be in the same folder, but they should be stored locally (not on a network drive). It is highly recommended that you avoid spaces and all unusual characters in the folder and file names associated with these raster datasets, and that the rasters are not buried in a deep directory structure. The model coefficients should be expressed to the maximum level of precision. The summary statistic is expressed as text and can be one of the following: MEAN, MIN, MAX, START, END, MED, SUM. 'MEAN' refers to the length weighted mean of the covariate along the step. If your covariate represents a dummy variable then the mean corresponds to the proportion of the step that passes through that covariate. 'Min' and 'max' are the minimum or maximum values encountered along the step. 'Start' and 'end' are the value of the covariate at the beginning/end of the step respectively. 'MED' refers to the median value along the step (note that this does not take into account the length of the segment that passes through a cell: all cell values encountered along the step contribute equally to the calculation of the median).

The format of the model file must be strictly observed. All values are separated by commas, and each line must contain only a single covariate description. Blank lines and other comments are not permitted. The first line of the model file will always be 'INTERCEPT,value', where value is the intercept value from the model, e.g. -13.2893934. Each subsequent line will follow the format: 'raster-dataset, value, summary-statistic', where raster-dataset is the full path to the raster dataset, value is the model coefficient, and summary-statistic is one one of the key words described above.

The user is required to prepare the raster datasets representing covariates prior to running this tool. First, all of the raster datasets must be in the same projection. The cell sizes and alignment of cells among the raster datasets can be different. In most cases the raster datasets used to parameterize the model will be used in the simulations. If they are not, then you must take care to ensure that consistent units are maintained. For instance, if a digital elevation model (DEM) that measures elevation meters is used as a covariate in the model, then you must ensure that the units of the DEM used in the simulations is also in meters, not in feet. Changes in units will influence the coefficient values that are estimated by the model, hence the need to ensure units are consistent.

You must also convert any thematic (categorical) rasters to dummy variable rasters. For instance, if a raster with 5 categorical habitat types is used in the SSF model, you must create 4 separate rasters coded as 1 or 0 to represent those variables in the simulation. (Note that you only need to create 4 rasters even though there are 5 habitat types in the model because one of those habitat types is the 'reference' category and is therefore omitted).

If you have performed any transformations of variables in the model, then you must also apply those transformations to the raster layer before running the simulation. For instance, you might have centred and log transformed a variable prior to fitting the model, in which case you would use Raster Calculator to centre and log transform the raster dataset. Or you might have a quadratic expression for a covariate like slope, in which case you must provide both the slope and slope^2 rasters. The key point is that the raster you use in this simulation must be directly related to the coefficient that the SSF model has estimated.

It is also important that you specify an appropriate boundary polygon. The most important aspect of this boundary is that it does not exceed the limits of any of the covariate raster datasets. Often, raster datasets cover different extents so you must take care to ensure that the polygon you create does not exceed the boundaries of any of these raster datasets. If you are simulating paths within the context of a home range, then the boundary polygon would be the home range polygon, and might only cover a small fraction of your raster datasets. But if you are interested in more of a landscape level simulation in which the simulated paths are not constrained to a pre-defined home range, then the polygon would be the common minimum limit of the raster datasets. The projection of the boundary polygon must also match the raster datasets.

The start locations can be random or based on actual animal locations. In either case, you must ensure that the projection of the start locations is identical to that of the raster datasets and boundary polygon. Note that there are a number of different strategies you can employ with regard to start locations. For instance, if you were generating a total of 10000 simulated paths you might have 1) a single point that all simulated paths start from (iterations=10000), 2) a set of 100 random points from which simulations start from (iterations=100), or 3) 10000 random points from which one path starts from each location (iterations=1). The strategy you adopt will depend on the question you are interested in addressing.

This simulation tool functions as follows. From the start location, a random initial bearing is drawn from a uniform distribution. The code then generates a number of available steps (the number of steps is controlled by the 'nsamples' option), and calculates the likelihood of each step based on the model, i.e. if we let w = exp(beta0 + beta1 * X1 + ... + betaN * XN), then the likelihood is calculated as L=w/(1+w). All available steps must end inside the boundary polygon (sampling continues until a full set of available steps that end inside the boundary polygon is acquired). Of these available steps, a single step is selected as the 'used' step where the probability of selection is proportional to the likelihood. Each of the available steps thus has a chance of being selected (if the likelihood of the step is non-zero).

Step length and turn angle distributions can be specified as either empirical distributions (which are loaded from a text file), or as statistical distributions from which random values are generated using R. Please refer to the section 'Specifying statistical and empirical distributions' for detailed instructions on how to specify distributions.

If you specify an empirical distribution that represents turn angles in radians then you must also set the radians=TRUE option. If you fail to do this the values will be interpreted as degrees and your paths will turn very little. If you specify a probability density function for the turn angle distribution then this is always interpreted as radians and you cannot override this setting using the 'radians' option (it is ignored).

If you specify a turn angle distribution that is too restrictive (strong directional persistence), or a step length distribution with a mean value that is large relative to the size of the boundary polygon, then the simulations may take a very long time to run. The risk is that the simulated path reaches the edge of the reflective boundary and then the restrictive movement parameters ensure that the vast majority of potential steps are rejected. This will eventually cause the simulator to fail.

It is recommended that you first use the movement.simplecrw command to simulate movement paths based only on the set of start locations, the step length and turn angle distributions, and the boundary polygon. This allows you to check that the distributions you have specified are reasonable before running the more complicated SSF simulation.

### References

Fortin, D., Beyer, H. L., Boyce, M. S., Smith, D. W., Duchesne, T. & Mao, J. S. 2005. Wolves influence elk movements: behavior shapes a trophic cascade in Yellowstone National Park. Ecology 86: 1320-1330.

### Syntax

movement.ssfsim1(model, inpoint, uidfield, tad, sld, nsamples, nsteps, iterations, outpoint, bnd, [outline], [radians], [where]);

model | the SSF model specification file (see full help documentation for details) | |

inpoint | the input start locations (a point feature source) | |

uidfield | the name of the unique ID field in the input data source | |

tad | the turn angle distribution (see full help documentation for details) | |

sld | the step length distribution (see full help documentation for details)) | |

nsamples | the number of potential steps to evaluate at each simulated step (see full help documentation for details) | |

nsteps | the number of steps in the path or a field name in the start location data source containing these values | |

iterations | the number of paths to generate per input point or input polygon | |

outpoint | the output point data source to create | |

bnd | a polygon data source containing a single polygon that defines the reflective boundary for simulated paths | |

[outline] | also generates the output in line format (one line per step) in this data source | |

[radians] | (TRUE/FALSE) specifies whether the turn angle distribution is in radians (default=FALSE) | |

[where] | the selection statement that will be applied to the point feature data source to identify a subset of points to process (see full Help documentation for further details) |

### Example

movement.ssfsim1(model="C:\data\ssfmodel1.txt", inpoint="C:\data\startlocs.shp", uidfield="STARTID", tad="C:\data\turns.csv", sld="C:\data\steps.csv", nsamples=20, nsteps=1000, iterations=100, outpoint="C:\data\sims.shp", bnd="C:\data\parkbnd.shp");

movement.ssfsim1(model="C:\data\ssfmodel1.txt", inpoint="C:\data\startlocs.shp", uidfield="STARTID", tad="C:\data\turns.csv", sld="C:\data\steps.csv", nsamples=50, nsteps="STEPCNT", iterations=1, outpoint="C:\data\sims.shp", outline="C:\data\simsline.shp", bnd="C:\data\parkbnd.shp", radians=TRUE);

movement.ssfsim1(model="C:\data\ssfmodel1.txt", inpoint="C:\data\startlocs.shp", uidfield="STARTID", tad=c("WRAPPEDCAUCHY",0,0.3), sld=c("EXPONENTIAL",0.0015), nsamples=20, nsteps=20000, iterations=100, outpoint="C:\data\sims.shp", bnd="C:\data\hrbnd.shp", radians=TRUE);

An example of the model file structure:

INTERCEPT, 12.3456789

C:\data\habforest.img, 0.987654, MEAN

C:\data\habmead.img, -1.234567, MEAN

C:\data\slope, 2.9589334, MIN

C:\data\slope2, -0.0290340, MIN

C:\data\biomass, 6.290384, MAX