Spider

From CSEE Documentation
Revision as of 14:47, 30 May 2007 by Jlee23 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Spider is a file scanner that looks for sensitive information in plain text files, Microsoft Word documents, PDFs, PostScripts, and ZIP archives based on a list of regular expressions. It has been heavily modified add features and work in the CSEE environment.

Using Spider

You may use spider to analyze files for which you have read access such as in your home directory. It can only be run from Solaris:

$ ssh sunserver1.cs.umbc.edu
$ /cs/bin/check-dir-with-spider.sh ~

You will receive an email report of possibly sensitive files should it find any when it finishes.

Customizing Spider

CSEE has a set of sensible rules which spider should check against in /cs/etc/spider-4.0. REGEXES contains a list of Perl-compatible regular expressions which, when matched, indicate sensitive data. Matches are then compared against the regexes in IGNORE. If any match, the data is no longer considered sensitive. This is used to weed out example data like "123-45-6789". SKIP_TYPES contains a list of file types as described by /etc/magic that spider should skip processing.

If you wish to add to any of these rules, create the corresponding file in ~/.spider.

Ignoring False-Positives

After you first run spider, it may report back files which appear to have sensitive data, when in fact they do not. You can add these to a false positives list by running

$ /cs/bin/check-dir-with-spider.sh -a <filename>

The file must be added to the false positives list every time it is modified or spider will rescan it.