<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.cs.umbc.edu/history/Spider?feed=atom</id>
	<title>Spider - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.cs.umbc.edu/history/Spider?feed=atom"/>
	<link rel="alternate" type="text/html" href="https://wiki.cs.umbc.edu/history/Spider"/>
	<updated>2026-04-29T00:30:14Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.44.2</generator>
	<entry>
		<id>https://wiki.cs.umbc.edu/index.php?title=Spider&amp;diff=303&amp;oldid=prev</id>
		<title>Jlee23 at 19:47, 30 May 2007</title>
		<link rel="alternate" type="text/html" href="https://wiki.cs.umbc.edu/index.php?title=Spider&amp;diff=303&amp;oldid=prev"/>
		<updated>2007-05-30T19:47:42Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 15:47, 30 May 2007&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[http://www.cit.cornell.edu/security/tools/ Spider] is a file scanner that looks for sensitive information in plain text files, Microsoft Word documents, PDFs, PostScripts, and ZIP archives based on a list of [http://en.wikipedia.org/wiki/PCRE regular expressions].  It has been heavily modified add features and work in the CSEE environment.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[http://www.cit.cornell.edu/security/tools/ Spider] is a file scanner that looks for sensitive information in plain text files, Microsoft Word documents, PDFs, PostScripts, and ZIP archives based on a list of [http://en.wikipedia.org/wiki/PCRE regular expressions].  It has been heavily modified &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;to &lt;/ins&gt;add features and work in the CSEE environment.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Using Spider==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Using Spider==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mw_csee_wiki-mw_:diff:1.41:old-302:rev-303:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Jlee23</name></author>
	</entry>
	<entry>
		<id>https://wiki.cs.umbc.edu/index.php?title=Spider&amp;diff=302&amp;oldid=prev</id>
		<title>Jlee23 at 19:47, 30 May 2007</title>
		<link rel="alternate" type="text/html" href="https://wiki.cs.umbc.edu/index.php?title=Spider&amp;diff=302&amp;oldid=prev"/>
		<updated>2007-05-30T19:47:05Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;[http://www.cit.cornell.edu/security/tools/ Spider] is a file scanner that looks for sensitive information in plain text files, Microsoft Word documents, PDFs, PostScripts, and ZIP archives based on a list of [http://en.wikipedia.org/wiki/PCRE regular expressions].  It has been heavily modified add features and work in the CSEE environment.&lt;br /&gt;
&lt;br /&gt;
==Using Spider==&lt;br /&gt;
You may use spider to analyze files for which you have read access such as in your home directory.  It can only be run from Solaris:&lt;br /&gt;
&lt;br /&gt;
 $ ssh sunserver1.cs.umbc.edu&lt;br /&gt;
 $ /cs/bin/check-dir-with-spider.sh ~&lt;br /&gt;
&lt;br /&gt;
You will receive an email report of possibly sensitive files should it find any when it finishes.&lt;br /&gt;
&lt;br /&gt;
==Customizing Spider==&lt;br /&gt;
CSEE has a set of sensible rules which spider should check against in &amp;lt;tt&amp;gt;/cs/etc/spider-4.0&amp;lt;/tt&amp;gt;.  &amp;lt;tt&amp;gt;REGEXES&amp;lt;/tt&amp;gt; contains a list of Perl-compatible regular expressions which, when matched, indicate sensitive data.  Matches are then compared against the regexes in &amp;lt;tt&amp;gt;IGNORE&amp;lt;/tt&amp;gt;.  If any match, the data is no longer considered sensitive.  This is used to weed out example data like &amp;quot;123-45-6789&amp;quot;.  &amp;lt;tt&amp;gt;SKIP_TYPES&amp;lt;/tt&amp;gt; contains a list of file types as described by /etc/magic that spider should skip processing.&lt;br /&gt;
&lt;br /&gt;
If you wish to add to any of these rules, create the corresponding file in &amp;lt;tt&amp;gt;~/.spider&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Ignoring False-Positives==&lt;br /&gt;
After you first run spider, it may report back files which appear to have sensitive data, when in fact they do not.  You can add these to a false positives list by running &lt;br /&gt;
&lt;br /&gt;
 $ /cs/bin/check-dir-with-spider.sh -a &amp;lt;filename&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The file must be added to the false positives list every time it is modified or spider will rescan it.&lt;/div&gt;</summary>
		<author><name>Jlee23</name></author>
	</entry>
</feed>