Your Documents Your Trust

Glossary Of Terms

ACE: Adaptive Contrast Enhancement. Bell & Howell’s proprietary dynamic thresholding technology for image enhancement.

ADF: Automatic Document Feeder. This is the means by which a scanner feeds the paper document.

Alphanumeric: Set of characters composed of letters and numbers; may include punctuation marks or other symbols; excludes printer control characters such as Carriage Return and flow control characters such as XON and XOFF.

Annotations: The changes or additions made to a document using sticky notes, a highlighter, or other electronic tools. Document images or text can be highlighted in different colors, redacted (blacked-out or whited-out), stamped (e.g. “FAXED” or “CONFIDENTIAL”), or posted with electronic sticky notes without changing the original document.

Aperture Card: A card which holds microfilm intended to protect the film and facilitate loading by a scanner or viewer.

ASCII: American Standard Computer Information Interchange. Used to define computer text which was built on a set of 255 alphanumeric and control characters. ASCII has been a standard, nonproprietary text format since 1963.

Bar Code: A small pattern of vertical lines which is read by a laser or an optical scanner and corresponds to a record in a database. An add-on component to imaging software, this feature is designed to increase the speed with which documents can be archived.

Batch Processing: The name of the technique used to input a large amount of information in a single step, as opposed to individual processes.

Bitmap/ Bitmapped: see Raster/ Rasterized

Bitonal: An image or file comprised of pixel or dot values of either black or white.

BMP: A native format of Windows for storing images called “bitmaps”

Boolean Logic: The use of the terms “AND,” “OR,” and “NOT” in conducting searches. Used to widen or narrow the scope of a search.

Briefcase: A method developed to simplify the transport of a group of documents from one computer to another.

Burn (CDs): To record or write data on a CD.

Caching (of Images): The temporary storage of image files on a hard disk for later migration to permanent storage, like an optical or CD jukebox. CD Publishing An alternative to photocopying large volumes of paper documents. This method involves coupling image and text documents with viewer software on CDs. Sometimes search software is included on the CDs to enhance search capabilities.

CD-R: Short for CD Recordable. These are CD’s which can be written (or recorded) only once. It can be copied to distribute a large amount of data. CD-Rs can be read on any CD-ROM reader whether it’s on a standalone computer or network system. This makes interchange between systems easier. CD-Rs are recognized as the preferred archival media for imaging systems in the 90’s.

CD-ROM: Compact Disc Read Only Memory. Written on a large scale and not on a standard computer CD burner (CD writer), they are an optical disk storage media popular for storing computer files as well as digitally-recorded music.

CD-ROM Drive: A drive on the computer that reads compact discs.

Client-Server Architecture vs. File-Sharing: Two common application software architectures found on computer networks. With file-sharing applications, all searches occur on the workstation, while the document database resides on the server. With client-server architecture, CPU intensive processes (such as searching and indexing) are completed on the server, while image viewing and OCR occur on the client. File-sharing applications are easier to develop, but they tend to generate tremendous network data traffic in document imaging applications. They also expose the database to corruption through workstation interruptions. Client-server applications are harder to develop, but dramatically reduce network data traffic and insulate the database from workstation interruptions.

COLD: Computer Output to Laser Disk. A computer programming process which outputs electronic records and printed reports to laser disk instead of a printer. Can be used to replace COM (Computer Output to Microfilm) or printed reports like green-bar.

COM: Computer Output to Microfilm. A process which outputs electronic records and computer generated reports to microfilm.

Compression Ratio: The ratio of the file sizes of a compressed file to an uncompressed file, e.g. with 20:1 compression ratio, an uncompressed file of 1MB is compressed to 50KB.

CPU: Central Processing Unit. The “brain” of the computer.

Data rate: The speed of a data communications channel, measured in bits per second.

De-shading: Removing shaded areas to render images more easily recognizable by OCR. De-shading software typically searches for areas with a regular pattern of tiny dots.

De-skewing: The process of straightening skewed (off-center) images. De-skewing is one of the image enhancements that can improve OCR accuracy. Documents often become skewed when they are scanned or faxed.

De-speckling: Removing speckles from an image file. Speckles often develop when a document is scanned or faxed.

Dithering: The process of converting grays to different densities of black dots, usually for the purposes of printing or storing color or gray-scale images as black and white images.

Document Imaging: Software used to store, manage, and retrieve documents on the computer. When paper documents are stored with a document imaging system, they can be retrieved quickly, managed easily and distributed rapidly.

DPI: ‘Dots Per Inch’. A measurement of scanner resolution. The number of pixels a scanner can physically distinguish in each vertical and horizontal inch of an original image. Documents are normally scanned at a resolution of between 200 dpi and 400 dpi.

Drag-and-drop: The movement of on-screen objects by dragging them across the screen with the mouse.

Duplex: The ability of a scanner to scan both sides of a sheet simultaneously. Requires two scanner cameras and often two processing boards.

Duplex Scanners v. Double-Sided Scanning: Duplex scanners automatically scan both sides of a double-sided page, producing two images at once. Double-sided scanning uses a single-sided scanner to scan double-sided pages, scanning one collated stack of paper, then flipping it over and scanning the other side.

Electronic Document Management: Imaging software which helps manage electronic documents.

Erasable Optical Drive: A type of optical drive that uses erasable optical discs.

Firewall: A network security tool designed to prevent unauthorized users from gaining access to network resources.

Flatbed Scanner: A flat surface scanner which allows users to input books and other documents.

Folder Browser: A system of on-screen folders (usually hierarchical or “stacked”) used to organize documents. For example, the File Manager program in Microsoft Windows is a type of folder browser which displays the directories on your disk.

Forms Processing: A specialized imaging application designed for handling pre-printed forms. Forms processing systems often use high-end (or multiple) OCR engines and elaborate data validation routines to extract hand-written or poor quality print from forms that go into database. This type of imaging application faces major challenges, since many of the documents scanned were never designed for imaging or OCR.

Full Text Indexing and Search: Enables the retrieval of documents by either their work or phrase content. Every word in the document is indexed into a master word list with pointers to the documents and pages where each occurrence of the word appears

Fuzzy Logic: A search procedure that looks for exact matches as well as similarities to the search criteria, in order to compensate for spelling errors that may occur in full-text searches.

GIF: CompuServe’s native file format for storing images.

Gigabyte: One billion bytes. Also expressed as one thousand megabytes. In terms of image storage capacity, one gigabyte equals approximately 17,000- 8.5″x11″ pages scanned at 300-dpi, stored as TIFF Group IV images.

Greyscale: An image type that uses black, white, and a ranges of shades of gray. The number of shades of gray depends on the number of bits per pixel. The larger the number of shades of gray, the better the image will look, and the larger the file will be.

Hierarchical Storage Management (HSM): Software that automatically migrates files from on-line to near-line storage media, usually on the basis of the age or frequency of use of the files.

Host: Computer in which an application or database resides.

Hot Spare: A drive or drives that resides in a RAID storage system that is used to automatically take over for a non-functioning or failed drive without any operator intervention.

Hz: Abbreviation for Hertz; cycles per second. Often used with metric prefixes, as in kiloHertz (kHz).

ICR: Intelligent Character Recognition. A software process that recognizes handwritten and printed test as alphanumeric characters.

Image compression boards: An imaging-dedicated processor(s). Relieves the CPU (Central Processor Unit – the computer’s main chip) from many imaging-specific tasks – compression, decompression, display, zooming, shrinking, scale-to-grey. In fact, does them better than the CPU.

Image Enabling: A software function that creates links between existing applications and stored images.

Image Processing: Think of “data processing”: it refers to the manipulation of raw data to solve some problem or enlighten the user in some way not possible without manipulation.

Image Processing Card (IPC): A board mounted in the computer, scanner, or printer that facilitates the acquisition and displaying of images. The primary function of most IPCs is the rapid compression and decompression of image files.

Index Fields: Database fields used to categorize and organize documents. Often user-defines, these fields can be used for searches.

Interface: 1. A mechanical or electrical link connecting two or more pieces of equipment together. 2. A point of demarcation between two devices where the electrical signals, connectors, timing and handshaking are defined.

ISO-9660: A file system format standard developed for CD-ROMs using the CD-XA encoding standard. It is supported by Microsoft operating systems, UNIX, and Macintosh.

Internet Publishing: Specialized imaging software that allows large volumes of paper documents to be published on the Internet or intranet and can be made available to the public for searching, viewing, and printing.

IPS: Inches per second. A scanner transport measurement of speed.

IPX/SPX: Communications protocol used by Novell networks.

ISIS and TWAIN Scanner Drives: Specialized applications used for communication between scanners and computers. TWAIN drivers were developed primarily for photo image editing and desktop publishing. They handle color and gray-scale images well, but don’t support high speed scanning. ISIS drivers were developed primarily for high-speed document imaging. They were designed for the rapid scanning of black and white images through an ADF. In recent years, the difference has narrowed and ISIS drivers now include gray-scale and color support, while TWAIN drivers now support ADF.

ISO 9660 CD Format: The International Standards organization format for creating CD-ROM’s that can be read worldwide.

JPEG: An image compression format used for storing color photographs and images.

Jukebox: A mass storage device that holds optical disks and loads them into a drive.

Key Field: Database fields used for document searches and retrieval. Synonymous with “index field.”

Magneto-Optical Drive: A drive that combines laser and magnetic technology to create high-capacity erasable storage.

MAPI: Mail Application Program Interface. The Windows software standard that has become a popular e-mail interface (used by MS Exchange, GroupWise, and several other e-mail packages).

Mean Time Between Failure: A statistical measure of reliability, this is calculated to indicate the anticipated average time between failures of a device. The longer the better.

Near-Line: Documents stored on optical disks or compact disks that are housed in the jukebox or CD changer and can be retrieved without human intervention.

NetWare Loadable Module (NLM): An application that runs as part of the network operating system (NOS) of a Novell NetWare server.

NT: Network Technology. Refers to Microsoft Windows NT server and workstation software.

OCR: Optical Character Recognition. A software process that recognizes printed text as alphanumeric characters.

Off-Line: Archival documents stored on optical disks or compact disks that are not connected or installed in the computer, but require human intervention to be accessed as needed.

On-Line: Documents stored on the hard drive or magnetic disk of a computer which are available immediately.

Optical Disks: Computer media similar to a Compact Disk that cannot be rewritten. An optical drive uses a laser to read the stored data.

Optical Jukebox: see “Jukebox.”

PDF: Adobe’s Portable Document Format. The term Adobe uses to describe Acrobat files.

Phase Change: A method of storing information or rewritable optical disks.

Pixel: Picture Element – The basic building block of all images — a simple, single dot in an image. In bitonal images, it is merely a black or white dot (see “Bitonal” definition above). In grey scale images, dots will have between 1-to-256 possible values of grey (for an 8-bit grey scale image).

Portable Volumes: A feature which facilitates the moving of large volumes of documents without having to copy multiple files. Portable volumes enables individual CDs to be easily regrouped, detached and reattached to different databases for a broader information exchange.

Portrait Orientation: An image registered so that it is taller than it is wide, with the narrow edge running along top and bottom. When scanning, orientation is determined by the leading edge of the document.

PPM: Pages per minute. A measurement of the throughput speed of a scanner – how many letter-size pages the scanner can scan in one minute. Beware: ppm can be misleading.

Queue: A queue is a series of folders that contain image files of scanned paper documents and a database of lookup values to facilitate searching, retrieving and viewing of the image files.

RAID: Redundant Array of Inexpensive Disks. A collection of hard disks that act as a single unit. Files on RAID drives can be duplicated (“mirrored”) to preserve data. RAID systems may vary in levels of redundancy, with no redundancy being a single, non-mirrored disk as level 0, two disks that mirror each other as level 1, on up to level 5, the most common.

RAID 5: A RAID implementation that writes a parity byte on one or more of the drives within the RAID system. This allows data to be rebuilt to a hot spare drive in the event of a hard drive failure within the RAID system.

Raster/ Bitmap: Raster or Bitmap Drawing. A method of representing an image with a grid (or “map”) of dots or pixels. Typical raster file formats are GIF, JPEG, TIFF, PCX, BMP, etc.

Resolution: Indicates the number of dots, often measured in dpi, that make up an image on a screen or printer. The larger the number of dots, and thus the higher the resolution, the finer and smoother images can appear when displayed at a given size. Low resolution causes jagged characters. The ideal resolution is a trade-off between quality and the overhead in storage power and processing strength required to use it.

Region (of an image): An area of an image file that is selected for specialized processing. Also called a “zone”.

Scale-To-Gray: An option to display a black and white image file in an enhanced mode, making it easier to view. A scale-to-gray display uses gray shading to fill in gaps or jumps (known as aliasing) that occur when displaying an image file on a computer screen (also known as gray-scale).

Scalability: The capacity of a system to expand without requiring major reconfiguration or re-entry of data. Multiple servers or additional storage can be easily added.

Scanner: An input device commonly used to convert paper documents into computer images. Scanner devices are also available to scan microfilm and microfiche.

SCSI: Small Computer Systems Interface. Pronounced “skuzzy.” A standard for attaching peripherals (notably mass storage devices and scanners) to computers. SCSI allows for up to 7 devices to be attached in a chain via cables. The current SCSI standard is “SCSI II,” also known as “Fast SCSI.”

SCSI Scanner Interface: The device used to connect a scanner with a computer.

Simplex: A document scanner that copies single-sided documents.

Skew: During printing or scanning, the contents of a page are almost never exactly verticle, which referred to as being skewed. De-skewing is a process where the computer detects and corrects the skew in an image file.

Snapshot: An add-on feature of imaging software that allows electronic document to be archived internally-within the computer system. Electronic documents are internally “printed” into the database, thereby alleviating the need for any physical paper printing or scanning.

SQL: Structured Query Language. The popular standard for running database searches (queries) and reports.

SSL: Acronym for Secure Socket Layer, a protocol designed to provide privacy between a web client and a web server. The protocol begins with a handshake phase that negotiates an encryption algorithm and keys and authenticates the server to the client. Once the handshake is complete and transmission of application data begins, all data is encrypted using the session keys negotiated during the handshake.

TCP/IP: Network communications protocol. The Internet uses this protocol.

Templates, Document: Sets of index fields for documents.

Throughput: The actual amount of useful and non-redundant information which is transmitted or processed. The relationship of what went in one end and what came out the other is a measure of the efficiency of that communications link – a function of cleanliness, speed, etc.

Thumbnails: Small versions of an image used for quick overviews or to get a general idea of what an image looks like.

TIFF: Tagged Image File Format. A non-proprietary format raster graphics image that has many different compression formats. TIFF has been in use since 1981.

TIFF Group III (compression): A one-dimensional compression format for storing black and white images that is utilized by most fax machines.

TIFF Group IV (compression): A two-dimensional compression format for storing black and white images. Typically compresses at a 20 to 1 ratio for standard business documents.

Transport Speed: The speed at which the mechanical transport runs, measured in inches/centimeters per second (ips/cps).

TWAIN: An industry standard scanner interface that allows software applications to communicate with and control document scanners via a computer’s serial port.

Video Scanner Interface Board: An add-in board residing in the host computer which enables communication and control of the scanner device. The board provides device control and file or data compression. Also known as an accelerator or compression board. Scanners with this interface require a scanner control board designed by Kofax, Xionics, or Dunord.

Work Flow, Ad Hoc: A simple manual process by which documents can be moved around a multi-user imaging system on an “as-needed” basis.

Work Flow (Rule-Based): A programmed series of automated steps that route documents to various users on a multi-user imaging system.

WORM Disks: Write Once Read Many Disks. A popular archival storage media of the 1980’s. Acknowledged as the first optical disks, they are primarily used to store archives of data which cannot be altered. WORM disks are created by standalone PCs and cannot be used on the network unlike CD-Rs.

ZIP: A common file compression format which allows for quick and easy storage for transport.

Zone OCR: An add-on feature of the imaging software which populated document templates by reading certain regions or zones of a document, and placing the text into a document index field.