Concept of Computer Files

Concept of Computer Files - SSS Two Computer Studies

TOPIC: Concept of Computer File

CLASS: SSS Two

Definition of Computer File

A computer file is a fundamental unit for storing data on a computer.
It's essentially a block of arbitrary information or a resource that a computer program can access, typically stored on a durable storage device like a hard drive, SSD, or USB drive.
It's the smallest meaningful unit of data representation within a computer that can be independently managed.
It's a collection of related information stored under a single name, accessible to computer programs.

File Contents

At the most basic level, all information in a file is binary, meaning it's just a series of ones and zeros (bits). However, a computer file can contain various types of information, broadly categorized as:

Document Files: These include any file created by a user, typically with an application program. Examples are text documents (.docx, .txt), spreadsheets (.xlsx), presentations (.pptx), images (.jpg, .png), audio (.mp3), and video (.mp4).
Program Files: These files contain executable instructions for the computer's microprocessor. They allow software applications and the operating system to run. Examples include .exe files in Windows or .app bundles in macOS.
Data Files: This category includes all other files that aren't programs or user-created documents, but instead store raw data that programs use. Examples might include database files, configuration files, or temporary files.

File Functions

Computer files are used to perform one or more of the following essential functions:

They provide machine-executable code, allowing programs and operating systems to run.
They store application programs or operating system configuration settings, ensuring software functions correctly.
They store data used by users, such as Microsoft Word documents, pictures, music, and videos.

Computer File Terms (Data Hierarchy)

To understand how data is organized within files, it's helpful to know these terms, which represent a hierarchy from smallest to largest meaningful units:

Data Item (or Character/Byte): This is the smallest unit of data, representing a single character (like 'A', '1', or '$'). It's the actual data stored in a field.
Field: A field is a single piece of information about an object or entity. It's a space that holds a specific type of data. Examples of fields are NAME, ADDRESS, QUANTITY, AGE, etc. A data element is the logical definition of a field.
Record: A record is a collection of related fields that together describe a single entity. For instance, a student record might include fields like Name, Age, Class, and ID Number.
File: A file is a collection of related records, treated as a single unit. For example, all student records combined would form a "Student File."
Database: A database is a collection of related files, organized for efficient storage and retrieval.

Types of Data Items

Each data element (or field) typically consists of a single item, which will fall into one of three basic types:

Numeric Data: Data consisting solely of digits (0-9). This type of data is used for calculations. Examples: Age (25), Quantity (150).
Alphabetic Data: Data consisting only of letters of the alphabet (A-Z, a-z). Examples: Name ("John"), City ("Lagos").
Alphanumeric Data: Data consisting of a combination of digits, alphabets, and/or special characters (e.g., #, @, $, %). Examples: Address ("123 Main St."), Phone Number ("+2348012345678").

File Organization

The term "file organization" refers to how data records are physically stored in a file on a storage device, and, consequently, the method by which they can be accessed. Different organizations are chosen based on how the data will be used.

File Organization Terms

Here are some terms related to file organization:

Block: A block is the physical unit of data transfer between a storage device (like a hard drive) and the computer's main memory. Data is read and written in blocks, not individual bytes.
Bucket: In the context of file organization, a bucket is a storage area (often one or more blocks) used to hold records. It's particularly relevant in hashing schemes where multiple records might "hash" to the same address, and they are then placed into a bucket.
Hit: A hit refers to a successful access or retrieval of a desired record or item of data from a storage location. For example, if you search for a file and find it immediately, that's a "hit."

File Organization Structure

Types of File Organization

There are mainly four common types of file organization, each suited for different access patterns:

Serial File Organization:
Serial files store records in chronological order (the order they are received). Each new record is simply added to the next available storage position. This method is generally used on serial media such as magnetic tape, where data must be accessed from the beginning to the end.
Sequential File Organization:
Sequential files are files whose records are sorted and stored in ascending or descending order based on a particular key field (e.g., student ID, product name). To access a record, you often have to read through the file from the beginning until you find the desired record.
Indexed Sequential File Organization:
Indexed Sequential file organization is logically similar to sequential organization, but it includes an index. An index is like an alphabetical list of names or subjects with references to the specific storage blocks where the corresponding records are located. This index allows for faster direct access to individual records by first looking up their location in the index, then going directly to that block, without having to read the entire file from the start.
Random (Direct) File Organization:
A randomly organized file contains records arranged physically without regard to the sequence of the primary key. Records are loaded onto the disk by establishing a direct relationship between the Key of the record and its physical address on the file. This relationship is typically achieved using a formula (or hashing algorithm) that converts the primary Key into a disk address. This direct relationship is also used for very fast retrieval of individual records.

Methods for Accessing Files

Files can be accessed using different methods, which often correspond to their organization:

Serial Access: Accessing records one after another, from the beginning of the file. (e.g., tape drives)
Sequential Access: Accessing records in a specific, sorted order. (e.g., reading a list of names alphabetically)
Random (Direct) Access: Accessing any record directly without having to read through preceding records. (e.g., directly jumping to a specific customer record)

File Classification

Files can be classified into different types based on their purpose and how they are used in data processing systems:

a. Transaction Files

Transaction files contain details of all activities or changes that have occurred over a specific period. These files are typically temporary and are used to update master files. Examples include customer orders, daily sales records, or new data entries into a database.

Features of Transaction Files:

The data stored in these files are temporary by nature.
Any data to be modified or processed in a system is often first recorded in a transaction file.

b. Master Files

Master files are permanent files that hold stable, core data about entities within a system (e.g., customers, products, students). They are kept up-to-date by applying the changes recorded in transaction files.

Features of Master Files:

The data stored in these files is permanent by nature, representing the current state of information.
These files are updated only through recent transactions, not directly modified by general operations.
Master files typically store a large amount of crucial data.

c. Reference Files

Reference files are a specific type of master file containing referential data. They hold data that is necessary to support data processing but doesn't change frequently. This data is used for lookups or validation.

Features of Reference Files:

They are stable and permanent, rarely changing.
They contain data primarily used for citation, validation, or lookup purposes.
Examples: price lists, dictionary files, tax tables, inventory codes, lookup tables.

Criteria for File Classification

Files are typically classified based on several criteria to determine their role and appropriate handling in a system:

How the file is to be used: Its primary purpose (e.g., storing user data, running programs, configuring systems).
How many records are processed each time the file is updated: This influences the choice of file organization and access method.
Whether individual records need to be quickly accessible: Determines if direct or indexed access is required.
Nature of content: What kind of data it holds (e.g., text, images, executable code).
Organization method: How the data within the file is structured (e.g., sequential, random, indexed).
Storage medium: The type of storage device where the file resides (e.g., hard disk, SSD, cloud).

Search This Blog

cmpnote blog