Encoding Schemes

July 17, 2025
3 views
0 comments
Encoding Schemes

4.1 Introduction to Encoding Schemes

Computers operate using binary digits (bits – 0s and 1s). However, humans communicate using characters, symbols, and languages. To bridge this gap, encoding schemes are used to represent these human-readable characters as binary numbers that computers can understand and process. An encoding scheme defines a unique numeric code for each character.

4.2 ASCII (American Standard Code for Information Interchange)

  • Purpose: ASCII was one of the earliest and most widely used encoding standards, primarily designed for the English language.
  • Structure:
    • Standard ASCII: Uses 7 bits to represent 128 characters (codes 0-127). This includes:
      • Uppercase English letters (A-Z)
      • Lowercase English letters (a-z)
      • Digits (0-9)
      • Punctuation symbols (!, @, #, etc.)
      • Control characters (like newline, tab, backspace).
    • Extended ASCII: Uses 8 bits (1 byte) to represent 256 characters (codes 0-255). The first 128 characters are the same as standard ASCII, while the additional 128 characters were used for graphical symbols and letters with accents (e.g., é, ñ).
  • Limitations:
    • Limited character set: It cannot represent characters from languages other than English effectively.
    • Lack of support for symbols and characters used in many other languages.

4.3 ISCII (Indian Standard Code for Information Interchange)

  • Purpose: Developed in India to address the need for representing characters from various Indian scripts (Devanagari, Bengali, Tamil, Telugu, etc.).
  • Structure: ISCII is an 8-bit encoding standard. The first 128 codes are the same as ASCII. The remaining codes are used to represent characters from Indian scripts.
  • Key Feature: ISCII allows for transliteration – the conversion of text from one Indian script to another. This is useful for multilingual applications and data exchange.

4.4 Unicode

  • Purpose: Unicode is a modern, universal character encoding standard designed to represent every character from every language in the world. It aims to unify all other encoding schemes into one.
  • Concept: Unicode assigns a unique number, called a code point, to every character. For example, the code point for the letter 'A' is U+0041, and for the Devanagari letter 'क' is U+0915.
  • Advantages:
    • Universal character support: Can represent characters from all languages.
    • Solves compatibility issues: Eliminates the need for multiple encoding schemes.

Unicode Encoding Formats

The Unicode standard itself doesn't define how to store the code points in memory; that job is done by different Unicode encodings.

a) UTF-8 (Unicode Transformation Format - 8-bit)

  • Mechanism: UTF-8 is a variable-width encoding. This means it uses a variable number of bytes to represent each character.
    • For characters that are also in ASCII (like 'A', 'b', '7'), it uses only 1 byte. This makes it fully backward compatible with ASCII.
    • For other characters (like 'é', '€', 'क', '😂'), it uses 2, 3, or 4 bytes as needed.
  • Advantages:
    • Space-efficient: For text that is primarily English, a UTF-8 file is roughly the same size as an ASCII file.
    • Compatibility: Its backward compatibility with ASCII is a huge advantage.
    • Widely Used: It is the dominant character encoding for the World Wide Web.

b) UTF-16 (Unicode Transformation Format - 16-bit)

  • Mechanism: UTF-16 uses 2 or 4 bytes to represent each character. Most commonly used characters are represented using 2 bytes.
  • Advantages:
    • Efficient for languages with a large number of characters that fall within the Basic Multilingual Plane (BMP).
  • Disadvantages:
    • Less space-efficient for English text compared to UTF-8.

c) UTF-32 (Unicode Transformation Format - 32-bit)

  • Mechanism: UTF-32 is a fixed-width encoding. It uses exactly 4 bytes (32 bits) to store every single character, regardless of what that character is.
  • Advantages:
    • Simplicity: Since every character has the same length, finding the Nth character in a string is very simple and fast.
  • Disadvantages:
    • Inefficient Storage: It is very wasteful of space. A text file containing only the word "Hello" would take 5 * 4 = 20 bytes in UTF-32, whereas it would only take 5 bytes in UTF-8 or ASCII.
  • Usage: UTF-32 is rarely used for storing or transmitting data but may be used internally by some programs for easier processing.

Summary Table:

Encoding SchemeBits per CharacterCompatibility with ASCIISpace EfficiencyUsage
ASCII7 or 8YesHighOlder systems, limited applications
ISCII8YesModerateIndian languages
UTF-81-4YesVery HighWeb, general-purpose
UTF-162 or 4NoModerateWindows, Java
UTF-324NoLowInternal processing

ASCII Character Table (Standard 7-bit)

DecHexOctCharacterDescription
00000NULNull character
10101SOHStart of Heading
20202STXStart of Text
30303ETXEnd of Text
40404EOTEnd of Transmission
50505ENQEnquiry
60606ACKAcknowledge
70707BELBell
80810BSBackspace
90911HTHorizontal Tab
100A12LFLine Feed
110B13VTVertical Tab
120C14FFForm Feed
130D15CRCarriage Return
140E16SOShift Out
150F17SIShift In
161020DLEData Link Escape
171121DC1Device Control 1
181222DC2Device Control 2
191323DC3Device Control 3
201424DC4Device Control 4
211525NAKNegative Acknowledge
221626SYNSynchronous Idle
231727ETBEnd of Transmission Block
241830CANCancel
251931EMEnd of Medium
261A32SUBSubstitute
271B33ESCEscape
281C34FSFile Separator
291D35GSGroup Separator
301E36RSRecord Separator
311F37USUnit Separator
322040SpaceSpace
332141!Exclamation Mark
342242"Double Quote
352343#Number Sign/Hash
362444$Dollar Sign
372545%Percent Sign
382646&Ampersand
392747'Single Quote
402850(Left Parenthesis
412951)Right Parenthesis
422A52*Asterisk
432B53+Plus Sign
442C54,Comma
452D55-Hyphen/Minus Sign
462E56.Period/Dot
472F57/Slash
4830600Digit Zero
4931611Digit One
5032622Digit Two
5133633Digit Three
5234644Digit Four
5335655Digit Five
5436666Digit Six
5537677Digit Seven
5638708Digit Eight
5739719Digit Nine
583A72:Colon
593B73;Semicolon
603C74<Less-than Sign
613D75=Equals Sign
623E76>Greater-than Sign
633F77?Question Mark
6440100@At Sign
6541101AUppercase A
6642102BUppercase B
6743103CUppercase C
6844104DUppercase D
6945105EUppercase E
7046106FUppercase F
7147107GUppercase G
7248110HUppercase H
7349111IUppercase I
744A112JUppercase J
754B113KUppercase K
764C114LUppercase L
774D115MUppercase M
784E116NUppercase N
794F117OUppercase O
8050120PUppercase P
8151121QUppercase Q
8252122RUppercase R
8353123SUppercase S
8454124TUppercase T
8555125UUppercase U
8656126VUppercase V
8757127WUppercase W
8858130XUppercase X
8959131YUppercase Y
905A132ZUppercase Z
915B133[Left Square Bracket
925C134**Backslash
935D135]Right Square Bracket
945E136^Circumflex
955F137_Underscore
9660140`Grave Accent
9761141aLowercase a
9862142bLowercase b
9963143cLowercase c
10064144dLowercase d
10165145eLowercase e
10266146fLowercase f
10367147gLowercase g
10468150hLowercase h
10569151iLowercase i
1066A152jLowercase j
1076B153kLowercase k
1086C154lLowercase l
1096D155mLowercase m
1106E156nLowercase n
1116F157oLowercase o
11270160pLowercase p
11371161qLowercase q
11472162rLowercase r
11573163sLowercase s
11674164tLowercase t
11775165uLowercase u
11876166vLowercase v
11977167wLowercase w
12078170xLowercase x
12179171yLowercase y
1227A172zLowercase z
1237B173{Left Curly Brace
1247C174****
1257D175}Right Curly Brace
1267E176~Tilde
1277F177DELDelete

Arbind Singh

Teacher, Software developer

Innovative educator and tech enthusiast dedicated to empowering students through robotics, programming, and digital tools.

Comments (0)

You need to be signed in to post a comment.

Sign In

No comments yet

Be the first to share your thoughts and insights about this note!

Note Stats

Views3
Comments0
PublishedJuly 17, 2025

Related Notes

Introduction to Computer Science

Class 11 • Computer Science

Python Programming Basics

Class 12 • Computer Science

Database Management Systems

Class 12 • Informatics Practices

Part of Course

Foundations of Computer Science and Python Programming (CBSE Class XI - 083)

Price
Free