4.1 Introduction to Encoding Schemes

Computers operate using binary digits (bits – 0s and 1s). However, humans communicate using characters, symbols, and languages. To bridge this gap, encoding schemes are used to represent these human-readable characters as binary numbers that computers can understand and process. An encoding scheme defines a unique numeric code for each character.

4.2 ASCII (American Standard Code for Information Interchange)

Purpose: ASCII was one of the earliest and most widely used encoding standards, primarily designed for the English language.
Structure:
- Standard ASCII: Uses 7 bits to represent 128 characters (codes 0-127). This includes:
  - Uppercase English letters (A-Z)
  - Lowercase English letters (a-z)
  - Digits (0-9)
  - Punctuation symbols (!, @, #, etc.)
  - Control characters (like newline, tab, backspace).
- Extended ASCII: Uses 8 bits (1 byte) to represent 256 characters (codes 0-255). The first 128 characters are the same as standard ASCII, while the additional 128 characters were used for graphical symbols and letters with accents (e.g., é, ñ).
Limitations:
- Limited character set: It cannot represent characters from languages other than English effectively.
- Lack of support for symbols and characters used in many other languages.

4.3 ISCII (Indian Standard Code for Information Interchange)

Purpose: Developed in India to address the need for representing characters from various Indian scripts (Devanagari, Bengali, Tamil, Telugu, etc.).
Structure: ISCII is an 8-bit encoding standard. The first 128 codes are the same as ASCII. The remaining codes are used to represent characters from Indian scripts.
Key Feature: ISCII allows for transliteration – the conversion of text from one Indian script to another. This is useful for multilingual applications and data exchange.

4.4 Unicode

Purpose: Unicode is a modern, universal character encoding standard designed to represent every character from every language in the world. It aims to unify all other encoding schemes into one.
Concept: Unicode assigns a unique number, called a code point, to every character. For example, the code point for the letter 'A' is U+0041, and for the Devanagari letter 'क' is U+0915.
Advantages:
- Universal character support: Can represent characters from all languages.
- Solves compatibility issues: Eliminates the need for multiple encoding schemes.

Unicode Encoding Formats

The Unicode standard itself doesn't define how to store the code points in memory; that job is done by different Unicode encodings.

a) UTF-8 (Unicode Transformation Format - 8-bit)

Mechanism: UTF-8 is a variable-width encoding. This means it uses a variable number of bytes to represent each character.
- For characters that are also in ASCII (like 'A', 'b', '7'), it uses only 1 byte. This makes it fully backward compatible with ASCII.
- For other characters (like 'é', '€', 'क', '😂'), it uses 2, 3, or 4 bytes as needed.
Advantages:
- Space-efficient: For text that is primarily English, a UTF-8 file is roughly the same size as an ASCII file.
- Compatibility: Its backward compatibility with ASCII is a huge advantage.
- Widely Used: It is the dominant character encoding for the World Wide Web.

b) UTF-16 (Unicode Transformation Format - 16-bit)

Mechanism: UTF-16 uses 2 or 4 bytes to represent each character. Most commonly used characters are represented using 2 bytes.
Advantages:
- Efficient for languages with a large number of characters that fall within the Basic Multilingual Plane (BMP).
Disadvantages:
- Less space-efficient for English text compared to UTF-8.

c) UTF-32 (Unicode Transformation Format - 32-bit)

Mechanism: UTF-32 is a fixed-width encoding. It uses exactly 4 bytes (32 bits) to store every single character, regardless of what that character is.
Advantages:
- Simplicity: Since every character has the same length, finding the Nth character in a string is very simple and fast.
Disadvantages:
- Inefficient Storage: It is very wasteful of space. A text file containing only the word "Hello" would take 5 * 4 = 20 bytes in UTF-32, whereas it would only take 5 bytes in UTF-8 or ASCII.
Usage: UTF-32 is rarely used for storing or transmitting data but may be used internally by some programs for easier processing.

Summary Table:

Encoding Scheme	Bits per Character	Compatibility with ASCII	Space Efficiency	Usage
ASCII	7 or 8	Yes	High	Older systems, limited applications
ISCII	8	Yes	Moderate	Indian languages
UTF-8	1-4	Yes	Very High	Web, general-purpose
UTF-16	2 or 4	No	Moderate	Windows, Java
UTF-32	4	No	Low	Internal processing

ASCII Character Table (Standard 7-bit)

Dec	Hex	Oct	Character	Description
0	00	00	NUL	Null character
1	01	01	SOH	Start of Heading
2	02	02	STX	Start of Text
3	03	03	ETX	End of Text
4	04	04	EOT	End of Transmission
5	05	05	ENQ	Enquiry
6	06	06	ACK	Acknowledge
7	07	07	BEL	Bell
8	08	10	BS	Backspace
9	09	11	HT	Horizontal Tab
10	0A	12	LF	Line Feed
11	0B	13	VT	Vertical Tab
12	0C	14	FF	Form Feed
13	0D	15	CR	Carriage Return
14	0E	16	SO	Shift Out
15	0F	17	SI	Shift In
16	10	20	DLE	Data Link Escape
17	11	21	DC1	Device Control 1
18	12	22	DC2	Device Control 2
19	13	23	DC3	Device Control 3
20	14	24	DC4	Device Control 4
21	15	25	NAK	Negative Acknowledge
22	16	26	SYN	Synchronous Idle
23	17	27	ETB	End of Transmission Block
24	18	30	CAN	Cancel
25	19	31	EM	End of Medium
26	1A	32	SUB	Substitute
27	1B	33	ESC	Escape
28	1C	34	FS	File Separator
29	1D	35	GS	Group Separator
30	1E	36	RS	Record Separator
31	1F	37	US	Unit Separator
32	20	40	Space	Space
33	21	41	!	Exclamation Mark
34	22	42	"	Double Quote
35	23	43	#	Number Sign/Hash
36	24	44	$	Dollar Sign
37	25	45	%	Percent Sign
38	26	46	&	Ampersand
39	27	47	'	Single Quote
40	28	50	(	Left Parenthesis
41	29	51	)	Right Parenthesis
42	2A	52	*	Asterisk
43	2B	53	+	Plus Sign
44	2C	54	,	Comma
45	2D	55	-	Hyphen/Minus Sign
46	2E	56	.	Period/Dot
47	2F	57	/	Slash
48	30	60	0	Digit Zero
49	31	61	1	Digit One
50	32	62	2	Digit Two
51	33	63	3	Digit Three
52	34	64	4	Digit Four
53	35	65	5	Digit Five
54	36	66	6	Digit Six
55	37	67	7	Digit Seven
56	38	70	8	Digit Eight
57	39	71	9	Digit Nine
58	3A	72	:	Colon
59	3B	73	;	Semicolon
60	3C	74	<	Less-than Sign
61	3D	75	=	Equals Sign
62	3E	76	>	Greater-than Sign
63	3F	77	?	Question Mark
64	40	100	@	At Sign
65	41	101	A	Uppercase A
66	42	102	B	Uppercase B
67	43	103	C	Uppercase C
68	44	104	D	Uppercase D
69	45	105	E	Uppercase E
70	46	106	F	Uppercase F
71	47	107	G	Uppercase G
72	48	110	H	Uppercase H
73	49	111	I	Uppercase I
74	4A	112	J	Uppercase J
75	4B	113	K	Uppercase K
76	4C	114	L	Uppercase L
77	4D	115	M	Uppercase M
78	4E	116	N	Uppercase N
79	4F	117	O	Uppercase O
80	50	120	P	Uppercase P
81	51	121	Q	Uppercase Q
82	52	122	R	Uppercase R
83	53	123	S	Uppercase S
84	54	124	T	Uppercase T
85	55	125	U	Uppercase U
86	56	126	V	Uppercase V
87	57	127	W	Uppercase W
88	58	130	X	Uppercase X
89	59	131	Y	Uppercase Y
90	5A	132	Z	Uppercase Z
91	5B	133	[	Left Square Bracket
92	5C	134	**	Backslash
93	5D	135	]	Right Square Bracket
94	5E	136	^	Circumflex
95	5F	137	_	Underscore
96	60	140	`	Grave Accent
97	61	141	a	Lowercase a
98	62	142	b	Lowercase b
99	63	143	c	Lowercase c
100	64	144	d	Lowercase d
101	65	145	e	Lowercase e
102	66	146	f	Lowercase f
103	67	147	g	Lowercase g
104	68	150	h	Lowercase h
105	69	151	i	Lowercase i
106	6A	152	j	Lowercase j
107	6B	153	k	Lowercase k
108	6C	154	l	Lowercase l
109	6D	155	m	Lowercase m
110	6E	156	n	Lowercase n
111	6F	157	o	Lowercase o
112	70	160	p	Lowercase p
113	71	161	q	Lowercase q
114	72	162	r	Lowercase r
115	73	163	s	Lowercase s
116	74	164	t	Lowercase t
117	75	165	u	Lowercase u
118	76	166	v	Lowercase v
119	77	167	w	Lowercase w
120	78	170	x	Lowercase x
121	79	171	y	Lowercase y
122	7A	172	z	Lowercase z
123	7B	173	{	Left Curly Brace
124	7C	174	**	**
125	7D	175	}	Right Curly Brace
126	7E	176	~	Tilde
127	7F	177	DEL	Delete

Encoding Schemes

4.1 Introduction to Encoding Schemes

4.2 ASCII (American Standard Code for Information Interchange)

4.3 ISCII (Indian Standard Code for Information Interchange)

4.4 Unicode

ASCII Character Table (Standard 7-bit)

Arbind Singh

Comments (0)

No comments yet

Note Stats

Related Notes

Introduction to Computer Science

Python Programming Basics

Database Management Systems

Part of Course

Foundations of Computer Science and Python Programming (CBSE Class XI - 083)