Lexemes are said to be a sequence of characters (alphanumeric) in a token. There are some predefined rules for every lexeme to be identified as a valid token. These rules are defined by grammar rules, by means of a pattern. A pattern explains what can be a token, and these patterns are defined by means of regular expressions.

In programming language, keywords, constants, identifiers, strings, numbers, operators and punctuations symbols can be considered as tokens.

For example, in C language, the variable declaration line

int value = 100;

contains the tokens:

int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

**Specifications of Tokens**

Let us understand how the language theory undertakes the following terms:

**Alphabets**

Any finite set of symbols {0,1} is a set of binary alphabets, {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets, {a-z, A-Z} is a set of English language alphabets.

**Strings**

Any finite sequence of alphabets is called a string. Length of the string is the total number of occurrence of alphabets, e.g., the length of the string tutorialspoint is 14 and is denoted by |tutorialspoint| = 14. A string having no alphabets, i.e. a string of zero length is known as an empty string and is denoted by ε (epsilon).

**Special Symbols**

A typical high-level language contains the following symbols:-

Arithmetic Symbols | Addition(+), Subtraction(-), Modulo(%), Multiplication(*), Division(/) |

Punctuation | Comma(,), Semicolon(;), Dot(.), Arrow(->) |

Assignment | = |

Special Assignment | +=, /=, *=, -= |

Comparison | ==, !=, <, <=, >, >= |

Preprocessor | # |

Location Specifier | & |

Logical | &, &&, |, ||, ! |

Shift Operator | >>, >>>, <<, <<< |