Packs data into a binary string or unpacks it displayng its contents in a pretty way. Supports recursions and more.
This tool takes a number of values (strings, ingeters, floats, etc.) and stores them in a single binary string. It also does the reverse, unpacking components from a binary string.
Basic syntax of format string resembles PHP's and Perl's packing functions (and is in fact backward-compatible). Howver, in addition to their basic features a number of strong enchancements such as recursive subformats and value-based repetitions were made.
i-Tools's pack: spartian but strong. ¶
Packing is a way of converting certain data (which might be textual or binary) into a binary form of specific structure. Conversely, unpacking is a process of extracting those components from a binary stream. ¶
To perform packing or unpacking the following two components are necessary: ¶
| c C s S n v | 8- and 16-bit numbers |
|---|---|
| i I l L N V | 32-bit numbers |
| f d | Float/double |
| a A | Regular strings |
| h H | Hexadecimal strings |
| @ | Offset |
| X | Backspace |
| x | NULL byte |
| R | Argument/value manipulation |
Note that the syntax of this tool's format string is compatible with PHP pack()'s but is significally extended. ¶
A format string designates how to pack or pull values in/out the data stream. To demonstrate its syntax lets look at the example (spaces are optional can be omitted): ¶
a SomeString * / cSomeNum3 / ... └ format └ value name └ length └ the beginning of next format └ next format
Or, in schematic notation: ¶
format [name] [length] [/ format [name] ...]
In other words, format string consists of components separated by a slash (/); each component is a format character optionally followed by its name and/or length. ¶
Note: names are case-sensitive. ¶
When format string is processed values are either appended to the end of current output buffer (when packing) or read from current input stream position (when unpacking). If name was not omitted a variable named accordingly is set to some specific value that depends on the format character used. This happens both when packing and unpacking although the exact value being set differs per-character – refer to corresponding format section for details or just experiment with the tool to find out. ¶
Sometimes length can also be specified as a name. In this case an expression like this: Xlength`name might look confusing so it might be better to think in terms of source and destination similar to MOV src, dest – Xsrc`dest. This is especially true with the R character (e.g. Rval`copy). ¶
Length (or repeater in PHP terminology) is size of data to be consumed from input stream (when unpacking) or the number of arguments to repeat the format for (when packing). This differs slightly for some format characters (like @ and a A h H) but generally follows the same idea. ¶
Length can be of 3 form: ¶
In addition to basic syntax explained above format strings can contain recursion subformats. This is done by placing parentheses (round brackets, ()) around the nested format; it's optional to put a normal format separator symbol (slash, /) before and after them.
Recursions can be nested. ¶
For example (try it out): ¶
VCount (`Files/ a*Name / VSize / VOffset)Count
└ 1st └ subformat name └ times to repeat
└ 1st └ 2nd └ 3rd
The above example describes a basic file table: ¶
Times to repeat works the same way length is specified for format characters – it can be either a number (0-8), an asterisk (*) or a name of previously read value (which can be a field read from within the same subformat – in this case its first value is stored and used as a counter so even if it's overriden (read over) later it won't be taken into account). ¶
Subformats can be named or unnamed. To name a subformat put a bactick (`) immediately after the opening bracket, then write desired name and close it off with a slash (/). As you can see, the example above uses a named subformat. ¶
When you name a subformat you create a namespace for it meaning that names inside will be relative to that subformat. When subformat is unnamed it uses its parent namespace (or global namespace if none).
As a consequence, values inside an unnamed repeating subformat override values previously read from within itself or its parent because there was no namespace created for it. ¶
For example, assuming the above sample format string and 3 files in a data stream the following values will be defined: ¶
However, if `Files/ part is removed from the subformat string turning it into an unnamed subformat only the last set of fields named just name, size and offset will be defined. ¶
Note: repeater of a subformat is relative to its parent, not the subformat is repeats: cCount(ccc)Count (demo) is correct while (`Sub/cCount/c3)Count is not (demo) – there's no «Count» value read within the global namespace. It could be written as (`Sub/cCount/c3)Sub#1:Count instead (demo) or just (cCount/c3)Count because unnamed subformats write values to their parent format (in this case there's none so global namespace is used) – demo. ¶
When referring to a value given name is expanded to form an absolute name. This happens transparently and unseen in format strings not using recursions. However, in more complex cases it's good to be aware of relative names. ¶
Any name can start with either of two characters: ¶
Prefix character (dot or backslash) can be repeated to go further up or down the stack. Different prefixes cannot be combined in one name: \\var is correct but ..\var is not. ¶
For example, if we take the following format string: ¶
(`f1/ (`f2/ (`f3/ c/f3Var) c/f2Var) c/f1Var) cGlobal
…then from within f3 subformat the following names properly resolve to corresponding values: f3Var, .f2Var, ..f1Var, ...Global, \Global, \\f1Var, \\\f2Var, \\\\f3Var. ¶
The above rule apply to any place where a value is referred to such as here: ¶
A slash synbol (/) separates format fields from each other (see the syntax). However, it's possible to write it in more concise notation if you don't want to name anything. ¶
Shortcut syntax is enabled for format strings consisting of valid format characters only (which are 0-9, * and format characters; /, @ and other symbols are not). Each value in this string will be unnamed. This allows shorter notations similar to that of PHP's pack() so VccA*s2 turns into V/c/c/A*/s2. ¶
Examples: ¶
Because shortcut is considered for a separate format it also works for recursions since it's essentially a set of formats. The following two expressions are the same: ¶
Although most format characters are the same as for PHP's pack() function some of them behave differently. ¶
There are 12 number formats which are as follows (corresponding to pack(), see also differences* from PHP): ¶
¶| Format | Length (in bytes) | Sign | Little/big-endian | C/PHP-style name |
|---|---|---|---|---|
| c | 1 | signed | — | signed char |
| C | 1 | unsigned | — | unsigned char |
| s | 2 | signed | LE* | signed short |
| S | 2 | unsigned | LE* | unsigned short |
| n | 2 | unsigned | BE | unsigned short |
| v | 2 | unsigned | LE | unsigned short |
| i | 4* | signed | LE* | unsigned integer |
| I | 4* | unsigned | LE* | unsigned integer |
| l | 4 | signed | LE* | signed long |
| L | 4 | unsigned | LE* | unsigned long |
| N | 4 | unsigned | BE | unsigned long |
| V | 4 | unsigned | LE | unsigned long |
| f | 4* | signed | —* | float |
| d | 8* | signed | —* | double |
There are 4 string formats. ¶
Tip: although, as explained later in this section, string formats operate on a single value you can read a series of strings using recursion: (`String/a*Substr)3 (demo). ¶
Tip: when packing a named regular string the length of packed string is stored in that value (excluding the terminator character for a & A). This allows packing of Pascal-style strings – more info here. ¶
a (NULL-padded string): a sequence of bytes (1 byte per character) terminated with one or more zero-characters. When packing, one terminator is appended, when unpacking all trailing terminators are removed.
Unlike PHP's pack() the terminator is written into the output stream after the string itself. If you want to remove it use the X format character. ¶
This format treats length following its character like this: ¶
A (space-padded string) is identical to a but terminator character is space (0x20) instead of NULL (0x00). ¶
H (hex string, high nibble first): ¶
h (hex string, low nibble first) is identical to H but swaps 4 bits on each byte's left and right. Compare: ¶
Two regular string formats, a & A, write C-style strings meaning that each string doesn't have a length field – its bytes just go in sequence and in the end there's either a NULL byte (a) or a space (A).
On teh contrary, Pascal strings have a length field (typically 1 byte for ANSI and 2 bytes for Unicode strings) which precedes the string's data. ¶
There's no direct way of operating on Pascal strings but there's a workaround involving argument manipulator metacharacter (R): ¶
To pack a Pascal string the following format string can be used (try it): ¶
R/a*`length/X/Xlength/Rlength`/c/a*/X
On the other hand, unpacking is more straightforward (try it): ¶
Clength/alength`
@ character's role is twofold. ¶
When packing it pads the output with NULL bytes unless it reaches the desired size (passed length). Additionally (this is extension to PHP's pack), if length passed starts with plus (+) synbol output stream is appended exactly the length passed.
If offset is omitted it's set to 1 (makes the stream at least 1 byte in size) – is also useful to store current output stream length in a value (like @`current_pos). ¶
Examples: @100 (ensures output is at least 100 bytes long), @+4 (inserts 4 NULL bytes into the output). ¶
When unpacking its length argument can have several different forms: ¶
Additionally, if you're specifying a number as offset you can specify it in 4 different notations: ¶
Keeping the above information in mind let's look at 4 possible forms of @ format character: ¶
@ [offset|value name`] [value to store new position in]
The R format character doesn't directly affect the input or output streams and is more of a metacharacter. It's used to manipulate arguments and values. Its behaviour is the same both when packing and unpacking. ¶
It's best to explain the work of this character by looking at its usages; let's assume that input arguments are 3 numbers: 1, 2, 3. ¶
For more examples see Pascal string packing. ¶
x: repeats NULL byte (0x00) given number of times (length) and appends it to the output (when packing) or, if named, sets corresponding value to it (when unpacking). ¶
Note: when packing x* and x0 do nothing. ¶
¶X (backspace): ¶
This section describes the differences between the syntax of format strings used by this tool and standard PHP pack() and unpack() functions. ¶
Headers (BITMAPFILEHEADER and BITMAPINFOHEADER) of a Windows bitmap (.bmp) file can be both packed and unpacked using the following format string (try out): ¶
C2signature / Vfile_size / S2reserved / Vcolor_data_offset / Vinfohdr_size / lwidth / Iheight / Splanes / Sbit_count / Vcompression / Vimage_size / lx_pels_per_meter / ly_pels_per_meter / Vused_color_count / Vimportant_color_count
Or, without names, just this: C2VS2VVllSSVVllVV (compare). ¶
For simple uncompressed RGB bitmaps you can also display all of its color data by using recursive subformats – append the following pattern to the above (try out): ¶
(`colors/Cred/Cgreen/Cblue)*
By using @-offset with a named argument you can skip most of the preamble and go straight to the color data (try out) ¶
@Ah / Vcolor_data_offset / @color_data_offset / (`colors/ Cred / Cgreen / Cblue)*
ID3-tagging is a well-known way of adding information to music tracks (mainly MP3). ID3v1 is the first version of the standard which is very simple to implement (both reading and writing) but is very limited as well. ID3v2 is much more extensible than ID3v1 but is more complex to deal with. For this demonstration we'll read ID3v1 (or, more specifically, ID3v1.1) tags here. ¶
ID3v1's technical side is described here. In short, a structure 128 bytes in size containing 7 fields is written to the end of (audio) file. In v1.1 there are 8 fields. ¶
You can unpack an ID3v1.1 information from a file by using this format string (try out): ¶
@* / @-127 / a3TAG / a30title / a30artist / a30album / a4year / a29comment / Ctrack_id / Cgenre
As explained here, @* construct seeks before the last byte in the file, then @-127 moves the pointer 127 bytes backwards (thus to the 128th byte from file end) where the ID3v1.1 structure is being read from. ¶
ZIP files are archives representing a basic file system. Its main parts are file records and directory records with directory records following file records. There might also be other record types (like ending record). ¶
In this demo we'll see how it's possible to read its list of files (file records). A ZIP file doesn't have a specific signature and starts right with the first file record which has a signature (2 symbols «PK» and bytes 03 04 following it), file name length, compressed and uncompressed data size and other fields (more info on Wikipedia). After a file record its compressed data follows, then next record follows and so on until the end of file. ¶
Directory entries usually follow all file records but they are less interesting in our demo so we won't consider them. ¶
Below is a format string that can be used to unpack file records (try it out): ¶
(`files/a4signature / vmin_version / vflags / vcomp_level / vtime/vdate / Vcrc / Vsize_comp / Vsize_uncomp / vname_length / vextra_length / aname_length`name / asize_comp`data)*
Note: since there is no field in ZIP specification that tells how many file records there are it's impossible to stop reading them during unpacking process. In the above sample a greedy repeater (*) is used to read as much records as possible and because of this last record will most likely contain junk meaning that there are no more file entries. ¶