See also

  1. Unserialize

Pack/unpack binary data

Data streams

Packs data into a binary string or unpacks it displayng its contents in a pretty way. Supports recursions and more.

This tool takes a number of values (strings, ingeters, floats, etc.) and stores them in a single binary string. It also does the reverse, unpacking components from a binary string.

Basic syntax of format string resembles PHP's and Perl's packing functions (and is in fact backward-compatible). Howver, in addition to their basic features a number of strong enchancements such as recursive subformats and value-based repetitions were made.

  • input values (when packing) should be put one per line (leading & trailing spaces are ignored but you can use \x20); integers and floats are detected automatically but to force them as strings prefix with a double apostrophe (e.g.: "1) which is removed. Escape sequences (e.g. \xFF) are processed so you can use \x22 code to represent " if you want to have a string like "1 untouched.
    • different notations can be used for numbers: h, b and o suffixes are recognized, e.g. FFh defines a number that is autoconverted from hex to dec notation (255).
    • you can insert ANSI strings to be autoconverted into hex – wrap them into two single (') or double (") quotes. Example: 00 00 'STR' 00.
  • packed string (when unpacking) is a sequence of bytes represented according to the Hex format option below. New lines, spaces and character case are ignored.
Ctrl+Shift+E ↔ Presets:bmp?fat?id3v1?zip?

Input values (when packing) have the same format as if they were entered using the previous option.

Max size of the input data is 10 MiB.

clear
clear
clear

Wrappers are executed in sequence on shared data string. For example, to supply base64-encoded stream of GZIP-compressed data enter wrappers: base64, gzip and fill the data input with the base64 stream. 

If data is not entered information from other tabs (upload, by URL and direct) is used to replace data on this page. This lets you enter gzip as a wrapper here and upload a file using upload tab while leaving data on this page empty. 

Currently supported wrappers are: base64, bzip, cslashes, datauri, direct, ftp, ftps, gzip, hex2bin, http, https, qprintable, upload, url, urlencoding, zlib

  • Upload
  • By URL
  • Direct input
  • Custom
Ctrl+Shift+E ↔

Pocket names can be of any language, e.g.: aСтрока/cБайт.

Ctrl+Shift+E ↔
aAhHcCsSnviIlLNVfdxX@R
Repeaters: 0123456789*Name: /` | Clear
Clear
Pack/unpack binary data
  1. 1. Quick format characters lookup
  2. 2. Format string
    1. 2.1. Specifying length (repeater)
    2. 2.2. Recursion
    3. 2.3. Relative names
    4. 2.4. Shortcut syntax
  3. 3. Format characters
    1. 3.1. Numbers
    2. 3.2. Strings
      1. 3.2.1. Regular (a A)
      2. 3.2.2. Hex (h H)
      3. 3.2.3. Pascal-style strings
    3. 3.3. Offset (@)
    4. 3.4. Argument/value (R)
    5. 3.5. Others (x X)
  4. 4. Differences from PHP
  5. 5. Sample format strings
    1. 5.1. Windows Bitmap
    2. 5.2. ID3v1
    3. 5.3. ZIP (PKZIP)

i-Tools's pack: spartian but strong. 

Packing is a way of converting certain data (which might be textual or binary) into a binary form of specific structure. Conversely, unpacking is a process of extracting those components from a binary stream. 

To perform packing or unpacking the following two components are necessary: 

  1. Format string which explains how exactly the data is to be (for packing) or was (for unpacking) packed. Mainly consists of format characters with possible recursions.
  2. Input values (when packing) or input stream (when unpacking).

Quick format characters lookup 

c C s S n v 8- and 16-bit numbers
i I l L N V 32-bit numbers
f d Float/double
a A Regular strings
h H Hexadecimal strings
@ Offset
X Backspace
x NULL byte
R Argument/value manipulation

Format string 

Note that the syntax of this tool's format string is compatible with PHP pack()'s but is significally extended. 

A format string designates how to pack or pull values in/out the data stream. To demonstrate its syntax lets look at the example (spaces are optional can be omitted): 

a         SomeString    *         /                               cSomeNum3 / ...
└ format  └ value name  └ length  └ the beginning of next format  └ next format

Or, in schematic notation: 

format [name] [length] [/ format [name] ...]

In other words, format string consists of components separated by a slash (/); each component is a format character optionally followed by its name and/or length. 

Note: names are case-sensitive. 

When format string is processed values are either appended to the end of current output buffer (when packing) or read from current input stream position (when unpacking). If name was not omitted a variable named accordingly is set to some specific value that depends on the format character used. This happens both when packing and unpacking although the exact value being set differs per-character – refer to corresponding format section for details or just experiment with the tool to find out. 

Sometimes length can also be specified as a name. In this case an expression like this: Xlength`name might look confusing so it might be better to think in terms of source and destination similar to MOV src, destXsrc`dest. This is especially true with the R character (e.g. Rval`copy). 

Specifying length (repeater) 

Length (or repeater in PHP terminology) is size of data to be consumed from input stream (when unpacking) or the number of arguments to repeat the format for (when packing). This differs slightly for some format characters (like @ and a A h H) but generally follows the same idea. 

Length can be of 3 form: 

  1. Fixed number of repetitions specified by a positive number (without sign): V1, a50.
    • the @ format adds more possibilities here.
  2. Greedy length (also greedy repeater) taking up all available arguments (packing) or all remaining input (unpacking).
    • note that string formats always operate on a single argument thus their length refers to the number of characters to take from that argument-string.
  3. Name of previously read value – in this case it's separated from the name of this value itself by a trailing backtick (`) so it's not mistaken for the value name: VLength/aLength`String – here are 2 values:
    1. V named «Length»;
    2. a named «String» which takes «Length» bytes from the input stream.
    3. If you are not giving a name you can just leave «`» at the end: VLength/aLength`

Recursion 

In addition to basic syntax explained above format strings can contain recursion subformats. This is done by placing parentheses (round brackets, ()) around the nested format; it's optional to put a normal format separator symbol (slash, /) before and after them.
Recursions can be nested

For example (try it out): 

VCount (`Files/ a*Name / VSize / VOffset)Count
└ 1st   └ subformat name                 └ times to repeat
                └ 1st    └ 2nd   └ 3rd

The above example describes a basic file table

  1. First goes a DWord which specifies how many file records there are;
  2. Then file table starts and repeats for count times (the value of VCount read earlier):
    1. a*Name is null-terminated file name (see regular strings);
    2. VSize is file size (see numeric format characters);
    3. VOffset is file offset in an imaginary data stream.

Times to repeat works the same way length is specified for format characters – it can be either a number (0-8), an asterisk (*) or a name of previously read value (which can be a field read from within the same subformat – in this case its first value is stored and used as a counter so even if it's overriden (read over) later it won't be taken into account). 

Subformats can be named or unnamed. To name a subformat put a bactick (`) immediately after the opening bracket, then write desired name and close it off with a slash (/). As you can see, the example above uses a named subformat. 

When you name a subformat you create a namespace for it meaning that names inside will be relative to that subformat. When subformat is unnamed it uses its parent namespace (or global namespace if none).
As a consequence, values inside an unnamed repeating subformat override values previously read from within itself or its parent because there was no namespace created for it. 

For example, assuming the above sample format string and 3 files in a data stream the following values will be defined: 

  1. Files#1:name, Files#1:size and Files#1:offset – 3 fields where actual field name goes after a colon.
  2. Files#2:name and so on – for the second set of fields.
  3. Files#3:name and so on – for the third file record.

However, if `Files/ part is removed from the subformat string turning it into an unnamed subformat only the last set of fields named just name, size and offset will be defined. 

Note: repeater of a subformat is relative to its parent, not the subformat is repeats: cCount(ccc)Count (demo) is correct while (`Sub/cCount/c3)Count is not (demo) – there's no «Count» value read within the global namespace. It could be written as (`Sub/cCount/c3)Sub#1:Count instead (demo) or just (cCount/c3)Count because unnamed subformats write values to their parent format (in this case there's none so global namespace is used) – demo. 

Relative names 

When referring to a value given name is expanded to form an absolute name. This happens transparently and unseen in format strings not using recursions. However, in more complex cases it's good to be aware of relative names

Any name can start with either of two characters: 

Prefix character (dot or backslash) can be repeated to go further up or down the stack. Different prefixes cannot be combined in one name: \\var is correct but ..\var is not. 

For example, if we take the following format string: 

(`f1/ (`f2/ (`f3/ c/f3Var) c/f2Var) c/f1Var) cGlobal

…then from within f3 subformat the following names properly resolve to corresponding values: f3Var, .f2Var, ..f1Var, ...Global, \Global, \\f1Var, \\\f2Var, \\\\f3Var

The above rule apply to any place where a value is referred to such as here: 

Shortcut syntax 

A slash synbol (/) separates format fields from each other (see the syntax). However, it's possible to write it in more concise notation if you don't want to name anything. 

Shortcut syntax is enabled for format strings consisting of valid format characters only (which are 0-9, * and format characters; /, @ and other symbols are not). Each value in this string will be unnamed. This allows shorter notations similar to that of PHP's pack() so VccA*s2 turns into V/c/c/A*/s2

Examples: 

V/C*
a DWord followed by a greedy byte (digits instead of asterisk would be also fine here)
VC*
the same as above
VC*/
one value: dword named «C*» (format string contains "/" albeit there's nothing after it)
Vz
one value: dword named «z» (since there's no format char «z» it's always a value name)

Because shortcut is considered for a separate format it also works for recursions since it's essentially a set of formats. The following two expressions are the same: 

Format characters 

Although most format characters are the same as for PHP's pack() function some of them behave differently

Numbers 

There are 12 number formats which are as follows (corresponding to pack(), see also differences* from PHP): 

Format Length (in bytes) Sign Little/big-endian C/PHP-style name
c 1 signed signed char
C 1 unsigned unsigned char
s 2 signed LE* signed short
S 2 unsigned LE* unsigned short
n 2 unsigned BE unsigned short
v 2 unsigned LE unsigned short
i 4* signed LE* unsigned integer
I 4* unsigned LE* unsigned integer
l 4 signed LE* signed long
L 4 unsigned LE* unsigned long
N 4 unsigned BE unsigned long
V 4 unsigned LE unsigned long
f 4* signed * float
d 8* signed * double

Strings 

There are 4 string formats. 

Tip: although, as explained later in this section, string formats operate on a single value you can read a series of strings using recursion: (`String/a*Substr)3 (demo). 

Tip: when packing a named regular string the length of packed string is stored in that value (excluding the terminator character for a & A). This allows packing of Pascal-style strings – more info here. 

Regular (a A) 

a (NULL-padded string): a sequence of bytes (1 byte per character) terminated with one or more zero-characters. When packing, one terminator is appended, when unpacking all trailing terminators are removed.
Unlike PHP's pack() the terminator is written into the output stream after the string itself. If you want to remove it use the X format character

This format treats length following its character like this: 

A (space-padded string) is identical to a but terminator character is space (0x20) instead of NULL (0x00). 

Hex (h H) 

H (hex string, high nibble first)

h (hex string, low nibble first) is identical to H but swaps 4 bits on each byte's left and right. Compare: 

Pascal-style strings 

Two regular string formats, a & A, write C-style strings meaning that each string doesn't have a length field – its bytes just go in sequence and in the end there's either a NULL byte (a) or a space (A).
On teh contrary, Pascal strings have a length field (typically 1 byte for ANSI and 2 bytes for Unicode strings) which precedes the string's data. 

There's no direct way of operating on Pascal strings but there's a workaround involving argument manipulator metacharacter (R)

To pack a Pascal string the following format string can be used (try it): 

R/a*`length/X/Xlength/Rlength`/c/a*/X
  1. First, next string argument to be packed is copied (R).
  2. Then it's packed and its length (not including trailing NULL character) is stored in value «length» (a*`length).
  3. After that it's removed from the output stream by using backspace format character – at first string's trailing terminator is cut off (X) and then its character bytes are removed (Xlength).
  4. String length stored in «length» is pushed in front of other arguments (Rlength`).
  5. Finally, the string which argument was copied in step #1 is written again, this time preceded by its length (clength).
  6. As the final step newly written string's terminator is removed (X) because Pascal string doesn't use one.

On the other hand, unpacking is more straightforward (try it): 

Clength/alength`

Offset (@) 

@ character's role is twofold. 

When packing it pads the output with NULL bytes unless it reaches the desired size (passed length). Additionally (this is extension to PHP's pack), if length passed starts with plus (+) synbol output stream is appended exactly the length passed.
If offset is omitted it's set to 1 (makes the stream at least 1 byte in size) – is also useful to store current output stream length in a value (like @`current_pos). 

Examples: @100 (ensures output is at least 100 bytes long), @+4 (inserts 4 NULL bytes into the output). 

When unpacking its length argument can have several different forms: 

  1. Empty (omitted) – @ saves current stream position to a value if @ is named; does nothing otherwise. Example: @`current.
  2. Starting with plus (+) or minus (–) sign – current pointer is moved forward or backward the given length so that later readings will occur at that point. Remember that once current position is moved beyond the stream's end unpacking stops.
    • example: Vdword/@-4/Vits_copy – reads a value, moves 4 bytes backwards and rereads it to a value with different name.
  3. Asterisk (*) – seeks to the last byte of the input stream: @*/cLast_byte. See also ID3v1 tag reading.
  4. Fixed number – seeks to an absolute position: @1024.
  5. Other (non-numeric) – the same as fixed number but the new position is read from variable with given name: @offset.

Additionally, if you're specifying a number as offset you can specify it in 4 different notations: 

@255
decimal
@FFh
hex
@377o
octal
@11111111b
binary

Keeping the above information in mind let's look at 4 possible forms of @ format character

@ [offset|value name`] [value to store new position in]
  1. @ – when packing ensures output isn't empty; when unpacking shows current position.
  2. @name or @377o:
    • when packing, ensures the stream is at least name or 255 bytes long;
    • when unpacking seeks (without saving new position) to value name or 256th byte (@0 seeks to the 1st byte).
  3. @`cur – stores current stream size (packing) or position (unpacking) into value cur.
  4. @+4`new_size:
    • when packing, enlarges output by 4 NULL bytes and stores its new length to new_size;
    • when unpacking, seeks 4 bytes forward (might be after the stream's end) and stores new position to new_size.

Argument/value (R

The R format character doesn't directly affect the input or output streams and is more of a metacharacter. It's used to manipulate arguments and values. Its behaviour is the same both when packing and unpacking

It's best to explain the work of this character by looking at its usages; let's assume that input arguments are 3 numbers: 1, 2, 3

R or R1
Duplicate next input argument; new arguments would be 1, 1, 2, 3.
R3
Copies 3rd argument in front of the first one: 3 1 2 3. Similarly, R2 would result in 2 1 2 3.
  • not existing arguments won't be copied: R4 does nothing.
R-1
Removes next (first) argument as if it was consumed by a normal format character. Result: 2 3. R-2 removes the second argument only: 1 3.
  • as in previous case, R-4 and below do nothing.
  • tip: you can remove four next arguments by using recursion: (R-1)4; the same trick allows removing of 4 arguments behind the current: (R-2)4and so on.
R`name or R1`name
Copy next (first) argument into value «name». Arguments that don't exist set the destination to an empty string.
Rfrom_val`to_val
Copies one value to another. Undeclared values are empty strings.
Rname` or Rname`1
Copy value «name» in front of the first input argument: value_of_name 1 2 3.
Rname`3
Puts value «name» in front of argument #3: 1 2 value_of_name 3.
  • if argument number points behind the last argument the value is inserted after all of them.
R3`2
Copies argument #3 before argument #2: 1 3 2 3.
R-3`2
Moves argument #3 before argument #2 (by copying it first and then removing): 1 3 2.
R-3`name
Moves argument #3 into value «name»: 1 2 and «name» = «3».

For more examples see Pascal string packing

Others (x X) 

x: repeats NULL byte (0x00) given number of times (length) and appends it to the output (when packing) or, if named, sets corresponding value to it (when unpacking). 

Note: when packing x* and x0 do nothing. 

X (backspace): 

Differences from PHP 

This section describes the differences between the syntax of format strings used by this tool and standard PHP pack() and unpack() functions. 

Sample format strings 

Windows Bitmap 

Headers (BITMAPFILEHEADER and BITMAPINFOHEADER) of a Windows bitmap (.bmp) file can be both packed and unpacked using the following format string (try out): 

C2signature / Vfile_size / S2reserved / Vcolor_data_offset / Vinfohdr_size / lwidth / Iheight / Splanes / Sbit_count / Vcompression / Vimage_size / lx_pels_per_meter / ly_pels_per_meter / Vused_color_count / Vimportant_color_count

Or, without names, just this: C2VS2VVllSSVVllVV (compare). 

For simple uncompressed RGB bitmaps you can also display all of its color data by using recursive subformats – append the following pattern to the above (try out): 

(`colors/Cred/Cgreen/Cblue)*

By using @-offset with a named argument you can skip most of the preamble and go straight to the color data (try out

@Ah / Vcolor_data_offset / @color_data_offset / (`colors/ Cred / Cgreen / Cblue)*

ID3v1 

ID3-tagging is a well-known way of adding information to music tracks (mainly MP3). ID3v1 is the first version of the standard which is very simple to implement (both reading and writing) but is very limited as well. ID3v2 is much more extensible than ID3v1 but is more complex to deal with. For this demonstration we'll read ID3v1 (or, more specifically, ID3v1.1) tags here. 

ID3v1's technical side is described here. In short, a structure 128 bytes in size containing 7 fields is written to the end of (audio) file. In v1.1 there are 8 fields

You can unpack an ID3v1.1 information from a file by using this format string (try out): 

@* / @-127 / a3TAG / a30title / a30artist / a30album / a4year / a29comment / Ctrack_id / Cgenre

As explained here, @* construct seeks before the last byte in the file, then @-127 moves the pointer 127 bytes backwards (thus to the 128th byte from file end) where the ID3v1.1 structure is being read from. 

ZIP (PKZIP) 

ZIP files are archives representing a basic file system. Its main parts are file records and directory records with directory records following file records. There might also be other record types (like ending record). 

In this demo we'll see how it's possible to read its list of files (file records). A ZIP file doesn't have a specific signature and starts right with the first file record which has a signature (2 symbols «PK» and bytes 03 04 following it), file name length, compressed and uncompressed data size and other fields (more info on Wikipedia). After a file record its compressed data follows, then next record follows and so on until the end of file. 

Directory entries usually follow all file records but they are less interesting in our demo so we won't consider them. 

Below is a format string that can be used to unpack file records (try it out): 

(`files/a4signature / vmin_version / vflags / vcomp_level / vtime/vdate / Vcrc / Vsize_comp / Vsize_uncomp / vname_length / vextra_length / aname_length`name / asize_comp`data)*

Note: since there is no field in ZIP specification that tells how many file records there are it's impossible to stop reading them during unpacking process. In the above sample a greedy repeater (*) is used to read as much records as possible and because of this last record will most likely contain junk meaning that there are no more file entries