Scratch Project File Format Copyright (c) 2007 Massachusetts Institute of Technology John Maloney August, 2007 1. Overview Scratch projects use a binary object serialization format called an "object store" that records an arbitrary network of objects with interconnecting pointers. An object store typically includes both fixed-format objects (e.g. strings) and "user-class" objects, objects whose formats can evolve over time (e.g sprites). User-class objects include a version number that allows later versions of the software to read objects in older formats. This design has allowed Scratch to evolve gracefully over the course of many years. The result of reading an object store is an array of objects called an object table. The first entry in this table is the root object of the network of objects that was stored. 2. Project File Structure A Scratch project file has the following top-level structure: header (10 bytes) the ASCII string "ScratchV01" infoSize (4 bytes) 32-bit big-endian integer infoObjects (infoSize bytes) object store for info (author, notes, thumbnail, etc.) contentsObjects (remaining bytes) object store for contents, including the stage, sprites, and media 3. Info Object Store The info object store contains information about the project such as the project author, notes, and a thumbnail image of the project. This information is separated from the project contents to make it fast and easy to display a project preview in the open dialog. The first object table entry is a Dictionary, an alternating sequence of keys (strings) and values. Keys currently in use include: "thumbnail" image showing a small picture of the stage when the project was saved "author" name of the user who saved or shared this project "comment" author's comments about the project "history" a string containing the project save/upload history "scratch-version" the version of Scratch that saved the project This set of keys has changed over time. Older projects may contain keys not listed here. 4. Contents Object Store The contents object store contains the stage, sprites, sounds, and images in the project. The first object table entry is the stage object. This includes the stage backgrounds, sounds, and scripts, plus a list of the objects on the stage (in the "submorphs" field), including sprites and variable watchers. The types of objects that can appear on the stage have evolved over time. 5. Object Store Format Note: For additional details, see ObjStream class in the Smalltalk code or the ObjReader class in the Java code. An object store holds a table of serialized objects. The first object in this table is the "root" object that was serialized; the rest of the objects in the table are objects reachable via pointers from this root object. Inter-object references are stored as indices into this table. These indices are mapped to actual object references when the structure is deserialized. (Note: Unlike C or Java arrays, the first index in this table is 1, not 0.) Objects are stored as a sequence of bytes in one of these formats: a. immediate values: the constants nil, true, false, integers, and floats b. fixed-format objects whose serialization format does not change (e.g strings or arrays) c. user-class objects, whose formats may change over time d. object references Every serialized value begins with a one-byte classID that determines its format. 5.1 Object store header Each object store starts with the ten byte sequence: 79, 98, 106, 83, 1, 83, 116, 99, 104, 1 This corresponds to the string "ObjS", the byte 1, the string "Stch", and the byte 1. 5.2 Immediate values Immediate values are encoded in-line; they do not appear in the object table. Immediate values include nil, booleans, integers, large integers, and floats. 5.3 Fixed-format objects Fixed-format objects have stable storage formats, so they do not need a version number. These objects are stored in the format: <...data...> In some cases, the data is of fixed size (e.g. a float is always 8 bytes). In other cases, the object's representation includes a field count (e.g. a string or array). The data of a fixed-format object may include both immediate values (e.g. integers) or references to other objects in the object table (section 5.5). Example: The string "cat" is encoded as eight bytes: 9, 0, 0, 0, 3, 99, 97, 116. 5.4 User-class objects User-class objects have representations that may evolve over time. These objects are stored in the format: <...field objects...> The fields of a user-defined object may include both immediate values (e.g. integers) or references to other objects in the object table (section 5.5). 5.5 Object References An object reference allows a field in one object to contain a pointer to another object. It has the following format: <99: 1 byte constant> The value 99 is a reserved classID value used to indicate an object reference. The first object table index is 1, unlike C or Java arrays where the first entry is at index 0. Example: An object reference to the second entry in the object table is encoded as four bytes: 99, 0, 0, 0, 2. 5.6 Reading an Object Table Reading an object table is usually done in several passes. The first pass builds the object table, creating an entry for each object and creating the resulting object. During this first pass, any fields in the resulting object that refer to other objects are recorded but not resolved (since they may be forward references to objects that have not yet been created). In the case of the Java code, a second pass is made to convert images and sounds into the equivalent Java media objects. Another pass over the object table is made to dereference object references in the fields of fixed-format objects and the fields list user-class objects. Finally, the client code scans the object table to extract the stage and sprites, along with their costumes, sounds, and scripts. (In the Java player, this last step is done by code written in Logo.) Appendix 1: Fixed-format Class IDs Here is a current list of fixed-format class ID's. Please refer to the Smalltalk class ObjStream or the Java class ObjReader for the data formats for these objects. 1 nil 2 True 3 False 4 SmallInteger 5 SmallInteger16 6 LargePositiveInteger 7 LargeNegativeInteger 8 Float 9 String 10 Symbol 11 ByteArray 12 SoundBuffer 13 Bitmap 12-19 reserved 20 Array 21 OrderedCollection 22 Set 23 IdentitySet 24 Dictionary 25 IdentityDictionary 26-29 reserved 30 Color 31 TranslucentColor 32 Point 33 Rectangle 34 Form 35 ColorForm 36-98 reserved Appendix 2: User-class IDs Here is a current list of user-class ID's. User-class ID's are in the range (100..255). Some of the user-class ID's below 175 that do not appear below were used in older versions of Scratch and many of the ones listed here are not currently in use. Common classes in current use are marked with and asterisk (*). All user-class ID's not listed here are reserved for future use. Please refer to the Smalltalk or Java code for the formats of these objects. 100 Morph* 101 BorderedMorph 102 RectangleMorph 103 EllipseMorph 104 AlignmentMorph* 105 StringMorph* 106 UpdatingStringMorph* 107 SimpleSliderMorph 108 SimpleButtonMorph 109 SampledSound* 110 ImageMorph* 111 SketchMorph 123 SensorBoardMorph* 124 ScratchSpriteMorph* 125 ScratchStageMorph* 140 ChoiceArgMorph 141 ColorArgMorph 142 ExpressionArgMorph 145 SpriteArgMorph 147 BlockMorph 148 CommandBlockMorph 149 CBlockMorph 151 HatBlockMorph 153 ScratchScriptsMorph* 154 ScratchSliderMorph 155 WatcherMorph* 157 SetterBlockMorph 158 EventHatMorph 160 VariableBlockMorph 162 ImageMedia* 163 MovieMedia 164 SoundMedia* 165 KeyEventHatMorph 166 BooleanArgMorph 167 EventTitleMorph 168 MouseClickEventHatMorph 169 ExpressionArgMorphWithMenu 170 ReporterBlockMorph 171 MultilineStringMorph* 172 ToggleButton 173 WatcherReadoutFrameMorph* 174 WatcherSliderMorph* Appendix 3: Object store example Here is an annotated example of an object store. The top-level object is a SampledSound object. This object contains references to an empty array of envelopes and an empty sound buffer. 79 98 106 83 1 83 116 99 104 1 ; object store header 0 0 0 3 ; object table size, 3 objects 109 1 8 ; [1] a user-class SampledSound object, version 1, 8 fields 99 0 0 2 ; evelopes: ref to object table entry 2 4 0 0 128 0 ; scaledVol: 32768 5 0 0 ; initialCount: 0 99 0 0 3 ; samples: ref to object table entry 3 5 86 34 ; sampilingRate: 22050 5 0 0 ; samplesSize: 0 4 0 1 0 0 ; scaledIncrement: 65536 1 ; scaledInitialIndex: nil 20 0 0 0 0 ; [2] a zero-length array 12 0 0 0 0 ; [3] a zero-length sound buffer