Go Built-in Data Types Implementation

Go Built-in Data Types Implementation

Go Built-in Data Types Implementation

All built-in extended types are structs.

String

struct string {
  byte* str;
  intgo len;
}

String constants usually live in read-only sections (.rodata) allocated by the compiler. The corresponding data addresses are not writable. Strings generated by fmt.Sprintf live on the heap, and their data addresses can be modified.

Underlying structure (the structure copied when assigning strings):

type StringHeader struct {
	Data uintptr
	Len  int
}

String and byte conversion:

//return GoString's buffer slice(enable modify string)
func StringBytes(s string) Bytes {
    return *(*Bytes)(unsafe.Pointer(&s))
}

// convert b to string without copy
func BytesString(b []byte) String {
    return *(*String)(unsafe.Pointer(&b))
}

func unsafeEqual(a string, b []byte) bool {
    bbp := *(*string)(unsafe.Pointer(&b))
    return a == bbp
}

Converting []rune to string always causes memory reallocation because of different memory structures:

var p []byte
buf := make([]byte, 3)
for _, r := range s {
	n := utf8.EncodeRune(buf, r)
	p = append(p, buf[:n]...)
}
return string(p)

String assignment only copies the data address and length. It does not cause the underlying byte array to be copied. Strings also support slice operations.

Arrays

Multiple initialization methods exist:

var a [3]int // Initialize to zero values, 3 elements
var b = [...]int{1, 2, 3} // Literal initialization
var c = [...]int{2: 3, 1: 2} // Index:value initialization, unspecified ones are 0

Slices

type slice struct {
	array unsafe.Pointer
	len   int
	cap   int
}

Iterating with for-range performs better because it skips bounds checking.

Iterate by length count only, no extra memory cost:

for range times {
	// do something
}

Append supports chaining operations, but generates intermediate temporary slices when adding. Use this approach to insert elements:

a = append(a, 0)
copy(a[i+1:], a[i:]) // Move elements backward
a[i] = x // Then insert
a = a[:copy(a, a[N:])] // Delete first N elements

When checking if a slice is empty, get the slice length through len. We rarely compare slices directly with nil.

Use zero-length slices for in-place modification:

// The underlying array of s passed in will be modified by b
func TrimSpace(s []byte) []byte {
	b := s[:0]
	for _, x := range s {
		if x != ' ' {
			b = append(b, x)
		}
	}
	return b
}

The key to efficient operations is reducing memory allocation frequency. Keep append operations within cap capacity.

If we use only a small part of a large slice array, the rest still won’t be garbage collected. Better to copy and return a truly smaller-capacity slice so the original large slice can be reclaimed.

Force slice type conversion by directly modifying the slice header:

// Fast sort for float slice
func SortFloat64FastV2(a []float64) {
	var c []int
	aHdr := (*reflect.SliceHeader)(unsafe.Pointer(&a))
	cHdr := (*reflect.SliceHeader)(unsafe.Pointer(&c))
	*cHdr = *aHdr
	sort.Ints(c)
}

Maps

// A header for a Go map.
type hmap struct {
	count int // len() returns map size, number of kv pairs
	flags uint8
	B     uint8  // Hash table has 2^B buckets total
	hash0 uint32 // hash seed

	buckets    unsafe.Pointer // Contiguous allocated array searchable by low hash value, initially 16 Buckets
	oldbuckets unsafe.Pointer 
	nevacuate  uintptr      

	overflow *[2]*[]*bmap // Overflow chain used after initial buckets fill up
}

Buckets is a one-dimensional array storing buckets. Overflow is a linked list mounted on buckets after they fill up.

Each bucket stores up to 8 KV pairs, plus pointers to next buckets. To save memory, storage order is kkkkkkkkvvvvvvvv, not paired, to prevent memory alignment from wasting space.

Incremental expansion: Similar to Redis implementation. To reduce latency during use, the map maintains both current and old buckets. New kv uses new hash to store in new buckets. During each insertion or deletion, migrate 1-2 key-value pairs. Migrated KV gets marked. When all old table data migrates to new table, remove old reference and free memory.

Insert: After hashing, use lower 8 bits to find the bucket for the key, then find an empty position to store the upper 8 bits of the hash value.

Lookup: Same as insert process. If not yet migrated to new map, read from old one. Compare against upper 8 bits of 8 keys in bucket. If not in bucket, check overflow chain.

Delete: Does not shrink unused space, keeping it for future use.

Map generics actually determine maptype and _type values at compile time. Compiled programs contain only user-defined map type implementations.

Hash and equal methods exist just like other languages:

type typeAlg struct {	
  // function for hashing objects of this type	
  // (ptr to object, seed) -> hash
  hash func(unsafe.Pointer, uintptr) uintptr	
  // function for comparing objects of this type	
  // (ptr to object A, ptr to object B) -> ==?
      equal func(unsafe.Pointer, unsafe.Pointer) bool	
}

Every Go type has a corresponding typeAlg.

Interfaces

type _interface struct {
	dynamicTypeInfo *_implementation
	dynamicValue    unsafe.Pointer // Generic pointer type
}

type _implementation struct {
	itype   *_type   // Interface type
	dtype   *_type   // Runtime type, must implement itype
	methods []*_func // Function pointers implementing corresponding itype methods
}

To avoid other types accidentally matching interfaces, design interfaces with specific function names to avoid this problem.

Pure virtual inheritance: Achieve through embedding anonymous pointer objects or anonymous interfaces, then dynamic injection at runtime.

Byte

Byte is an alias for uint8.

Rune

Rune is an alias for int32.

Int

Comparison Operations

Most types support comparison operations except func, map, slice. For interface types, concrete types need to support == and != operations.