Very recently I was asked by one of my professors to compile a custom kernel to his phone. He gave me an old Samsung Galaxy Fit phone. This post documents the process carried out to do the same.
The phone runs ARMv6-compatible processor rev 5 (v6l) and supports features like
swp [SWaP] & swpb [SWaPByte]
This instruction is deprecated as of ARMv7 even in some later versions of ARMv6
This instruction was used to implement exclusive access to semaphores themselves.
Semaphores are used to manage access to a shared resource. Depending on the type of semaphore, one or more clients may be granted access.
Before accessing a resource, a client must read the semaphore value and check that it indicates whether the client can proceed, or whether it must wait. When the client is about to proceed it must change the semaphore value to inform other clients.
A fundamental issue with semaphores is that they themselves are shared resources, which – as we just learned – must be protected by semaphores.
SWP (Swap) and SWPB
(Swap Byte) provide a method for software synchronization that does not require disabling interrupts. This is achieved by performing a special type of memory access, reading a value into a processor register and writing a value to a memory location as an atomic operation. The example code below shows the implementation of simple mutex functions using the SWP
instruction. SWP
and SWPB
are not supported in the Thumb instruction set, so the example must be assembled for ARM.
binary mutex functions
EXPORT lock_mutex_swp
lock_mutex_swp PROC
LDR r2, =locked
SWP r1, r2, [r0] ; Swap R2 with location [R0], [R0] value placed in R1
CMP r1, r2 ; Check if memory value was ‘locked’
BEQ lock_mutex_swp ; If so, retry immediately
BX lr ; If not, lock successful, return
ENDP
EXPORT unlock_mutex_swp
unlock_mutex_swp
LDR r1, =unlocked
STR r1, [r0] ; Write value ‘unlocked’ to location [R0]
BX lr
ENDP
In the SWP
instruction in above example, R1
is the destination register that receives the value from the memory location, and R2
is the source register that is written to the memory location. You can use the same register for destination and source. For example
SWP r2, r2, [r0] ; Swap R2 with location [R0], [R0] value placed in R2
is a valid instruction.
But there are certain limitations to this. If an interrupt triggers while a swap operation is taking place, the processor must complete both the load and the store part of the instruction before taking the interrupt, increasing interrupt latency. Because Load-Exclusive and Store-Exclusive are separate instructions, this effect is reduced when using the new synchronization primitives. In our source for Galaxy Fit we will find that swp or swpb aren't used to create atomic operations but Load-Exclusive and Store-Exclusive. Here is an example
static inline void atomic_add(int i, atomic_t *v)
{
unsigned long tmp;
int result;
__asm__ __volatile__("@ atomic_add\n"
"1: ldrex %0, [%3]\n"
" add %0, %0, %4\n"
" strex %1, %0, [%3]\n"
" teq %1, #0\n"
" bne 1b"
: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
: "r" (&v->counter), "Ir" (i)
: "cc");
}
This is taken from atomic.h of /arch/arm/include/asm/ directory.
In a multi-core system, preventing access to main memory for all processors for the duration of a swap instruction can reduce overall system performance. This is especially true in a multi-core system where processors operate at different frequencies but share the same main memory.
half [Half - Precision Floating Point Support]
In the ARM's VFP [Vector Floating Point Co-Processor] the support for 'half' means that it supports 16 bit floating point numbers and conversions between 16-32 bit floating point numbers. Half-precision floating-point numbers are provided as an optional extension to the VFPv3 architecture.
S (bit[15]): Sign bit
E (bits[14:10]): Biased exponent
T (bits[9:0]): Mantissa.
The meanings of these fields depend on the format that is selected.
thumb [The Thumb Instruction set]
The Thumb instruction set is a subset of the most commonly used 32-bit ARM instructions. Thumb instructions are each 16 bits long, and have a corresponding 32-bit ARM instruction that has the same effect on the processor model. Thumb instructions operate with the standard ARM register configuration, allowing excellent interoperability between ARM and Thumb states.
On execution, 16-bit Thumb instructions are transparently decompressed to full 32-bit ARM instructions in real time, without performance loss.
Thumb has all the advantages of a 32-bit core:
Thumb therefore offers a long branch range, powerful arithmetic operations, and a large address space.
Thumb code is typically 65% of the size of ARM code, and provides 160% of the performance of ARM code when running from a 16-bit memory system.
The availability of both 16-bit Thumb and 32-bit ARM instruction sets gives designers the flexibility to emphasize performance or code size on a subroutine level, according to the requirements of their applications. For example, critical loops for applications such as fast interrupts and DSP algorithms can be coded using the full ARM instruction set then linked with Thumb code
fastmult [Fast Multiplication]
This refers to the fact that the processor can do 32 bit multiplication. The processor implementing this provides hardware to preform 32bit x 32 bit = 64 bit operation. This is a common feature in almost all the processors nowadays. But still they guys who built the kernel felt the need to mention it.
vfp [Vector Floating Point Co-Processor]
The VFP coprocessor supports floating point arithmetic operations and is a functional block within.
The VFP has its own bank of 32 registers for single-precision operands that you can:
The VFP supports a wide range of single and double precision operations, including ABS, NEG, COPY, MUL, MAC, DIV, and SQRT. The VFP effectively executes most of these in a single cycle. Sometime in future I promise to do a tutorial on this and NEON instructions.
edsp [DSP extensions]
The ARM DSP instruction set extensions increase the DSP processing capability of ARM solutions in high-performance applications, while offering the low power consumption required by portable, battery-powered devices. DSP extensions are optimized for a broad range of software applications including servo motor control, Voice over IP (VOIP) and video & audio codecs, where the extensions increase the DSP performance to enable efficient processing of the required tasks.
Features
- Single-cycle 16x16 and 32x16 MAC implementations
- 2-3 x DSP performance improvement over ARM7™ processor-based CPU products
- Zero overhead saturation extension support
- New instructions to load and store pairs of registers, with enhanced addressing modes
- New CLZ instruction improves normalization in arithmetic operations and improves divide performance
- Full support in the ARMv5TE, ARMv6 and ARMv7 architectures
Applications
- Audio encode/decode (MP3: AAC, WMA)
- Servo motor control (HDD/DVD)
- MPEG4 decode
- Voice and handwriting recognition
- Embedded control
- Bit exact algorithms (GSM-AMR)
java [Jazelle]
Jazelle DBX (Direct Bytecode eXecution) allows some ARM processors to execute Java bytecode in hardware as a third execution state alongside the existing ARM and Thumb modes. Jazelle functionality was specified in the ARMv5TEJ architecture and the first processor with Jazelle technology was the ARM926EJ-S. Jazelle is denoted by a 'J' appended to the CPU name, except for post-v5 cores where it is required (albeit only in trivial form) for architecture conformance.
These are the extra features that the processor supports over the regular ones. The kernel can only be complied over linux or Mac. For this one I will be using linux. First we need to install a few requirements and then
download the
source.
You will find the following files in the kernel source. Next install the required programs to compile the source. Run the following line in your terminal.
sudo apt-get install git-core gnupg flex bison gperf libsdl-dev libesd0-dev libwxgtk2.6-dev build-essential zip curl libncurses5-dev zlib1g-dev valgrind
Extract the kernel source. And also download the
toolchain. Now extract the toolchain. and place it in a folder. You'll find the following in the toolchain folder.
Go to the kernel source and edit the Makefile to add the CROSS_COMPILE to point to the toolchain. It must look like this
Make sure the path is right. For me the toolchain was in /home/regnarts/S3/arm-2011.03 check the full path before updating.
Now to make this kernel there is a small problem. The original kernel source that was released by the samsung didn't have any support for Ext filesystem. That was because the original rom for this phone runs on samsung's proprietary RFS file system. So if you want to only run the stock rom over the kernel then just jump this part. Else you will need to enable the support for this in the config file. Find
/arch/arm/configs/beni_rev01_defconfig and change the File System settings as shown below.
There is another important thing. The phone uses proprietary wifi/wireless drivers. So their source code is not provided. Therefore we need to just use the already existing ones. But the problem is that insmod checks for the kernel version before loading the modules. So if there is a mismatch. The kernel won't load the modules. Therefore you'll need to name the kernel as the module wishes to see it. Run
make menuconfig
This opens up the menu to configure the kernel.
In general setup append the name "
-pref-CL928291" to the Local version. Now you are good to go. Run the make script that came with the kernel make_kernel_GT-S5670.sh. Or run
make beni_rev01_defconfig
make
That should give us the kernel. You will find it as
/arch/arm/boot/zImage. Congratulations you've built the kernel for your phone. However you will need to still put it up in to the phone. For this we shall make the android's very popular "recovery flashable" zip file.
Download a
sample file. Extract the zip, you'll find this
Now the file boot.img is the kernel+ramdisk file. But in order to build a boot.img you'll first need a ramdisk. We shall take the ramdisk from the sample previously downloaded. Use
boot-tools to unpack and use the
boot.img. The following must help
To unpack the
boot.img
tools/unpackbootimg -i source_img/boot.img -o unpack
To unpack ramdisk (If you want to)
gzip -dc ../unpack/boot.img-ramdisk.gz | cpio -i
Replace
boot.img-zImage with your own
zImage and then run
tools/mkbootimg --kernel unpack/boot.img-zImage --ramdisk unpack/boot.img-ramdisk-new.gz -o target_img/boot.img --base `cat unpack/boot.img-base`
With the new boot.img replace it in the update.zip and zip it again.
This will flash in most of the recoveries if you turn of the signature verification. If you are unable to do it in your recovery. Then you'll need to
sign it. Use this to sign it from linux.
export CLASSPATH=$CLASSPATH:./lib.jar
java testsign update.zip update-finished.zip
Now flash the zip file from recovery.
I'll discuss overclocking this phone in the next blog post.