Sunday, May 18, 2014

Android Kernel Compile | Galaxy Fit S5670 | ARM Basics

Very recently I was asked by one of my professors to compile a custom kernel to his phone. He gave me an old Samsung Galaxy Fit phone.  This post documents the process carried out to do the same.

The phone runs ARMv6-compatible processor rev 5 (v6l) and supports features like



swp [SWaP] & swpb [SWaPByte]

This instruction is deprecated as of ARMv7 even in some later versions of ARMv6

This instruction was used to implement exclusive access to semaphores themselves.
Semaphores are used to manage access to a shared resource. Depending on the type of semaphore, one or more clients may be granted access.
Before accessing a resource, a client must read the semaphore value and check that it indicates whether the client can proceed, or whether it must wait. When the client is about to proceed it must change the semaphore value to inform other clients.
A fundamental issue with semaphores is that they themselves are shared resources, which – as we just learned – must be protected by semaphores.
SWP (Swap) and SWPB (Swap Byte) provide a method for software synchronization that does not require disabling interrupts. This is achieved by performing a special type of memory access, reading a value into a processor register and writing a value to a memory location as an atomic operation. The example code below shows the implementation of simple mutex functions using the SWPinstruction. SWP and SWPB are not supported in the Thumb instruction set, so the example must be assembled for ARM.
binary mutex functions
    EXPORT lock_mutex_swp
lock_mutex_swp PROC
    LDR r2, =locked
    SWP r1, r2, [r0]       ; Swap R2 with location [R0], [R0] value placed in R1
    CMP r1, r2             ; Check if memory value was ‘locked’
    BEQ lock_mutex_swp     ; If so, retry immediately
    BX  lr                 ; If not, lock successful, return
    ENDP

    EXPORT unlock_mutex_swp
unlock_mutex_swp
    LDR r1, =unlocked
    STR r1, [r0]           ; Write value ‘unlocked’ to location [R0]
    BX  lr
    ENDP
In the SWP instruction in above example, R1 is the destination register that receives the value from the memory location, and R2 is the source register that is written to the memory location. You can use the same register for destination and source. For example 
        SWP r2, r2, [r0]       ; Swap R2 with location [R0], [R0] value placed in R2
is a valid instruction.

But there are certain limitations to this. If an interrupt triggers while a swap operation is taking place, the processor must complete both the load and the store part of the instruction before taking the interrupt, increasing interrupt latency. Because Load-Exclusive and Store-Exclusive are separate instructions, this effect is reduced when using the new synchronization primitives. In our source for Galaxy Fit we will find that swp or swpb aren't used to create atomic operations but Load-Exclusive and Store-Exclusive. Here is an example 

static inline void atomic_add(int i, atomic_t *v)
{
unsigned long tmp;
int result;

__asm__ __volatile__("@ atomic_add\n"
"1: ldrex %0, [%3]\n"
" add %0, %0, %4\n"
" strex %1, %0, [%3]\n"
" teq %1, #0\n"
" bne 1b"
: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
: "r" (&v->counter), "Ir" (i)
: "cc");

}

This is taken from atomic.h of /arch/arm/include/asm/ directory.
In a multi-core system, preventing access to main memory for all processors for the duration of a swap instruction can reduce overall system performance. This is especially true in a multi-core system where processors operate at different frequencies but share the same main memory.

half [Half - Precision Floating Point Support]

In the ARM's VFP [Vector Floating Point Co-Processor] the support for 'half' means that it supports 16 bit floating point numbers and conversions between 16-32 bit floating point numbers. Half-precision floating-point numbers are provided as an optional extension to the VFPv3 architecture.
Half-precision floating-point format

Where:
   S (bit[15]):      Sign bit
   E (bits[14:10]):  Biased exponent
   T (bits[9:0]):    Mantissa.
The meanings of these fields depend on the format that is selected.

thumb [The Thumb Instruction set]

The Thumb instruction set is a subset of the most commonly used 32-bit ARM instructions. Thumb instructions are each 16 bits long, and have a corresponding 32-bit ARM instruction that has the same effect on the processor model. Thumb instructions operate with the standard ARM register configuration, allowing excellent interoperability between ARM and Thumb states.
On execution, 16-bit Thumb instructions are transparently decompressed to full 32-bit ARM instructions in real time, without performance loss.
Thumb has all the advantages of a 32-bit core:
  • 32-bit address space
  • 32-bit registers
  • 32-bit shifter, and Arithmetic Logic Unit (ALU)
  • 32-bit memory transfer.
Thumb therefore offers a long branch range, powerful arithmetic operations, and a large address space.
Thumb code is typically 65% of the size of ARM code, and provides 160% of the performance of ARM code when running from a 16-bit memory system.
The availability of both 16-bit Thumb and 32-bit ARM instruction sets gives designers the flexibility to emphasize performance or code size on a subroutine level, according to the requirements of their applications. For example, critical loops for applications such as fast interrupts and DSP algorithms can be coded using the full ARM instruction set then linked with Thumb code

fastmult [Fast Multiplication]

This refers to the fact that the processor can do 32 bit multiplication. The processor implementing this provides hardware to preform 32bit x 32 bit = 64 bit operation. This is a common feature in almost all the processors nowadays. But still they guys who built the kernel felt the need to mention it.

vfp [Vector Floating Point Co-Processor]

The VFP coprocessor supports floating point arithmetic operations and is a functional block within.
The VFP has its own bank of 32 registers for single-precision operands that you can:
  • use in pairs for double-precision operands
  • operate loads and stores of VFP registers in parallel with arithmetic operations.
The VFP supports a wide range of single and double precision operations, including ABS, NEG, COPY, MUL, MAC, DIV, and SQRT. The VFP effectively executes most of these in a single cycle. Sometime in future I promise to do a tutorial on this and NEON instructions.

edsp [DSP extensions]

The ARM DSP instruction set extensions increase the DSP processing capability of ARM solutions in high-performance applications, while offering the low power consumption required by portable, battery-powered devices. DSP extensions are optimized for a broad range of software applications including servo motor control, Voice over IP (VOIP) and video & audio codecs, where the extensions increase the DSP performance to enable efficient processing of the required tasks.

Features

  • Single-cycle 16x16 and 32x16 MAC implementations
  • 2-3 x DSP performance improvement over ARM7™ processor-based CPU products
  • Zero overhead saturation extension support
  • New instructions to load and store pairs of registers, with enhanced addressing modes
  • New CLZ instruction improves normalization in arithmetic operations and improves divide performance
  • Full support in the ARMv5TE, ARMv6 and ARMv7 architectures

Applications

  • Audio encode/decode (MP3: AAC, WMA)
  • Servo motor control (HDD/DVD)
  • MPEG4 decode
  • Voice and handwriting recognition
  • Embedded control
  • Bit exact algorithms (GSM-AMR)

java [Jazelle]

Jazelle DBX (Direct Bytecode eXecution) allows some ARM processors to execute Java bytecode in hardware as a third execution state alongside the existing ARM and Thumb modes. Jazelle functionality was specified in the ARMv5TEJ architecture and the first processor with Jazelle technology was the ARM926EJ-S. Jazelle is denoted by a 'J' appended to the CPU name, except for post-v5 cores where it is required (albeit only in trivial form) for architecture conformance.
These are the extra features that the processor supports over the regular ones. The kernel can only be complied over linux or Mac. For this one I will be using linux. First we need to install a few requirements and then download the source.


You will find the following files in the kernel source. Next install the required programs to compile the source. Run the following line in your terminal.

sudo apt-get install git-core gnupg flex bison gperf libsdl-dev libesd0-dev libwxgtk2.6-dev build-essential zip curl libncurses5-dev zlib1g-dev valgrind
Extract the kernel source. And also download the toolchain. Now extract the toolchain. and place it in a folder. You'll find the following in the toolchain folder.

Go to the kernel source and edit the Makefile to add the CROSS_COMPILE to point to the toolchain. It must look like this


Make sure the path is right. For me the toolchain was in /home/regnarts/S3/arm-2011.03 check the full path before updating.

Now to make this kernel there is a small problem. The original kernel source that was released by the samsung didn't have any support for Ext filesystem. That was because the original rom for this phone runs on samsung's proprietary RFS file system. So if you want to only run the stock rom over the kernel then just jump this part. Else you will need to enable the support for this in the config file. Find /arch/arm/configs/beni_rev01_defconfig and change the File System settings as shown below.


There is another important thing. The phone uses proprietary wifi/wireless drivers. So their source code is not provided. Therefore we need to just use the already existing ones. But the problem is that insmod checks for the kernel version before loading the modules. So if there is a mismatch. The kernel won't load the modules. Therefore you'll need to name the kernel as the module wishes to see it. Run

make menuconfig

This opens up the menu to configure the kernel.

In general setup append the name "-pref-CL928291" to the Local version. Now you are good to go. Run the make script that came with the kernel make_kernel_GT-S5670.sh. Or run

make beni_rev01_defconfig
make

That should give us the kernel. You will find it as /arch/arm/boot/zImage. Congratulations you've built the kernel for your phone. However you will need to still put it up in to the phone. For this we shall make the android's very popular "recovery flashable" zip file.

Download a sample file. Extract the zip, you'll find this

Now the file boot.img is the kernel+ramdisk file. But in order to build a boot.img you'll first need a ramdisk. We shall take the ramdisk from the sample previously downloaded. Use boot-tools to unpack and use the boot.img. The following must help
To unpack the boot.img

tools/unpackbootimg -i source_img/boot.img -o unpack

To unpack ramdisk (If you want to)

gzip -dc ../unpack/boot.img-ramdisk.gz | cpio -i

Replace boot.img-zImage with your own zImage and then run

tools/mkbootimg --kernel unpack/boot.img-zImage --ramdisk unpack/boot.img-ramdisk-new.gz -o target_img/boot.img --base `cat unpack/boot.img-base`

With the new boot.img replace it in the update.zip and zip it again.
This will flash in most of the recoveries if you turn of the signature verification. If you are unable to do it in your recovery. Then you'll need to sign it. Use this to sign it from linux.

export CLASSPATH=$CLASSPATH:./lib.jar
java testsign update.zip update-finished.zip

Now flash the zip file from recovery.
I'll discuss overclocking this phone in the next blog post.






Thursday, May 15, 2014

Serial Communicaion Level Converters | RPi / BeagleBoard to Arduino / ATMegaX | UART

Working on my college project I came across the problem of interfacing a custom designed control board based on ATMega8 microcontroller to RPi that would run the main program. UART seemed to be the best option. The problem is that RPi runs its UART at 3.3V level and BeagleBoard runs its at 1.8V level and ATMegaX runs its UART at 5V level. Therefore there was a need for converter. Here are the schematics and design of the one that I built.
It is simply two transistor based inverters shifting the logic up and resistor network putting the logic down. The resistor logic is to step down the voltage for RPi. If you intend to use this for BeagleBoard then please change the resistances to appropriate values. You can download the PCB file and pdf.

Note:
We need to replace the lines  
dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait
with
dwc_otg.lpm_enable=0 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait
in /boot/cmdline.txt
and comment out the line
#Spawn a getty on Raspberry Pi serial line 
T0:23:respawn:/sbin/getty -L ttyAMA0 115200 vt100
to get
#Spawn a getty on Raspberry Pi serial line 
#T0:23:respawn:/sbin/getty -L ttyAMA0 115200 vt100
in /etc/inittab
In Raspberry Pi if you wish to use the port for other uses than the terminal access.
In BeagleBoard you will need to setup the pins to work as serial. Please refer to the relevant documentation
Use software like minicom to access the ports.