Wednesday, October 31, 2007

Broke down...

I finally broke down and ordered the missing pieces that I'd need to build my DVR machine. I originally started by using a stripped down NetX Embedded board, but was quickly frustrated with the fact that it used an odd case design making my PCI Riser card unusable.

So, with a lightening of my wallet, I purchased the following (from

  1. EPIA ME 6000G (170mm x 170mm size, 2x ATA133 connectors, onboard MPEG2 playback accelerator, S-Video, LVDS, 1x PCI, 1x Serial, 600MHz C7, 1GB memory, fanless)
  2. 2699R black case
  3. Slimline DVDRW
  4. 160GB HDD (2.5")
  5. assorted lengths of wire
TOTAL COST (including 2-day shipping): $450

Why did I choose these things?
The 6000G is not a powerhouse for computing. However, I don't need anything super duper since it'll just be controlling the Hauppage PVR-250 card. I'm gonna front-end with MythTV (probably just use the KnoppMyth distribution since I'm too busy to spin my own custom linux build). The nice thing about having hardware encoder/decoder for MPEG2 is that my CPU could have all the processing power of a wet turd, and it wouldn't matter. I'm hoping to someday get a wifi connection to the box, but we'll save those hopes and dreams for later.

My plans for this are semi-big. It'll be my set top box. It'll control everything. Sound, video, pictures, and all manner of other things. I'll hopefully have a full media box in a week or two.

As a link to the past, I've gone back to try and get the awesome code I wrote below for stack tracing working on an alpha.

What I found was the axp platform does not keep a reference to the previous frame pointer on the stack all nice and neat. Therefore, we have to do some wonky hacking. We basically traverse the stack, one byte at a time mind you, dereferencing it until we locate what _could_ be a frame (by either an lda or subq instruction). Not exactly a good time.

But, at least, still possible to do entirely in C, albeit...this time you'll have to know the assembly.

Monday, October 22, 2007

Creative Solutions

I had an interesting conversation a few days ago with a friend. I used some seemingly mundane tricks of reverse engineering the Apple iTunes Music Store (R) communications protocol a few years back (2 years ago, actually). I figured they were rather mundane. Load a sniffer, sniff communications, snoop the binary for whatever strings I might use to fake a validator, and voila! His comment was that such actions constituted creative ways of solving the problem.

I'm not convinced on that. However, I _AM_ convinced that there are creative solutions available in the computing world. Just as there are literally hundreds of ways of representing the number 1 (integrals and differentiations) there are multiple creative ways of solving a problem in computer science.

My most creative solution? I'm not sure. I attempted to patent a resynchronization algorithm for A11 and MIP during error code 133. Airvana decided they didn't want that patent. Oh well. I also wrote a rather involved and cool A13 emulation tool (it would send all A13 messages, including the 2 new messages for A16 transfers).

The point in all this is, there should be a much more creative way of neural networking and AI than currently exists; I can't think of anything, but someone MUST be able to figure something out.

Oh well...this isn't one of my better rambles because I'm a little ill today. Hopefully, I can make some better updates tomorrow or the next day.

Wednesday, October 17, 2007

De bugs...oh why?

So, I believe I found an issue with the pthread_create routine. Why do I believe this?

During the course of writing my own userspace heap, I noticed that during multi-threaded test execution I had 1 block being leaked. The block was obviously more than 64b, but less than 128b. I wrote a backtrace routine (described in earlier posts) to try and quickly figure out where the leak was taking place. And here is what I found:

This is where the leak occurs
0x4000dcc4 : call 0x4000076c
0x4000dcc9 : test %eax,%eax

This is the backtrace:
#0 0x4000dc9b in allocate_dtv () from /lib/
#1 0x4000df3c in _dl_allocate_tls () from /lib/
#2 0x441a78e5 in pthread_create@@GLIBC_2.1 () from /lib/tls/
#3 0x0804885a in main () at test.c:59

Valgrind reports a similar issue, so I can't be wrong here, I think.

I write up a bug, detailing this. Response is something along the lines of "nptl has no memory leaks, and valgrind isn't always right. Also, you're using an old version. Please upgrade and rerun test."

BULL! It's not just valgrind reporting this. I see the same issue in my own heap implementation. It is an overlay of the malloc() and free() routines (malloc, free, calloc and realloc to be precise). Open source community, want to know why you have such a bad rep? You believe that because you wrote an ethernet driver, or a RAID array controller, somehow you're untouchable. WRONG. I've written drivers, heaps, userspace apps, cell tower apps, and a whole plethora of things. I don't spout off nonsense to try and get a big e-rection. I actually try and solve problems.


Tuesday, October 9, 2007

Distilling some information

Did you know that in the US, it is illegal to distill alcohol. Distillation is the process by which we apply a thermal change (either colder or hotter) to separate fluids with different boiling/freezing points from a solution.

In fact, in the US, it's illegal to ferment alcohol unless one happens to be 21. This means that if you, perchance, drop some yeast into a bucket of sugar water then walk away and it ferments out you _might_ have broken the law (if you're 3 years old). More realistically, what about the 13 year old kid who buys unpasteurized cider, and accidentally leaves it in his or her basement for a month. It will most likely ferment out into hard cider.

Even worse, if he decides "Oh, I don't want to get in trouble for having this" and leaves it with his trash, on a cold winter day, it will distill out into applejack, which is a felony. Seems to me the country has some 'splaining to do, Lucy.

As far as an interesting topic for computer science goes, it's interesting to me that so many people don't look beyond the first five lines of code before they believe they understand everything about the function. A good example is the following:

rcopy - Algorithm 1
void *rcopy(void *dst, const void *src, size_t size)
void *ret = dst;

(char*)*(dst++) = (char*)*(src--);
return ret;

and the following algorithm by Paul Lovvik (as published in the Sun Developer's Network, June 2004)

rcopy - Algorithm 2
void *
rcopy(void *dest, const void *src, size_t size) {
int srcIndex = size - 1;
int destIndex = 0;

while (srcIndex >= 0) {
((char *)(dest))[destIndex++] =
((char *)(src))[srcIndex--];
return (dest);
When comparing the two, most pick algorithm 1 as the faster algorithm. On the surface, sure, it does look faster. But Algorithm 2 has two important differences:
1) It is much more readable
2) It has 1 less mathematic operator. This means that in the long run, it's faster by 1 instruction, which can add up if you're reverse copying a few thousand MB.

Ultimately, we need to take some time to just analyze the costs associated with anything in our lives. Whether it be taking the law into our own hands to enjoy some fresh distilled apple jack at 13, or whether it's the obscure cost of an extra comparison within our code, we need to carefully weigh each nugget of information and compare it with who we are and who we want to be.

Sunday, October 7, 2007


It's funny how timing is important in all aspects of life. For instance, had our parents copulated a day later or earlier than our conception, we'd never be around. If we had changed our alarm clocks by 15 minutes, perhaps we'd end up jumping in front of a bus to save a little boy (yes, that was a Will Ferrel reference).

With that in mind, I've been thinking about the two different mechanisms available for timing in user-space applications. The first is, of course, the posix and related family of signal based timers which operating systems provide. These timers have some key advantages. Firstly, the expiry time is usually very accurate. However, it has drawbacks.

The biggest drawback is the context within which timers are processed. The signal context is an interrupt to the process, but is unable to be interrupted. This means that while in the timer context, you lose other signals (posix queue signals) and potentially drop a lot of information. Secondly, your application backtrace will be interrupted with a signal. If there is a crash in the timer code you'll have a confusing mess on your hands.

The second way of timers is a separate thread which does polling. The granularity on this brand of timers is much worse. A timer which is set to fire every 5 milli-seconds may have to wait for 12 or 13 ms mark before it can fire. In a system where every milli-second is critical (nuclear power plant controller, perhaps) waiting 12 is a lifetime.

This random musing was inspired by the following question on LQ:

Hi I don't understand how this isn't working correctly...I'm running Linux Kernel 2.6.9. What's happening is that as soon as I call the timer_settime function it immediately triggers the signal handler. No matter how big I set the time.

I suggest we read the manpages :)

Friday, October 5, 2007

So, decided I'd do the 'ol blog thing. I'm gonna try to update this at least once a week with random thoughts, comments, and opinions (many of which will be wrong) on software development, and engineering.

As I write this first entry, I reflect on the many firsts upon which I am currently embarking. I'm leaving my job at Airvana, Inc. to accept a position at another Chelmsford, MA based company. I'm finishing up my first heap management package (not open sourced...I'd like to keep a few things up my sleeve for later) and I'm trying a $5 bottle of Fiji water (free at the company as a sampler).

The water isn't bad; the package at the new job isn't bad, and certainly a constant time heap isn't bad. All in all, lots of good that I hope continue.

And now for a bit of code, and some musings.

A while ago, I was interested in attaining the backtrace of a particular stack address. I could use the gnu backtrace() routine, but that carries with it, a few issues:

1) It does an internal malloc (not so good when you want a backtrace after running out of memory)
2) Because it then touches the memory it allocates, it'll potentially cause a pagefault (suspeds the entire process, including all threads)
3) isn't supported on all platforms anyway

So, I decided I'd use a little trick to get the frames prior. This isn't exactly portable, but it should work on linux for most architectures. You may have to play with the math a bit to get the correct previous frame addr.

First, we can tell where we currently are by getting a stack variable address. This means that if we have a function:

int foo(int someVar)

and inside that function, we do:
unsigned char *stackAddr = (unsigned char *)&someVar;

stackAddr will contain the address of that variable within the frame. This is great for knowing where in the frame we are, but we really care about the return address which sits at the bottom (or top depending on your nomenclature) of the frame. Since stacks grow upwards, we know that the return pointer exists somewhere before our current location. We'll be subtracting from the address.

In order to correctly align the pointer for the linux stack, we need to know how the linux stack frame looks. According to, the frame is laid out such that:

offset 0 - The ptr to the callee address frame
offset 4 - The link register save location
offset 8 - The parameter area

This means that in our example above, we're currently pointing to offset 8. Going back 8 bytes then should get us to the ptr to the callee address frame.

So, lets write a quick 'n dirty test function, and use gdb to check on the results.


int backtrace_test(int nFrameParameterOffset)
unsigned char *pStackPtr = (unsigned char *)&nFrameParameterOffset;
unsigned int *pCalleePtr = 0;

pStackPtr -= 8; //this will back us up to the callee ptr
pCalleePtr = (unsigned int *)pStackPtr;

printf("address of callee [%p] points to [%x]\n", pCalleePtr, *pCalleePtr);
return 0;

int main()
return backtrace_test(1234);

Compile with: gcc -g -o backtrace_test backtrace_test.c

Running this produces the following:

aconole@linuxws220 /localhome/aconole/bt-test
address of callee [0xbffff618] points to [bffff648]

Under gdb, if we look at these values, we'll notice that at 0xbffff64c (4 bytes ahead of the frame to which we are pointed) we see (0x080483ee). Attempting to disassemble this address (disas 0x080483ee) puts us at the return address after the call to _init. Looks like we're getting a clue as to the stack workings. Let's put in another frame and see what we get. We'll add the following to our code:

int foo(int bar)
return backtrace_test(bar);

And we'll change main to call foo.

The new result is:
aconole@linuxws220 /localhome/aconole/bt-test
address of callee [0xbffff5f8] points to [bffff618]

Notice, instead of 0xbffff618 pointing to 0xbffff648, we have 0xbffff5f8 pointing to 0xbffff618. The extra frame is at 0xbffff5f8! Lets follow the rest of the frames in gdb:

(gdb) p/x *0xbffff618
$2 = 0xbffff638
(gdb) p/x *0xbffff638
$3 = 0xbffff668
(gdb) p/x *0xbffff668
$4 = 0xbffff6c8
(gdb) p/x *0xbffff6c8
$5 = 0x0

We can see that all the frames link back until we hit 0. If we look at 0xbffff61c now (4 bytes after the callee ptr), we see 0x80483b7. gdb gives us more information:

(gdb) disassemble 0x80483b7
Dump of assembler code for function foo:
0x080483a6 : push %ebp
0x080483a7 : mov %esp,%ebp
0x080483a9 : sub $0x8,%esp
0x080483ac : sub $0xc,%esp
0x080483af : pushl 0x8(%ebp)
0x080483b2 : call 0x8048368
0x080483b7 : add $0x10,%esp
0x080483ba : leave
0x080483bb : ret
End of assembler dump.

Success! 0x80483b7 is the bolded line. It's the return address inside foo!

We know two things now: Following the return register back will give us all the frames (until we hit frame 0), and 4 bytes after the callee ptr, the link register gives us the program counter for the frame. Let's write a stack dumper function which will give us a simple stack trace:

unsigned int *dumpAllFrames(unsigned int nFramesMax)
unsigned char *pStackAddrPtr = (unsigned char *)&nFramesMax;
unsigned int *pFramePointer = 0;

pStackAddrPtr -= 8;

pFramePointer = (unsigned int *)pStackAddrPtr; //at this point we have the stack list

while(pFramePointer && nFramesMax)
printf("Current Frame=<%p>, Next Frame=<0x%x>, PC=<0x%x>\n",
pFramePointer, *pFramePointer, *(pFramePointer+1));
pFramePointer = (unsigned int *)*pFramePointer;

return pFramePointer;

and call that from within backtrace_test.

The results:
Current Frame=<0xbffff5d8>, Next Frame=<0xbffff5f8>, PC=<0x80483d2>
Current Frame=<0xbffff5f8>, Next Frame=<0xbffff618>, PC=<0x80483ed>
Current Frame=<0xbffff618>, Next Frame=<0xbffff648>, PC=<0x804841b>
Current Frame=<0xbffff648>, Next Frame=<0xbffff6a8>, PC=<0x418de3>
Current Frame=<0xbffff6a8>, Next Frame=<0x0>, PC=<0x80482e1>

And using gdb:
(gdb) disas 0x80483d2
Dump of assembler code for function backtrace_test:
0x080483c2 : push %ebp
0x080483c3 : mov %esp,%ebp
0x080483c5 : sub $0x8,%esp
0x080483c8 : sub $0xc,%esp
0x080483cb : push $0xa
0x080483cd : call 0x8048368
0x080483d2 : add $0x10,%esp
0x080483d5 : mov $0x0,%eax
0x080483da : leave
0x080483db : ret
End of assembler dump.
(gdb) disas 0x80483ed
Dump of assembler code for function foo:
0x080483dc : push %ebp
0x080483dd : mov %esp,%ebp
0x080483df : sub $0x8,%esp
0x080483e2 : sub $0xc,%esp
0x080483e5 : pushl 0x8(%ebp)
0x080483e8 : call 0x80483c2
0x080483ed : add $0x10,%esp
0x080483f0 : leave
0x080483f1 : ret
End of assembler dump.
(gdb) disas 0x804841b
Dump of assembler code for function main:
0x080483f2 : push %ebp
0x080483f3 : mov %esp,%ebp
0x080483f5 : sub $0x8,%esp
0x080483f8 : and $0xfffffff0,%esp
0x080483fb : mov $0x0,%eax
0x08048400 : add $0xf,%eax
0x08048403 : add $0xf,%eax
0x08048406 : shr $0x4,%eax
0x08048409 : shl $0x4,%eax
0x0804840c : sub %eax,%esp
0x0804840e : sub $0xc,%esp
0x08048411 : push $0x4d2
0x08048416 : call 0x80483dc
0x0804841b : add $0x10,%esp
0x0804841e : leave
0x0804841f : ret
End of assembler dump.
(gdb) disas 0x80482e1
Dump of assembler code for function _start:
0x080482c0 <_start+0>: xor %ebp,%ebp
0x080482c2 <_start+2>: pop %esi
0x080482c3 <_start+3>: mov %esp,%ecx
0x080482c5 <_start+5>: and $0xfffffff0,%esp
0x080482c8 <_start+8>: push %eax
0x080482c9 <_start+9>: push %esp
0x080482ca <_start+10>: push %edx
0x080482cb <_start+11>: push $0x8048474
0x080482d0 <_start+16>: push $0x8048420
0x080482d5 <_start+21>: push %ecx
0x080482d6 <_start+22>: push %esi
0x080482d7 <_start+23>: push $0x80483f2
0x080482dc <_start+28>: call 0x80482a0
0x080482e1 <_start+33>: hlt
0x080482e2 <_start+34>: nop
0x080482e3 <_start+35>: nop
End of assembler dump.

As you can see, we don't have the frame before main(). I didn't include that because it is most likely part of some runtime generated code, and therefore we can't actually get it. However, we do see that every other frame is accounted for. We can see exactly where in memory we are, and we should be able to use this information to dump a stack wherever we choose, without the issues with gnu backtrace().

A word of caution: This has only been tested on cygwin, and powerpc & x86 linux. Your mileage may vary.