The random Copy&Paste thread
category: residue [glöplog]
http://www.nopantsday.com/wp/
tabyspelsallskap.ning.com
http://content4.novojoy.com/g16033/14.jpg
AddHandler php5-script .whale
Sophie Wilson, formerly Roger Wilson, is a British computer scientist. She was educated at Cambridge University. In 1978, she designed the Acorn Micro-Computer, the first of a long line of computers sold by Acorn Computers Ltd
(oh hauerha! :))
(oh hauerha! :))
y once I believe..
So it's just Barkley and Cyberdwarf. Great. Gold Encrusted Pick Toss hits four
times, each dealing around 85 damage per hit. Gemstorm also hits all party
members for around 200 damage, so it's a real doozy. Level 2 Melf's Acid Arrow
not only inflicts around 220 damage, but also inflicts Diabetes on its
intended target. As you can see, Duergar is very powerful, and having only two
party members makes this battle a lot harder than normal bouts.
times, each dealing around 85 damage per hit. Gemstorm also hits all party
members for around 200 damage, so it's a real doozy. Level 2 Melf's Acid Arrow
not only inflicts around 220 damage, but also inflicts Diabetes on its
intended target. As you can see, Duergar is very powerful, and having only two
party members makes this battle a lot harder than normal bouts.
The random Copy&Paste thread
The crappy random paste thread
added on the 2009-05-14 21:51:17 by hermes
hermes
y once I believe..
added on the 2009-05-16 05:27:16 by bittin
bittin
So it's just Barkley and Cyberdwarf. Great. Gold Encrusted Pick Toss hits four
times, each dealing around 85 damage per hit. Gemstorm also hits all party
members for around 200 damage, so it's a real doozy. Level 2 Melf's Acid Arrow
not only inflicts around 220 damage, but also inflicts Diabetes on its
intended target. As you can see, Duergar is very powerful, and having only two
party members makes this battle a lot harder than normal bouts.
added on the 2009-05-16 06:13:27 by :confused:
:confused:
The random Copy&Paste thread
added on the 2009-05-16 11:32:09 by toxie
toxie
The crappy random paste thread
added on the 2009-05-16 14:23:08 by Joghurt
Joghurt
previous page
go to page of 9
post a new reply
message:
hermes
y once I believe..
added on the 2009-05-16 05:27:16 by bittin
bittin
So it's just Barkley and Cyberdwarf. Great. Gold Encrusted Pick Toss hits four
times, each dealing around 85 damage per hit. Gemstorm also hits all party
members for around 200 damage, so it's a real doozy. Level 2 Melf's Acid Arrow
not only inflicts around 220 damage, but also inflicts Diabetes on its
intended target. As you can see, Duergar is very powerful, and having only two
party members makes this battle a lot harder than normal bouts.
added on the 2009-05-16 06:13:27 by :confused:
:confused:
The random Copy&Paste thread
added on the 2009-05-16 11:32:09 by toxie
toxie
The crappy random paste thread
added on the 2009-05-16 14:23:08 by Joghurt
Joghurt
previous page
go to page of 9
post a new reply
message:
damn, saw naked pics of yours or maybe the one in pic is similar to you .... crazy lol
Terrornews #21 written as always by Dj mElLoW nOiSe ... [c] 1995 INFECT TM
-------------------------------------------------------------------------------
Today : Friedhuhn, die Ossischlampe !
Nein! Das sah gar nicht gut aus... Friedhuhn watschelte ueber den Gehweg.
Heute war der 31.12. und für's Anschaffen sah's gar nicht gut aus. 2 Kunden
bis jetzt und einer steckte ihr auch noch nen China-Böller unten rein.
Mit völlig zerfetzter Kiste rennt sie seit 20 Minuten hin und her. Kein Kunde
in Sicht. Jetzt ist Schluß. Sie hielt einen Polizeiwagen an , haute dem Beamten
den Kopf ab, packte ihn in das Handschuhfach und parkte CS-Gas dem Fahrer in's
Maul. Ja... Das war der blanke Haß. Sie nahm die Pumpgun aus dem Kofferraum
und hielt den nächsten Laster an. Nach Extremvergewaltigung an einem 78jährigen
Lasterfahrer rauschte sie mit dem Teil in die nächste City. Yo! Nun waren sie
dran die Rentner. Nix mit Konsummarken und mehr Kartoffeln im Heim. BLut,
Köpfe und Eingeweide standen bei Friedhuhn ganz oben auf der Liste. Ups --
Der Kinderwagen und die Frau... Passiert ! Alk in's Maul und ab geht der Gaul!
Friedhuhn kam 53mal in der Minute. Nach 2 Pommesständen, 1 Waldhütte und 4
MFS-Lebkuchen Fabriken fuhr sie sich über den Haufen, starb und erwürgte ...
Eigentlich nicht schlecht für'ne 42 jährige angebombte Drecksau. Bissl mehr
Action hätte sein können, aber es heißen ja nicht alle Gertrud und Rudolf-
Today : Friedhuhn, die Ossischlampe !
Nein! Das sah gar nicht gut aus... Friedhuhn watschelte ueber den Gehweg.
Heute war der 31.12. und für's Anschaffen sah's gar nicht gut aus. 2 Kunden
bis jetzt und einer steckte ihr auch noch nen China-Böller unten rein.
Mit völlig zerfetzter Kiste rennt sie seit 20 Minuten hin und her. Kein Kunde
in Sicht. Jetzt ist Schluß. Sie hielt einen Polizeiwagen an , haute dem Beamten
den Kopf ab, packte ihn in das Handschuhfach und parkte CS-Gas dem Fahrer in's
Maul. Ja... Das war der blanke Haß. Sie nahm die Pumpgun aus dem Kofferraum
und hielt den nächsten Laster an. Nach Extremvergewaltigung an einem 78jährigen
Lasterfahrer rauschte sie mit dem Teil in die nächste City. Yo! Nun waren sie
dran die Rentner. Nix mit Konsummarken und mehr Kartoffeln im Heim. BLut,
Köpfe und Eingeweide standen bei Friedhuhn ganz oben auf der Liste. Ups --
Der Kinderwagen und die Frau... Passiert ! Alk in's Maul und ab geht der Gaul!
Friedhuhn kam 53mal in der Minute. Nach 2 Pommesständen, 1 Waldhütte und 4
MFS-Lebkuchen Fabriken fuhr sie sich über den Haufen, starb und erwürgte ...
Eigentlich nicht schlecht für'ne 42 jährige angebombte Drecksau. Bissl mehr
Action hätte sein können, aber es heißen ja nicht alle Gertrud und Rudolf-
NSLog(@"Book initialised.\nTitle: %@\nDescription: %@\nDownloaded: %@", self.title, self.description, self.isDownloaded ? @"Yes" : @"No");
(i'm wondering why the hell this is in one of my class files, the app is nothing to do with books :( )
(i'm wondering why the hell this is in one of my class files, the app is nothing to do with books :( )
11111100000000000000011111100000000000000011111110000000000000011111111110000000111111111
11110000000000000000011111000000000000000011111000000000000000011111111100000000011111111
11100000000000000000011110000000000000000011110000000000000000011111111000011000011111111
11000001111111111111111100001111111111111111100000111111111111111111111000111110001111111
10000111111111111111111000011111111111111111100001111111111111111111111000100110001111111
10000110000000000000011000110000000000000011100011000000000000011111110000100010000111111
10001100000000000000011000110000000000000011000010000000000000011111110001100011000111111
10001000000000000000011000110000000000000011000110000000000000011111110001000011000111111
10001000011111111111111000110001111111111111000110001111111111111111100001000001000011111
10001000000000011111111000110000000000001111000110001100000000011111100011000001100011111
10001100000000000111111000110000000000001111000110001100000000011111000011000001100011111
10000110000000000011111000110000000000001111000110001100000000011111000010000000100001111
11000011111111000001111000111111111111111111000110001111111100011111000110000000110001111
11000001111111100000111000111111111111111111000110001111111100011111000110000000110001111
11100000000000110000111000110000000000001111000110001100000100011110000100001000010000111
11110000000000011000111000110000000000001111000110001100000100011110001100011000011000111
11111110000000001000011000110000000000001111000110001100000100011110001000011100011000111
11111111111110001000011000110001111111111111000110001111000100011100011000011100001000011
10000000000000001000011000110000000000000011000110000000000100011100011000000000001100011
10000000000000011000111000110000000000000011000010000000000100011100010000100000001100011
00000000000000010000111000110000000000000011100011000000000100011000110000100000000100001
11111111111111110000111000011111111111111111100001111111111100011000110001111111111110001
11111111111111000001111100001111111111111111100000111111111100010000100001111111111110001
00000000000000000011111110000000000000000011110000000000000000000001100001100000000000000
10000000000000000111111111000000000000000011111000000000000000000001100001100000000000000
10000000000000011111111111100000000000000011111110000000000000000001000011100000000000000
11110000000000000000011111000000000000000011111000000000000000011111111100000000011111111
11100000000000000000011110000000000000000011110000000000000000011111111000011000011111111
11000001111111111111111100001111111111111111100000111111111111111111111000111110001111111
10000111111111111111111000011111111111111111100001111111111111111111111000100110001111111
10000110000000000000011000110000000000000011100011000000000000011111110000100010000111111
10001100000000000000011000110000000000000011000010000000000000011111110001100011000111111
10001000000000000000011000110000000000000011000110000000000000011111110001000011000111111
10001000011111111111111000110001111111111111000110001111111111111111100001000001000011111
10001000000000011111111000110000000000001111000110001100000000011111100011000001100011111
10001100000000000111111000110000000000001111000110001100000000011111000011000001100011111
10000110000000000011111000110000000000001111000110001100000000011111000010000000100001111
11000011111111000001111000111111111111111111000110001111111100011111000110000000110001111
11000001111111100000111000111111111111111111000110001111111100011111000110000000110001111
11100000000000110000111000110000000000001111000110001100000100011110000100001000010000111
11110000000000011000111000110000000000001111000110001100000100011110001100011000011000111
11111110000000001000011000110000000000001111000110001100000100011110001000011100011000111
11111111111110001000011000110001111111111111000110001111000100011100011000011100001000011
10000000000000001000011000110000000000000011000110000000000100011100011000000000001100011
10000000000000011000111000110000000000000011000010000000000100011100010000100000001100011
00000000000000010000111000110000000000000011100011000000000100011000110000100000000100001
11111111111111110000111000011111111111111111100001111111111100011000110001111111111110001
11111111111111000001111100001111111111111111100000111111111100010000100001111111111110001
00000000000000000011111110000000000000000011110000000000000000000001100001100000000000000
10000000000000000111111111000000000000000011111000000000000000000001100001100000000000000
10000000000000011111111111100000000000000011111110000000000000000001000011100000000000000
0001110011111111100000000000000000111111111111111111111111111111111110000
0001100001111111100000000000000000111111111111111111111111111111111111000
0001100001111111000000000011110000111111111111111111111111111111111111100
0001100000111111000000000011111000011111111111111111111111111111111111110
0001100000111110000000000011111000011111111111111111111111111111110000000
0001100000111110000000000011111100011111111111111111111111100000000000000
0001100001111110000000000011111100111111111111111111111100000000000000000
0000100001111000000000000001111100111111111111111111110000000000000000000
0000110000111100000000000001111100111111111111111111111111000000000000000
0000011000111100000000000000010001111111111111111111111111111000000000000
0000001000011100000000000000000011111111111111111111111111111110000000000
0000011111111000000000000000001111111111111111111111111111111111000000000
0001111111111111111111000011111111111111111111111111111111111111100000000
0000111111111111111111111111111111111111111111111111111111111111110000000
0000000000111111111111111111111111111111111111111111111111111111111000000
0001100001111111100000000000000000111111111111111111111111111111111111000
0001100001111111000000000011110000111111111111111111111111111111111111100
0001100000111111000000000011111000011111111111111111111111111111111111110
0001100000111110000000000011111000011111111111111111111111111111110000000
0001100000111110000000000011111100011111111111111111111111100000000000000
0001100001111110000000000011111100111111111111111111111100000000000000000
0000100001111000000000000001111100111111111111111111110000000000000000000
0000110000111100000000000001111100111111111111111111111111000000000000000
0000011000111100000000000000010001111111111111111111111111111000000000000
0000001000011100000000000000000011111111111111111111111111111110000000000
0000011111111000000000000000001111111111111111111111111111111111000000000
0001111111111111111111000011111111111111111111111111111111111111100000000
0000111111111111111111111111111111111111111111111111111111111111110000000
0000000000111111111111111111111111111111111111111111111111111111111000000
You now text-completion? Fucking us patent department just gave ibm a patent on this, idiots.
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220090132950%22.PGNR.&OS=DN/20090132950&RS=DN/20090132950
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220090132950%22.PGNR.&OS=DN/20090132950&RS=DN/20090132950
Code:
forceware 183, private working set:
- 4072k: end: new engine
- 3908k: begin: load plugin tkfreetype2
- 3988k: end: load plugin tkfreetype2 +80k
- 3988k: begin: load plugin tkopengl
- 4788k: end: load plugin tkopengl +800k
- 5032k: begin: load plugin tksdl
- 7072k: end: load plugin tksdl +2040k
- 7332k: begin: load plugin tkui
- 7392k: end: load plugin tkui +60k
- 9644k end: scanner pass +5572k (since end: new engine) (2980k plugins, 2592k other)
- 16284k end: compiler pass +6640k
- 20936k <before openwindow>
(total = 20936-4072 = 16864k for scan/compile/init pass)
-> 16864/98 = ~172kbyte per class (and module) (inc. everything)
(vs. 18-Apr-2009 tks-preview.exe: -5MBytes, ~30% saved)
- 29268k <after openwindow> +8332k
- 29476k <ui running> (stable) +208k
[...] compile time: 266 millisec (2.71 millisec avg. per module).
[...] optimize time: 16 millisec (0.16 millisec avg. per module).
[...] #modules = 98
[...] #tokens = 134285
[...] #strings = 10011
[...] #tokenchars = 192071
[...] #lines = 28379
[...] #unopt.nodes = 50851
[...] #opt.nodes = 50076
[...] #nodebytes = 2670816 / 2883584
[...] #classes = 125
[...] #methods = 1330
[...] #members = 505
[...] #t.methods = 11760 / 24144
[...] #t.members = 2920 / 5442
[...] #classinits = 200
[dbg] Object::object_counter = 44212
[dbg] (38950 strings)
[dbg] (3474 script class instances)
[dbg] CachedObject::object_counter = 19584
[dbg] CachedObject::refname_counter = 16311
[dbg] CachedObject::copyname_counter = 3118
[dbg] ObjectPool: current pool usage:
[dbg] priority class 0: 8 / 2048 kbytes allocated, 0 kbytes used.
[dbg] priority class 1: 136 / 2048 kbytes allocated, 48 kbytes used.
[dbg] priority class 2: 492 / 2048 kbytes allocated, 398 kbytes used.
[dbg] peak_char_size = 345344
[dbg] total_char_size = 240178
-> ~4.73 tokens per LOC
-> ~0.35 strings per LOC
-> ~1.7645 nodes per LOC
-> ~21 lines per method
-> ~608 bytes memory per LOC
-> ~12614 total bytes memory per method
-> ~2008 node bytes per method (2670816/1330)
-> ~390 bytes memory per object
-> ~4970 bytes memory per script class instance
(the -> lines are just estimates but interesting anyway)
(kudos to nvidia for fixing the forceware 175 bug regarding fluctuating and excessive memory use with gf8500gt on vista32 (the working set is now ~38mb smaller, ~29 instead of ~67 mbytes)
JSYK.
(and yea, it's a debug/test-run for a UI toolkit written in a VHLL)
(correction: forceware 185.85 instead of 183)
and it looks like this ;)
Marilyn Manson has admitted he cried when his tour manager refused to supply him with drugs recently. The shock-rocker revealed he was left upset by the move, saying: "About two hours ago Steve [Manson's tour manager] wouldn't get me any drugs. My eyes watered. Then he asks me if I am OK! It's ridiculous. It should be, 'Here's the mirror - and now are you OK?'"
0
Langeweile? Zur Bewußtseinserweiterung. Informationsgewinnung.
1
Der eine mit dem Pferdeschwanz schnitt ein paar Hanfblüten klein und vermischte sie mit Tabak. Dann stopfte der andere einen kleinen Metallbecher und zog aus einer umgebauten Cola-Flasche zwei Gelbe. Die Luft war dick und haschischgeschwängert. Alle Wege führen nach Bagdad. Alter. Er wollte eigentlich nur kurz vorbeischauen um etwas zu fragen. Der Eimer teilte eine Runde langsamer aber tiefer Schläge aus.
2
Der eine saß auf dem Sofa, weder in Erwartung was war, noch was kommen würde. Der mit den Kotletten grinste plötzlich breit. Er stand auf und holte eine Videokassette. Sie zogen sich "Speed" rein. Danach zogen sie sich Speed rein. Auf der "Speed"-Kassette. Der andere schaltete irgendwann, keiner bemerkte es, von MTV auf einen Pornofilm um. Und die Pizza kam nicht.
3
Der mit der Brille schaute immer nur zu. Ihm war das ganze neu, schien es den anderen. Und er sagte das auch so. Wenn sie wüßten. Manche Dinge nimmt man eben lieber als Scherz, oder Übertreibung.
4
Er erkannte sie nur undeutlich durch den Lärm und den Nebel. Aber er hatte Recht gehabt. Etwas anderes konnte er sich auch nicht leisten. Der Entzug zehrte an seinen Kräften. Und trieb ihn hinaus, ins Freie. Doch er riß sich zusammen, schaffte eine Annäherung und fand heraus was er wissen wollte. Sie hatte drei Pillen eingeworfen, eine davon LSD-beschichtet. Und sie wollte noch mehr. Er hatte mehr. Sagte er.
Er schob sie vor sich her durch den Durchgang zum Klo. Nach Links, zum Männerklo. Er schob sie in eine Kabine und schloß ab. Die Stille ließ alles unwirklich wirken. Sie war bereits mißtrauisch geworden. Er setzte sie grob auf die Schüssel. Blitzschnell zog er einen Plastikstreifen aus der Tasche und fesselte ihre Hände zusammen. Mit einem Klebeband verschloß er ihre Lippen.
Sie sagte mmh mmh.
Er sagte pssst, kein Problem, wir regeln das mit der Bezahlung heute nur ein bißchen anders. Sein Phallus sprang aus seiner Hose. Er zog die Vorhaut zurück und klebte eine kleine weiße Pille auf die schwitzende Eichel. Dann schob er seinen Schwanz unter ihr enges, bunt gestreiftes Hemdchen. Er schob ihn bis oben zwischen ihren Brüsten durch gegen ihr Kinn und drückte es hoch, so daß sie ihn durch ihre Brille entsetzt anstarren mußte. Er zog seinen Schwanz wieder zurück und stemmte ihn gegen das Klebeband vor ihren weichen Lippen. Mit einer Hand zog er es ab, mit der anderen blockierte er ihre Kiefer. Sein Phallus fuhr bis tief in ihre Kehle. Sie schnaufte und heulte durch die Nase, Rotz und Wasser liefen auf sein Schamhaar. Aber gierig leckte sie seine Eichel, bis ihre Zunge die kleine weiße Pille gefunden hatte. Er ergoß sich in ihren Mund und sie spülte die Tablette mit seinem Sperma
herunter.
Es war nicht so eine wie sie dachte, und sie fiel alsbald in tiefen Schlaf. Er streichelte ihr Haar, packte seinen Schwanz wieder ein und verschwand in die Nacht. Das Zittern hatte nachgelassen, aber nicht aufgehört.
5
Er war wieder auf dem Weg, auf der Suche. Schweißperlen standen auf seiner Stirn. Er bebte. Sein Schwanz zitterte. Er wartete auf den Aufzug, stieg ein und fuhr in den 15. Stock. Auf dem Weg kam ihm die rettende Idee. Dies war ein Studentenwohnheim. Er blieb im Aufzug stehen. Er band sich eine Schutzmaske aus Papier um.
Nach einer kurzen Zeit stieg sie ein. Wollte sie zum Cafe auf dem Dach. Zum Ausgang unten. Zwischen zwei Stockwerken hielt er den Aufzug an und konsumierte sie.
6
Es war schon fast hell, als er aus der letzten Diskothek kam, die Knochen bebten vom Entzug. Auf der gegenüberliegenden Straßenseite waren zwei Mädchen. Die eine lag auf dem Rücken auf dem Asphalt. Die andere stand daneben. Gibt es ein Problem fragte er und sein Herz schlug bereits höher. Er schaffte es, die beiden in seine Höhle zu bugsieren. Stunden später lag die eine bewußtlos in einer verrenkten Seitenlage nackt in der Ecke. Ihre blonden kurzen Haare war von ihrem eigenen Erbrochenen verklebt. Die andere hatte ihren Kopf auf seinem Schoß. Sie war verpackt mit Plastiklaschen und Paketklebeband, sonst trug sie nichts. Sie zitterte stark, doch ihre vor Schreck geweiteten Augen waren stumpf geworden. Er saß vollkommen zufrieden und beruhigt auf der Matratze und streichelte das Bündel auf seinem Schoß. Nachdem er sich Gedanken über die Entsorgung gemacht hatte, zückte er sein kleines Büchlein und trug die Dosis ein.
7
Er saß in seinem anderen Zimmer auf dem Boden vor dem Computer. Das einzige Licht kam aus dem Monitor. Er codete. Er konnte es, während die Dosis noch wirkte.
==Phrack Inc.==
Volume 0x0d, Issue 0x42, Phile #0x0D of 0x11
|=----------------------------------------------------------------------=|
|=---------=[ Hacking the Cell Broadband Engine Architecture ]=---------=|
|=-------------------=[ SPE software exploitation ]=--------------------=|
|=----------------------------------------------------------------------=|
|=--------------=[ By BSDaemon ]=----------=|
|=--------------=[ <bsdaemon *noSPAM* risesecurity_org> ]=----------=|
|=----------------------------------------------------------------------=|
"There are two ways of
constructing a software design.
One way is to make it so simple
that there are obviously no
deficiencies. And the other way
is to make it so complicated that
there are no obvious deficiencies"
- C.A.R. Hoare
------[ Index
1 - Introduction
1.1 - Paper structure
2 - Cell Broadband Engine Architecture
2.1 - What is Cell
2.2 - Cell History
2.2.1 - Problems it solves
2.2.2 - Basic Design Concept
2.2.3 - Architecture Components
2.2.4 - Processor Components
2.3 - Debugging Cell
2.3.1 - Linux on Cell
2.3.2 - Extensions to Linux
2.3.2.1 - User-mode
2.3.2.2 - Kernel-mode
2.3.3 - Debugging the SPE
2.4 - Software Development for Linux on Cell
2.4.1 - PPE/SPE hello world
2.4.2 - Standard Library Calls from SPE
2.4.3 - Communication Mechanisms
2.4.4 - Memory Flow Control (MFC) Commands
2.4.5 - Direct Memory Access (DMA) Commands
2.4.5.1 - Get/Put Commands
2.4.5.2 - Resources
2.4.5.3 - SPE 2 SPE Communication
3 - Exploiting Software Vulnerabilities on Cell SPE
3.1 - Memory Overflows
3.1.1 - SPE memory layout
3.1.2 - SPE assembly basics
3.1.2.1 - Registers
3.1.2.2 - Local Storage Addressing Mode
3.1.2.3 - External Devices
3.1.2.4 - Instruction Set
3.1.3 - Exploiting software vulnerabilities in SPE
3.1.3.1 - Avoiding Null Bytes
3.1.4 - Finding software vulnerabilities on SPE
4 - Future and other uses
5 - Acknowledgements
6 - References
7 - Notes on SDK/Simulator Environment
8 - Sources
------[ 1 - Introduction
This article is all about Cell Broadband Architecture Engine [1], a new
hardware designed by a joint between Sony [2], Toshiba [3] and IBM [4].
As so, lots of architecture details will be explained, and also many
development differences for this platform.
The biggest differentiator between this article and others released about
this subject, is the focus on the architecture exploitation and not the
use of the powerful processor resources to break code [5] and of course,
the focus in the differentiators of the architecture, which means the SPU
(synergestic processor unit) and not in the core (PPU - power processor
unit) [6], since the core is a small-modified power processor (which
means, all shellcodes for Linux on Power will also works for the core and
there is just small differences in the code allocation and stuffs like
that).
It's important to mention that everything about Cell tries to focus in the
Playstation3 hardware, since it's cheap and widely deployed, but there is
also big machines made with this processor [7], including the #1 in the
list of supercomputers [8].
---[ 1.1 - Paper structure
The idea of this paper is to complete the studies about Cell, putting all
the information needed to do security research, focused in software
exploitation for this architecture together.
For that, the paper have been structured in two important portions:
Chapter 2 will be all about the Cell Architecture and how to develop for
this architecture. It includes many samples and explains the
modifications done to Linux in order to get the best from this
architecture. Also, it gives the knowledge needed in order to go further
in software exploitation for this arch. Chapter 3 is focused in the
exploitation of the SPU processor, showing the simple memory layout it has
and how to write a shellcode for the purpose of gaining control over an
application running inside the SPU.
------[ 2 - Cell Broadband Engine Architecture
From the IBM Research [9]: "The Cell Architecture grew from a challenge
posed by Sony and Toshiba to provide power-efficient and cost-effective
high-performance processing for a wide range of applications, including
the most demanding consumer appliance: game consoles. Cell - also known as
the Cell Broadband Engine Architecture (CBEA) - is an innovative solution
whose design was based on the analysis of a broad range of workloads in
areas such as cryptography, graphics transform and lighting, physics,
fast-Fourier transforms (FFT), matrix operations, and scientific
workloads. As an example of innovation that ensures the clients' success,
a team from IBM Research joined forces with teams from IBM Systems
Technology Group, Sony and Toshiba, to lead the development of a novel
architecture that represents a breakthrough in performance for consumer
applications. IBM Research participated throughout the entire development
of the architecture, its implementation and its software enablement,
ensuring the timely and efficient application of novel ideas and
technology into a product that solves real challenges."
It's impossible to not get excited with this. A so 'powerful' and
versatile architecture, completely different from what we usually seen is
an amazing stuff to research for software vulnerabilities. Also, since
it's supposed to be widely deployed, there will be an infinite number of
new vulnerabilities coming on in the near future. I wanted to exploit
those vulnerabilities.
---[ 2.1 - What is Cell
As must be already clear to the reader, I'm not talking about phones here.
Cell is a new architecture, which cames to solve some of the actual
problems in the computer industry.
It's compatible with a well-known architecture, which are the Power
Architecture, keeping most of it's advantages and solving most of it's
problems (if you cannot wait until know what problems, go to 2.2.1
section).
---[ 2.2 - Cell History
The focus of this section is just to give a timeline vision for the
reader, not been detailed at all.
The architecture was born from a joint between IBM, Sony and Toshiba,
formed in 2000.
They opened a design center in March 2001, based in Austin, Texas (USA).
In the spring of 2004, a single Cell BE became operational. In the summer
of the same year, a 2-way SMP version was released.
The first technical disclosures came just in February 2005, with the
simulator [10] and open-source SDK [11] (more on that later) been released
in November of the same year. In the same month, Mercury started to sell
Cell (yeah, sell Cell sounds funny) machines.
Cell Blades was announced by IBM in February of 2006. The SDK 1.1 was
released in July of the same year, with many improvements. The latest
version is 3.1.
---[ 2.2.1 - Problems it solves
The computer technology have been evolving along the years, but always
suffering and trying to avoid some barriers.
Those barriers are physically impossible to be bypassed and that's why the
processor clock stopped to grow and multi-core architectures been focused.
Basically we have three big walls (barriers) to the speedy grow:
- Power wall
It's related to the CMOS technology limits and the hard limit to
the acceptable system power
- Memory wall
Many comparisons and improvements trying to avoid the DRAM latency
when compared to the processor frequency
- Frequency wall
Diminishing return from deeper pipelines
For a new architecture to work and be widely deployed, it was also
important to keep the investments in software development.
Cell accomplish that being compatible with the 64 bits Power Architecture,
and attacks the walls in the following ways:
- Non-homogeneous coherent multi-processor and high design
frequency at a low operating voltage with advanced power
management attacks the 'power wall'.
- Streaming DMA architecture and three-level memory model (main
storage, local storage and register files) attacks the 'memory
wall'.
- Non-homogeneous coherent multi-processor, highly-optimized
implementation and large shared register files with software controlled
branching to allow deeper pipelines attacks the 'frequency wall'.
It have been developed to support any OS, which means it supports
real-time operating system as well non-real time operating systems.
---[ 2.2.2 - Basic Design Concept
The basic concept behind cell is it's asymmetric multi-core design. That
permits a powerful design, but of course requires specific-developed
applications to achieve the most of the architecture.
Knowing that, becomes clear that the understanding of the new component,
which is called SPU (synergistic processor unit) or SPE (synergistic
processor element) proofs to be essential - see the next section for a
better understanding of the differences between SPU and SPE.
---[ 2.2.3 - Architecture Components
In cell what we have is a core processor, called Power Processor Element
(PPE) which control tasks and synergistic processor elements (SPEs) for
data-intensive processing.
The SPE consists of the synergistic processor unit (SPU), which are a
processor itself and the memory flow control (MFC), responsible for the
data movements and synchronization, as well for the interface with the
high-performance element interconnect bus (EIB).
Communications with the EIB are done in a 16B/cycle, which means that each
SPU is interconnected at that speedy with the bus, which supports
96B/cycle.
Refer to the picture architecture-components.jpg in the directory images
of the attached file for a visual of the above explanation.
---[ 2.2.4 - Processor Components
As said, the Power Processor Element (PPE) is the core processor which
control tasks (scheduling). It is a general purpose 64 bit RISC processor
(Power architecture).
It's 2-way hardware multithreaded, with a L1: 32KB I and D caches and L2:
512KB cache.
Has support for real-time operations, like locking the L2 cache and the
TLB (also it supports managed TLB by hardware and software). It has
bandwidth and resource reservation and mediated interrupts.
It's also connected to the EIB using a 16B/cycle channel (figure
processor-components.jpg).
The EIB itself supports four 16 bytes data rings with simultaneous
transfers per ring (it will be clarified later).
This bus supports over 100 simultaneous transactions achieving in each bus
data port more than 25.6 Gbytes/sec in each direction.
On the other side, the synergistic processor element is a simple RISC
user-mode architecture supporting dual-issue VMX-like, graphics SP-float
and IEEE DP-float.
Important to note that the SPE itself has dedicated resources: unified 128
x 128 bit register files and 256KB local storage. Each SPE has a
dedicated DMA engine, supporting 16 requests.
The memory management on this architecture simplified it's use, with the
local storage of the SPE being aliased into the PPE system memory (figure
processor-components2.jpg).
MFC in the SPE acts as the MMU providing controls over the SPE DMA access
and it's compatible with the PowerPC Virtual Memory layout and is software
controllable using PPE MMIO.
DMA access supports 1,2,4,8...n*16 bytes transfer, with a maximum of 16 KB
for I/O, and with two different queues for DMA commands: Proxy & SPU
(more on this later).
EIB is also connected in a broadband interface controller (BIC). The
purpose of this controller is to provide external connectivity for
devices. It supports two configurable interfaces (60 GB/s) with a
configurable number of bytes, coherent (BIF) and/or I/O (IOIFx) protocols,
using two virtual channels per interface, and multiple system
configurations.
The memory interface controller (MIC) is also connected to the EIB and is
a Dual XDR controller (25.6 GB/s) with ECC and suspended DRAM support
(figure processor-components3.jpg).
Still are missing two more components: The internal interrupt controller
(IIC) and the I/O Bus Master Translation (IOT) (figure
processor-components4.jpg).
The IIC handles the SPE interrupts as well as the external interrupts and
interrupts comming from the coherent interconnect and the IOIF0 and IOIF1.
It is also responsible for the interrupt priority level control and for
the interrupt generation ports for IPI. Note that the IIC is duplicated
for each PPE hardware thread.
IOT translates bus addresses to system real addresses, supporting two
level translations:
- I/O segments (256 MB)
- I/O pages (4K, 64K, 1M, 16M bytes)
Interesting is the resource of I/O device identifier per page for LPAR use
(blades) and IOST/IOPT caches managed by software and hardware.
---[ 2.3 - Debugging Cell
As the bus is a high-speedy circuit, it's really difficult to debug the
architecture and better seen what is going on.
For that, and also to made it easy to develop software for Cell, IBM
Research developed a Cell simulator [10] in which you may run Linux and
install the software development kit [11].
The IBM Linux Technology Center brazilian team developed a plugin for
eclipse as an IDE for the debugger and SDK. Putting it all together is
possible to have the toolkit installed in a Linux machine, running the
frontends for the simulator and for the SDK. The debugging interface is
much better using this frontends. Anyway, it's important to notice that
it's just a frontend for the normal and well know linux tools with
extended support to Cell processor (GDB and GCC).
---[ 2.3.1 - Linux on Cell
Linux on cell is an open-source git branch and is provided in the PowerPC
64 kernel line.
It started in the 2.6.15 and is evolving to support many new features,
like the scheduling improvements for the SPUs (actually it can be
preempted, and my big friend Andre Detsch who reviewed this article was
one of the biggest contributors to create an stable code here).
On Linux it added heterogeneous lwp/thread model, with a new SPE thread
model (really similar to the pthreads library as we will see later),
supporting user-mode direct and indirect SPE access, full-preemptive SPE
context management and for that, spe_ptrace() was create and it's support
added to GDB, spe_schedule() for thread to physical spe assigment (it is
not anymore FIFO - run until completion).
As a note, the SPE threads shares it's address space with the parent PPE
process (using DMA), demand paging for SPE access and shared hardware page
table with PPE.
An implementation detail is the PPE proxy thread allocated for each SPE to
provide a single namespace for both PPE and SPE and assist in SPE
initiated C99 and Posix library services.
All the events, error and signal handling for SPEs are done by the parent
PPE thread.
The ELF objects for SPE are wrapped into PPE objects with an extended GLD.
---[ 2.3.2 - Extensions to Linux
Here I'll try to provide some details for Linux running under a Cell
Hardware. The base hardware used for this reference is a Playstation 3,
which has 8 SPUs, but one is reserved with the purpose of redundancy and
another one is used as hypervisor for a custom OS (in this case, Linux).
All the details are valid for any Linux on Cell and we will provide an
top-down view approach.
---[ 2.3.2.1 - User-mode
Cell supports both power 32 and 64 bits applications, as well as 32 and 64
cell workloads. It has different programming modes, like RPC, devices
subsystems and direct/indirect access.
As already said, it has heterogeneous threads: single SPU, SPU groups and
shared memory support.
It runs over a SPE management runtime library, with 32 and 64 bits. This
library interacts with the SPUFS filesystem (/spu/thread#/) in the
following ways:
* Open, close, read, write the files:
- mem
This file provides access to the local storage
- regs
Access to the 128 register of 128 bits each
- mbox
spe to ppe mailbox
- liox
spe to ppe interrupt mailbox
- xbox_stat
Get the mailbox status
- signal1
Signal notification acess
- signal2
Signal notification acess
- signalx_type
Signal type
- npc
Read/write SPE next program counter (for debugging)
- fpcr
SPE floating point control/status register
- decr
SPE decrementer
- decr_status
SPE decrementer status
- spu_tag_mask
Access tag query mask
- event_mask
Access spe event mask
- srr0
Access spe state restore register 0
* open, close mmap the files:
- mem
Program State access of the Local Storage
- signal1
Direct application access to signal 1
- signal2
Direct application access to signal 2
- cntl
Direct application access to SPE controls, DMA queues and
mailboxes
The library also provides SPE task control system calls (to interact with
the SPE system calls implemented in kernel-mode), which are:
- sys_spu_create_thread
Allocates a SPE task/context and creates a directory in SPUFS
- sys_spu_run
Activates a SPU task/context on a physical SPE and
blocks in the kernel as a proxy thread to handle the events
already mentioned
Some functions provided by the library are related to the management of
the spe tasks, like spe create group, create thread, get/set affinity,
get/set context, get event, get group, get ls, get ps area, get threads,
get/set priority, get policy, set group defaults, group max, kill/wait,
open/close image, write signal, read in_mbox, write out_mbox, read mbox
status.
Obviously the standard 32 and 64 bits powerpc ELF (binary) interpreters,
it is provided a SPE object loader, responsible for understand the
extension to the normal objects already mentioned and for initiate the
loading of the SPE threads.
Going down, we have the glibc and other GNU libraries, both supporting 32
and 64 bits.
---[ 2.3.2.2 - Kernel-mode
The next layer is the normal system-call interface, where we have the SPU
management framework (through special files in the spufs) and
modifications in the exec* interface, in a 64bit kernel.
This modification is done through a special misc format binary, called SPU
object loader extension.
Of course there is other kernel extensions, the SPUFS filesystem, which
provides the management interface and the SPU allocation, scheduling and
dispatch.
Also, we do have the Cell BE architecture specific code, supporting multi
and large pages, SPE event & fault handling, IIC and IOMMU.
Everything is controlled by a hypervisor, since Linux is what is called a
custom OS when running in a Playstation3 hardware (the hypervisor is
responsible for the protection of the 'secret key' of the hardware and
knowing how to exploit SPU vulnerabilities plus some fuzzing on the
hypervisor may be the needed knowledge to break the game protection copy
in this hardware).
---[ 2.3.3 - Debugging the SPE
The SDK for Linux on Cell provides good resources for Debugging and better
understanding of what is going on.
It's important to note the environment variables that control the
behaviour of the system.
So, if you set the SPU_INFO, for example, the spe runtime library will
print messages when loading a SPE ELF executable (see above).
---------- begin output ----------
# export SPU_INFO=1
# ./test
Loading SPE program: ./test
SPU LS Entry Addr : XXX
---------- end output ----------
And it will also print messages before starting up a new SPE thread, like:
---------- begin output ----------
Starting SPE thread 0x..., to attach debugger use: spu-gdb -p XXX
---------- end output ----------
When planning to use the spu-gdb to debug a SPU thread, it's important to
remember the SPU_DEBUG_START environment variable, which will include
everything provided by the SPU_INFO and will stop the thread until a
debugger is attached or a signal is received.
Since each SPU register can hold multiple fixed (or floating) point values
of different sizes, for GDB is provided a data structure that can be
accessed with different formats. So, specifying the field in the data
structure, we can update it using different sizes as well:
---------- begin output ----------
(gdb) ptype $r70
type = union __gdb_builtin_type_vec128 {
int128_t uint128;
float v4_float[4];
int32_t v4_int32[4];
int16_t v8_int16[8];
int8_t v16_int8[16];
}
(gdb) p $r70.uint128
$1 = 0x00018ff000018ff000018ff000018ff0
(gdb) set $r70.v4_int[2]=0xdeadbeef
(gdb) p $r70.uint128
$2 = 0x00018ff000018ff0deadbeef00018ff0
---------- end output ----------
To permit you to better understand when the SPU code starts the execution
and follow it gdb also included an interesting option:
---------- begin output ----------
(gdb) set spu stop-on-load
(gdb) run
...
(gdb) info registers
---------- end output ----------
Another important information for debugging your code is to understand the
internal sizes and be prepared for overlapping. Useful information can
be get using the following fragment code inside your spu program (careful:
It's not freeing the allocated memory).
--- code ---
extern int _etext;
extern int _edata;
extern int _end;
void meminfo(void)
{
printf("\n&_etext: %p", &_etext);
printf("\n&_edata: %p", &_edata);
printf("\n&_end: %p", &_end);
printf("\nsbrk(0): %p", sbrk(0));
printf("\nmalloc(1024): %p", malloc(1024));
printf("\nsbrk(0): %p", sbrk(0));
}
--- end code ---
And of course you can also play with the GCC and LD arguments to have more
debugging info:
--- code ---
# vi Makefile
CFLAGS += -g
LDFLAGS += -Wl,-Map,map_filename.map
--- end code ---
---[ 2.4 - Software Development for Linux on Cell
In this chapter I will introduce the inners of the Cell development,
giving the basic knowledge necessary to better understand the further
chapters.
---[ 2.4.1 - PPE/SPE hello world
Every program in Cell that uses the SPEs needs to have at least two source
codes. One for the PPE and another one for the SPE.
Following is a simple code to run on the SPE (it's also in the attached
tar file :
--- code ---
#include <stdio.h>
int main(unsigned long long speid, unsigned long long argp, unsigned long long envp)
{
printf("\nHello World!\n");
return 0;
}
--- end code ---
The Makefile for this code will look like:
--- code ---
PROGRAM_spu = hello_spu
LIBRARY_embed = hello_spu.a
IMPORTS = $(SDKLIB_spu)/libc.a
include ($TOP)/make.footer
--- end code ---
Of course it looks like any normal code. The PPE as already explained is
the responsible for the creation of the new thread and allocation in the
SPE:
--- code ---
#include <stdio.h>
#include <libspe.h>
extern spe_program_handle_t hello_spu;
int main(void)
{
int speid, status;
speid=spe_create_thread(0, &hello_spu, NULL, NULL, -1, 0);
spe_wait(speid, &status, 1);
return 0;
}
--- end code ---
With the following Makefile:
--- code ---
DIRS = spu
PROGRAM_ppu = hello_ppu
IMPORTS = ../spu/hello_spu.a -lspe
include $(TOP)/make.footer
--- end code ---
The reader will notice that the speid in the PPE program will be the same
value as the speid in the main of the SPE.
Also, the arguments passed to the spe_create_thread() are the ones
received by the SPE program when running (argp and envp equals to NULL in
our sample).
Important to remember that when compiled this program will generate a
binary in the spu directory, called hello_spu and another one in the root
directory of this example called hello_ppu, which CONTAINS embedded the
hello_spu.
---[ 2.4.2 - Standard Library Calls from SPE
When the SPE program needs to use any standard library call, like for
example, printf or exit, it has to call back to the PPE main thread.
It uses a simple stop-and-signal assembly instruction with standardized
arguments value (important to remember that since it's needed in
shellcodes for SPE).
That value is returned from the ioctl call and the user thread must react
to that. This means copying the arguments from the SPE Local Storage,
executing the library call and then calling ioctl again.
The instruction according to the manual:
"stop u14 - Stop and signal. Execution is stopped, the current
address is written to the SPU NPC register, the value u14 is
written to the SPU status register, and an interrupt is sent to
the PPU."
This is a disassembly output of the hello_spu program:
---------- begin output ----------
# spu-gdb ./hello_spu
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "--host=powerpc64-unknown-linux-gnu --target=spu"...
(gdb) disassemble main
Dump of assembler code for function main:
0x00000170 <main+0>: ila $3,0x340 <.rodata>
0x00000174 <main+4>: stqd $0,16($1)
0x00000178 <main+8>: nop $127
0x0000017c <main+12>: stqd $1,-32($1)
0x00000180 <main+16>: ai $1,$1,-32
0x00000184 <main+20>: brsl $0,0x1a0 <puts> # 1a0
0x00000188 <main+24>: ai $1,$1,32 # 20
0x0000018c <main+28>: fsmbi $3,0
0x00000190 <main+32>: lqd $0,16($1)
0x00000194 <main+36>: bi $0
0x00000198 <main+40>: stop
0x0000019c <main+44>: stop
End of assembler dump.
(gdb)
---------- end output ----------
---[ 2.4.3 - Communication Mechanisms
The architecture offers three main communications mechanism:
- DMA
Used to move data and instructions between main storage and
a local storage. SPEs rely on asyncronous DMA transfers to hide
memory latency and transfer overhead by moving information in
parallel with SPU computation.
- Mailbox
Used for control communications between a SPE and the
PPE or other devices. Mailboxes holds 32-bit messages. Each
SPE has two mailboxes for sending messages and one mailbox for
receiving messages.
- Signal Notification
Used for control communications from PPE or
other devices. Signal notification (also known as signalling)
uses 32-bit registers that can be configured for
one-sender-to-one-receiver signalling or
many-senders-to-one-receiver signalling.
All three are controlled and implemented by the SPE MFC and it's
importance is related to the way the vulnerable program will receive it's
input.
---[ 2.4.4 - Memory Flow Control (MFC) Commands
This is the main mechanism for the SPE to access the main storage and
maintain syncronization with other processors and devices in the system.
MFC commands can be issued either by the SPE itself, or by the processor
and other devices, as follow:
- A code running on the SPU issue a MFC command by executing a
series of writes and/or reads using channel instructions.
- A code running on the PPU or any other device issue a MFC
command by performing a serie of stores and/or loads to
memory-mapped I/O (MMIO) registers in the MFC.
The MFC commands are then queued in one of those independent queues:
- MFC SPU Command Queue - For channel-initiated commands by the
associated SPU
- MFC Proxy Command Queue - For MMIO-initiated commands by the PPE
or other devices.
---[ 2.4.5 - Direct Memory Access (DMA) Commands
The MFC commands that transfers data are referred as DMA commands. The
transfer direction for DMA commands are based on the SPE point of view:
- Into a SPE (from main storage to the local storage) -> get
- Out of a SPE (from local storage to the main storage) -> put
---[ 2.4.5.1 - Get/Put Commands
DMA get from the main memory to the local storage:
(void) mfc_get (volatile void *ls, uint64_t ea, uint32_t size,
uint32_t tag, uint32_t tid, uint32_t rid)
DMA put into the main memory from the local storage:
(void) mfc_put (volatile void *ls, uint64_t ea, uint32_t size,
uint32_t tag, uint32_t tid, uint32_t rid)
To guarantee the synchronization of the writes to the main memory, there
is the options:
- mfc_putf: the 'f' means fenced, or, that all commands executed
before within the same tag group must finish first, later ones
could be before
- mfc_putb: the 'b' here means barrier, or, that the barrier
command and all commands issued thereafter are NOT executed
until all previously issued commands in the same tag group have
been performed
---[ 2.4.5.2 - Resources
For DMA operations the system uses DMA transfers with variable length
sizes (1, 2, 4, 8 and n*16 bytes (n an integer, of course). There is a
maximum of 16 KB per DMA transfer and 128b aligments offer better
performance.
The DMA queues are defined per SPU, with 16-element queue for
SPU-initiated requests and 8-element queue for PPU-initiated requests.
The SPU-initiated request has always a higher priority.
To differentiate each DMA command, they receive a tag, with a 5-bit
identifier. Same identifier can be applied to multiple commands since
it's used for polling status or waiting on the completion of the DMA
commands.
A great feature provided is the DMA lists, where a single DMA command can
cause execution of a list of transfers requests (in local storage). Lists
implements scatter-gather functions and may contain up to 2K transfer
requests.
---[ 2.4.5.3 - SPE 2 SPE Communication
An address in another SPE local storage is represented as a 32-bit
effective address (global address).
SPE issuing a DMA command needs a pointer to the other SPE's local
storage. The PPE code can obtain effective address of an SPE's local
storage:
--- code ---
#include <libspe.h>
speid_t speid;
void *spe_ls_addr;
spe_ls_addr=spe_get_ls(speid);
--- end code ---
This permits the PPE to give to the SPEs each other local addresses and
control the communications. Vulnerabilities may arise don't matter what
is the communication flow, even without involving the PPE itself.
Follow is a simple DMA demo program between PPE and SPE (see the attached
file for the complete version) - This program will send an address in the
PPE to the SPE through DMA:
--- PPE code ---
information_sent is[1] __attribute__ ((aligned 128)));
spe_git_t gid;
int * pointer=(int *)malloc(128);
gid=spe_create_group(SCHED_OTHER, 0, 1);
if (spe_group_max(gid) < 1 ) {
printf("\nOps, there is no free SPE to run it...\n");
exit(EXIT_FAILURE);
}
is[0].addr = (unsigned int) pointer;
/* Create the SPE thread */
speid=spe_create_thread (gid, &hello_dma, (unsigned long long *) &is[0], NULL, -1, 0);
/* Wait for the SPE to complete */
spe_wait(speids[0], &status[0], 0);
/* Best pratice: Issue a sync before ending - This is good for us ;) */
__asm__ __volatile__ ("sync" : : : "memory");
--- end code ---
--- SPE code ---
information_sent is __attribute__ ((aligned 128)));
int main(unsigned long long speid, unsigned long long argp, unsigned long long envp)
{
/* Where:
is -> Address in local storage to place the data
argp -> Main memory address
sizeof(is) -> Number of bytes to read
31 -> Associated tag to this DMA (from 0 to 31)
0 -> Not useful here (just when using caching)
0 -> Not useful here (just when using caching)
*/
mfc_get(&is, argp, sizeof(is), 31, 0, 0);
mfc_write_tag_mask(1<<31); /* Always 1 left-shifted the value of your tag mask */
/* Issue the DMA and wait until completion */
mfc_read_tag_status_all();
}
--- end code ---
And now between two SPEs (also for the complete code, please refer to the
attached sources):
--- PPE code ---
speid_t speid[2]
speid[0]=spe_create_thread (0, &dma_spe1, NULL, NULL, -1, 0);
speid[1]=spe_create_thread (0, &dma_spe2, NULL, NULL, -1, 0);
for (i=0; i<2; i++) local_store[i]=spe_get_ls(speid[i]); /* Get local storage address */
for (i=0; i<2; i++) spe_kill(speid[i], SIGKILL); /* Send SIGKILL to the SPE
threds */
--- end code ---
--- SPE code ---
/* Write something to the PPE */
spu_write_out_mbox(buffer);
/* Read something from the PPE */
pointer = spu_read_in_mbox();
/* DMA interface */
mfc_get(buffer, pointer, size, tag, 0, 0);
wait_on_mask(1<<tag);
/* DMA something to the second SPE */
mfc_put(buffer, local_store[1], size, tag, 0, 0);
wait_on_mask(1<<tag);
/* Notify the PPE */
spu_write_out_mbox(1);
--- end code ---
------[ 3 - Exploiting Software Vulnerabilities on Cell SPE
I love the architecture manuals and the engineers and the way they talk
about really dumb design choices:
"The SPU Local Store has no memory protection, and memory access wraps
from the end of the Local Store back to the beginning. An SPU program is
free to write anywhere in the Local Store including its own instruction
space. A common problem in SPU programming is the corruption of the SPU
program text when the stack area overflows into the program area. This
problem typically does not become apparent until some later point in the
program execution when the program attempts to execute code in area that
was corrupted, which typically results in illegal instruction exception.
Even with a debugger it can be difficult to track down this type of
problem because the cause and effect can occur far apart in the program
execution. Adding printf's just moves failure point around".
---[ 3.1 - Memory Overflows
In the aforementioned memory design of the SPU is already cleaver that
when an attacker controls the overwrite size it's really easy to exploit a
SPU vulnerability, just replacing the original program .text with the
attacker's one.
It's important to note that the SPU interrupt facility can be configured
to branch to an interrupt handler at address 0 if an external condition is
true (bisled - branch indirect and set link if external data is the
instruction used to check if there is external data available). Since the
memory layout loops around, it's always possible to overwrite this handler
if it's been used.
Another important note is the fact that instructions on memory MUST be
aligned on word boundaries.
There is instruction and data caches for the local storage (depending on
the implementation details), so it's important to assure:
- You are overflowing a large enough amount of data to avoid
caching
- You are not using a self-modifying shellcode unless you issue
the sync instruction (see [13] for references)
---[ 3.1.1 - SPE memory layout
The memory layout for the SPE looks like:
------------------------ -> 0x3FFFF
SPU ABI Reserved Usage
------------------------ | Stack grows from the
Runtime Stack | higher addresses to
------------------------ | the lower addresses.
Global Data |
------------------------ \/
.Text
------------------------ -> 0x00000
For the purpose of test your application, it's really interesting to use the
'size' application:
---------- begin output ----------
# size hello_spu
text data bss dec hex filename
1346 928 32 2306 902 hello_spu
---------- end output ----------
---[ 3.1.2 - SPE assembly basics
It's important in order to develop a shellcode to understand the
differences in the SPE assembly when comparing to PowerPC.
The SPE uses risc-based assembly, which means there is a small set of
instructions and everything in the SPE runs in user-mode (there is no
kernel-mode for the SPE). That said we need to remember there is no
system-calls, but instead there is the PPE calls (stop instructions).
It is also a big endian architecture (keep that in mind while reading the
following sections).
This architecture provides many ways to avoid branches in the code for
maximum efficiency. Since it's not a real problem while exploiting
software, I'll just avoid to talk about and will also avoid to talk about
SIMD instructions. For more informations on that refer to the SPU
Instruction Set Architecture document [12].
---[ 3.1.2.1 - Registers
I already explained a little about the way the architecture works and in
this section I'll just include what is the available register set and how
to use it .
The SPE does not define a conditional register, so the comparison
operations will set results that are either 0 (false) or 1 (true) with the
same width as the operands been tested. This results are used to do
bitwise masking, instruction selection or conditional branching.
As any other platform, there is general purposes registers and special
purpose registers in the SPE:
- General Purpose Registers (0-127) Used in different ways by the
instructions. In the second word of R1 you have the information
about the amount of free space in the stack (the room between
end of the heap and the start of the stack).
- Special Purpose Registers
The SPE also supports 128 special-purpose registers. Some
interesting ones:
* SRR0 - Save and Restore Register 0 - Holds the address
used by the interrupt return (iret) instruction
* LR - Link Register - All branch instructions that set
the link register will force the address of the next
instruction to be loaded on this register
* CTR - Count Register - Usually it's used to hold a loop
counter (like the loop instruction and %ecx register in
intel x86 architecture)
* CR - Condition Register - Used to perform conditional
comparisons
To move data between Special Purpose Registers and General Purpose
Registers we have the instructions
* mtspr (move to special purpose register) mfspr (move from
* special purpose register)
---[ 3.1.2.2 - Local Storage Addressing Mode
In order to address information to/from Local Storage the instructions
uses the following structure:
Instruction_Opcode l10_field RA_field RT_field
8-bit 10-bit 7-bit 7-bit
Where: The signed value of the l10 field is appended with 4 zeros and then
added to the preferred slot in the RA, forcing the 4-rightmost bits of the
sum to zero. After, the 16 bytes of the local storage address are
inserted in the RT field.
Preferred slot for the architecture point of view are the leftmost
4 bytes (not bits).
Important to note here that the IBM convention specifies that:
l10 means a 10-bit immediate value
RA means a general purpose register to be used as
source/destination
RT means a general purpose register to be used as destination
(target)
Knowing that makes it easier to understand why the Local Storage Address
Space is limited to 4 GB.
The actual size of the Local Storage can be viewed accessing the LSLR
(local storage limit register). All effective address are ANDed with the
value in the LSLR before used.
---[ 3.1.2.3 - External Devices
The SPU can send/receive data to/from external devices using the channel
interface. The channel instructions uses quadwords (128bits) to transfer
data to/from general purpose registers and the channel device (which
supports 128 channels).
---[ 3.1.2.4 - Instruction Set
Here are some useful instructions to be used while developing a shellcode
for the SPE.
Instruction Operands Description
Sample
-------------------------------------------------------------------------
lqd (load quadword) rt,symbol(ra) load a value (16 bytes)
from Local Storage (pointed by RA to the general purpose register RT)
lqd $0, 16($1)
stqd (store quadword) rt,symbol(ra) the contents of the
register (RT) are stored at the local storage address pointed by RA
stqd $0, 16($1)
ilh (immediate load halfword) rt,symbol the value of l16 is placed
in register RT
ilh $0, 0x1a0
il (immediate load word) rt, symbol the value of l16 is
expanded to 32bits replicating the leftmost bit and then written to the RT
il $0, 0x1a0
nop (no operation) rt this instruction uses a
false RT and nothing is changed
nop $127
ila (immediate load address) rt, symbol the value of the l18 is
placed in the rightmost 18bits of RT (the remaining bits of RT are zeroed)
ila $3, 0x340
a (add word) rt,ra,rb the operand on register ra
is added to the operand on register rb and the result is written to RT
a $0, $1, $2
ai (add word immediate) rt,ra,value the value (l10 field) is
added to the operand in ra and the result written to RT
ai $1, $1, -32
brsl (branch relative and set link) rt,symbol execution proceeds to the
target instruction and a link register is set (the symbol is a l16 type
and it is extended to the rigth with two 0 bits) - The address of the
current instruction is added to the symbol address for the branch. The
address of the next instruction is written to the preferred byte of the RT
register.
brsl $0, 0x1a0
fsmbi (form select mask for bytes immediate) rt,symbol the symbol is a
l16 value used to create a mask in the register RT copying eight times
each bit. Bits in the operand are related to bytes in the result in a
left-to-right correspondence. fsmbi $3, 0
bi (branch indirect) ra execution proceeds to the
preferred slot of RA. The right two bits in the RA are ignored (supposed
to be zero). There is two flags, D and E to disable and Enable
interrupts.
bi $0
---[ 3.1.3 - Exploiting Software Vulnerabilities in SPE
First of all it's important to make it even more clear that it is
impossible to, for example, force the SPE process to execute a new command
(a.k.a. execve() shellcodes). The same happens for network-based library
functions and others, as already explained we need the PPE to proxy that
for us.
So it open two new paths:
- Create a PPE shellcode to be used while exploiting PPE software
vulnerabilities that will spawn a proxy for commands received by
the SPE and will create a SPE thread to do all the job -> This
is pure PPC shellcode and this article already discussed
everything needed to achieve that. In the attached sources you
have samples in the directory cell-ppe/ [16].
- Create a vulnerability specific code for the SPE, that will
print out internal program information related to the exploited
SPE. This is specially interesting and difficult because:
* Need to remember that the SPE uses instruction-cache, so
sometimes if you overflow just a small amount of bytes,
it will be specially difficult to get it executed
* If you use the wrap-around characteristics of the memory
layout for the SPE, you will probably overwrite also the
information you are interested in.
In the other hand, it's important to say that everything the information
will be in the same place (or easier to understand: there is no ASLR in
the SPE). Running the attached samples (specially the SPE-SPE
communications because it's printing the pointers addresses will make it
clear to the reader).
---[ 3.1.3.1 - Avoiding Null Bytes
It is important to avoid null bytes, so we cannot use the NOP instruction
in our shellcode.
This creates a problem, since the ori instruction will also generate null
byte if used with 0 as an argument (e.g: ori $1, $1, 0).
A good replacement is the instruction or (e.g: or $1, $1, $1) or the usage
of multiple instructions (which will reduce the probability of your return
address).
---[ 3.1.4 - Finding software vulnerabilities on SPE
The simulator provided by IBM has a feature that monitors selected
addresses or regions on the Local Store for read and write accesses. This
feature can help identify stack overflows conditions.o
Invoked from the simulator command windows as follows:
enable_stack_checking [spu_number] [spu_executable_filename]
This procedure uses the nm system utility to determine the area of the
Local Storage that will contain the program code and creates trigger
functions to trap writes by the SPU into this region.
Important to notice that this approach are just looking for writes in the
text and static data and not to the heap. Of course the same approach
used by this feature could be used to help the creation of a fuzzer using
TCL scripts based on the one provided.
------[ 4 - Future and other uses
I can't foresee the future, but this kind of architectures are becoming
more and more common and will open a wide range of new vulnerabilities.
The complexity behind this kind of asymmetric multi-threaded architectures
are even higher than the normal ones. The lack of memory protection will
help also the attackers on how to subvert those systems. The main
processor been based on an already well-known architecture (powerpc) also
helps the dissemination of malicious codes.
Many other researchers are doing stuff using Cell:
- Nick Breese presented on Crackstation project in BlackHat [5]
Basically he used the SIMD capabilities and big registers
provided by the architecture to crack passwords [5]
- IBM Researchers released a study about the usage of the Cell SPU
as a Garbage Collector Co-processor [14]
- Maybe there is JTAG-based interfaces on the cell machines to try
to use RiscWatch [15]
- Unfortunelly the SPU access are controlled by the PPE so run
integrity protection mechanisms from SPU seens infeasible ->
Anyway, I wrote a network traffic analyzer using cell as base
architechture.
------[ 5 - Acknowledgments
A lot of people helped me in the long way for these researches that
resulted in something funny to be published, you all know who you are.
Special thanks to the Phrack Staff for the great review of the article,
giving a lot of important insights about how to better structure it and
giving a real value to it.
I always need to thanks to Filipe Balestra, my research partner, for
sharing with me his ideas, feedbacks, comments and experiences improving a
lot the article and the samples.
I'll never ever forget to say thanks to my research team and friends at
RISE Security (http://www.risesecurity.org) for always keeping me
motivated studying completely new things. Be sure that the unix-asm [16]
project will be updated soon with all the stuff showed here and much more
different types of shellcodes for the architecture. Also, of course the
updates will be available for Metasploit.
Big thanks to the Cell Kernel guru, Andre Detsch for sharing with me his
ideas and discussing the internals of the Linux implementation for Cell.
Conference organizers who invited me to talk about Cell Software
Exploitation, even after many people already talked about Cell they
trusted that my talk was not about brute-forcing (yeah, a lot of fun in
completely different cultures).
To my girlfriend who waited for me (alone, I suppose) during this travels.
It's impossible to not say thanks to COSEINC, for let me keep doing this
research using important company time.
------[ 6 - References
[1] Cell Broadband Engine Architecture, v1.01 October 2006
http://cell.scei.co.jp/pdf/CBE_Architecture_v101.pdf
[2] Sony Computer Entertainment
http://www.sony.com
[3] Toshiba Corporation
http://www.toshiba.com
[4] IBM Corporation
http://www.ibm.com
[5] Breese, Nick; "Crackstation"; Black Hat Europe 2008
http://www.blackhat.com/presentations/bh-europe08/Bresse/Presentation/bh-eu-08-breese.pdf
[6] IBM Power Architecture
http://www-03.ibm.com/chips/power/
[7] IBM Bladecenter QS21
http://www.ibm.com/systems/bladecenter/hardware/servers/qs21/index.html
[8] IBM Roadrunner Supercomputer
http://en.wikipedia.org/wiki/IBM_Roadrunner
[9] The cell project at IBM Research
http://www.research.ibm.com/cell/
[10] Cell Simulator
http://www.alphaworks.ibm.com/tech/cellsystemsim
[11] Cell resource center at developerWorks (SDK download)
http://www-128.ibm.com/developerworks/power/cell/
[12] Synergistic Processor Unit Instruction Set Architecture v1.2
http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/76CA6C7304210F3987257060006F2C44/$file/SPU_ISA_v1.2_27Jan2007_pub.pdf
[13] Moore, H.D; "Mac OS X PPC Shellcode Tricks"; Uninformed Magazine 2005
http://www.uninformed.org/?v=1&a=1&t=txt
[14] Cher, Chen-Yong; Gschwind, Michael; "Cell GC: Using the Cell Synergistic Processor as a Garbage Collector Coprocessor"; 2008
http://www.research.ibm.com/cell/papers/2008_vee_cellgc_slides.pdf
[15] RISCWatch Debugger
http://www.ibm.com/chips/techlib/techlib.nsf/products/RISCWatch_Debugger
[16] Carvalho, Ramon de; "Cell PPE Shellcodes"; RISE Security;
http://www.risesecurity.org/papers/lopbuffer.pdf
Others:
PowerPC User Instruction Set Architecture, Book I, v2.02 January 2005
http://moss.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/arch/PPC_Vers202_Book1_public.pdf
PowerPC Virtual Environment Architecture, Book II, v2.02 January 2005
http://moss.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/arch/PPC_Vers202_Book2_public.pdf
PowerPC Operating Environment Architecture, Book III, v2.02 January 2005
http://moss.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/arch/PPC_Vers202_Book3_public.pdf
Cell developer's corner at power.org
http://www.power.org/resources/devcorner/cellcorner/
Linux info at the Barcelona Supercomputing Center website
http://www.bsc.es/projects/deepcomputing/linuxoncell
------[ 7 - Notes on SDK/Simulator Environment
There is some pictures on the simulator and sdk running on the attached file:
images/cell-sim1.jpg and images/cell-sim2.jpg
To install the SDK/Simulator, do:
- Download the Cell SDK ISO image from the IBM alphaWorks website.
- Mount the disk image on the mount directory: mount -o loop
CellSDK<version>.iso /mnt/phrack
- Change directory to /mnt/phrack/software:
- Install the SDK by using the following command and answer any
prompts: ./cellsdk install
To start the simulator: cd /opt/IBM/systemsim-cell/run/cell/linux
../run_gui Click on the 'go' button to start the simulated system
To copy files to the simulated system (inside it run):
callthru source /home/bsdaemon/Phrack/hello_ppu > hello_ppu
Then give the correct permissions and execute:
chmod +x hello_ppu
./hello_ppu
------[ 8 - Sources [cell_samples.tgz]
Attached all the samples used on this article to be compiled in a Linux
running on Cell machine.
Further updates will be available in the RISE Security website at:
http://www.risesecurity.org
For the author's public key:
http://www.kernelhacking.com/rodrigo/docs/public.txt
begin 644 cell_samples.tgz
end
--------{ EOF
Volume 0x0d, Issue 0x42, Phile #0x0D of 0x11
|=----------------------------------------------------------------------=|
|=---------=[ Hacking the Cell Broadband Engine Architecture ]=---------=|
|=-------------------=[ SPE software exploitation ]=--------------------=|
|=----------------------------------------------------------------------=|
|=--------------=[ By BSDaemon ]=----------=|
|=--------------=[ <bsdaemon *noSPAM* risesecurity_org> ]=----------=|
|=----------------------------------------------------------------------=|
"There are two ways of
constructing a software design.
One way is to make it so simple
that there are obviously no
deficiencies. And the other way
is to make it so complicated that
there are no obvious deficiencies"
- C.A.R. Hoare
------[ Index
1 - Introduction
1.1 - Paper structure
2 - Cell Broadband Engine Architecture
2.1 - What is Cell
2.2 - Cell History
2.2.1 - Problems it solves
2.2.2 - Basic Design Concept
2.2.3 - Architecture Components
2.2.4 - Processor Components
2.3 - Debugging Cell
2.3.1 - Linux on Cell
2.3.2 - Extensions to Linux
2.3.2.1 - User-mode
2.3.2.2 - Kernel-mode
2.3.3 - Debugging the SPE
2.4 - Software Development for Linux on Cell
2.4.1 - PPE/SPE hello world
2.4.2 - Standard Library Calls from SPE
2.4.3 - Communication Mechanisms
2.4.4 - Memory Flow Control (MFC) Commands
2.4.5 - Direct Memory Access (DMA) Commands
2.4.5.1 - Get/Put Commands
2.4.5.2 - Resources
2.4.5.3 - SPE 2 SPE Communication
3 - Exploiting Software Vulnerabilities on Cell SPE
3.1 - Memory Overflows
3.1.1 - SPE memory layout
3.1.2 - SPE assembly basics
3.1.2.1 - Registers
3.1.2.2 - Local Storage Addressing Mode
3.1.2.3 - External Devices
3.1.2.4 - Instruction Set
3.1.3 - Exploiting software vulnerabilities in SPE
3.1.3.1 - Avoiding Null Bytes
3.1.4 - Finding software vulnerabilities on SPE
4 - Future and other uses
5 - Acknowledgements
6 - References
7 - Notes on SDK/Simulator Environment
8 - Sources
------[ 1 - Introduction
This article is all about Cell Broadband Architecture Engine [1], a new
hardware designed by a joint between Sony [2], Toshiba [3] and IBM [4].
As so, lots of architecture details will be explained, and also many
development differences for this platform.
The biggest differentiator between this article and others released about
this subject, is the focus on the architecture exploitation and not the
use of the powerful processor resources to break code [5] and of course,
the focus in the differentiators of the architecture, which means the SPU
(synergestic processor unit) and not in the core (PPU - power processor
unit) [6], since the core is a small-modified power processor (which
means, all shellcodes for Linux on Power will also works for the core and
there is just small differences in the code allocation and stuffs like
that).
It's important to mention that everything about Cell tries to focus in the
Playstation3 hardware, since it's cheap and widely deployed, but there is
also big machines made with this processor [7], including the #1 in the
list of supercomputers [8].
---[ 1.1 - Paper structure
The idea of this paper is to complete the studies about Cell, putting all
the information needed to do security research, focused in software
exploitation for this architecture together.
For that, the paper have been structured in two important portions:
Chapter 2 will be all about the Cell Architecture and how to develop for
this architecture. It includes many samples and explains the
modifications done to Linux in order to get the best from this
architecture. Also, it gives the knowledge needed in order to go further
in software exploitation for this arch. Chapter 3 is focused in the
exploitation of the SPU processor, showing the simple memory layout it has
and how to write a shellcode for the purpose of gaining control over an
application running inside the SPU.
------[ 2 - Cell Broadband Engine Architecture
From the IBM Research [9]: "The Cell Architecture grew from a challenge
posed by Sony and Toshiba to provide power-efficient and cost-effective
high-performance processing for a wide range of applications, including
the most demanding consumer appliance: game consoles. Cell - also known as
the Cell Broadband Engine Architecture (CBEA) - is an innovative solution
whose design was based on the analysis of a broad range of workloads in
areas such as cryptography, graphics transform and lighting, physics,
fast-Fourier transforms (FFT), matrix operations, and scientific
workloads. As an example of innovation that ensures the clients' success,
a team from IBM Research joined forces with teams from IBM Systems
Technology Group, Sony and Toshiba, to lead the development of a novel
architecture that represents a breakthrough in performance for consumer
applications. IBM Research participated throughout the entire development
of the architecture, its implementation and its software enablement,
ensuring the timely and efficient application of novel ideas and
technology into a product that solves real challenges."
It's impossible to not get excited with this. A so 'powerful' and
versatile architecture, completely different from what we usually seen is
an amazing stuff to research for software vulnerabilities. Also, since
it's supposed to be widely deployed, there will be an infinite number of
new vulnerabilities coming on in the near future. I wanted to exploit
those vulnerabilities.
---[ 2.1 - What is Cell
As must be already clear to the reader, I'm not talking about phones here.
Cell is a new architecture, which cames to solve some of the actual
problems in the computer industry.
It's compatible with a well-known architecture, which are the Power
Architecture, keeping most of it's advantages and solving most of it's
problems (if you cannot wait until know what problems, go to 2.2.1
section).
---[ 2.2 - Cell History
The focus of this section is just to give a timeline vision for the
reader, not been detailed at all.
The architecture was born from a joint between IBM, Sony and Toshiba,
formed in 2000.
They opened a design center in March 2001, based in Austin, Texas (USA).
In the spring of 2004, a single Cell BE became operational. In the summer
of the same year, a 2-way SMP version was released.
The first technical disclosures came just in February 2005, with the
simulator [10] and open-source SDK [11] (more on that later) been released
in November of the same year. In the same month, Mercury started to sell
Cell (yeah, sell Cell sounds funny) machines.
Cell Blades was announced by IBM in February of 2006. The SDK 1.1 was
released in July of the same year, with many improvements. The latest
version is 3.1.
---[ 2.2.1 - Problems it solves
The computer technology have been evolving along the years, but always
suffering and trying to avoid some barriers.
Those barriers are physically impossible to be bypassed and that's why the
processor clock stopped to grow and multi-core architectures been focused.
Basically we have three big walls (barriers) to the speedy grow:
- Power wall
It's related to the CMOS technology limits and the hard limit to
the acceptable system power
- Memory wall
Many comparisons and improvements trying to avoid the DRAM latency
when compared to the processor frequency
- Frequency wall
Diminishing return from deeper pipelines
For a new architecture to work and be widely deployed, it was also
important to keep the investments in software development.
Cell accomplish that being compatible with the 64 bits Power Architecture,
and attacks the walls in the following ways:
- Non-homogeneous coherent multi-processor and high design
frequency at a low operating voltage with advanced power
management attacks the 'power wall'.
- Streaming DMA architecture and three-level memory model (main
storage, local storage and register files) attacks the 'memory
wall'.
- Non-homogeneous coherent multi-processor, highly-optimized
implementation and large shared register files with software controlled
branching to allow deeper pipelines attacks the 'frequency wall'.
It have been developed to support any OS, which means it supports
real-time operating system as well non-real time operating systems.
---[ 2.2.2 - Basic Design Concept
The basic concept behind cell is it's asymmetric multi-core design. That
permits a powerful design, but of course requires specific-developed
applications to achieve the most of the architecture.
Knowing that, becomes clear that the understanding of the new component,
which is called SPU (synergistic processor unit) or SPE (synergistic
processor element) proofs to be essential - see the next section for a
better understanding of the differences between SPU and SPE.
---[ 2.2.3 - Architecture Components
In cell what we have is a core processor, called Power Processor Element
(PPE) which control tasks and synergistic processor elements (SPEs) for
data-intensive processing.
The SPE consists of the synergistic processor unit (SPU), which are a
processor itself and the memory flow control (MFC), responsible for the
data movements and synchronization, as well for the interface with the
high-performance element interconnect bus (EIB).
Communications with the EIB are done in a 16B/cycle, which means that each
SPU is interconnected at that speedy with the bus, which supports
96B/cycle.
Refer to the picture architecture-components.jpg in the directory images
of the attached file for a visual of the above explanation.
---[ 2.2.4 - Processor Components
As said, the Power Processor Element (PPE) is the core processor which
control tasks (scheduling). It is a general purpose 64 bit RISC processor
(Power architecture).
It's 2-way hardware multithreaded, with a L1: 32KB I and D caches and L2:
512KB cache.
Has support for real-time operations, like locking the L2 cache and the
TLB (also it supports managed TLB by hardware and software). It has
bandwidth and resource reservation and mediated interrupts.
It's also connected to the EIB using a 16B/cycle channel (figure
processor-components.jpg).
The EIB itself supports four 16 bytes data rings with simultaneous
transfers per ring (it will be clarified later).
This bus supports over 100 simultaneous transactions achieving in each bus
data port more than 25.6 Gbytes/sec in each direction.
On the other side, the synergistic processor element is a simple RISC
user-mode architecture supporting dual-issue VMX-like, graphics SP-float
and IEEE DP-float.
Important to note that the SPE itself has dedicated resources: unified 128
x 128 bit register files and 256KB local storage. Each SPE has a
dedicated DMA engine, supporting 16 requests.
The memory management on this architecture simplified it's use, with the
local storage of the SPE being aliased into the PPE system memory (figure
processor-components2.jpg).
MFC in the SPE acts as the MMU providing controls over the SPE DMA access
and it's compatible with the PowerPC Virtual Memory layout and is software
controllable using PPE MMIO.
DMA access supports 1,2,4,8...n*16 bytes transfer, with a maximum of 16 KB
for I/O, and with two different queues for DMA commands: Proxy & SPU
EIB is also connected in a broadband interface controller (BIC). The
purpose of this controller is to provide external connectivity for
devices. It supports two configurable interfaces (60 GB/s) with a
configurable number of bytes, coherent (BIF) and/or I/O (IOIFx) protocols,
using two virtual channels per interface, and multiple system
configurations.
The memory interface controller (MIC) is also connected to the EIB and is
a Dual XDR controller (25.6 GB/s) with ECC and suspended DRAM support
(figure processor-components3.jpg).
Still are missing two more components: The internal interrupt controller
(IIC) and the I/O Bus Master Translation (IOT) (figure
processor-components4.jpg).
The IIC handles the SPE interrupts as well as the external interrupts and
interrupts comming from the coherent interconnect and the IOIF0 and IOIF1.
It is also responsible for the interrupt priority level control and for
the interrupt generation ports for IPI. Note that the IIC is duplicated
for each PPE hardware thread.
IOT translates bus addresses to system real addresses, supporting two
level translations:
- I/O segments (256 MB)
- I/O pages (4K, 64K, 1M, 16M bytes)
Interesting is the resource of I/O device identifier per page for LPAR use
(blades) and IOST/IOPT caches managed by software and hardware.
---[ 2.3 - Debugging Cell
As the bus is a high-speedy circuit, it's really difficult to debug the
architecture and better seen what is going on.
For that, and also to made it easy to develop software for Cell, IBM
Research developed a Cell simulator [10] in which you may run Linux and
install the software development kit [11].
The IBM Linux Technology Center brazilian team developed a plugin for
eclipse as an IDE for the debugger and SDK. Putting it all together is
possible to have the toolkit installed in a Linux machine, running the
frontends for the simulator and for the SDK. The debugging interface is
much better using this frontends. Anyway, it's important to notice that
it's just a frontend for the normal and well know linux tools with
extended support to Cell processor (GDB and GCC).
---[ 2.3.1 - Linux on Cell
Linux on cell is an open-source git branch and is provided in the PowerPC
64 kernel line.
It started in the 2.6.15 and is evolving to support many new features,
like the scheduling improvements for the SPUs (actually it can be
preempted, and my big friend Andre Detsch who reviewed this article was
one of the biggest contributors to create an stable code here).
On Linux it added heterogeneous lwp/thread model, with a new SPE thread
model (really similar to the pthreads library as we will see later),
supporting user-mode direct and indirect SPE access, full-preemptive SPE
context management and for that, spe_ptrace() was create and it's support
added to GDB, spe_schedule() for thread to physical spe assigment (it is
not anymore FIFO - run until completion).
As a note, the SPE threads shares it's address space with the parent PPE
process (using DMA), demand paging for SPE access and shared hardware page
table with PPE.
An implementation detail is the PPE proxy thread allocated for each SPE to
provide a single namespace for both PPE and SPE and assist in SPE
initiated C99 and Posix library services.
All the events, error and signal handling for SPEs are done by the parent
PPE thread.
The ELF objects for SPE are wrapped into PPE objects with an extended GLD.
---[ 2.3.2 - Extensions to Linux
Here I'll try to provide some details for Linux running under a Cell
Hardware. The base hardware used for this reference is a Playstation 3,
which has 8 SPUs, but one is reserved with the purpose of redundancy and
another one is used as hypervisor for a custom OS (in this case, Linux).
All the details are valid for any Linux on Cell and we will provide an
top-down view approach.
---[ 2.3.2.1 - User-mode
Cell supports both power 32 and 64 bits applications, as well as 32 and 64
cell workloads. It has different programming modes, like RPC, devices
subsystems and direct/indirect access.
As already said, it has heterogeneous threads: single SPU, SPU groups and
shared memory support.
It runs over a SPE management runtime library, with 32 and 64 bits. This
library interacts with the SPUFS filesystem (/spu/thread#/) in the
following ways:
* Open, close, read, write the files:
- mem
This file provides access to the local storage
- regs
Access to the 128 register of 128 bits each
- mbox
spe to ppe mailbox
- liox
spe to ppe interrupt mailbox
- xbox_stat
Get the mailbox status
- signal1
Signal notification acess
- signal2
Signal notification acess
- signalx_type
Signal type
- npc
Read/write SPE next program counter (for debugging)
- fpcr
SPE floating point control/status register
- decr
SPE decrementer
- decr_status
SPE decrementer status
- spu_tag_mask
Access tag query mask
- event_mask
Access spe event mask
- srr0
Access spe state restore register 0
* open, close mmap the files:
- mem
Program State access of the Local Storage
- signal1
Direct application access to signal 1
- signal2
Direct application access to signal 2
- cntl
Direct application access to SPE controls, DMA queues and
mailboxes
The library also provides SPE task control system calls (to interact with
the SPE system calls implemented in kernel-mode), which are:
- sys_spu_create_thread
Allocates a SPE task/context and creates a directory in SPUFS
- sys_spu_run
Activates a SPU task/context on a physical SPE and
blocks in the kernel as a proxy thread to handle the events
already mentioned
Some functions provided by the library are related to the management of
the spe tasks, like spe create group, create thread, get/set affinity,
get/set context, get event, get group, get ls, get ps area, get threads,
get/set priority, get policy, set group defaults, group max, kill/wait,
open/close image, write signal, read in_mbox, write out_mbox, read mbox
status.
Obviously the standard 32 and 64 bits powerpc ELF (binary) interpreters,
it is provided a SPE object loader, responsible for understand the
extension to the normal objects already mentioned and for initiate the
loading of the SPE threads.
Going down, we have the glibc and other GNU libraries, both supporting 32
and 64 bits.
---[ 2.3.2.2 - Kernel-mode
The next layer is the normal system-call interface, where we have the SPU
management framework (through special files in the spufs) and
modifications in the exec* interface, in a 64bit kernel.
This modification is done through a special misc format binary, called SPU
object loader extension.
Of course there is other kernel extensions, the SPUFS filesystem, which
provides the management interface and the SPU allocation, scheduling and
dispatch.
Also, we do have the Cell BE architecture specific code, supporting multi
and large pages, SPE event & fault handling, IIC and IOMMU.
Everything is controlled by a hypervisor, since Linux is what is called a
custom OS when running in a Playstation3 hardware (the hypervisor is
responsible for the protection of the 'secret key' of the hardware and
knowing how to exploit SPU vulnerabilities plus some fuzzing on the
hypervisor may be the needed knowledge to break the game protection copy
in this hardware).
---[ 2.3.3 - Debugging the SPE
The SDK for Linux on Cell provides good resources for Debugging and better
understanding of what is going on.
It's important to note the environment variables that control the
behaviour of the system.
So, if you set the SPU_INFO, for example, the spe runtime library will
print messages when loading a SPE ELF executable (see above).
---------- begin output ----------
# export SPU_INFO=1
# ./test
Loading SPE program: ./test
SPU LS Entry Addr : XXX
---------- end output ----------
And it will also print messages before starting up a new SPE thread, like:
---------- begin output ----------
Starting SPE thread 0x..., to attach debugger use: spu-gdb -p XXX
---------- end output ----------
When planning to use the spu-gdb to debug a SPU thread, it's important to
remember the SPU_DEBUG_START environment variable, which will include
everything provided by the SPU_INFO and will stop the thread until a
debugger is attached or a signal is received.
Since each SPU register can hold multiple fixed (or floating) point values
of different sizes, for GDB is provided a data structure that can be
accessed with different formats. So, specifying the field in the data
structure, we can update it using different sizes as well:
---------- begin output ----------
(gdb) ptype $r70
type = union __gdb_builtin_type_vec128 {
int128_t uint128;
float v4_float[4];
int32_t v4_int32[4];
int16_t v8_int16[8];
int8_t v16_int8[16];
}
(gdb) p $r70.uint128
$1 = 0x00018ff000018ff000018ff000018ff0
(gdb) set $r70.v4_int[2]=0xdeadbeef
(gdb) p $r70.uint128
$2 = 0x00018ff000018ff0deadbeef00018ff0
---------- end output ----------
To permit you to better understand when the SPU code starts the execution
and follow it gdb also included an interesting option:
---------- begin output ----------
(gdb) set spu stop-on-load
(gdb) run
...
(gdb) info registers
---------- end output ----------
Another important information for debugging your code is to understand the
internal sizes and be prepared for overlapping. Useful information can
be get using the following fragment code inside your spu program (careful:
It's not freeing the allocated memory).
--- code ---
extern int _etext;
extern int _edata;
extern int _end;
void meminfo(void)
{
printf("\n&_etext: %p", &_etext);
printf("\n&_edata: %p", &_edata);
printf("\n&_end: %p", &_end);
printf("\nsbrk(0): %p", sbrk(0));
printf("\nmalloc(1024): %p", malloc(1024));
printf("\nsbrk(0): %p", sbrk(0));
}
--- end code ---
And of course you can also play with the GCC and LD arguments to have more
debugging info:
--- code ---
# vi Makefile
CFLAGS += -g
LDFLAGS += -Wl,-Map,map_filename.map
--- end code ---
---[ 2.4 - Software Development for Linux on Cell
In this chapter I will introduce the inners of the Cell development,
giving the basic knowledge necessary to better understand the further
chapters.
---[ 2.4.1 - PPE/SPE hello world
Every program in Cell that uses the SPEs needs to have at least two source
codes. One for the PPE and another one for the SPE.
Following is a simple code to run on the SPE (it's also in the attached
tar file :
--- code ---
#include <stdio.h>
int main(unsigned long long speid, unsigned long long argp, unsigned long long envp)
{
printf("\nHello World!\n");
return 0;
}
--- end code ---
The Makefile for this code will look like:
--- code ---
PROGRAM_spu = hello_spu
LIBRARY_embed = hello_spu.a
IMPORTS = $(SDKLIB_spu)/libc.a
include ($TOP)/make.footer
--- end code ---
Of course it looks like any normal code. The PPE as already explained is
the responsible for the creation of the new thread and allocation in the
SPE:
--- code ---
#include <stdio.h>
#include <libspe.h>
extern spe_program_handle_t hello_spu;
int main(void)
{
int speid, status;
speid=spe_create_thread(0, &hello_spu, NULL, NULL, -1, 0);
spe_wait(speid, &status, 1);
return 0;
}
--- end code ---
With the following Makefile:
--- code ---
DIRS = spu
PROGRAM_ppu = hello_ppu
IMPORTS = ../spu/hello_spu.a -lspe
include $(TOP)/make.footer
--- end code ---
The reader will notice that the speid in the PPE program will be the same
value as the speid in the main of the SPE.
Also, the arguments passed to the spe_create_thread() are the ones
received by the SPE program when running (argp and envp equals to NULL in
our sample).
Important to remember that when compiled this program will generate a
binary in the spu directory, called hello_spu and another one in the root
directory of this example called hello_ppu, which CONTAINS embedded the
hello_spu.
---[ 2.4.2 - Standard Library Calls from SPE
When the SPE program needs to use any standard library call, like for
example, printf or exit, it has to call back to the PPE main thread.
It uses a simple stop-and-signal assembly instruction with standardized
arguments value (important to remember that since it's needed in
shellcodes for SPE).
That value is returned from the ioctl call and the user thread must react
to that. This means copying the arguments from the SPE Local Storage,
executing the library call and then calling ioctl again.
The instruction according to the manual:
"stop u14 - Stop and signal. Execution is stopped, the current
address is written to the SPU NPC register, the value u14 is
written to the SPU status register, and an interrupt is sent to
the PPU."
This is a disassembly output of the hello_spu program:
---------- begin output ----------
# spu-gdb ./hello_spu
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "--host=powerpc64-unknown-linux-gnu --target=spu"...
(gdb) disassemble main
Dump of assembler code for function main:
0x00000170 <main+0>: ila $3,0x340 <.rodata>
0x00000174 <main+4>: stqd $0,16($1)
0x00000178 <main+8>: nop $127
0x0000017c <main+12>: stqd $1,-32($1)
0x00000180 <main+16>: ai $1,$1,-32
0x00000184 <main+20>: brsl $0,0x1a0 <puts> # 1a0
0x00000188 <main+24>: ai $1,$1,32 # 20
0x0000018c <main+28>: fsmbi $3,0
0x00000190 <main+32>: lqd $0,16($1)
0x00000194 <main+36>: bi $0
0x00000198 <main+40>: stop
0x0000019c <main+44>: stop
End of assembler dump.
(gdb)
---------- end output ----------
---[ 2.4.3 - Communication Mechanisms
The architecture offers three main communications mechanism:
- DMA
Used to move data and instructions between main storage and
a local storage. SPEs rely on asyncronous DMA transfers to hide
memory latency and transfer overhead by moving information in
parallel with SPU computation.
- Mailbox
Used for control communications between a SPE and the
PPE or other devices. Mailboxes holds 32-bit messages. Each
SPE has two mailboxes for sending messages and one mailbox for
receiving messages.
- Signal Notification
Used for control communications from PPE or
other devices. Signal notification (also known as signalling)
uses 32-bit registers that can be configured for
one-sender-to-one-receiver signalling or
many-senders-to-one-receiver signalling.
All three are controlled and implemented by the SPE MFC and it's
importance is related to the way the vulnerable program will receive it's
input.
---[ 2.4.4 - Memory Flow Control (MFC) Commands
This is the main mechanism for the SPE to access the main storage and
maintain syncronization with other processors and devices in the system.
MFC commands can be issued either by the SPE itself, or by the processor
and other devices, as follow:
- A code running on the SPU issue a MFC command by executing a
series of writes and/or reads using channel instructions.
- A code running on the PPU or any other device issue a MFC
command by performing a serie of stores and/or loads to
memory-mapped I/O (MMIO) registers in the MFC.
The MFC commands are then queued in one of those independent queues:
- MFC SPU Command Queue - For channel-initiated commands by the
associated SPU
- MFC Proxy Command Queue - For MMIO-initiated commands by the PPE
or other devices.
---[ 2.4.5 - Direct Memory Access (DMA) Commands
The MFC commands that transfers data are referred as DMA commands. The
transfer direction for DMA commands are based on the SPE point of view:
- Into a SPE (from main storage to the local storage) -> get
- Out of a SPE (from local storage to the main storage) -> put
---[ 2.4.5.1 - Get/Put Commands
DMA get from the main memory to the local storage:
(void) mfc_get (volatile void *ls, uint64_t ea, uint32_t size,
uint32_t tag, uint32_t tid, uint32_t rid)
DMA put into the main memory from the local storage:
(void) mfc_put (volatile void *ls, uint64_t ea, uint32_t size,
uint32_t tag, uint32_t tid, uint32_t rid)
To guarantee the synchronization of the writes to the main memory, there
is the options:
- mfc_putf: the 'f' means fenced, or, that all commands executed
before within the same tag group must finish first, later ones
could be before
- mfc_putb: the 'b' here means barrier, or, that the barrier
command and all commands issued thereafter are NOT executed
until all previously issued commands in the same tag group have
been performed
---[ 2.4.5.2 - Resources
For DMA operations the system uses DMA transfers with variable length
sizes (1, 2, 4, 8 and n*16 bytes (n an integer, of course). There is a
maximum of 16 KB per DMA transfer and 128b aligments offer better
performance.
The DMA queues are defined per SPU, with 16-element queue for
SPU-initiated requests and 8-element queue for PPU-initiated requests.
The SPU-initiated request has always a higher priority.
To differentiate each DMA command, they receive a tag, with a 5-bit
identifier. Same identifier can be applied to multiple commands since
it's used for polling status or waiting on the completion of the DMA
commands.
A great feature provided is the DMA lists, where a single DMA command can
cause execution of a list of transfers requests (in local storage). Lists
implements scatter-gather functions and may contain up to 2K transfer
requests.
---[ 2.4.5.3 - SPE 2 SPE Communication
An address in another SPE local storage is represented as a 32-bit
effective address (global address).
SPE issuing a DMA command needs a pointer to the other SPE's local
storage. The PPE code can obtain effective address of an SPE's local
storage:
--- code ---
#include <libspe.h>
speid_t speid;
void *spe_ls_addr;
spe_ls_addr=spe_get_ls(speid);
--- end code ---
This permits the PPE to give to the SPEs each other local addresses and
control the communications. Vulnerabilities may arise don't matter what
is the communication flow, even without involving the PPE itself.
Follow is a simple DMA demo program between PPE and SPE (see the attached
file for the complete version) - This program will send an address in the
PPE to the SPE through DMA:
--- PPE code ---
information_sent is[1] __attribute__ ((aligned 128)));
spe_git_t gid;
int * pointer=(int *)malloc(128);
gid=spe_create_group(SCHED_OTHER, 0, 1);
if (spe_group_max(gid) < 1 ) {
printf("\nOps, there is no free SPE to run it...\n");
exit(EXIT_FAILURE);
}
is[0].addr = (unsigned int) pointer;
/* Create the SPE thread */
speid=spe_create_thread (gid, &hello_dma, (unsigned long long *) &is[0], NULL, -1, 0);
/* Wait for the SPE to complete */
spe_wait(speids[0], &status[0], 0);
/* Best pratice: Issue a sync before ending - This is good for us ;) */
__asm__ __volatile__ ("sync" : : : "memory");
--- end code ---
--- SPE code ---
information_sent is __attribute__ ((aligned 128)));
int main(unsigned long long speid, unsigned long long argp, unsigned long long envp)
{
/* Where:
is -> Address in local storage to place the data
argp -> Main memory address
sizeof(is) -> Number of bytes to read
31 -> Associated tag to this DMA (from 0 to 31)
0 -> Not useful here (just when using caching)
0 -> Not useful here (just when using caching)
*/
mfc_get(&is, argp, sizeof(is), 31, 0, 0);
mfc_write_tag_mask(1<<31); /* Always 1 left-shifted the value of your tag mask */
/* Issue the DMA and wait until completion */
mfc_read_tag_status_all();
}
--- end code ---
And now between two SPEs (also for the complete code, please refer to the
attached sources):
--- PPE code ---
speid_t speid[2]
speid[0]=spe_create_thread (0, &dma_spe1, NULL, NULL, -1, 0);
speid[1]=spe_create_thread (0, &dma_spe2, NULL, NULL, -1, 0);
for (i=0; i<2; i++) local_store[i]=spe_get_ls(speid[i]); /* Get local storage address */
for (i=0; i<2; i++) spe_kill(speid[i], SIGKILL); /* Send SIGKILL to the SPE
threds */
--- end code ---
--- SPE code ---
/* Write something to the PPE */
spu_write_out_mbox(buffer);
/* Read something from the PPE */
pointer = spu_read_in_mbox();
/* DMA interface */
mfc_get(buffer, pointer, size, tag, 0, 0);
wait_on_mask(1<<tag);
/* DMA something to the second SPE */
mfc_put(buffer, local_store[1], size, tag, 0, 0);
wait_on_mask(1<<tag);
/* Notify the PPE */
spu_write_out_mbox(1);
--- end code ---
------[ 3 - Exploiting Software Vulnerabilities on Cell SPE
I love the architecture manuals and the engineers and the way they talk
about really dumb design choices:
"The SPU Local Store has no memory protection, and memory access wraps
from the end of the Local Store back to the beginning. An SPU program is
free to write anywhere in the Local Store including its own instruction
space. A common problem in SPU programming is the corruption of the SPU
program text when the stack area overflows into the program area. This
problem typically does not become apparent until some later point in the
program execution when the program attempts to execute code in area that
was corrupted, which typically results in illegal instruction exception.
Even with a debugger it can be difficult to track down this type of
problem because the cause and effect can occur far apart in the program
execution. Adding printf's just moves failure point around".
---[ 3.1 - Memory Overflows
In the aforementioned memory design of the SPU is already cleaver that
when an attacker controls the overwrite size it's really easy to exploit a
SPU vulnerability, just replacing the original program .text with the
attacker's one.
It's important to note that the SPU interrupt facility can be configured
to branch to an interrupt handler at address 0 if an external condition is
true (bisled - branch indirect and set link if external data is the
instruction used to check if there is external data available). Since the
memory layout loops around, it's always possible to overwrite this handler
if it's been used.
Another important note is the fact that instructions on memory MUST be
aligned on word boundaries.
There is instruction and data caches for the local storage (depending on
the implementation details), so it's important to assure:
- You are overflowing a large enough amount of data to avoid
caching
- You are not using a self-modifying shellcode unless you issue
the sync instruction (see [13] for references)
---[ 3.1.1 - SPE memory layout
The memory layout for the SPE looks like:
------------------------ -> 0x3FFFF
SPU ABI Reserved Usage
------------------------ | Stack grows from the
Runtime Stack | higher addresses to
------------------------ | the lower addresses.
Global Data |
------------------------ \/
.Text
------------------------ -> 0x00000
For the purpose of test your application, it's really interesting to use the
'size' application:
---------- begin output ----------
# size hello_spu
text data bss dec hex filename
1346 928 32 2306 902 hello_spu
---------- end output ----------
---[ 3.1.2 - SPE assembly basics
It's important in order to develop a shellcode to understand the
differences in the SPE assembly when comparing to PowerPC.
The SPE uses risc-based assembly, which means there is a small set of
instructions and everything in the SPE runs in user-mode (there is no
kernel-mode for the SPE). That said we need to remember there is no
system-calls, but instead there is the PPE calls (stop instructions).
It is also a big endian architecture (keep that in mind while reading the
following sections).
This architecture provides many ways to avoid branches in the code for
maximum efficiency. Since it's not a real problem while exploiting
software, I'll just avoid to talk about and will also avoid to talk about
SIMD instructions. For more informations on that refer to the SPU
Instruction Set Architecture document [12].
---[ 3.1.2.1 - Registers
I already explained a little about the way the architecture works and in
this section I'll just include what is the available register set and how
to use it .
The SPE does not define a conditional register, so the comparison
operations will set results that are either 0 (false) or 1 (true) with the
same width as the operands been tested. This results are used to do
bitwise masking, instruction selection or conditional branching.
As any other platform, there is general purposes registers and special
purpose registers in the SPE:
- General Purpose Registers (0-127) Used in different ways by the
instructions. In the second word of R1 you have the information
about the amount of free space in the stack (the room between
end of the heap and the start of the stack).
- Special Purpose Registers
The SPE also supports 128 special-purpose registers. Some
interesting ones:
* SRR0 - Save and Restore Register 0 - Holds the address
used by the interrupt return (iret) instruction
* LR - Link Register - All branch instructions that set
the link register will force the address of the next
instruction to be loaded on this register
* CTR - Count Register - Usually it's used to hold a loop
counter (like the loop instruction and %ecx register in
intel x86 architecture)
* CR - Condition Register - Used to perform conditional
comparisons
To move data between Special Purpose Registers and General Purpose
Registers we have the instructions
* mtspr (move to special purpose register) mfspr (move from
* special purpose register)
---[ 3.1.2.2 - Local Storage Addressing Mode
In order to address information to/from Local Storage the instructions
uses the following structure:
Instruction_Opcode l10_field RA_field RT_field
8-bit 10-bit 7-bit 7-bit
Where: The signed value of the l10 field is appended with 4 zeros and then
added to the preferred slot in the RA, forcing the 4-rightmost bits of the
sum to zero. After, the 16 bytes of the local storage address are
inserted in the RT field.
Preferred slot for the architecture point of view are the leftmost
4 bytes (not bits).
Important to note here that the IBM convention specifies that:
l10 means a 10-bit immediate value
RA means a general purpose register to be used as
source/destination
RT means a general purpose register to be used as destination
(target)
Knowing that makes it easier to understand why the Local Storage Address
Space is limited to 4 GB.
The actual size of the Local Storage can be viewed accessing the LSLR
(local storage limit register). All effective address are ANDed with the
value in the LSLR before used.
---[ 3.1.2.3 - External Devices
The SPU can send/receive data to/from external devices using the channel
interface. The channel instructions uses quadwords (128bits) to transfer
data to/from general purpose registers and the channel device (which
supports 128 channels).
---[ 3.1.2.4 - Instruction Set
Here are some useful instructions to be used while developing a shellcode
for the SPE.
Instruction Operands Description
Sample
-------------------------------------------------------------------------
lqd (load quadword) rt,symbol(ra) load a value (16 bytes)
from Local Storage (pointed by RA to the general purpose register RT)
lqd $0, 16($1)
stqd (store quadword) rt,symbol(ra) the contents of the
register (RT) are stored at the local storage address pointed by RA
stqd $0, 16($1)
ilh (immediate load halfword) rt,symbol the value of l16 is placed
in register RT
ilh $0, 0x1a0
il (immediate load word) rt, symbol the value of l16 is
expanded to 32bits replicating the leftmost bit and then written to the RT
il $0, 0x1a0
nop (no operation) rt this instruction uses a
false RT and nothing is changed
nop $127
ila (immediate load address) rt, symbol the value of the l18 is
placed in the rightmost 18bits of RT (the remaining bits of RT are zeroed)
ila $3, 0x340
a (add word) rt,ra,rb the operand on register ra
is added to the operand on register rb and the result is written to RT
a $0, $1, $2
ai (add word immediate) rt,ra,value the value (l10 field) is
added to the operand in ra and the result written to RT
ai $1, $1, -32
brsl (branch relative and set link) rt,symbol execution proceeds to the
target instruction and a link register is set (the symbol is a l16 type
and it is extended to the rigth with two 0 bits) - The address of the
current instruction is added to the symbol address for the branch. The
address of the next instruction is written to the preferred byte of the RT
register.
brsl $0, 0x1a0
fsmbi (form select mask for bytes immediate) rt,symbol the symbol is a
l16 value used to create a mask in the register RT copying eight times
each bit. Bits in the operand are related to bytes in the result in a
left-to-right correspondence. fsmbi $3, 0
bi (branch indirect) ra execution proceeds to the
preferred slot of RA. The right two bits in the RA are ignored (supposed
to be zero). There is two flags, D and E to disable and Enable
interrupts.
bi $0
---[ 3.1.3 - Exploiting Software Vulnerabilities in SPE
First of all it's important to make it even more clear that it is
impossible to, for example, force the SPE process to execute a new command
(a.k.a. execve() shellcodes). The same happens for network-based library
functions and others, as already explained we need the PPE to proxy that
for us.
So it open two new paths:
- Create a PPE shellcode to be used while exploiting PPE software
vulnerabilities that will spawn a proxy for commands received by
the SPE and will create a SPE thread to do all the job -> This
is pure PPC shellcode and this article already discussed
everything needed to achieve that. In the attached sources you
have samples in the directory cell-ppe/ [16].
- Create a vulnerability specific code for the SPE, that will
print out internal program information related to the exploited
SPE. This is specially interesting and difficult because:
* Need to remember that the SPE uses instruction-cache, so
sometimes if you overflow just a small amount of bytes,
it will be specially difficult to get it executed
* If you use the wrap-around characteristics of the memory
layout for the SPE, you will probably overwrite also the
information you are interested in.
In the other hand, it's important to say that everything the information
will be in the same place (or easier to understand: there is no ASLR in
the SPE). Running the attached samples (specially the SPE-SPE
communications because it's printing the pointers addresses will make it
clear to the reader).
---[ 3.1.3.1 - Avoiding Null Bytes
It is important to avoid null bytes, so we cannot use the NOP instruction
in our shellcode.
This creates a problem, since the ori instruction will also generate null
byte if used with 0 as an argument (e.g: ori $1, $1, 0).
A good replacement is the instruction or (e.g: or $1, $1, $1) or the usage
of multiple instructions (which will reduce the probability of your return
address).
---[ 3.1.4 - Finding software vulnerabilities on SPE
The simulator provided by IBM has a feature that monitors selected
addresses or regions on the Local Store for read and write accesses. This
feature can help identify stack overflows conditions.o
Invoked from the simulator command windows as follows:
enable_stack_checking [spu_number] [spu_executable_filename]
This procedure uses the nm system utility to determine the area of the
Local Storage that will contain the program code and creates trigger
functions to trap writes by the SPU into this region.
Important to notice that this approach are just looking for writes in the
text and static data and not to the heap. Of course the same approach
used by this feature could be used to help the creation of a fuzzer using
TCL scripts based on the one provided.
------[ 4 - Future and other uses
I can't foresee the future, but this kind of architectures are becoming
more and more common and will open a wide range of new vulnerabilities.
The complexity behind this kind of asymmetric multi-threaded architectures
are even higher than the normal ones. The lack of memory protection will
help also the attackers on how to subvert those systems. The main
processor been based on an already well-known architecture (powerpc) also
helps the dissemination of malicious codes.
Many other researchers are doing stuff using Cell:
- Nick Breese presented on Crackstation project in BlackHat [5]
Basically he used the SIMD capabilities and big registers
provided by the architecture to crack passwords [5]
- IBM Researchers released a study about the usage of the Cell SPU
as a Garbage Collector Co-processor [14]
- Maybe there is JTAG-based interfaces on the cell machines to try
to use RiscWatch [15]
- Unfortunelly the SPU access are controlled by the PPE so run
integrity protection mechanisms from SPU seens infeasible ->
Anyway, I wrote a network traffic analyzer using cell as base
architechture.
------[ 5 - Acknowledgments
A lot of people helped me in the long way for these researches that
resulted in something funny to be published, you all know who you are.
Special thanks to the Phrack Staff for the great review of the article,
giving a lot of important insights about how to better structure it and
giving a real value to it.
I always need to thanks to Filipe Balestra, my research partner, for
sharing with me his ideas, feedbacks, comments and experiences improving a
lot the article and the samples.
I'll never ever forget to say thanks to my research team and friends at
RISE Security (http://www.risesecurity.org) for always keeping me
motivated studying completely new things. Be sure that the unix-asm [16]
project will be updated soon with all the stuff showed here and much more
different types of shellcodes for the architecture. Also, of course the
updates will be available for Metasploit.
Big thanks to the Cell Kernel guru, Andre Detsch for sharing with me his
ideas and discussing the internals of the Linux implementation for Cell.
Conference organizers who invited me to talk about Cell Software
Exploitation, even after many people already talked about Cell they
trusted that my talk was not about brute-forcing (yeah, a lot of fun in
completely different cultures).
To my girlfriend who waited for me (alone, I suppose) during this travels.
It's impossible to not say thanks to COSEINC, for let me keep doing this
research using important company time.
------[ 6 - References
[1] Cell Broadband Engine Architecture, v1.01 October 2006
http://cell.scei.co.jp/pdf/CBE_Architecture_v101.pdf
[2] Sony Computer Entertainment
http://www.sony.com
[3] Toshiba Corporation
http://www.toshiba.com
[4] IBM Corporation
http://www.ibm.com
[5] Breese, Nick; "Crackstation"; Black Hat Europe 2008
http://www.blackhat.com/presentations/bh-europe08/Bresse/Presentation/bh-eu-08-breese.pdf
[6] IBM Power Architecture
http://www-03.ibm.com/chips/power/
[7] IBM Bladecenter QS21
http://www.ibm.com/systems/bladecenter/hardware/servers/qs21/index.html
[8] IBM Roadrunner Supercomputer
http://en.wikipedia.org/wiki/IBM_Roadrunner
[9] The cell project at IBM Research
http://www.research.ibm.com/cell/
[10] Cell Simulator
http://www.alphaworks.ibm.com/tech/cellsystemsim
[11] Cell resource center at developerWorks (SDK download)
http://www-128.ibm.com/developerworks/power/cell/
[12] Synergistic Processor Unit Instruction Set Architecture v1.2
http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/76CA6C7304210F3987257060006F2C44/$file/SPU_ISA_v1.2_27Jan2007_pub.pdf
[13] Moore, H.D; "Mac OS X PPC Shellcode Tricks"; Uninformed Magazine 2005
http://www.uninformed.org/?v=1&a=1&t=txt
[14] Cher, Chen-Yong; Gschwind, Michael; "Cell GC: Using the Cell Synergistic Processor as a Garbage Collector Coprocessor"; 2008
http://www.research.ibm.com/cell/papers/2008_vee_cellgc_slides.pdf
[15] RISCWatch Debugger
http://www.ibm.com/chips/techlib/techlib.nsf/products/RISCWatch_Debugger
[16] Carvalho, Ramon de; "Cell PPE Shellcodes"; RISE Security;
http://www.risesecurity.org/papers/lopbuffer.pdf
Others:
PowerPC User Instruction Set Architecture, Book I, v2.02 January 2005
http://moss.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/arch/PPC_Vers202_Book1_public.pdf
PowerPC Virtual Environment Architecture, Book II, v2.02 January 2005
http://moss.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/arch/PPC_Vers202_Book2_public.pdf
PowerPC Operating Environment Architecture, Book III, v2.02 January 2005
http://moss.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/arch/PPC_Vers202_Book3_public.pdf
Cell developer's corner at power.org
http://www.power.org/resources/devcorner/cellcorner/
Linux info at the Barcelona Supercomputing Center website
http://www.bsc.es/projects/deepcomputing/linuxoncell
------[ 7 - Notes on SDK/Simulator Environment
There is some pictures on the simulator and sdk running on the attached file:
images/cell-sim1.jpg and images/cell-sim2.jpg
To install the SDK/Simulator, do:
- Download the Cell SDK ISO image from the IBM alphaWorks website.
- Mount the disk image on the mount directory: mount -o loop
CellSDK<version>.iso /mnt/phrack
- Change directory to /mnt/phrack/software:
- Install the SDK by using the following command and answer any
prompts: ./cellsdk install
To start the simulator: cd /opt/IBM/systemsim-cell/run/cell/linux
../run_gui Click on the 'go' button to start the simulated system
To copy files to the simulated system (inside it run):
callthru source /home/bsdaemon/Phrack/hello_ppu > hello_ppu
Then give the correct permissions and execute:
chmod +x hello_ppu
./hello_ppu
------[ 8 - Sources [cell_samples.tgz]
Attached all the samples used on this article to be compiled in a Linux
running on Cell machine.
Further updates will be available in the RISE Security website at:
http://www.risesecurity.org
For the author's public key:
http://www.kernelhacking.com/rodrigo/docs/public.txt
begin 644 cell_samples.tgz
end
--------{ EOF
Ok, I will use the random copy paste thread now.
Found somewhere on the internet:
Much like manatees being mistaken for mermaids, this has some basis in fact. The way that it seems to work is that an incompetent person sees a teen perform some miracle of computing - changing system settings, troubleshooting a common network problem, etc. So, since this kid was able to outdo them, the kid must be A) Brilliant (so he could figure it out when they couldn't) B) Highly skilled (because admitting that anyone with a reasonable amount of knowledge could do it would be admitting to ignorance). So then this person tells a friend about this 'genius', which is combined in the friends mind with tales of 'hackers' who can break into computers. And there you have it - the teenage genius hacker.
Found somewhere on the internet:
Much like manatees being mistaken for mermaids, this has some basis in fact. The way that it seems to work is that an incompetent person sees a teen perform some miracle of computing - changing system settings, troubleshooting a common network problem, etc. So, since this kid was able to outdo them, the kid must be A) Brilliant (so he could figure it out when they couldn't) B) Highly skilled (because admitting that anyone with a reasonable amount of knowledge could do it would be admitting to ignorance). So then this person tells a friend about this 'genius', which is combined in the friends mind with tales of 'hackers' who can break into computers. And there you have it - the teenage genius hacker.
; c64 tiny 303 driver - 4mat/orb
; compiles with c64asm
* = $1000
; init music
sei
lda #$00
tay
ldx #$fd
argha
sta $02,x
dex
bne argha
setsound
lda #$41
sta 54276,x
lda #$08
sta 54275,x
lda #$00
sta 54277,x
lda vols,y
sta 54278,x
lda datas,y
sta $6b,y
lda siddata,y
sta 54295,y
iny
clc
txa
adc #$07
tax
cpx #$15
bne setsound
lda #<musicloop
sta $0314
lda #>musicloop
sta $0315
cli
loop
jmp loop
; play music
musicloop dec $60
bpl updatedrums
lda #$06
sta $60
ldy $61
lda $e1b5,y
and #$0f
sta 54273
lda $e1b2,y
and #$0f
sta 54273+7
noresetsq
ldx #$ff
drumcheck
lda beat,y
and btt,x
beq nobit
lda wav,x
sta 54276+14
sta $66
lda not,x
sta $67
lda plu,x
sta $68
lda del,x
sta $69
nobit dex
bpl drumcheck
iny
tya
and #$0f
sta $61
bne noupdate
dec $6b
bpl noupdate
lda #$17
sta filtsweep+$01
lda #$a0
sta noupdate+$01
dec noresetsq+$01
lda noresetsq+$01
and #$03
sta noresetsq+$01
tax
lda basswaves,x
sta 54276
lda length,x
sta $6b
lsr
sta length,x
dec $6c
lda $6c
bpl noupdate
jmp 64738
noupdate
lda #$2f
sta $62
updatedrums
nogate lda $60
cmp $69
bne nodrumgate
dec $66
lda $66
sta 54276+14
nodrumgate
clc
lda $67
adc $68
sta $67
sta 54273+14
lda $62
sta 54294
filtsweep sbc #$00
sta $62
jmp $ea31
; music data
datas .byte $05,$07
siddata .byte $53,$1f,$00
basswaves .byte $11,$51,$21,$41
length .byte $01,$03,$07,$03
vols .byte $c9,$49,$d5
btt .byte $40,$02,$08,$00
wav .byte $41,$81,$81
not .byte $0a,$ff,$20
plu .byte $ff,$fe,$fc
del .byte $03,$02,$01
beat .byte $40,$01,$02,$01,$08,$01,$02,$40,$40,$01,$02,$01,$08,$01,$02,$01
; compiles with c64asm
* = $1000
; init music
sei
lda #$00
tay
ldx #$fd
argha
sta $02,x
dex
bne argha
setsound
lda #$41
sta 54276,x
lda #$08
sta 54275,x
lda #$00
sta 54277,x
lda vols,y
sta 54278,x
lda datas,y
sta $6b,y
lda siddata,y
sta 54295,y
iny
clc
txa
adc #$07
tax
cpx #$15
bne setsound
lda #<musicloop
sta $0314
lda #>musicloop
sta $0315
cli
loop
jmp loop
; play music
musicloop dec $60
bpl updatedrums
lda #$06
sta $60
ldy $61
lda $e1b5,y
and #$0f
sta 54273
lda $e1b2,y
and #$0f
sta 54273+7
noresetsq
ldx #$ff
drumcheck
lda beat,y
and btt,x
beq nobit
lda wav,x
sta 54276+14
sta $66
lda not,x
sta $67
lda plu,x
sta $68
lda del,x
sta $69
nobit dex
bpl drumcheck
iny
tya
and #$0f
sta $61
bne noupdate
dec $6b
bpl noupdate
lda #$17
sta filtsweep+$01
lda #$a0
sta noupdate+$01
dec noresetsq+$01
lda noresetsq+$01
and #$03
sta noresetsq+$01
tax
lda basswaves,x
sta 54276
lda length,x
sta $6b
lsr
sta length,x
dec $6c
lda $6c
bpl noupdate
jmp 64738
noupdate
lda #$2f
sta $62
updatedrums
nogate lda $60
cmp $69
bne nodrumgate
dec $66
lda $66
sta 54276+14
nodrumgate
clc
lda $67
adc $68
sta $67
sta 54273+14
lda $62
sta 54294
filtsweep sbc #$00
sta $62
jmp $ea31
; music data
datas .byte $05,$07
siddata .byte $53,$1f,$00
basswaves .byte $11,$51,$21,$41
length .byte $01,$03,$07,$03
vols .byte $c9,$49,$d5
btt .byte $40,$02,$08,$00
wav .byte $41,$81,$81
not .byte $0a,$ff,$20
plu .byte $ff,$fe,$fc
del .byte $03,$02,$01
beat .byte $40,$01,$02,$01,$08,$01,$02,$40,$40,$01,$02,$01,$08,$01,$02,$01