make.conf

Jan Dušátko jan at dusatko.org
Thu Jun 27 01:48:12 CEST 2013


>
> make.conf je obecne urcen spis pro nastavovani globalnich parametru,
nikoliv veci prilis parametrizovanych a tudiz v ruznych situacich ruznych.
>
> Nicmene, pri prekladu kernelu a modulu se nepouzije promenna CFLAGS nybrz
COPTFLAGS a pokud je soucasne nadefinovana promenna NO_CPU_COPTFLAGS tak se
k COPTFLAGS automaticky nepridaji nastaveni pro konkretni procesor zalozene
na architekture (a muzes respektive musis si je tam tedy dat sam). Tim se
otevira moznost mit pro preklad kernelu a modulu separatni nastaveni flagu,
ktere das, vcetne nastaveni pro procesor, do COPTFLAGS, zatimco flagy pro
preklad ostatnich veci se nastavi beznym zpusobem
>
> Tohle cele se ale tyka jen prekladu C/CPP zdrojaku. Assemblerovy kod a
jeho preklad nastaveni CFLAGS ani COPTFLAGS neovlivni. A ani jakekoliv jin
enastaveni arcgitektury nebo neceho jineho. Assemblerovske zdrojaky se
proste prekladaji bez moznosti ovlivnit optiony s jakymi se to bude delat.
>
> Kompilator samotny pak urcuje promenna CC kterou si pro preklad nastav
vzdy na ten kompilator, ktery je podle tebe v dany chvili potreba.
>
> > Pripadne, mate zkusenost s kompilaci kernelu pod gcc 4.9 ?
>
> Ne, ale pamatuju si, ze nekde v handbooku ci kde je pouziti vlastnich
nastaveni optimalizace pri prekladu jadra povazovano za neco co delas "na
vlastni nebezpeci". Muze dojit ke vzniku race-condition zpusobenych
nevhodnou optimalizaci pri prekladu a jadro pak muze nahodne padat ci
vykazovat jine "podivne" chovani.
>
> Takze do tohoto dobrodruzstvi jsem se nikdy nepustil.

Ahoj,
po nejakem experimentovani jsem dospel prozatim k ~manualnimu prepinani. Mam
dva stroje, jeden s Atom D525, druhy s I7 (vypis viz nize). Pokousel jsem se
vytvorit nejakou rozumnou optimalizaci jadra, ktera by mi umoznila aktivovat
nektere rozsirene instrukcni sady a zvysit eventuelne vykon. Mozna se to
nekomu z vas bude hodit, kazdopadne by mne zajimaly vase napady.
Jak mne Dan Lukes varoval, muze dojit k problemum s kompatibilitou kompileru
a jadra, ktera finalne muze skoncit az nefunkcnosti system - to je zivot.
Kazdopadne stale nemam doreseno jak automaticky prepinat flagy (nejake .if
nastaveni), maximalne scriptovat. OS je FreeBSD 9.1

Pro kompilaci v userlandu jsem pouzival gcc49 a informace o nastaveni
CPUTYPE je prevzato z
http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html. Kompiler
pro jadro je 4.2.1, ktery je odpovidajici pro zachovani urcite bezpecnosti,
casovani a dalsich zalezitosti, ale kazda mince ma dve strany, v tomto
pripade omezena podpora novejsich instrukcnich sad.

Flagy pro GCC
CC=                             /usr/local/bin/gcc49
CXX=                            /usr/local/bin/g++49
CPP=                            /usr/local/bin/cpp49

Narazil jsem na problem s kompilaci nekterych balicku, ktere v pripade
pouziti jineho kompileru nez systemoveho proste zhavaruji (neprojde ani
config), nebo balicku vyzadujicich systemovy kompiler a nastaveni
odpovidajiciho CPUTYPE. Takze mam hruby postup - zkompilovat s optimalizaci
pro CPU, pokud neprojde zkompilovat s definici pro kernel, pokud neprojde
vypnout GCC a pokud neprojde stahnout z portu. To lze scriptovat, je to
necestne a nesportovni, ale zatim to funguje.

CPUTYPE pro D525
#userland
CPUTYPE?=			atom
#kernel, world a nektere balicky
CPUTYPE?=                      nocona

CPUTYPE pro i7
#userland
CPUTYPE?=                       corei7-avx
#kernel, world a nektere balicky
CPUTYPE?=                      core2

Default flagy. Puvodne jsem premyslel nad vyuzitim funnkcionality prepinace
-march, ale zase - zlobila spousta portu, neresilo to problem volby
kompileru pro jadro a porty

CFLAGS=                         -O2 -pipe -fno-strict-aliasing
COPTFLAGS=                      -O2 -pipe -funroll-loops -ffast-math
-fno-strict-aliasing

Zatim jsem si delal jenom hrube testy, kazdopadne vyuziti funkcionality i7
ma rozhodne smysl pro VPN site a sifrovani v AES-CBC modu, rozdil je dost
vyrazny. Kompilaci si ovsem nepomohu, dulezitejsi je nahrat modul aesni
(samozrejmne pouze na i5, i7 nebo novejsich) bud pres kldload nebo v:
/boot/loader.conf
	aesni_load="YES"
Jinak, zaznamenal jsem zmenou kompilace pro jiny typ CPU obecne snizeni
reakcnich casu pod zatezi (napr. kompilace vsech portu mi dobehne o zhruba
10-15% rychleji). Co se tyka Atomu, nezaznamenal jsem nejaky rozdil, takze
zustanu u kompilace jadra pro nocona.
Jedine, co bohuzel nedokazu zmerit je stabilita a bezpecnost, to ze mi to
funguje neznamena, ze je vse v poradku. Jak mi kdysi nekdo rekl: "Uz pro ten
krasny vlhky pocit, ze to mam o 0.0001% rychlejsi ...."

# openssl engine -c -tt
(cryptodev) BSD cryptodev engine
 [RSA, DSA, DH, AES-128-CBC]
     [ available ]
(dynamic) Dynamic engine loading support
     [ unavailable ]

# openssl speed aes-128-cbc
To get the most accurate results, try to run this
program when this computer is idle.
Doing aes-128 cbc for 3s on 16 size blocks: 21866431 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 5708626 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 1435293 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 361581 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 45242 aes-128 cbc's in 3.00s
OpenSSL 0.9.8y 5 Feb 2013
built on: date not available
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial)
blowfish(idx)
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
bytes
aes-128 cbc     116507.29k   121780.37k   122464.86k   123397.48k
123518.22k

Prikladam vypis CPUID
# cpuid
Vendor ID: "GenuineIntel"; CPUID level 2

Intel-specific functions:
Version 000106ca:
Type 0 - Original OEM
Family 6 - Pentium Pro
Model 28 - Intel Atom processor, 45nm
Stepping 10
Reserved 0

Extended brand string: "         Intel(R) Atom(TM) CPU D525   @ 1.80GHz"
CLFLUSH instruction cache line size: 8
Initial APIC ID: 2
Hyper threading siblings: 4

Feature flags: bfebfbff:
FPU    Floating Point Unit
VME    Virtual 8086 Mode Enhancements
DE     Debugging Extensions
PSE    Page Size Extensions
TSC    Time Stamp Counter
MSR    Model Specific Registers
PAE    Physical Address Extension
MCE    Machine Check Exception
CX8    COMPXCHG8B Instruction
APIC   On-chip Advanced Programmable Interrupt Controller present and
enabled
SEP    Fast System Call
MTRR   Memory Type Range Registers
PGE    PTE Global Flag
MCA    Machine Check Architecture
CMOV   Conditional Move and Compare Instructions
FGPAT  Page Attribute Table
PSE-36 36-bit Page Size Extension
CLFSH  CFLUSH instruction
DS     Debug store
ACPI   Thermal Monitor and Clock Ctrl
MMX    MMX instruction set
FXSR   Fast FP/MMX Streaming SIMD Extensions save/restore
SSE    Streaming SIMD Extensions instruction set
SSE2   SSE2 extensions
SS     Self Snoop
HT     Hyper Threading
TM     Thermal monitor
31     Pending Break Enable

Feature flags set 2: 0040e31d:
SSE3     SSE3 extensions
DTES64   64-bit debug store
MONITOR  MONITOR/MWAIT instructions
DS-CPL   CPL Qualified Debug Store
TM2      Thermal Monitor 2
SSSE3    Supplemental Streaming SIMD Extension 3
CX16     CMPXCHG16B
xTPR     Send Task Priority messages
PDCM     Perfmon and debug capability
MOVBE    MOVBE instruction

Extended feature flags: 20100000:
XD-bit    Execution Disable bit
EM64T     Intel Extended Memory 64 Technology

Extended feature flags set 2: 00000001:
LAHF      LAHF/SAHF available in IA-32e mode

TLB and cache info:
59: unknown TLB/cache descriptor
ba: unknown TLB/cache descriptor
4f: unknown TLB/cache descriptor
c0: unknown TLB/cache descriptor
80: unknown TLB/cache descriptor
30: 1st-level instruction cache: 32-KB, 8-way set associative, 64-byte line
size
0e: unknown TLB/cache descriptor

# cpuid
Vendor ID: "GenuineIntel"; CPUID level 13

Intel-specific functions:
Version 000306a9:
Type 0 - Original OEM
Family 6 - Pentium Pro
Model 58 -
Stepping 9
Reserved 0

Extended brand string: "      Intel(R) Core(TM) i7-3612QE CPU @ 2.10GHz"
CLFLUSH instruction cache line size: 8
Initial APIC ID: 3
Hyper threading siblings: 16

Feature flags: bfebfbff:
FPU    Floating Point Unit
VME    Virtual 8086 Mode Enhancements
DE     Debugging Extensions
PSE    Page Size Extensions
TSC    Time Stamp Counter
MSR    Model Specific Registers
PAE    Physical Address Extension
MCE    Machine Check Exception
CX8    COMPXCHG8B Instruction
APIC   On-chip Advanced Programmable Interrupt Controller present and
enabled
SEP    Fast System Call
MTRR   Memory Type Range Registers
PGE    PTE Global Flag
MCA    Machine Check Architecture
CMOV   Conditional Move and Compare Instructions
FGPAT  Page Attribute Table
PSE-36 36-bit Page Size Extension
CLFSH  CFLUSH instruction
DS     Debug store
ACPI   Thermal Monitor and Clock Ctrl
MMX    MMX instruction set
FXSR   Fast FP/MMX Streaming SIMD Extensions save/restore
SSE    Streaming SIMD Extensions instruction set
SSE2   SSE2 extensions
SS     Self Snoop
HT     Hyper Threading
TM     Thermal monitor
31     Pending Break Enable

Feature flags set 2: 7fbae3ff:
SSE3     SSE3 extensions
PCLMULDQ PCLMULDQ instruction
DTES64   64-bit debug store
MONITOR  MONITOR/MWAIT instructions
DS-CPL   CPL Qualified Debug Store
VMX      Virtual Machine Extensions
SMX      Safer Mode Extension
EST      Enhanced Intel SpeedStep Technology
TM2      Thermal Monitor 2
SSSE3    Supplemental Streaming SIMD Extension 3
CX16     CMPXCHG16B
xTPR     Send Task Priority messages
PDCM     Perfmon and debug capability
17 - unknown feature
SSE4.1   Streaming SIMD Extension 4.1
SSE4.2   Streaming SIMD Extension 4.2
x2APIC   Extended xAPIC support
POPCNT   POPCNT instruction
24 - unknown feature
AESNI    AES Instruction set
XSAVE    XSAVE/XSTOR states
OSXSAVE  OS-enabled extended state managerment
AVX      AVX extensions
29 - unknown feature
30 - unknown feature

Extended feature flags: 28100800:
SYSCALL   SYSCALL/SYSRET instructions
XD-bit    Execution Disable bit
RDTSCP    RDTSCP and IA32_TSC_AUX are available
EM64T     Intel Extended Memory 64 Technology

Extended feature flags set 2: 00000001:
LAHF      LAHF/SAHF available in IA-32e mode

TLB and cache info:
5a: Data TLB: 2MB or 4MB pages, 4-way set associative, 32 entries
03: Data TLB: 4KB pages, 4-way set assoc, 64 entries
76: unknown TLB/cache descriptor
ff: unknown TLB/cache descriptor
b2: Instruction TLB: 4-KB Pages, 4-way set associative, 64 entries
f0: 64-byte prefetching
ca: Shared 2nd-level TLB: 4-KB Pages, 4-way set associative, 512 entries
Processor serial: 0000-0000-0000-0000-0000-0000




More information about the Users-l mailing list