make.conf
Jan Dušátko
jan at dusatko.org
Thu Jun 27 01:48:12 CEST 2013
>
> make.conf je obecne urcen spis pro nastavovani globalnich parametru,
nikoliv veci prilis parametrizovanych a tudiz v ruznych situacich ruznych.
>
> Nicmene, pri prekladu kernelu a modulu se nepouzije promenna CFLAGS nybrz
COPTFLAGS a pokud je soucasne nadefinovana promenna NO_CPU_COPTFLAGS tak se
k COPTFLAGS automaticky nepridaji nastaveni pro konkretni procesor zalozene
na architekture (a muzes respektive musis si je tam tedy dat sam). Tim se
otevira moznost mit pro preklad kernelu a modulu separatni nastaveni flagu,
ktere das, vcetne nastaveni pro procesor, do COPTFLAGS, zatimco flagy pro
preklad ostatnich veci se nastavi beznym zpusobem
>
> Tohle cele se ale tyka jen prekladu C/CPP zdrojaku. Assemblerovy kod a
jeho preklad nastaveni CFLAGS ani COPTFLAGS neovlivni. A ani jakekoliv jin
enastaveni arcgitektury nebo neceho jineho. Assemblerovske zdrojaky se
proste prekladaji bez moznosti ovlivnit optiony s jakymi se to bude delat.
>
> Kompilator samotny pak urcuje promenna CC kterou si pro preklad nastav
vzdy na ten kompilator, ktery je podle tebe v dany chvili potreba.
>
> > Pripadne, mate zkusenost s kompilaci kernelu pod gcc 4.9 ?
>
> Ne, ale pamatuju si, ze nekde v handbooku ci kde je pouziti vlastnich
nastaveni optimalizace pri prekladu jadra povazovano za neco co delas "na
vlastni nebezpeci". Muze dojit ke vzniku race-condition zpusobenych
nevhodnou optimalizaci pri prekladu a jadro pak muze nahodne padat ci
vykazovat jine "podivne" chovani.
>
> Takze do tohoto dobrodruzstvi jsem se nikdy nepustil.
Ahoj,
po nejakem experimentovani jsem dospel prozatim k ~manualnimu prepinani. Mam
dva stroje, jeden s Atom D525, druhy s I7 (vypis viz nize). Pokousel jsem se
vytvorit nejakou rozumnou optimalizaci jadra, ktera by mi umoznila aktivovat
nektere rozsirene instrukcni sady a zvysit eventuelne vykon. Mozna se to
nekomu z vas bude hodit, kazdopadne by mne zajimaly vase napady.
Jak mne Dan Lukes varoval, muze dojit k problemum s kompatibilitou kompileru
a jadra, ktera finalne muze skoncit az nefunkcnosti system - to je zivot.
Kazdopadne stale nemam doreseno jak automaticky prepinat flagy (nejake .if
nastaveni), maximalne scriptovat. OS je FreeBSD 9.1
Pro kompilaci v userlandu jsem pouzival gcc49 a informace o nastaveni
CPUTYPE je prevzato z
http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html. Kompiler
pro jadro je 4.2.1, ktery je odpovidajici pro zachovani urcite bezpecnosti,
casovani a dalsich zalezitosti, ale kazda mince ma dve strany, v tomto
pripade omezena podpora novejsich instrukcnich sad.
Flagy pro GCC
CC= /usr/local/bin/gcc49
CXX= /usr/local/bin/g++49
CPP= /usr/local/bin/cpp49
Narazil jsem na problem s kompilaci nekterych balicku, ktere v pripade
pouziti jineho kompileru nez systemoveho proste zhavaruji (neprojde ani
config), nebo balicku vyzadujicich systemovy kompiler a nastaveni
odpovidajiciho CPUTYPE. Takze mam hruby postup - zkompilovat s optimalizaci
pro CPU, pokud neprojde zkompilovat s definici pro kernel, pokud neprojde
vypnout GCC a pokud neprojde stahnout z portu. To lze scriptovat, je to
necestne a nesportovni, ale zatim to funguje.
CPUTYPE pro D525
#userland
CPUTYPE?= atom
#kernel, world a nektere balicky
CPUTYPE?= nocona
CPUTYPE pro i7
#userland
CPUTYPE?= corei7-avx
#kernel, world a nektere balicky
CPUTYPE?= core2
Default flagy. Puvodne jsem premyslel nad vyuzitim funnkcionality prepinace
-march, ale zase - zlobila spousta portu, neresilo to problem volby
kompileru pro jadro a porty
CFLAGS= -O2 -pipe -fno-strict-aliasing
COPTFLAGS= -O2 -pipe -funroll-loops -ffast-math
-fno-strict-aliasing
Zatim jsem si delal jenom hrube testy, kazdopadne vyuziti funkcionality i7
ma rozhodne smysl pro VPN site a sifrovani v AES-CBC modu, rozdil je dost
vyrazny. Kompilaci si ovsem nepomohu, dulezitejsi je nahrat modul aesni
(samozrejmne pouze na i5, i7 nebo novejsich) bud pres kldload nebo v:
/boot/loader.conf
aesni_load="YES"
Jinak, zaznamenal jsem zmenou kompilace pro jiny typ CPU obecne snizeni
reakcnich casu pod zatezi (napr. kompilace vsech portu mi dobehne o zhruba
10-15% rychleji). Co se tyka Atomu, nezaznamenal jsem nejaky rozdil, takze
zustanu u kompilace jadra pro nocona.
Jedine, co bohuzel nedokazu zmerit je stabilita a bezpecnost, to ze mi to
funguje neznamena, ze je vse v poradku. Jak mi kdysi nekdo rekl: "Uz pro ten
krasny vlhky pocit, ze to mam o 0.0001% rychlejsi ...."
# openssl engine -c -tt
(cryptodev) BSD cryptodev engine
[RSA, DSA, DH, AES-128-CBC]
[ available ]
(dynamic) Dynamic engine loading support
[ unavailable ]
# openssl speed aes-128-cbc
To get the most accurate results, try to run this
program when this computer is idle.
Doing aes-128 cbc for 3s on 16 size blocks: 21866431 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 5708626 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 1435293 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 361581 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 45242 aes-128 cbc's in 3.00s
OpenSSL 0.9.8y 5 Feb 2013
built on: date not available
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial)
blowfish(idx)
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192
bytes
aes-128 cbc 116507.29k 121780.37k 122464.86k 123397.48k
123518.22k
Prikladam vypis CPUID
# cpuid
Vendor ID: "GenuineIntel"; CPUID level 2
Intel-specific functions:
Version 000106ca:
Type 0 - Original OEM
Family 6 - Pentium Pro
Model 28 - Intel Atom processor, 45nm
Stepping 10
Reserved 0
Extended brand string: " Intel(R) Atom(TM) CPU D525 @ 1.80GHz"
CLFLUSH instruction cache line size: 8
Initial APIC ID: 2
Hyper threading siblings: 4
Feature flags: bfebfbff:
FPU Floating Point Unit
VME Virtual 8086 Mode Enhancements
DE Debugging Extensions
PSE Page Size Extensions
TSC Time Stamp Counter
MSR Model Specific Registers
PAE Physical Address Extension
MCE Machine Check Exception
CX8 COMPXCHG8B Instruction
APIC On-chip Advanced Programmable Interrupt Controller present and
enabled
SEP Fast System Call
MTRR Memory Type Range Registers
PGE PTE Global Flag
MCA Machine Check Architecture
CMOV Conditional Move and Compare Instructions
FGPAT Page Attribute Table
PSE-36 36-bit Page Size Extension
CLFSH CFLUSH instruction
DS Debug store
ACPI Thermal Monitor and Clock Ctrl
MMX MMX instruction set
FXSR Fast FP/MMX Streaming SIMD Extensions save/restore
SSE Streaming SIMD Extensions instruction set
SSE2 SSE2 extensions
SS Self Snoop
HT Hyper Threading
TM Thermal monitor
31 Pending Break Enable
Feature flags set 2: 0040e31d:
SSE3 SSE3 extensions
DTES64 64-bit debug store
MONITOR MONITOR/MWAIT instructions
DS-CPL CPL Qualified Debug Store
TM2 Thermal Monitor 2
SSSE3 Supplemental Streaming SIMD Extension 3
CX16 CMPXCHG16B
xTPR Send Task Priority messages
PDCM Perfmon and debug capability
MOVBE MOVBE instruction
Extended feature flags: 20100000:
XD-bit Execution Disable bit
EM64T Intel Extended Memory 64 Technology
Extended feature flags set 2: 00000001:
LAHF LAHF/SAHF available in IA-32e mode
TLB and cache info:
59: unknown TLB/cache descriptor
ba: unknown TLB/cache descriptor
4f: unknown TLB/cache descriptor
c0: unknown TLB/cache descriptor
80: unknown TLB/cache descriptor
30: 1st-level instruction cache: 32-KB, 8-way set associative, 64-byte line
size
0e: unknown TLB/cache descriptor
# cpuid
Vendor ID: "GenuineIntel"; CPUID level 13
Intel-specific functions:
Version 000306a9:
Type 0 - Original OEM
Family 6 - Pentium Pro
Model 58 -
Stepping 9
Reserved 0
Extended brand string: " Intel(R) Core(TM) i7-3612QE CPU @ 2.10GHz"
CLFLUSH instruction cache line size: 8
Initial APIC ID: 3
Hyper threading siblings: 16
Feature flags: bfebfbff:
FPU Floating Point Unit
VME Virtual 8086 Mode Enhancements
DE Debugging Extensions
PSE Page Size Extensions
TSC Time Stamp Counter
MSR Model Specific Registers
PAE Physical Address Extension
MCE Machine Check Exception
CX8 COMPXCHG8B Instruction
APIC On-chip Advanced Programmable Interrupt Controller present and
enabled
SEP Fast System Call
MTRR Memory Type Range Registers
PGE PTE Global Flag
MCA Machine Check Architecture
CMOV Conditional Move and Compare Instructions
FGPAT Page Attribute Table
PSE-36 36-bit Page Size Extension
CLFSH CFLUSH instruction
DS Debug store
ACPI Thermal Monitor and Clock Ctrl
MMX MMX instruction set
FXSR Fast FP/MMX Streaming SIMD Extensions save/restore
SSE Streaming SIMD Extensions instruction set
SSE2 SSE2 extensions
SS Self Snoop
HT Hyper Threading
TM Thermal monitor
31 Pending Break Enable
Feature flags set 2: 7fbae3ff:
SSE3 SSE3 extensions
PCLMULDQ PCLMULDQ instruction
DTES64 64-bit debug store
MONITOR MONITOR/MWAIT instructions
DS-CPL CPL Qualified Debug Store
VMX Virtual Machine Extensions
SMX Safer Mode Extension
EST Enhanced Intel SpeedStep Technology
TM2 Thermal Monitor 2
SSSE3 Supplemental Streaming SIMD Extension 3
CX16 CMPXCHG16B
xTPR Send Task Priority messages
PDCM Perfmon and debug capability
17 - unknown feature
SSE4.1 Streaming SIMD Extension 4.1
SSE4.2 Streaming SIMD Extension 4.2
x2APIC Extended xAPIC support
POPCNT POPCNT instruction
24 - unknown feature
AESNI AES Instruction set
XSAVE XSAVE/XSTOR states
OSXSAVE OS-enabled extended state managerment
AVX AVX extensions
29 - unknown feature
30 - unknown feature
Extended feature flags: 28100800:
SYSCALL SYSCALL/SYSRET instructions
XD-bit Execution Disable bit
RDTSCP RDTSCP and IA32_TSC_AUX are available
EM64T Intel Extended Memory 64 Technology
Extended feature flags set 2: 00000001:
LAHF LAHF/SAHF available in IA-32e mode
TLB and cache info:
5a: Data TLB: 2MB or 4MB pages, 4-way set associative, 32 entries
03: Data TLB: 4KB pages, 4-way set assoc, 64 entries
76: unknown TLB/cache descriptor
ff: unknown TLB/cache descriptor
b2: Instruction TLB: 4-KB Pages, 4-way set associative, 64 entries
f0: 64-byte prefetching
ca: Shared 2nd-level TLB: 4-KB Pages, 4-way set associative, 512 entries
Processor serial: 0000-0000-0000-0000-0000-0000
More information about the Users-l
mailing list