The proc filesystem

2023-01-31 02:01:23 FileSystem proc

------------------------------------------------------------------------------
CHAPTER 2: MODIFYING SYSTEM PARAMETERS
------------------------------------------------------------------------------

------------------------------------------------------------------------------
In This Chapter
------------------------------------------------------------------------------
* Modifying kernel parameters by writing into files found in /proc/sys
* Exploring the files which modify certain parameters
* Review of the /proc/sys file tree
------------------------------------------------------------------------------


A very  interesting part of /proc is the directory /proc/sys. This is not only
a source  of  infORMation,  it also allows you to change parameters within the
kernel. Be  very  careful  when attempting this. You can optimize your system,
but you  can  also  cause  it  to  crash.  Never  alter kernel parameters on a
production system.  Set  up  a  development Machine and test to make sure that
everything works  the  way  you want it to. You may have no alternative but to
reboot the machine once an error has been made.

To change  a  value,  simply  echo  the new value into the file. An example is
given below  in the section on the file system data. You need to be root to do
this. You  can  create  your  own  boot script to perform this every time your
system boots.

The files  in /proc/sys can be used to fine tune and monitor miscellaneous and
general things  in  the operation of the linux kernel. Since some of the files
can inadvertently  disrupt  your  system,  it  is  advisable  to  read  both
documentation and  source  before actually making adjustments. In any case, be
very careful  when  writing  to  any  of these files. The entries in /proc may
change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt
review the kernel documentation in the directory /usr/src/linux/Documentation.
This chapter  is  heavily  based  on the documentation included in the pre 2.2
kernels, and became part of it in version 2.2.1 of the Linux kernel.

2.1 /proc/sys/fs - File system data
-----------------------------------

This subdirectory  contains  specific  file system, file handle, inode, dentry
and quota information.

Currently, these files are in /proc/sys/fs:

dentry-state
------------

Status of  the  directory  cache.  Since  directory  entries  are  dynamically
allocated and  deallocated,  this  file indicates the current status. It holds
six values, in which the last two are not used and are always zero. The others
are listed in table 2-1.


Table 2-1: Status files of the directory cache
..............................................................................
 File       Content                                                           
 nr_dentry  Almost always zero                                                
 nr_unused  Number of unused cache entries                                    
 age_limit 
            in seconds after the entry may be reclaimed, when memory is short
 want_pages internally                                                        
..............................................................................

dquot-nr and dquot-max
----------------------

The file dquot-max shows the maximum number of cached disk quota entries.

The file  dquot-nr  shows  the  number of allocated disk quota entries and the
number of free disk quota entries.

If the number of available cached disk quotas is very low and you have a large
number of simultaneous system users, you might want to raise the limit.

file-nr and file-max
--------------------

The kernel  allocates file handles dynamically, but doesn't free them again at
this time.

The value  in  file-max  denotes  the  maximum number of file handles that the
Linux kernel will allocate. When you get a lot of error messages about running
out of  file handles, you might want to raise this limit. The default value is
10% of  RAM in kilobytes.  To  change it, just  write the new number  into the
file:

  # cat /proc/sys/fs/file-max
  4096
  # echo 8192 > /proc/sys/fs/file-max
  # cat /proc/sys/fs/file-max
  8192


This method  of  revision  is  useful  for  all customizable parameters of the
kernel - simply echo the new value to the corresponding file.

Historically, the three values in file-nr denoted the number of allocated file
handles,  the number of  allocated but  unused file  handles, and  the maximum
number of file handles. Linux 2.6 always  reports 0 as the number of free file
handles -- this  is not an error,  it just means that the  number of allocated
file handles exactly matches the number of used file handles.

Attempts to  allocate more  file descriptors than  file-max are  reported with
printk, look for "VFS: file-max limit <number> reached".

inode-state and inode-nr
------------------------

The file inode-nr contains the first two items from inode-state, so we'll skip
to that file...

inode-state contains  two  actual numbers and five dummy values. The numbers
are nr_inodes and nr_free_inodes (in order of appearance).

nr_inodes
~~~~~~~~~

Denotes the  number  of  inodes the system has allocated. This number will
grow and shrink dynamically.

nr_open
-------

Denotes the maximum number of file-handles a process can
allocate. Default value is 1024*1024 (1048576) which should be
enough for most machines. Actual limit depends on RLIMIT_NOFILE
resource limit.

nr_free_inodes
--------------

Represents the  number of free inodes. Ie. The number of inuse inodes is
(nr_inodes - nr_free_inodes).

aio-nr and aio-max-nr
---------------------

aio-nr is the running total of the number of events specified on the
io_setup system call for all currently active aio contexts.  If aio-nr
reaches aio-max-nr then io_setup will fail with EAGAIN.  Note that
raising aio-max-nr does not result in the pre-allocation or re-sizing
of any kernel data structures.

2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
-----------------------------------------------------------

Besides these  files, there is the subdirectory /proc/sys/fs/binfmt_misc. This
handles the kernel support for miscellaneous binary formats.

Binfmt_misc provides  the ability to reGISter additional binary formats to the
Kernel without  compiling  an additional module/kernel. Therefore, binfmt_misc
needs to  know magic numbers at the beginning or the filename extension of the
binary.

It works by maintaining a linked list of structs that contain a description of
a binary  format,  including  a  magic  with size (or the filename extension),
offset and  mask,  and  the  interpreter name. On request it invokes the given
interpreter with  the  original  program  as  argument,  as  binfmt_java  and
binfmt_em86 and  binfmt_mz  do.  Since binfmt_misc does not define any default
binary-formats, you have to register an additional binary-format.

There are two general files in binfmt_misc and one file per registered format.
The two general files are register and status.

Registering a new binary format
-------------------------------

To register a new binary format you have to issue the command

  echo :name:type:offset:magic:mask:interpreter: > /proc/sys/fs/binfmt_misc/register



with appropriate  name (the name for the /proc-dir entry), offset (defaults to
0, if  omitted),  magic, mask (which can be omitted, defaults to all 0xff) and
last but  not  least,  the  interpreter that is to be invoked (for example and
testing /bin/echo).  Type  can be M for usual magic matching or E for filename
extension matching (give extension in place of magic).

Check or reset the status of the binary format handler
------------------------------------------------------

If you  do a cat on the file /proc/sys/fs/binfmt_misc/status, you will get the
current status (enabled/disabled) of binfmt_misc. Change the status by echoing
0 (disables)  or  1  (enables)  or  -1  (caution:  this  clears all previously
registered binary  formats)  to status. For example echo 0 > status to disable
binfmt_misc (temporarily).

Status of a single handler
--------------------------

Each registered  handler has an entry in /proc/sys/fs/binfmt_misc. These files
perform the  same function as status, but their scope is limited to the actual
binary format.  By  cating this file, you also receive all related information
about the interpreter/magic of the binfmt.

Example usage of binfmt_misc (emulate binfmt_java)
--------------------------------------------------

  cd /proc/sys/fs/binfmt_misc 
  echo ':Java:M::\xca\xfe\xba\xbe::/usr/local/java/bin/javawrapper:' > register 
  echo ':html:E::html::/usr/local/java/bin/appletviewer:' > register 
  echo ':Applet:M::<!--applet::/usr/local/java/bin/appletviewer:' > register
  echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register


These four  lines  add  support  for  Java  executables and Java applets (like
binfmt_java, additionally  recognizing the .html extension with no need to put
<!--applet> to  every  applet  file).  You  have  to  install  the jdk and the
shell-script /usr/local/java/bin/javawrapper  too.  It  works  around  the
brokenness of  the Java filename handling. To add a Java binary, just create a
link to the class-file somewhere in the path.

2.3 /proc/sys/kernel - general kernel parameters
------------------------------------------------

This directory  reflects  general  kernel  behaviors. As I've said before, the
contents depend  on  your  configuration.  Here you'll find the most important
files, along with descriptions of what they mean and how to use them.

acct
----

The file contains three values; highwater, lowwater, and frequency.

It exists  only  when  BSD-style  process  accounting is enabled. These values
control its behavior. If the free space on the file system where the log lives
Goes below  lowwater  percentage,  accounting  suspends.  If  it  goes  above
highwater percentage,  accounting  resumes. Frequency determines how often you
check the amount of free space (value is in seconds). Default settings are: 4,
2, and  30.  That is, suspend accounting if there is less than 2 percent free;
resume it  if we have a value of 3 or more percent; consider information about
the amount of free space valid for 30 seconds

ctrl-alt-del
------------

When the value in this file is 0, ctrl-alt-del is trapped and sent to the init
program to  handle a graceful restart. However, when the value is greater that
zero, Linux's  Reaction  to  this key combination will be an immediate reboot,
without syncing its dirty buffers.

[NOTE]
    When a  program  (like  dosemu)  has  the  keyboard  in  raw  mode,  the
    ctrl-alt-del is  intercepted  by  the  program  before it ever reaches the
    kernel tty  layer,  and  it is up to the program to decide what to do with
    it.

domainname and hostname
-----------------------

These files  can  be controlled to set the NIS domainname and hostname of your
box. For the classic darkstar.frop.org a simple:

  # echo "darkstar" > /proc/sys/kernel/hostname
  # echo "frop.org" > /proc/sys/kernel/domainname


would suffice to set your hostname and NIS domainname.

osrelease, ostype and version
-----------------------------

The names make it pretty obvious what these fields contain:

  > cat /proc/sys/kernel/osrelease
  2.2.12
  
  > cat /proc/sys/kernel/ostype
  Linux
  
  > cat /proc/sys/kernel/version
  #4 Fri Oct 1 12:41:14 PDT 1999


The files  osrelease and ostype should be clear enough. Version needs a little
more clarification.  The  #4 means that this is the 4th kernel built from this
source base and the date after it indicates the time the kernel was built. The
only way to tune these values is to rebuild the kernel.

panic
-----

The value  in  this  file  represents  the  number of seconds the kernel waits
before rebooting  on  a  panic.  When  you  use  the  software  watchdog,  the
recommended setting  is  60. If set to 0, the auto reboot after a kernel panic
is disabled, which is the default setting.

printk
------

The four values in printk denote
* console_loglevel,
* default_message_loglevel,
* minimum_console_loglevel and
* default_console_loglevel
respectively.

These values  influence  printk()  behavior  when  printing  or  logging error
messages, which  come  from  inside  the  kernel.  See  syslog(2)  for  more
information on the different log levels.

console_loglevel
----------------

Messages with a higher priority than this will be printed to the console.

default_message_level
---------------------

Messages without an explicit priority will be printed with this priority.

minimum_console_loglevel
------------------------

Minimum (highest) value to which the console_loglevel can be set.

default_console_loglevel
------------------------

Default value for console_loglevel.

sg-big-buff
-----------

This file  shows  the size of the generic SCSI (sg) buffer. At this point, you
can't tune  it  yet,  but  you  can  change  it  at  compile  time  by editing
include/scsi/sg.h and changing the value of SG_BIG_BUFF.

If you use a scanner with SANE (Scanner Access Now Easy) you might want to set
this to a higher value. Refer to the SANE documentation on this issue.

modprobe
--------

The location  where  the  modprobe  binary  is  located.  The kernel uses this
program to load modules on demand.

unknown_nmi_panic
-----------------

The value in this file affects behavior of handling NMI. When the value is
non-zero, unknown NMI is trapped and then panic occurs. At that time, kernel
debugging information is displayed on console.

NMI switch that most IA32 servers have fires unknown NMI up, for example.
If a system hangs up, try pressing the NMI switch.

panic_on_unrecovered_nmi
------------------------

The default Linux behaviour on an NMI of either memory or unknown is to continue
operation. For many environments such as scientific computing it is preferable
that the box is taken out and the error dealt with than an uncorrected
parity/ECC error get propogated.

A small number of systems do generate NMI's for bizarre random reasons such as
power management so the default is off. That sysctl works like the existing
panic controls already in that directory.

nmi_watchdog
------------

Enables/Disables the NMI watchdog on x86 systems.  When the value is non-zero
the NMI watchdog is enabled and will continuously test all online cpus to
determine whether or not they are still functioning properly.

Because the NMI watchdog shares registers with oprofile, by disabling the NMI
watchdog, oprofile may have more registers to utilize.

msgmni
------

Maximum number of message queue ids on the system.
This value scales to the amount of lowmem. It is automatically recomputed
upon memory add/remove or ipc namespace creation/removal.
When a value is written into this file, msgmni's value becomes fixed, i.e. it
is not recomputed anymore when one of the above events occurs.
Use auto_msgmni to change this behavior.

auto_msgmni
-----------

Enables/Disables automatic recomputing of msgmni upon memory add/remove or
upon ipc namespace creation/removal (see the msgmni description above).
Echoing "1" into this file enables msgmni automatic recomputing.
Echoing "0" turns it off.
auto_msgmni default value is 1.

 

相关文章