General Description
Motorola MVME2604. LynxOS-3.0
Pentium II PC. Linux-2.2.12
RIO2 8061. LynxOS-2.5.1
Slink Implementations |
The Latency of transfers MUST be as low as possible. System Calls are only allowed when the Slink devices need a Set-Up. Polling is used to read/write data from/to the S5933 FIFOs. Previous studies justify that an ISR means a system overload too much heavy in high rate DAQ systems. As backup of last thought, the drivers released map also the S5933 chip if a privilege intervention is required
The PCI S5933 Chipset MUST be user space mapped. Current Slink software packages shows that a user space Slink Library can provide same functionalities that a driver library version
The Data Flow between S5933 internal FIFOs and the system RAM MUST be performed using the S5933 DMA capabilities. CPU is only used to move data from/to FIFOs when the maximum bandwidth performance is NOT required
The RAM buffersfor Slink transactions MUST will also be user space mapped. Therefore, no Kernel to User/User to Kernel transactions are required to access the DMA data buffers.
The ROD management of Physical devices as RAM memory and PCI Chipsets MUST live in peace with the internal Kernel management of such kind of devices not related with ROD.
From
the point of view of Slink Programmers And Slink users a
similar
set of functions and data structures have been implemented/used
in both systems (Motorola-RIO2/Lynx, PC/Linux).
The kernel-user I/O is performed using typical driver calls, "ioctls", "open" and "close" . "Read" and "write" calls operates over the complete PCI configuration device space (256bytes), which is physically inaccessible from a mapped system. The user procedure must follow an "attachment" of the particular device, it will perform some "read", "write" ioctl calls to configure the physical chipset and will "alloc" the device size in the PCI driver. "Alloc" ioctl call will return the physical address of the PCI chipset. This parameter will be used to map the chipset on the user context, using the typical LynxOS call smem_create.
The RAM device has been managed using the original RIO2 driver uiodrvr provided by CES, the same strategy has been followed on the uiolib user library provided the DAQ-1 project.
Slink libraries have been written on top of this two drivers. Both of them will provide some issues pointed at beginning. A complete set of routines performs the typical functions over the user software object as can be open, close, reset or DMA reads. A software quality checking has been carried out building a user application, SLIDAStest. SLIDAStest will try to estimate the performance of SLINK to PMC board to move data between the SLIDAS and the physical memory. SLIDAS emulates a data source generator of 32 bits at a maximum rate of 40MHz. It means that even 160Mbytes/sec can be loaded on our SLink PMC.
The hardware design of the Slink to PMC forces a continuous monitoring of the kind of Slink word present at AddOn FIFO (SLink FIFO), to get a continuous DMA flow. A mistmach between the expected link word on OMB1 and the current link word reported at IMB4 will resume the DMA.
Inside a DMA transfer, the link word is data type and therefore
the expected word must also be data type, to get a continuous data
flowing from AMCC S5933 FIFOs to RAM Memory. When a ctrl word arrives
to the external FIFO (header or trailer of current packet), DMA will resume
waiting that some external mechanism (software and therefore the CPU) change
the expected word. After update the values, the DMA transaction will flow
again for the new packet. Our tests will try to measure this intrinsic
effect of current slink hardware on the Bandwidth performance.
Following last thought, we have done two kinds of test . Raw Transfers where SLIDAS supplies raw
packets whitout CONTROL words and Control Transfers
where the packet is a normal SLINK packet Header, data, data......, data, Trailer. On both cases, the buffer size has been sweeped covering 13K words. In
the CTRL sweep the curve sampling and sweep size is constrained by the
SLIDAS hardware. The Raw sweep has been software oversampling to
check CTRL test measurement. Below, the data plots are shown.
RAW DMA Transfers (DMA buffering without CTRL words) can be as high as 89Mbytes/sec with our current system, with an efficiency of 95% for 2Kwords buffer size. The figure shows that the buffer size required to obtain a good efficiency is not too much exigent (1/2 kword means a 85%)CTRL DMA Transfers (DMA buffering with CTRL words) can reach up to 85 Mbytes/sec for 8Kwords (the maximum Packet size provided by SLIDAS). In terms of Bandwidth, the penalty coming from the software CTRL words switch, which is nicely seen in the curves, it's roughly about 10% for 1Kword
Both experiments trends to the same plateau (P1 parameter) for a buffer size >= 8Kwords inside a a variance lower than 4%
A good Chi square-ndf ratio has been obtained from the data collected which means a good characterization
The calculations use software calls to measure elapsed time after M iterations for each packet size. We do not have in account overheads coming from intermediate function calls, the procedure followed seems something like below code:for( jj = 0 ; jj < STATISTICs ; jj ++ ){run_params.start=times( &timer );
for( ii = 0 ; ii < loops ; ii++ ){
if( SLink_DMA_Read( dev, ( char * )( &dma ), size, PCItoPHY_M ) != SUCCESS )
break;
}
run_params.end=times( &timer );time[ jj ] = show_performance( &run_params );
printf("\n%d\t\%.0f", jj, time[ jj ]);
fprintf(data_file,"\n%d\t\%.0f", jj, time[ jj ]);}
S5933 Chipset is designed for data widths of 32 bits, working at a maximum speed of 33MHz. It means that the MVME2604 PCI bus is underloaded, which can extract/push data widths of 64 bits at a full speed of 33MHz
The Mmap driver entry makes the PCI Chipset/RAM remapping job, using remap_page_range kernel symbol. The trick to remove page swapping kernel management on system RAM, comes from the X server strategy to access physical memory. A description of method can be found at Linux Lab Project Documentation. Our modified version blocks the kernel array mem_map entries linked to the page range allocated by the _get_free_pages call.
/* Trick from the Linux Lab Project to use remap_page_range in mmap
call
when RAM is involved. PG_Reserved bit.
Same strategy as Xfree */
for( ii = MAP_NR( tmp->k_ptr
);
ii <= MAP_NR( tmp->k_ptr + ( (PAGE_SIZE - 1)*alloc.pages ) );
ii++ ){
mem_map_reserve( ii );
}
After this loop the pages are treated as PCI memory pages, and therefore are suitable to be user mapped using mmap, remap_page_range and VMAs.
Driver entries "write", "read" are implemented to access
the PCI Configuration Space which can not be mapped. Same as the LynxOS
PCI driver, the Linux driver maps internally the PCI chipsets to be able
to access hardware whenever required, i.e. in the /proc/rod entry
or if some interrupt service is designed.
As the SLink LynxOS libraries case, a similar set of data types and procedures have been implemented on top of the ROD driver services. We have check
the package functionalities, setting up a real Slink environment based
on Fiberchannel. LDC and LSC SLink/FiberChannel cards provide a maximum
BW of 103Mbytes/sec over the maximum theoretical rate of 1Gbit/sec provided
per FiberChannel specification, further information about hardware is found
at http://www.rmki.kfki.hu/detector/S-Link/.
The applications built will try to measure the PC RAM to FIFO transfers bandwidth and the Fiberchannel bandwidth. We are also interested to measure the incoming penalty from the CTRL words management. In the Linux side, the control word (Slink packet header or trailer) needs to be written by software, therefore, no DMA can be used.
A synchronization method between the Sender application and receiver
software is necessary before the sweep test starts. We will use the main
data link and the SLink Return Lines to make the configuration setup between
the two processes. Before the sweep test starts, a special packet will
be sent from the PC Memory to the PowerPC memory. It will contains the
sweep parameters: the data buffer size to be transmitted, the number of
SLink packets, the number of loops per sweep point and the sweep resolution.
When the receiver is ready to make the sweep, it will acknowledge its state
lowering S-Link line LRL3, which will be monitored at sender side after
send the synchro packet. The experimental results obtained are shown below:
RAW DMA Transfers (DMA buffering without CTRL words) can be as high as 42 Mbytes/sec with an efficiency greater than 98% for 8Kwords buffer size. The Figure shows that the size-efficiency is not too much exigent (1/2 kword means a 90%)CTRL DMA Transfers (DMA buffering with CTRL words) can reach up to 42,5 Mbytes/sec for 8Kwords. In terms of Bandwidth, the penalty coming from the software CTRL words management, it's roughly a 10% for 1/2Kword
Both experiments trends to the same plateau (P1 parameter) for a buffer size >= 1Kwords inside a 5% margin
Same as the SLIDAS case, a good Chi square is obtained from the data collected
The theoretical BW on the PCI to SLink card is 65 Mbytes/sec. Therefore, is not possible to check the FiberChannel bandwidth. Erik V der Bij has suggested us that such test should be done using a SliTest board or a MicroEnable card which provides RAM Memory to/from PCI throughput of 110Mbytes/secThe Linux load has been set to default. XServer and other "user applications" as networking services are running and our SLink application hasn't any special privileges. No special kernel scheduler Set-Up has been done
RAW and CTRL DMA tests are performed for a PCI-SLink Latency timer value of 96 (~23usec per burst). For a fixed data buffer (8Kwords), a latency sweep has been done covering the whole dynamic range of latency timer. We see a flat response of 35Mbytes/sec with a step response to P1 parameter from 26 latency value to the end of the sweep ?????
TileCal ROD
Maintaned by Juanba Ific, University of Valencia |