Tuesday, November 4, 2008

Make it work .NET Compact Framework on PXA320 based Windows Embedded CE 6.0 Platform NET

I have developed a BSP for PXA320 based platform running on windows embedded CE 6.0 R2. I have taken the Zylonite windows CE 5.0 BSP. I faced a strange issue while running the Dot Net Compact framework based application. The application failed to run even though the necessary components have been included for the Dot Net Compact framework.
I tried to run the cgacutil.exe to get the version of the .net framework. I got “Registration Failed” message in the console. Same issue I faced when I installed the .Net Framework through the cab installer. I tried the same project file (.pbxml) with other platform. I found the .net framework was working.
So the issue should be related to BSP. During the installation of the .NET Compact framework, the IOCTL_HAL_GET_DEVICE_INFO is called through the kernelIOcontrol to get the device information for registration. This IOCTL will give set of system parameter information through some SPI cases. One of the cases is SPI_GETPLATFORMNAME. This will give the platform name and it is used during the .NET compact framework GAC implementation. This case is missing in my BSP, since I have used the Zylonite windows CE 5.0 BSP. Because of this .NET Compact Framework is not working.
Workaround 1:
a) Add the global declarations
const WCHAR HALPlatNameStr[] = L"XXXPlatform" ;
const DWORD dwHALPlatNameStrLen = 9;
b) Add the following code in the src\oal\oallib\ioctl.c OALGetDeviceInfo() function.
case SPI_GETPLATFORMNAME:
len = (dwHALPlatNameStrLen+1)*sizeof(WCHAR);
if (lpBytesReturned)
*lpBytesReturned = len;
if (nOutBufSize >= len && lpOutBuf)
{
memcpy(lpOutBuf,HALPlatNameStr,len);
retval = TRUE;
} else
NKSetLastError(ERROR_INSUFFICIENT_BUFFER);
break;


(OR)

Workaround 2:
Do the following changes in src\inc\ioctl_tab.h file.
{ IOCTL_HAL_GET_DEVICE_INFO, 0, OALIoCtlHalGetDeviceInfo },
//{ IOCTL_HAL_GET_DEVICE_INFO, 0, OALGetDeviceInfo },
Use the common OAL IOCTL function OALIoCtlHalGetDeviceInfo() instead of OALGetDeviceInfo(). The OALIoCtlHalGetDeviceInfo() will finally use the g_oalIoCtlPlatformType value implemented in the ioctl.c file.

Monday, November 3, 2008

Work Around for data abort on L2 cache enabled PXA320 windows CE 6.0 platforms.

I have developed a BSP for PXA320 based platform running on windows embedded CE 6.0. I have taken the Zylonite windows CE 5.0 BSP. I faced an issue while enabling the PXA320 L2 cache. There are lots of data abort during the booting time from some drivers and application like explorer.exe. I track the call stack and I found the source code causing the data abort. OEMCacheRangeFlush () CACHE_SYNC_DISCARD case is causing the issue.
This case is used when the cache lines or the whole cache will be write-back and invalidate. Data Abort was occurring exactly when invalidating a range of cache lines. The invalidation operation for the whole cache or a range of the cache is decided based on the length and address. If length and address both are zero or the length is greater than the size of the cache, the whole cache will be flushed, otherwise only a particular set of cache lines will be flushed based on the address. Address passed as an argument is used to calculate the starting cache line to flush. L2 Cache is using physical address for cache operations. The cache flushing function is implemented in assembly language. It will convert the Virtual address to physical address through the page table entry. During this conversion sometime it is getting invalid page table data, which causes the data abort. I don’t know the reason for the invalid page table entry. But I have the workaround.
The work around is, use the same function used for flushing the entire cache instead of using the function implemented for flushing the lines.
Replace the function
XScaleFlushDCacheLinesL2((LPVOID) dwNormalizedAddress, dwNormalizedLength, ARMCacheInfo.dwL2DCacheLineSize);
With
XScaleFlushDCacheL2(DCACHE_LINES, ARMCacheInfo.dwL2DCacheLineSize, (DWORD) gpvCacheFlushBaseMemoryAddress);
This will stop the occurrence of data abort during the L2 cache flushing on OEMCacheRangeFlush().

Sharing Blocks of Memory between Kernel Mode Process and the User Mode Process in Windows Embedded CE 6.0

There are certain cases that we may need to share the same memory between kernel process and user process instead of copying the kernel memory data to user memory data or vice versa to increase the performance or process the data in a real time manner.
For Example, consider the scenario on the camera based application. A camera has sending the frames in 30 Frames per second. Approximately 33 ms is the interval between the 2 frames. The camera driver stores the captured data (640*480*8bits) within 5 ms for every frame and the remaining 28 ms has been left to process the data in the application for the required output. The driver has to store all the required frames as blocks of memory. It means if the application need 60 frames, the driver has to store the 60 Frames in the memory contiguously.
In this case, maintaining two large amount of memory in an embedded product will create scarcity of memory also this is not an efficient design. Unlike windows CE 5.0, directly accessing the kernel mode memory by the user process is restricted in windows CE 6.0 to avoid security vulnerability. However Windows Embedded CE 6.0 has introduced a new set of APIs that allows sharing the memory blocks between the kernel mode process and the user mode process in the secured manner.
VirtualAllocEX, VirtualCopyEx and VirtualFreeEX are newly introduced APIs in Windows CE 6.0 and allows you to share the memory between the kernel and the user mode processes. I can explain you the implementation through the sample stream driver and a user application. The virtual address space is allocated to the process id given as an argument to VirtualAllocEX (). In our example, virtual address space is allocated to user process area or application process area. You can print the address to find the address range. It should be less than 2 GB (user address space).
Allocating, sharing and freeing the memory address has been implemented through the IOCTL calls. The following source code explains the implementation.
DWORD SMD_IOControl (DWORD dwOpen, DWORD dwCode, PDWORD pIn, DWORD dwIn,
PDWORD pOut, DWORD dwOut, DWORD *pdwBytesWritten)
{
RETAILMSG(1, (TEXT("SMD_IOControl++ dwCode: 0x%x\r\n"),dwCode));
switch (dwCode)
{
case IOCTL_SMD_ALLOC_ADDRESS:
{
*pOut=(DWORD)GetVirtualAddress();
*pdwBytesWritten=4;
dwOut=4;
RETAILMSG (1, (TEXT("SMD_IOControl: Address=0x%x\r\n"), *pOut));
}
break;
case IOCTL_SMD_FREE_ADDRESS:
{
void* pvProcess=(void*)GetCallerVMProcessId();
VirtualFreeEx(pvProcess,(PVOID)((ULONG)gUserAddr & ~(ULONG)(PAGE_SIZE - 1)), 0,MEM_RELEASE);
gUserAddr=NULL;
}
break;
case IOCTL_SMD_FILL_ADDRESS:
{
char* Temp= (char*)((char*)gUserAddr+0x1F00000);
NKDbgPrintfW(L"Address=0x%x\r\n",Temp);
Strcpy(Temp," Hello World");
NKDbgPrintfW(L"Written value =%s\r\n",Temp);
}
break;

default:
{
DEBUGMSG (1, (TEXT("SMD_IOControl: unknown code %x\r\n"), dwCode));
}
return FALSE;
}

RETAILMSG(1, (TEXT("SMD_IOControl Finished--\r\n")));
return TRUE;

}

LPVOID GetVirtualAddress()
{
DWORD sDevPhysAddr = 0x81F00000;
DWORD dwSize = 0x02000000;
LPVOID lpUserAddr;
ULONG SourceSize;
ULONGLONG SourcePhys;
void* pvProcess=(void*)GetCallerVMProcessId();
SourcePhys = sDevPhysAddr & ~(PAGE_SIZE - 1);
SourceSize = dwSize + (sDevPhysAddr & (PAGE_SIZE - 1));
lpUserAddr = VirtualAllocEx(pvProcess, 0, SourceSize, MEM_RESERVE,PAGE_NOACCESS);
if (lpUserAddr == NULL) {
return NULL;
}
if (!VirtualCopyEx(pvProcess, lpUserAddr, GetCurrentProcess(), (PVOID)
SourcePhys, SourceSize,
PAGE_READWRITE PAGE_NOCACHE)) {
return NULL;
}
NKDbgPrintfW(L"Before round up lpUserAddr=0x%x\r\n",lpUserAddr);
lpUserAddr=(LPVOID)((ULONG)lpUserAddr+(sDevPhysAddr & (PAGE_SIZE - 1)));
gUserAddr=lpUserAddr;
NKDbgPrintfW(L"After round up gUserAddr=0x%x\r\n",gUserAddr);
return lpUserAddr;
}

IOCTL_SMD_ALLOC_ADDRESS: This Ioctl allocates blocks of memory and return the starting address of the allocated area to the user mode process (Application).
IOCTL_SMD_FREE_ADDRESS: This IOCtl frees the allocated shared memory area using VirtualFreeEX ().
IOCTL_SMD_FILL_ADDRESS: Just fill the data at the end of the large block of memory. These data can be printed by the application for the proof of concept of shared memory implementation.
GetVirtualAddress (): It allocates 32 MB of large memory, which can be shared by the kernel mode process and the user mode process. This function calls the GetCallerVMProcessId () to get the caller process id given as an argument to VirtualAllocEx () and VirtualCopyEX (). The caller process id is the application process id or user mode process id in our case. VirtualAllocEx takes the caller process id and the current process id as an argument and allocates the shared memory. Usage of the VirtualAllocEx and VirtualCopyEx are very similar to the VirtualAlloc and VirtualCopy.
The following is the sample application access the memory allocated by the sample memory driver.
HANDLE SMDDrv;
// Open the sample memory driver
SMDDrv=CreateFile(L"SMD1:", //file name to be opened
GENERIC_WRITEGENERIC_READ, //openign he fiel in read mode
0, //no share mode
NULL, //default security
OPEN_EXISTING, //opening an existing file
0, //non overlapped mode
NULL //no any template file
);
if(SMDDrv==INVALID_HANDLE_VALUE)
{
printf(":SMD open failed\r\n");
}
else
{
printf(":SMD open success\r\n");
}
//Get the allocated shared pointer
DeviceIoControl(SMDDrv,IOCTL_SMD_ALLOC_ADDRESS,NULL,0,&Address,sizeof(DWORD),&BytesWritten,NULL);
char* Data=(char*)Address;
printf("Address:0x%x",Data);
//Request the driver to fill the data
DeviceIoControl(SMDDrv,IOCTL_SMD_FILL_ADDRESS,NULL,0,NULL,0,NULL,NULL);
// Move the pointer to the filled data area
Data+=0x1F00000;
//Just print the data for proof of concept
printf("Data:%s",Data);
// Free the memory
DeviceIoControl(SMDDrv,IOCTL_SMD_FREE_ADDRESS,NULL,0,NULL,0,NULL,NULL);
//Close the driver handle
CloseHandle(SMDDrv);
I have given the simple application that opens the driver, get the large memory pointer, move to the data area filled by the driver, print the data and close the driver.
Hope this blog is useful for architects and developers, who are all in need of sharing the large blocks of memory between the user and kernel mode processes.

About Me:
I am currently working with e-con systems India which offers product design services, BSP development, Device driver development on Windows CE. we are specialized in camera drivers.