Why u-boot always mark writed block as bad?

Why u-boot always mark writed block as bad? - c

I'm developing an nand flash driver of u-boot. I think it works well but the environment of u-boot doesn't work properly. Here is what i have done for testing:
Erase the whole nand flash by code purely coded by myself, these codes are not related to u-boot. And no bad blocks found. (Is it possible for a nand flash to have no bad block?). Here is the code
void nand_erase(u32 addr)
{
if (addr & (BLOCK_SIZE - 1))
{
printf("not block align\n");
return;
}
u32 row = addr / 2048;
nand_select_chip();
nand_cmd(0x60);
NFADDR = row & 0xFF;
NFADDR = (row >> 8) & 0xFF;
NFADDR = (row >> 16) & 0x07;
nand_cmd(0xD0);
nand_wait_ready();
nand_cmd(0x70);
u8 status = nand_read();
if (status & 0x01)
{
printf("block 0x%x is bad", addr);
}
nand_deselect_chip();
}
Start u-boot, it prompt"bad CRC ,using default environment".
Now i use "setenv test 100" and "printenv test", it works well, and "saveenv", it prompt "OK" as well.
And i use "nand bad", it shows nothing.
Restart the board and u-boot
Now it says that "readenv() failed, using default environment".
And i "printenv test", it fails. Then i chekced "nand bad", it shows a bad block exactly at "CONFIG_ENV_OFFSET"
Then i changed CONFIG_ENV_OFFSET to another value. and repeat step 1-7. It will shows a bad block again at the new CONFIG_ENV_OFFSET.
I've checked my driver, the write operation and read operation are good i think. The steps are here:
"nand dump 0" , and it shows all 0xff
"nand write 20000000 0 800" to write memory into nand flash.
Then "nand dump 0", it shows the same values as "md 20000000 100" does.
So, you can see that after saveenv, the block at CONFIG_ENV_OFFSET will be marked as bad, I really don't know why

Now i figure out.
I set ecc.mode = NAND_ECC_HW_SYNDROME, But XXX_syndrome function doesn't maintain ecc layout. It just simply write ecc after main data. Finally it will override the first and second bytes of oob area in each page , but u-boot check these two bytes as bad block marker, so here is the answer.

Related

Problem utilizing spi module of beaglebone black

I have the following weird problem.
I have setup the BBB to activate the spi1 module. The the module is connected to an F-RAM chip (FM25CL64B). I have done all the necessary configuration. The /dev/spidev1.0 is present and I wrote a small program to write to, and read from the chip, by opening the /dev/spidev1.0 and using ioctl with the SPI_IOC_MESSAGE macro command.
Using that program I managed to successfully write 32 bytes of text in to the F-RAM chip. Reading also seemed to be successful... How do I know they both were successful? I used a logic analyzer with an SPI decoder activated to actually see what's going on over all four of the SPI lines. Monitoring all SPI lines I could see that the write and read operations generate the correct signals with the correct timings, and all signals are in sync. The CS enables the chip during the transaction, CLK clocks every byte in 8-bit words (as configured), the data lines show the correct values, which I could see thanks to the SPI decoder which shows the byte value right above each 8-bit signal sequence of both MOSI and MISO lines.
The problem is that even though I can see that correct information is sent over the MISO line during read operation, the buffer I provide to the ioctl(iSPIR, SPI_IOC_MESSAGE(2), xfer) is filled with zeros.
I intentionally initialized that buffer with other values, so I can see if the ioctl even writes into it. And it does. Zeros.
Now, the fact that I could see all the bytes sent over the MISO line during the read operation, proves that the writing operation not only looked right in the analyzer but actually wrote the intended data during the previous write operation.
I checked several times whether the MISO line is configured correctly, in the dts file (which I can rebuild and reinstall on demand). I checked if it's the correct pin, and if it's configured as input. Everything seemed to be correctly configured.
I ran the program as root in case there are permission issues - no difference.
I also implemented the spi communication in GPIO mode. I.e. the spi module is disabled, all lines configured as GPIO. CE, CLK, MOSI configured as otuputs and the MISO configured as input. This way I could implement the entire communication in software so I could have full control over the lines. Doing that, this time, I was able to successfully fill a buffer with the correct data from the F-RAM chip. I.e. the sequential read operation went fine from the F-RAM chip all the way to my user space buffer. I was able to print the data into the console. However, that worked way too slow. Also I find it inefficient to use purely software implementation of SPI com when there is a module available for use.
In order to write my sample program I used the spi_test.c open source example available online.
I also built and ran spi_test.c itself with no modification, same result.
Here is listing of my program (Relevant snippets):
// SPI config ...
int InitSPIReadMode(const char* pstrDeviceF)
{
int file;
__u8 wr_mode = SPI_MODE_0, rd_mode = SPI_MODE_0, lsb = 0, bits = 8;
__u32 speed = CLOCK_FREQ_HZ; // 500kHz
if((file = open(pstrDeviceF, O_RDWR)) < 0)
{
printf("Failed to open the bus.");
/* ERROR HANDLING; you can check errno to see what went wrong */
exit(1);
}
if(ioctl(file, SPI_IOC_RD_MODE, &rd_mode) < 0)
{
printf("SPI rd_mode\n");
return -1;
}
if(ioctl(file, SPI_IOC_RD_LSB_FIRST, &lsb) < 0)
{
printf("SPI rd_lsb_fist\n");
return -1;
}
if(ioctl(file, SPI_IOC_RD_BITS_PER_WORD, &bits) < 0)
{
printf("SPI bits_per_word\n");
return -1;
}
if(ioctl(file, SPI_IOC_RD_MAX_SPEED_HZ, &speed) < 0)
{
printf("SPI max_speed_hz\n");
return -1;
}
printf("%s: spi wr-mode=%d, spi rd-mode=%d, %d bits per word, %s, %d Hz max\n", pstrDeviceF, wr_mode, rd_mode, bits, lsb ? "(lsb first) " : "(msb first)", speed);
xfer[0].cs_change = 0; /* Keep CS activated */
xfer[0].delay_usecs = 0; //delay in us
xfer[0].speed_hz = CLOCK_FREQ_HZ; //speed
xfer[0].bits_per_word = 8; // bites per word 8
xfer[1].cs_change = 0; /* Keep CS activated */
xfer[1].delay_usecs = 0;
xfer[1].speed_hz = CLOCK_FREQ_HZ;
xfer[1].bits_per_word = 8;
return file;
}
In the main function: (as the logic analyzer shows this code correctly sends the command, address and clocks the 32 bytes of data afterwards)
int iSPIR = InitSPIReadMode("/dev/spidev1.0"); //open("/dev/spidev1.0", O_RDWR | O_SYNC);
char arrInstruct[3] = { OPCO_READ, 0x00, 0x00 };
char arrFRamData[512];
for(int pos = 0; pos < 512; pos++) arrFRamData[pos] = pos;
xfer[0].tx_buf = (unsigned long)arrInstruct;
xfer[0].len = 3;
xfer[1].rx_buf = (unsigned long)arrFRamData;
xfer[1].len = 32;
if(ioctl(iSPIR, SPI_IOC_MESSAGE(2), xfer) < 0) printf("ioctl write error %s.\n", strerror(errno));
// hex dumping of the arrFRamData buffer.
xfer is a global variable defined as:
struct spi_ioc_transfer xfer[2];
Thanks a lot in advance! :)

I found what the problem was with my setup. But even though the whole thing works now, the solution I applied raises more questions than answers.
So I found this solution in an article (https://elinux.org/BeagleBone_Black_Enable_SPIDEV) on the beaglebone wiki.
There I noticed that, in the device-tree overlay they set the CLK as an input. Reading the entire article turned out nothing as to why the CLOCK on the BBBs side has to be an input. Even though it's the master... The article only explains how to build and install the new DTBO in order to activate the SPI1 module.
So, even though it didn't make any sense to me, I though it wouldn't hurt to try changing the CLK line in my DTBO file form output to input to see what happens... And it worked! :O
Now, why is it so strange that the setup works like this. The BBB is supposed to be the SPI master, so its clock line should be an Output in order to drive this synchronous communication. That means the FM25CL64B chip must act as SPI slave. I just checked the datasheet and yes, the CLK on the chips side, is an input. So the CLK on the BBB must be an output. That's how I've configured the CLK pin on the BBB. As an output. And it didn't work.
This is why I'm so puzzled. So it works even though both ends of the CLK line are inputs?! Looking at the logic analyzers output, I can clearly see that the CLK line is being driven correctly. But if both the master an slave have inputs on that line there shouldn't be anything to generate those impulses?! There is nothing else connected to that line....

AVR Internal eeprom reading issue

I am using the internal EEPROM of atmega8A using avr's EEPROM library. My code looks like this
#define EEPROM_ADDR 0x0A
int main(void)
{
_delay_ms(2000);
LED_Initialize();
vBlink_Led(100, 2);
//eeprom_write_byte((uint8_t*)EEPROM_ADDR, 8);
val = eeprom_read_byte((uint8_t*)EEPROM_ADDR);
while (1);
}
when i uncomment the line eeprom_write_byte((uint8_t*)EEPROM_ADDR, 8); and then read using val = eeprom_read_byte((uint8_t*)EEPROM_ADDR);, the correct value 8 is read. But when I comment on the line and then reflash the code, the values changes to 255.
any suggestions?
note - i have unchecked the box in avrdudes for erase flash and eeprom

Normaly, when "Chip Erase" operation is performed, the EEPROM is being cleared as well.
To prevent this, you have to program (i.e. set to zero) EESAVE fuse bit, which is 3rd bit in the high fuse byte (please refer to the datasheet chapter 29.2 Fuse Bits)

Problem with USB CDC settings and libraries for STM32F4

I'm working in an embedded aplication for STM32F401RBT6 and I'm trying to establish a connection with the PC (Windows 10) but the device is not recognized by the system. The code that I generated by STMCubeMX and debugged by Atollic not works. I saw and try reproduce several examples, but anything works. In the code, I have all libraries that i think necessary.
I'm have this archives generated by STMCubeMX for the CDC comunication, but I'm newbie and I don't know what I have to modify on the code for the USB be recognized by the system. Someone can help me?

Beside the point from Soup ( the failing malloc caused by a heap that is only 0x200 by default) some Windows version have a problem with the line coding in the example.
In the usbd_cdc_if.c you should add:
/* USER CODE BEGIN PRIVATE_VARIABLES */
USBD_CDC_LineCodingTypeDef LineCoding =
{
115200, /* baud rate*/
0x00, /* stop bits-1*/
0x00, /* parity - none*/
0x08 /* nb. of bits 8*/
};
And a little bit below
static int8_t CDC_Control_FS(uint8_t cmd, uint8_t* pbuf, uint16_t length)
.
.
.
case CDC_SET_LINE_CODING:
LineCoding.bitrate = (uint32_t)(pbuf[0] | (pbuf[1] << 8) |\
(pbuf[2] << 16) | (pbuf[3] << 24));
LineCoding.format = pbuf[4];
LineCoding.paritytype = pbuf[5];
LineCoding.datatype = pbuf[6];
break;
case CDC_GET_LINE_CODING:
pbuf[0] = (uint8_t)(LineCoding.bitrate);
pbuf[1] = (uint8_t)(LineCoding.bitrate >> 8);
pbuf[2] = (uint8_t)(LineCoding.bitrate >> 16);
pbuf[3] = (uint8_t)(LineCoding.bitrate >> 24);
pbuf[4] = LineCoding.format;
pbuf[5] = LineCoding.paritytype;
pbuf[6] = LineCoding.datatype;
break;
So the host won't get undefined data if he tries to set the line coding.

Inside USBD_CDC_Init(..) function there is a malloc function which allocates about 540 Bytes memory on the Heap.
This was not taken into account from CubeMX when code generated .
So , at least you must define the Heap size taking into account theese extra Bytes, to have the USB CDC port working.

Keyboard interrupt handler not working in system iso

I'm trying to write an operating system using OSDev and others. Now, I'm stuck making a keyboard interrupt handler. When I compile my OS and run the kernel using qemu-system-i386 -kernel kernel/myos.kernel everything works perfectly. When I put everything into an ISO image and try to run it using qemu-system-i386 -cdrom myos.iso, it restarts when I press a key. I think it's caused by some problems in my interrupt handler or a bad IDT entry.
My keyboard handler (AT&T syntax):
.globl keyboard_handler
.align 4
keyboard_handler:
pushal
cld
call keyboard_handler_main
popal
iret
My main handler in C:
void keyboard_handler_main(void) {
unsigned char status;
char keycode;
/* write EOI */
write_port(0x20, 0x20);
status = read_port(KEYBOARD_STATUS_PORT);
/* Lowest bit of status will be set if buffer is not empty */
if (status & 0x01) {
keycode = read_port(KEYBOARD_DATA_PORT);
if(keycode < 0)
return;
if(keycode == ENTER_KEY_CODE) {
printf("\n");
return;
}
printf("%c", keyboard_map[(unsigned char) keycode]);
}
}
C function I use to load the:
void idt_init(void)
{
//unsigned long keyboard_address;
unsigned long idt_address;
unsigned long idt_ptr[2];
auto keyboard_address = (*keyboard_handler);
IDT[0x21].offset_lowerbits = keyboard_address & 0xffff;
IDT[0x21].selector = KERNEL_CODE_SEGMENT_OFFSET;
IDT[0x21].zero = 0;
IDT[0x21].type_attr = INTERRUPT_GATE;
IDT[0x21].offset_higherbits = (keyboard_address & 0xffff0000) >> 16;
/* Ports
* PIC1 PIC2
*Command 0x20 0xA0
*Data 0x21 0xA1
*/
write_port(0x20 , 0x11);
write_port(0xA0 , 0x11);
write_port(0x21 , 0x20);
write_port(0xA1 , 0x28);
write_port(0x21 , 0x00);
write_port(0xA1 , 0x00);
write_port(0x21 , 0x01);
write_port(0xA1 , 0x01);
write_port(0x21 , 0xff);
write_port(0xA1 , 0xff);
idt_address = (unsigned long)IDT ;
idt_ptr[0] = (sizeof (struct IDT_entry) * IDT_SIZE) + ((idt_address & 0xffff) << 16);
idt_ptr[1] = idt_address >> 16 ;
load_idt(idt_ptr);
printf("%s\n", "loadd");
}
Files are organized the same way as OSDev's Meaty Skeleton. I do have a different bootloader.

Based on experience I believed that this issue was related to a GDT not being set up. Often when someone says interrupts work with QEMU's -kernel option but not a real version of GRUB it is often related to the kernel developer not creating and loading their own GDT. The Mulitboot Specification says:
‘GDTR’
Even though the segment registers are set up as described above, the ‘GDTR’ may be invalid, so the OS image must not load any segment registers (even just reloading the same values!) until it sets up its own ‘GDT’.
When using QEMU with the -kernel option the GDTR is usually valid but it isn't guaranteed to be that way. When using a real version of GRUB (installed to a hard drive, virtual image, ISO etc) to boot you may discover that the GDTR isn't in fact valid. The first time you attempt to reload any segment register with a selector (even if it is the same value) it may fault. When using interrupts the code segment (CS) will be modified which may cause a triple fault and reboot.
As well the Multiboot Specification doesn't say which selectors point to code or data descriptors. Since the layout of the GDT entries is not known or guaranteed by the Multiboot specification it poses a problem for filling in the IDT entries. Each IDT entry needs to specify a specific selector that points to a code segment.
The Meaty Skeleton tutorial on OSDev doesn't set up interrupts, nor do they modify any of the segment registers so that code likely works with QEMU's -kernel option and a real version of GRUB. Adding IDT and interrupt code on top of the base tutorial probably lead to the failure you are seeing when booting with GRUB. That tutorial should probably make it clearer that you should set up your own GDT and not rely on the one set up by the Multiboot loader.

M95128-W EEPROM. First byte of each page not writing or reading correctly

I am working on a library for controlling the M95128-W EEPROM from an STM32 device. I have the library writing and reading back data however the first byte of each page it not as expected and seems to be fixed at 0x04.
For example I write 128 bytes across two pages starting at 0x00 address with value 0x80. When read back I get:
byte[0] = 0x04;
byte[1] = 0x80;
byte[2] = 0x80;
byte[3] = 0x80;
.......
byte[64] = 0x04;
byte[65] = 0x80;
byte[66] = 0x80;
byte[67] = 0x80;
I have debugged the SPI with a logic analyzer and confirmed the correct bytes are being sent. When using the logic analyzer on the read command the mysterios 0x04 is transmitted from the EEPROM.
Here is my code:
void FLA::write(const void* data, unsigned int dataLength, uint16_t address)
{
int pagePos = 0;
int pageCount = (dataLength + 64 - 1) / 64;
int bytePos = 0;
int startAddress = address;
while (pagePos < pageCount)
{
HAL_GPIO_WritePin(GPIOB,GPIO_PIN_2, GPIO_PIN_SET); // WP High
chipSelect();
_spi->transfer(INSTRUCTION_WREN);
chipUnselect();
uint8_t status = readRegister(INSTRUCTION_RDSR);
chipSelect();
_spi->transfer(INSTRUCTION_WRITE);
uint8_t xlow = address & 0xff;
uint8_t xhigh = (address >> 8);
_spi->transfer(xhigh); // part 1 address MSB
_spi->transfer(xlow); // part 2 address LSB
for (unsigned int i = 0; i < 64 && bytePos < dataLength; i++ )
{
uint8_t byte = ((uint8_t*)data)[bytePos];
_spi->transfer(byte);
printConsole("Wrote byte to ");
printConsoleInt(startAddress + bytePos);
printConsole("with value ");
printConsoleInt(byte);
printConsole("\n");
bytePos ++;
}
_spi->transfer(INSTRUCTION_WRDI);
chipUnselect();
HAL_GPIO_WritePin(GPIOB,GPIO_PIN_2, GPIO_PIN_RESET); //WP LOW
bool writeComplete = false;
while (writeComplete == false)
{
uint8_t status = readRegister(INSTRUCTION_RDSR);
if(status&1<<0)
{
printConsole("Waiting for write to complete....\n");
}
else
{
writeComplete = true;
printConsole("Write complete to page ");
printConsoleInt(pagePos);
printConsole("# address ");
printConsoleInt(bytePos);
printConsole("\n");
}
}
pagePos++;
address = address + 64;
}
printConsole("Finished writing all pages total bytes ");
printConsoleInt(bytePos);
printConsole("\n");
}
void FLA::read(char* returndata, unsigned int dataLength, uint16_t address)
{
chipSelect();
_spi->transfer(INSTRUCTION_READ);
uint8_t xlow = address & 0xff;
uint8_t xhigh = (address >> 8);
_spi->transfer(xhigh); // part 1 address
_spi->transfer(xlow); // part 2 address
for (unsigned int i = 0; i < dataLength; i++)
returndata[i] = _spi->transfer(0x00);
chipUnselect();
}
Any suggestion or help appreciated.
UPDATES:
I have tried writing sequentially 255 bytes increasing data to check for rollover. The results are as follows:
byte[0] = 4; // Incorrect Mystery Byte
byte[1] = 1;
byte[2] = 2;
byte[3] = 3;
.......
byte[63] = 63;
byte[64] = 4; // Incorrect Mystery Byte
byte[65] = 65;
byte[66] = 66;
.......
byte[127] = 127;
byte[128] = 4; // Incorrect Mystery Byte
byte[129} = 129;
Pattern continues. I have also tried writing just 8 bytes from address 0x00 and the same problem persists so I think we can rule out rollover.
I have tried removing the debug printConsole and it has had no effect.
Here is a SPI logic trace of the write command:
And a close up of the first byte that is not working correctly:
Code can be viewed on gitlab here:
https://gitlab.com/DanielBeyzade/stm32f107vc-home-control-master/blob/master/Src/flash.cpp
Init code of SPI can be seen here in MX_SPI_Init()
https://gitlab.com/DanielBeyzade/stm32f107vc-home-control-master/blob/master/Src/main.cpp
I have another device on the SPI bus (RFM69HW RF Module) which works as expected sending and receiving data.

The explanation was actually already given by Craig Estey in his answer. You do have a rollover. You write full page and then - without cycling the CS pin - you send INSTRUCTION_WRDI command. Guess what's the binary code of this command? If you guessed that it's 4, then you're absolutely right.
Check your code here:
chipSelect();
_spi->transfer(INSTRUCTION_WRITE);
uint8_t xlow = address & 0xff;
uint8_t xhigh = (address >> 8);
_spi->transfer(xhigh); // part 1 address MSB
_spi->transfer(xlow); // part 2 address LSB
for (unsigned int i = 0; i < 64 && bytePos < dataLength; i++ )
{
uint8_t byte = ((uint8_t*)data)[bytePos];
_spi->transfer(byte);
// ...
bytePos ++;
}
_spi->transfer(INSTRUCTION_WRDI); // <-------------- ROLLOEVER!
chipUnselect();
With these devices, each command MUST start with cycling CS. After CS goes low, the first byte is interpreted as command. All remaining bytes - until CS is cycled again - are interpreted as data. So you cannot send multiple commands in a single "block" with CS being constantly pulled low.
Another thing is that you don't need WRDI command at all - after the write instruction is terminated (by CS going high), the WEL bit is automatically reset. See page 18 of the datasheet:
The Write Enable Latch (WEL) bit, in fact, becomes reset by any of the
following events:
• Power-up
• WRDI instruction execution
• WRSR instruction completion
• WRITE instruction completion.

Caveat: I don't have a definitive solution, just some observations and suggestions [that would be too large for a comment].
From 6.6: Each time a new data byte is shifted in, the least significant bits of the internal address counter are incremented. If more bytes are sent than will fit up to the end of the page, a condition known as “roll-over” occurs. In case of roll-over, the bytes exceeding the page size are overwritten from location 0 of the same page.
So, in your write loop code, you do: for (i = 0; i < 64; i++). This is incorrect in the general case if the LSB of address (xlow) is non-zero. You'd need to do something like: for (i = xlow % 64; i < 64; i++)
In other words, you might be getting the page boundary rollover. But, you mentioned that you're using address 0x0000, so it should work, even with the code as it exists.
I might remove the print statements from the loop as they could have an effect on the serialization timing.
I might try this with an incrementing data pattern: (e.g.) 0x01,0x02,0x03,... That way, you could see which byte is rolling over [if any].
Also, try writing a single page from address zero, and write less than the full page size (i.e. less that 64 bytes) to guarantee that you're not getting rollover.
Also, from figure 13 [the timing diagram for WRITE], it looks like once you assert chip select, the ROM wants a continuous bit stream clocked precisely, so you may have a race condition where you're not providing the data at precisely the clock edge(s) needed. You may want to use the logic analyzer to verify that the data appears exactly in sync with clock edge as required (i.e. at clock rising edge)
As you've probably already noticed, offset 0 and offset 64 are getting the 0x04. So, this adds to the notion of rollover.
Or, it could be that the first data byte of each page is being written "late" and the 0x04 is a result of that.
I don't know if your output port has a SILO so you can send data as in a traditional serial I/O port or do you have to maintain precise bit-for-bit timing (which I presume the _spi->transfer would do)
Another thing to try is to write a shorter pattern (e.g. 10 bytes) starting at a non-zero address (e.g. xhigh = 0; xlow = 4) and the incrementing pattern and see how things change.
UPDATE:
From your update, it appears to be the first byte of each page [obviously].
From the exploded view of the timing, I notice SCLK is not strictly uniform. The pulse width is slightly erratic. Since the write data is sampled on the clock rising edge, this shouldn't matter. But, I wonder where this comes from. That is, is SCLK asserted/deasserted by the software (i.e. transfer) and SCLK is connected to another GPIO pin? I'd be interested in seeing the source for the transfer function [or a disassembly].
I've just looked up SPI here: https://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus and it answers my own question.
From that, here is a sample transfer function:
/*
* Simultaneously transmit and receive a byte on the SPI.
*
* Polarity and phase are assumed to be both 0, i.e.:
* - input data is captured on rising edge of SCLK.
* - output data is propagated on falling edge of SCLK.
*
* Returns the received byte.
*/
uint8_t SPI_transfer_byte(uint8_t byte_out)
{
uint8_t byte_in = 0;
uint8_t bit;
for (bit = 0x80; bit; bit >>= 1) {
/* Shift-out a bit to the MOSI line */
write_MOSI((byte_out & bit) ? HIGH : LOW);
/* Delay for at least the peer's setup time */
delay(SPI_SCLK_LOW_TIME);
/* Pull the clock line high */
write_SCLK(HIGH);
/* Shift-in a bit from the MISO line */
if (read_MISO() == HIGH)
byte_in |= bit;
/* Delay for at least the peer's hold time */
delay(SPI_SCLK_HIGH_TIME);
/* Pull the clock line low */
write_SCLK(LOW);
}
return byte_in;
}
So, the delay times need be at least the ones the ROM needs. Hopefully, you can verify that is the case.
But, I also notice that on the problem byte, the first bit of the data appears to lag its rising clock edge. That is, I would want the data line to be stabilized before clock rising edge.
But, that assumes CPOL=0,CPHA=1. Your ROM can be programmed for that mode or CPOL=0,CPHA=0, which is the mode used by the sample code above.
This is what I see from the timing diagram. It implies that the transfer function does CPOL=0,CPHA=0:
SCLK
__
| |
___| |___
DATA
___
/ \
/ \
This is what I originally expected (CPOL=0,CPHA=1) based on something earlier in the ROM document:
SCLK
__
| |
___| |___
DATA
___
/ \
/ \
The ROM can be configured to use either CPOL=0,CPHA=0 or CPOL=1,CPHA=1. So, you may need to configure these values to match the transfer function (or vice-versa) And, verify that the transfer function's delay times are adequate for your ROM. The SDK may do all this for you, but, since you're having trouble, double checking this may be worthwhile (e.g. See Table 18 et. al. in the ROM document).
However, since the ROM seems to respond well for most byte locations, the timing may already be adequate.
One thing you might also try. Since it's the first byte that is the problem, and here I mean first byte after the LSB address byte, the memory might need some additional [and undocumented] setup time.
So, after the transfer(xlow), add a small spin loop after that before entering the data transfer loop, to give the ROM time to set up for the write burst [or read burst].
This could be confirmed by starting xlow at a non-zero value (e.g. 3) and shortening the transfer. If the problem byte tracks xlow, that's one way to verify that the setup time may be required. You'd need to use a different data value for each test to be sure you're not just reading back a stale value from a prior test.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight