Many things are done, and the core itself is almost done – 5 Years in the making by probing everything in the real hardware.
Everything is now done!!! Just working on some upgrades!!!!
CPU design
- Full pipeline completed for standard MIPS instructions.
- Interlocks and Bypass functions were tested and confirmed to be working.
- Up to 150mhz without TLB, FPU and Cache cores. (The Goal of speed is 120Mhz for other cores)
- CP0 Core completed sending internal registers.
- 64-bit pipeline With 64 Bit regs.
- 64bit loads and stores fully working.
- TLB Core completed and used a duel clocked system to speed up
- FPU Core completed and tested working
- The FPU ALU will be separate from the main ALU pipeline to simplify the FPGA logic and do fault finding much easier.
- Cache memory with duel clocks. This is so we can start overclocking the CPU independent of the RCP core clock.
- The instruction cache is completed and working. With this, we can now run the CPU interface with a 64-bit wide data bus, so no bottlenecks will happen here anymore.
- Duel bus access. one for the instruction cache and one for the data cache. This allows the CPU able to get data for instructions and code without affecting each other. No more 32 bit muxed bus that the OG CPU uses for only 250Mbyte transfers. So almost a full 500mbyte access x2.
Bus Design
- 128-bit Ram access / DMA Channel (64 bit for normal data and 64 bits for the Z-buffer and coverage/alpha bits and always render 32bits – no more dither)
- 32-bit address and register access to devices
- 64-bit extended bit access for RDP/VI Z-buffer and Color alpha extended bits
- Job system where many cores can ask for ram at the same time and can be queued in the ram to keep data flow happening (This is a major issue with DDR3 ram and refreshing)
- All cores can read and write independently, thus giving an internal memory throughput of 500M/Bytes full duplex (500Mbytes read and 500Mbytes Write at the same time)
- Added direct RDP access to memory at 133mhz
MIPS Interface
- Standard access to registers and local memories (IMEM/DMEM and Rom access)
- Ram access is via the DMA Channel as this would be able to byte-mask data
- Interface for an original N64 CPU via the Nexys Video FMS Port
- Confirmed all block and subblock access done (This is important for the cache memory access.
- This will be later removed from the core as we will run the CPU directly to the internal buses allowing for up to 500mbyte/s and not the 250 it is with this bus
- This is being removed as we no longer require the real CPU, but a simple switch between the two cores via a boot setup via a reg for debugging games.
- Thanks To ElectronAsh, we have the original CPU on a board that I can use to get the main RCP core and other external accesses done without knowing if the CPU crashed due to my own bugs – Cannot wait to fix this.
PIF/SI interface
- Have working controller inputs -Done
- Works to be done on the memory paks and rumble pak -Done
- BRAM Interface with DMA controller -Done
- An internal CPU that would make the PIF ram look like the CIC seed is correct -Done
- A new PIF controller using a 6502 core accesses the BRAM and external devices. -Done
- DMA controller that both reads and writes 64bytes at a time (not 64bits) This would be an improvement -Done
- Custom BIOS for booting. -Done
- EEPROM access – Done
Rom Reader
- 32 and 64-bit reads and writes via the Register bus and DMA controller. -Done
- Changeable timing for rom reads -Done
- We need to make this independent of the master clock (62.5mhz), so we can run higher speeds and keep the read latency time calculations. -Done
- We have found some unaligned transfers that can happen.
Ram controller
- Full register setup – Both the ram regs and the ram module regs are accessible
- be able to read and write at the same time to the MIG7 ram controller
- Full 2.2 gbit throughput – 500mbyte read and 500mbyte write for normal N64 systems with more access for the RDP
- Multiple cores to read and write at the same time.
- Jobs are queued up so the latency of the DDR3 ram can be mitigated. We will work on a more extensive caching system on the memory controller of about 128kbyte for the RDP to run on, where half will be for the Z-buffer and the other half for the colour buffer.
RSP Core -completed
- Entire DMA, Imem and Dmem are completed and working -Done
- We can get this core up to 90mhz – Overclocking I hear -Done
- SU core tested and working -Done
- Interlock working in the pipeline process. This includes the VU core as well -Done
- the bypass also sorted for the EXE and WB stages in the main core
- The main CPU core has been completely tested for ALU and load and stores -Done
- CP0 is fully working – Even the DMA 😉 -Done
- VU core has been built, and some normal ALU ops are tested and confirmed, Just some special ops that need to be checked. -Done
- Just the divide core to complete. -Done
- Duel Opcodes are implemented, except for loads/Stores and MTC/MFC/CFC/CTC as we cannot write to the reg file at the same time a VU opcode is al to be processed. -Done
- Bug testing to be completed as well. -Done
RDP Core – completed
- The pipeline has been designed and tested. But does it work …. Will find out(UPDATE). We do have fill commands working, But there is an issue with masking. -Done
- Need to build the memory interface that can do both Z-buffer, Color image transfer and reads, Texture load and stores, and Masked stores that are all byte aligned. This has been done with some bugs in the TULT calls -Done
- Fully working 18bit Z-buffer and coverage checks
- Copy and Fill needs to be done directly from the memory controller -Done
- The memory controller will be done after the SU and VU in the RSP are tested and done.-Done
- Some Fill commands have been tested and confirmed working-Done
- Some caching stuff to do to help with DDR3 latency – working on
- Full 8x 4K buffers for texture, image and Zbuffer line buffers
Video Core
- Have a scaling unit working – some bugs on some interlaced games
- Got HDMI core working
- DMA fully working
- Need to work on some AA stuff
- Dithering can be selectable with 16bit colour outputs
Audio Core
- Audio going through the HDMI port and a DAC controller to check
- Some work on a custom frequency that the audio core makes, but all output is 48khz, just polyphase, so there is no popping on the audio output
- 32bit output for the HDMI core
- Got this running via HDMI, so no more DAC is needed
Please also check N64Brew for updates on the hardware details I have updated. I support these guys and the Discord channel “N64Brew”
This has been Five years in the making and written fully by myself. No leaks were used for the production of this core. Only emulators, reverse engineering and a lot of reading of patents. Coffee was overused as well
Many thanks to the N64Brew team and Decompiler teams as well for testing and source code access to find all the bugs.
https://www.twitch.tv/ultrafp64
https://www.youtube.com/user/mazamars312/videos
Where am I up to right now….. Well, let’s just say I’m almost done 🙂 Finally!!!!
I have no issues with buying myself a coffee to help my coffee addiction 🙂