So back in February 2021 I completed the online book; Ray Tracing In One Weekend. You can read my post (and review) on it here:
https://lptcp.blogspot.com/2021/02/ray-tracing-in-one-weekend-review.html
https://lptcp.blogspot.com/2021/02/ray-tracing-in-one-weekend-review.html
I originally wrote and executed the ray tracer on my previous computer; a 4 core intel 4790k. As the program is single threaded, it took 27 minutes to render a single 400x225 image with 100 samples per pixel. I attempted to do a 1k version but after 6 hours I stopped the program.
2 years later, I sat back down and decided now was the time to finally attempt to multithread it to speed it up. I actually tried to after writing the first post however, at the time, my programming experience was still a bit too limited and I just couldn't wrap my head around the concepts of multi-threaded programming. With an extra 2 years experience under my belt, I was able to get it done.
Code:
My attempt isn't perfect but it I'm happy with the result and I can improve on it from here.
I initially did some very quick "fun with multithreading" exercises in the console window:
I'll probably make a seperate post on simple threading as the MSDN documentation is great but it's not great for someone with little experience and these are the types of examples I would've wanted 2 years ago (not lambda's operating on matrices).
Adding More Threads
So the main issue with just adding more threads to the ray tracer is the output method. It currently writes each pixel as a line in a text file which is saved as a PPM. This means that cout << is accessing a single text file every time a pixel is calculated.
If you add more threads, cout is not thread safe and the threads will fight with each other to access the text file and data will be overwritten/jumbled up.
A simple fix for this would be for a thread to lock access to the text file while it's accessing it, then unlock when it's done. This would prevent corrupted data. However, threads spun up by a for loop do not guarantee any kind of order so whereas all the pixels would be calculated correctly, the picture would still be a jumbled mess.
My solution was to therefore split up the picture into "tiles" or "chunks". I was inspired by Blender's cycles renderer which renders a tile at a time. If you choose to render with your CPU, it will use a core per tile and you can see it working on X-cores at a time.
Using std::thread::hardware_concurrency(), I allowed the program to spin up as many threads as it can. Each of these threads then writes pixels to a fixed-sized 2D array. Each thread writes to a certain portion of the array so they aren't overwriting each other. To visualise, if the image is 500x500 and using 4 threads, then thread 1 would render pixels x0-x249 and y0-y249:
Then, once all the threads have have finished, we then go through the vector array of pixels and write them out to a file to create the image.
I ran this new code on my new pc; a 16-Core AMD 3950x using the same width and pixel sample as the old pc and it completed the image in 2 minutes (vs 27 on a single core). It outputs it upside down but that can be easily fixed:
A 1K version took just over 7 minutes; the power of multithreading!
My original solution to this used 4 hardcoded threads that wrote to their own text files and then once they were finished, I stitched the files together. However, I completely forgot how PPM files worked and my chunks wouldn't stitch together properly. Using an array bypassed this problem.
My next task will be to transfer the output to the screen using a GUI library like FLTK or SMFL so you can see the tiles being created in real-time.
No comments:
Post a Comment