This paper presents preliminary results from design and implementation of a library for shared-memory parallel applications, SpaceLib. SpaceLib is written in C++, and is designed to facilitate efficient use of caches, taking into account locality and the fact that caches are organized into fixed-sized blocks. Before the implementation of SpaceLib, restructuring of a particle-based wind tunnel simulation called MP3D showed that a cache-sensitive approach had significant advantages. Memory-system simulation showed big reductions in cache misses. Run times on several architectures showed that sensitivity to caches is increasingly important on more recent designs because processor speed is improving faster than DRAM (ordinary main memory) speed. MP3D has been re-implemented on top of SpaceLib with promising results; further research includes more detailed measurement and implementation of other applications.