User-defined Search in RonPub publications

User-defined Search in RonPub publications http://www.ronpub.com/publications/search.php?journal=ALL&author=Stefan+Werner&exactauthor=on&title=&abstract=&volume=&issue=&year1=&year2=&searchtype=advanced This feed contains the result of an user-defined search in RonPub publications en-us Sven Groppe, Thomas Kiencke, Stefan Werner, Dennis Heinrich, Marc Stelzner and Le Gruenwald: P-LUPOSDATE: Using Precomputed Bloom Filters to Speed Up SPARQL Processing in the Cloud, Open Journal of Semantic Web (OJSW), 1 (2), pages 25-55, URN: urn:nbn:de:101:1-201705194858, 2014 https://www.ronpub.com/ojsw/OJSW-v1i2n02_Groppe.html http://nbn-resolving.de/urn:nbn:de:101:1-201705194858 Increasingly data on the Web is stored in the form of Semantic Web data. Because of today's information overload, it becomes very important to store and query these big datasets in a scalable way and hence in a distributed fashion. Cloud Computing offers such a distributed environment with dynamic reallocation of computing and storing resources based on needs. In this work we introduce a scalable distributed Semantic Web database in the Cloud. In order to reduce the number of (unnecessary) intermediate results early, we apply bloom filters. Instead of computing bloom filters, a time-consuming task during query processing as it has been done traditionally, we precompute the bloom filters as much as possible and store them in the indices besides the data. The experimental results with data sets up to 1 billion triples show that our approach speeds up query processing significantly and sometimes even reduces the processing time to less than half. Sven Groppe, Johannes Blume, Dennis Heinrich and Stefan Werner: A Self-Optimizing Cloud Computing System for Distributed Storage and Processing of Semantic Web Data, Open Journal of Cloud Computing (OJCC), 1 (2), pages 1-14, URN: urn:nbn:de:101:1-201705194478, 2014 https://www.ronpub.com/ojcc/OJCC-v1i2n01_Groppe.html http://nbn-resolving.de/urn:nbn:de:101:1-201705194478 Clouds are dynamic networks of common, off-the-shell computers to build computation farms. The rapid growth of databases in the context of the semantic web requires efficient ways to store and process this data. Using cloud technology for storing and processing Semantic Web data is an obvious way to overcome difficulties in storing and processing the enormously large present and future datasets of the Semantic Web. This paper presents a new approach for storing Semantic Web data, such that operations for the evaluation of Semantic Web queries are more likely to be processed only on local data, instead of using costly distributed operations. An experimental evaluation demonstrates the performance improvements in comparison to a naive distribution of Semantic Web data. Sven Groppe, Dennis Heinrich, Stefan Werner, Christopher Blochwitz and Thilo Pionteck: PatTrieSort - External String Sorting based on Patricia Tries, Open Journal of Databases (OJDB), 2 (1), pages 36-50, URN: urn:nbn:de:101:1-201705194627, 2015 https://www.ronpub.com/ojdb/OJDB_2015v2i1n03_Groppe.html http://nbn-resolving.de/urn:nbn:de:101:1-201705194627 External merge sort belongs to the most efficient and widely used algorithms to sort big data: As much data as fits inside is sorted in main memory and afterwards swapped to external storage as so called initial run. After sorting all the data in this way block-wise, the initial runs are merged in a merging phase in order to retrieve the final sorted run containing the completely sorted original data. Patricia tries are one of the most space-efficient ways to store strings especially those with common prefixes. Hence, we propose to use patricia tries for initial run generation in an external merge sort variant, such that initial runs can become large compared to traditional external merge sort using the same main memory size. Furthermore, we store the initial runs as patricia tries instead of lists of sorted strings. As we will show in this paper, patricia tries can be efficiently merged having a superior performance in comparison to merging runs of sorted strings. We complete our discussion with a complexity analysis as well as a comprehensive performance evaluation, where our new approach outperforms traditional external merge sort by a factor of 4 for sorting over 4 billion strings of real world data. Sven Groppe, Dennis Heinrich and Stefan Werner: Distributed Join Approaches for W3C-Conform SPARQL Endpoints, Open Journal of Semantic Web (OJSW), 2 (1), pages 30-52, URN: urn:nbn:de:101:1-201705194910, 2015 https://www.ronpub.com/ojsw/OJSW_2015v2i1n04_Groppe.html http://nbn-resolving.de/urn:nbn:de:101:1-201705194910 Currently many SPARQL endpoints are freely available and accessible without any costs to users: Everyone can submit SPARQL queries to SPARQL endpoints via a standardized protocol, where the queries are processed on the datasets of the SPARQL endpoints and the query results are sent back to the user in a standardized format. As these distributed execution environments for semantic big data (as intersection of semantic data and big data) are freely accessible, the Semantic Web is an ideal playground for big data research. However, when utilizing these distributed execution environments, questions about the performance arise. Especially when several datasets (locally and those residing in SPARQL endpoints) need to be combined, distributed joins need to be computed. In this work we give an overview of the various possibilities of distributed join processing in SPARQL endpoints, which follow the SPARQL specification and hence are "W3C conform". We also introduce new distributed join approaches as variants of the Bitvector-Join and combination of the Semi- and Bitvector-Join. Finally we compare all the existing and newly proposed distributed join approaches for W3C conform SPARQL endpoints in an extensive experimental evaluation. Stefan Werner, Dennis Heinrich, Sven Groppe, Christopher Blochwitz and Thilo Pionteck: Runtime Adaptive Hybrid Query Engine based on FPGAs, Open Journal of Databases (OJDB), 3 (1), pages 21-41, URN: urn:nbn:de:101:1-201705194645, 2016 https://www.ronpub.com/ojdb/OJDB_2016v3i1n02_Werner.html http://nbn-resolving.de/urn:nbn:de:101:1-201705194645 This paper presents the fully integrated hardware-accelerated query engine for large-scale datasets in the context of Semantic Web databases. As queries are typically unknown at design time, a static approach is not feasible and not flexible to cover a wide range of queries at system runtime. Therefore, we introduce a runtime reconfigurable accelerator based on a Field Programmable Gate Array (FPGA), which transparently incorporates with the freely available Semantic Web database LUPOSDATE. At system runtime, the proposed approach dynamically generates an optimized hardware accelerator in terms of an FPGA configuration for each individual query and transparently retrieves the query result to be displayed to the user. During hardware-accelerated execution the host supplies triple data to the FPGA and retrieves the results from the FPGA via PCIe interface. The benefits and limitations are evaluated on large-scale synthetic datasets with up to 260 million triples as well as the widely known Billion Triples Challenge.