Wednesday, April 3, 2019
Determining Attributes to Maximize Visibility of Objects
Determining Attri exactlyes to Maximize visibility of ObjectsA Critique onDetermining Attributes toMaximize Visibility of ObjectsMuhammed MiahGautam DasVagelis HristidisHeikki MannilaVidisha H. Shah1. compend of the published workDas, Hristidis, and Mannila (2009, p. 959) discussed close to the pointing function and top k retrieval algorithms that helps user (potential buyers to search for the required return from the available cata record. The difficulty is how a user (potential seller) should select propertys of new tuple that the product stands out from the early(a) available products. So in that respect ar several formulation that were positive by the causality and few that are already in practice. correspond to references, to run a interrogate(search) key give-and-takes are entered on rear of which search is conducted (p. 959). The oppugn anserwing constitution may return all the values that fulfill the take aim it is besides called as unranked retrieval or Boolean retrieval, or jakes rank the answers and return top k values know as ranked retrieval or Top-k retrieval. The example disposed by the author is objects piece of tail be ranked on the ascribe base on price or ground on relevance.The example and a hassle related to it is described by the author. A user wants to discipline an ad to rent an apartment in an online news constitution (p. 959). The apt(p) ad (tuple) has discordant attributes wish cast of bedrooms, location and so on. The cost federal agent is excessively involved in any ad so and so attribute that pull up stakes provide better visibleness should be selected. To picture which attributes provide better visibleness we buttocks built it on basis of previous sellers recommendation (tradition technique) or an argument by which we idler pot the ranking function buy which we can understand which attribute bequeath lead to high ranking score. Example adding an attribute swimming consortium can increa se the visibility, or a catchy title or indexing key paroles (for an article). Let D be the selective informationbase of some product that has been de none already (competitor). germ is considering that the selective informationbase can be a relational submit or a schoolbookbook enrolment (p. 960). If infobase is a relational remand past each tuple in the table is a product and e very(prenominal) column is an attribute related to the product. If informationbase is a collection of school text editionual matter inscription thusly each document contains data regarding a particularized product (ad). The heap of queries or search conditions that have been executed in prehistorical by the user is stated as Q. on that pointfore Q is the query put down or workload. The query put down is the record of the queries that have been utilise buy the potential buyers in the past. So the query could be like SQL query or query based on key word that will return a tuple from D( database).The problem given up by the author is when a D(database), Q(query log), new tuple t and integer m are given determine best m attributes for tuple t such that when the shortened interpreting of the tuple t with m attributes is inserted in d then the sum of queries from Q retrieving tuple t is maximized (p. 960). In this paper sort of m is in addition considered that is when m is given by the user or when m is not mentioned.In this paper author has consider several variants like Boolean (unranked retrieval) (P 960), categorical variant, text and numeric data variant and conjunctive and disjunctive query semantics. calculate variant is also considered where in if m is not given the bearing of maximizing the visibility is achieved applying m minimum. No- budget variant is also considered where value of m is not given and the only aim is to bring home the bacon maximum visibility of the object and for that all possible attributes can be added.In the preliminaries per sona author describes that for the given database D it contains tuples t1, t2,.tm. Each tuple t has various attributes a1, a2,. an. Tuple t will have either value 1 or 0. 0 implies that the attribute is absent and 1 implies that the feature is available. Tuple command means that if a tuple has all attributes value 1 that that tuple dominates. Tuple compression of t which has m attributes. It retains all 1s in m and converts equipoise all attributes to 0 (p. 961).In continuative Boolean with query log(CB-QL) variant the problem comment stated by the author is when a Q with accommodative Boolean retrieval semantics, tuple t, and integer m are given then have to compute compressed tuple with m attribute with maximum visibility(p. 961). For this problem author uses NP-Completeness Results and derives the Theorem that the decision version of CB-QL problem is NP-hard.Author explains various algorithms for Conjunctive Boolean with query log (p. 961). First is Optimal Brute repel Alg orithm. As stated earlier that CB-QL is NP-hard so during worst case optional algorithm will run in polynomial time. The problem can be solved by a simple. This problem can be solved by simple brute force algorithm. So can be called as Brute Force-CB-QL which will consider all the cabal of all m attributes of the tuple t such that the combination will encounter to achieve maximum visibility among Q.In Optimal Algorithm found on Integer one-dimensional programming an ILP framework CB-QL can be described as follows, new tuple t be a Boolean vector has various attributes a1,a2,an. Q be the query log and S be the total number of queries in query log. So the task is toThis integer linear formulation is attractive unlike other general IP solvers, ILP solvers and are also usually more efficient(p. 962).According to author in Optimal Algorithm that is Based on maximum Frequent degree Sets according to the author this algorithm is based on Integer Linear Programming, but this has certai n limitation so author orders it is impractical if there are more than few hundred of queries in the Q query log. The author has develop an alternate approach for the same which scales large query logs very well (p. 963). This algorithm is called MaxFreqItemSets-CB-QL, for this author has defined the frequent item set problem, Complementing the Query enter, Setting of the Threshold Parameter, Random manner of walking to Compute Maximal Frequent Item Sets, Complexity Analysis of a Random Walk Sequence, Number of Iterations, Frequent Item Sets at Level M _ m, Preprocessing Opportunities, The Per-Attribute Variant.Author says in Greedy Heuristics algorithm becomes slow for large query logs when maximal frequent item set based algorithm has better scalability then the IPL based algorithm (p. 964). So author has developed suboptimal greedy trial-and-error for solving CB-QL. The algorithm consist of ConsumeAttr-CB-QL computes the number of times each attribute appears in Q. Using thi s top m attributes that have highest frequency is computed. The algorithm ConsumeAttrCumul-CB-QL commencement ceremony selects the attributes from the query log Q that has occurred maximum times and then finds the attribute that occurs second highest in the Q, and so on. The algorithm ConsumeQueries-CB-QL picks the query with minimum number of attributes first, and then selects all attributes specified in the query.In next region author explains problem variant for text data. In the text database there is a collection of documents, and each document consist a data of a particular ad (p. 965). The problem definition for text data is that query is a set of keywords and have to retrieve top-k documents via query specific scoring functions and make the document maximum visible. According to author text database can be directly mapped into Boolean database (p. 965). So the algorithm and the workings can be made homogeneous to that of Boolean information but author says that there is a problem with attribute picking for text data is NP-complete. It can convert it into Boolean considering each key word as a Boolean attribute. So according to author since text database can be converted to Boolean database in the algorithm for text data the are two issues firstly to view each text keyword as a Boolean attribute in query log Q, and none of the optimal algorithms are feasible for text data (p. 965) . befriend issue is that in text data the scoring functions that are use takes account of the document length and leads to decrease the score if keyword has low frequency.In the next section author has described about the experiments that were conducted and there results. For this experiments system that was utilise had following configuration P4, 1 GB RAM, 3.2- GHZ processor, century GB HDD, Microsoft SQL Server 2,000 RDBMS. Algorithms were implemented in C Language, for backend RDBMS and connectivity was done victimization ADO. 2 data sets were utilise for Boolean d ata and publication titles were used for text data experiments. 185 queries in query log were created for the experiments, 205 distinct keywords were created by other students. The experiment worked well for Boolean data CB-QL where top m attributes were given and had maximum visibility for 185 queries. Individual experiments were done to calculate the exploit time of each algorithms of CB-QL. Various statistical data is given by the author that gives how individual algorithm runs under various workload. Various similar experiments were done for text data also and its algorithm and similar statistical data is given by the author (p. 965).In the next section various other problem variants for Boolean data, categorical and numeric data are considered. In that author has first explain Conjunctive Boolean-Data (CB-D) in which author describes its problem definition for maximum visibility given D(database), Q(query log), t (new tuple) and m(integer). For the given problem definition com plexity results for CB-D and its algorithm are given by the author (p. 967). Then next variant considered is Top-k Global be (Tk-GR) and Top-k Query-Specific Ranking (Tk-QR) and in that author considers Top-k retrieval using Global and Query-Specific tally Function. Then problem definition for Tk-GR and Tk-QR is stated by the author and then its complexity and algorithm for the same are given(P.968). Next variant considered by the author is Skyline Boolean (SB) where skyline retrieval semantics are considered then problem definition for SB then its complexity and algorithms are discussed. In the similar course re primary(prenominal)ing variants Conjunctive BooleanQuery LogNegation (CB-QL-Negation), Maximize Query Coverage (MQC), Categorical and Numeric Data are discussed by the author(P. 969).In conclusion author describes that how the best attributes for the problem can be selected from the data set given query log. Author has presented variants for many a(prenominal) cases l ike Boolean data or categorical or text data and numeric data (p. 972). And has showed that all the same though the problem is NP complete the optimal algorithms are feasible for small inputs. Author has also presented greedy algorithms that can produce good approximation ratio.2. My Opinion on published workThe use of internet and network has increased staggeringly and with that the data available on network has increased but the master(prenominal) problem is information to knowledge conversion that is finding data that is usable to the user, over spam. The algorithm discussed by the author can be used to improve the visibility of the document. In the paper author has not beneficial given algorithm for Boolean type data but also text data and other variant that is the algorithm can be used for real time data that is in various forms.The main focus of the author is on potential seller and what all attributes should be added to maximize the visibility of the advisement or the doc ument on the weathervane so that the potential buyers can view that document in first few options, but this can be used other way round to and using this spam can be created, a document that is a fake document that has various attributes which are not accepted but are added added to gain maximum visibility, which should not be even displayed in the given category.The author makes assumption about the competitors or say other advertise, and assumptions about the users preferences are made as well. The queries in the query log where written by random students and not according to what true users want, so there is no guarantee that this will work equally well in real time environment and will actually maximize the visibility with real time users and on real network.As given by the author in every problem definition of every variant that given D database and given Q query log but in real time for many application incomplete D(database) nor Q(query log) is available for analysis so user have to make assumptions about the competitors and users (potential buyers) need and there after have to decide the Top-k attributes from the subset of all the attributes that will help the user to achieve maximum visibility with minimum number of attributes.In the paper the author has given various variant by which the visibility of the object can be maximized in various cases and has various optimal algorithms and greedy algorithm. Optimal algorithm gives optimal outputs but whole caboodle well for small inputs only as and when the size of input increases the algorithm does not work well. Greedy algorithm produces approximate results that can be seen from the experiments done by the author with various variants.According to Ao-Jan Su, Y. Charlie Hu, Aleksandar Kuzmanovic, and Cheng-Kok Koh Page rank of any document or advertisement is not only depended on the attributes but also on key words in multitude name, the key words in the URL, HTML header so with the selection of proper attributes in the document user also needs to keep a check on above mentioned factors also to maximize the visibility of the object.(2010, P. 55)Angelica Caro has given a table of Data tincture and visibility rankings for Spanish university portals. In which author has given DQ* ranking, Visibility ranking, Partial visibility rankings in terms of Site, Links and Popularity, Distance* where *DQ means data quality and *Distance between the data quality and visibility rankings. Teal numbers racket indicate the portals that are relatively close in both rankings. So from the result given by the author it is seen that there is not a precise order that is the data quality of a site can be ranked 1 but visibility is 19 because it is based on other factors also like its popularity, associate, sites and distance. So even if the DQ is not very good but it is popular or it has many future links can lead to improve the overall ranking of the summon and thereby maximizes the visibility of the page. The statistic of the site that has ranks first in visibility is data quality is 5 visibility is 1 site 1 links 1 popularity 3 distance 4 so it can be seen that to gain maximum visibility we cannot just depend on attributes of the data that is not just data quality but there are various other factors that is required to be considered to improve visibility of the object, that is not considered in the paper by the author.(2011, p. 46).ReferencesAo-Jan Su, Hu, Y.C., Kuzmanovic, A., Cheng-Kok Koh (2010). How to Improve Your Google Ranking Myths and Reality.2010 IEEE/WIC/ACM International collection onWeb Intelligence and Intelligent Agent Technology (WI-IAT),1, 50-57.doi 10.1109/WI-IAT.2010.195Caro, A., Calero, C., Moraga, M.A.(2011). Are Web Visibility and Data Quality Related Concepts?.Internet Computing, IEEE, 15(2), 43-49.doi 10.1109/MIC.2010.126Miah, M., Das, G., Hristidis, V., Mannila, H. (2009). Determining Attributes to Maximize Visibility of Objects.Knowledge a nd Data Engineering, IEEE Transactions on,21(7), 959-973.doi 10.1109/TKDE.2009.72
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment