Oct
13th

About IPV4, IPV6 and future hostnames

Files under WWW | Leave a Comment | 13 Saturday October 2007

In this post I shall talk about IPV4 and IPV6 addresses and the respective for both standards.

 

The current standard allows the allocation of a total of 232= 4,294,967,296 billion addresses.

According to the latest internet census study peformed by the University of Southern California’s Information Sciences Institute ISI, a total of 2.8 billion addresses have been probed with ICMP ping messages. In this study it is stated that from a total of 4.29 billion addresses, a total of 73% has been allocated and 1.16 billion addresses are reserved for future use. On the other hand the unallocated address pool is predicted to be exhausted by the year 2010/2011 (IPV4 Address Report).

 

Nevertheless the new standard will allow 2128 addresses or 5.0788×1028 addresses for each of the 6.7 billion Earth inhabitants.

Future addresses will have the following format:

  • instead of having 4 groups of 4 decimal digit addresses as in (i.e www.wordpress.org 72.233.56.138) they have 8 groups of four hexadecimal digits (i.e. 2010:0db8:85a3:08d3:1319:8a2e:0370:7344 )

 

Internet

Regarding internet we do not have the exhaustion problem as compared to the total address pool.

can contain up-to a maximum of 255 characters. are ASCII based, not being UNICODE compliant although some anecdotical exceptions are allowed such as the Spanish ñ character for Spanish speaking domains.

Each character can be either:

  • A plain ASCII letter: ‘a’ to ‘z’ (case insensitive), that is 26 letters and/or

  • ‘0′ to ‘9′ decimal digits, that is 10 digits and

  • the ‘-’ hyphen character

Therefore it adds up a total of 37 ASCII-based options for each of the 255 characters:

25537=1.1 e89 combinations or almost infinite possibilities (although from a semantic or language perspective a lot of them will be non-sensical for the human being)

In other words we do not have a problem regarding availibility of according to the current standard mapped to the almost unlimited future address pool.

can be divided by periods, for instance we can specify a fully qualified domain name such as:

  • myHost.myDomain.myTopLevelDomain (i.e virgobrain.carlosfenguix.org) or

  • subdomain.myDomain.myTopLevelDomain (i.e. wordpress.carlosfenguix.org)

 

Future

So Why not individuals in the future have their own personal and/or personal addresses?

Probably we can suggest several straight-forward possibilities for referencing the of single individuals:

  • alias.myHost.myDomain.myTld:

    • Example: cenguix.virgobrain.carlosfenguix.org or

  • role.alias.myHost.myDomain.myTld:

    •  
      • Examples: research.cenguix.virgobrain.carlosfenguix.org or

      • personal.cenguix.virgobrain.carlosfenguix.org

What would be interesting is to have somehow UNICODE compliant for international languages in the future and alias for translated . For instance a chinese speaking person could have a personal hostname defined in chinese and a series of multiple alias representing the respective translations in different languages.

Sphere: Related Content

TAGS:[ , , , , ]

Related posts

Print This Post Print This Post

There Are No BibTex Entries For This Post

Hits for this Post:2348 | Posted in WWWPosted by VirgoBrain | No Comments »


1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
No Comments Add Comments



Feb
25th

ACM Computing Reviews Triggers over Nested Views of Relational Data

Files under WWW | Leave a Comment | 25 Sunday February 2007

After moving from Valencia, Spain to Sydney, Australia where I was born, Greetings to all the people who reads my specialized Web Blog.

In my Firestats statistics I have people from around the World:

  • JAPAN JAPAN 20.76%
  • UNITED STATES UNITED STATES 14.88%
  • REPUBLIC OF KOREA REPUBLIC OF KOREA 13.84%
  • SPAIN SPAIN 10.10%
  • CHINA CHINA 8.17%
  • INDIA INDIA 4.29%
  • BRAZIL BRAZIL 3.53%
  • RUSSIAN FEDERATION RUSSIAN FEDERATION 2.77%
  • POLAND POLAND 2.21%
  • GERMANY GERMANY 2.15%
  • AUSTRALIA AUSTRALIA 1.52%
  • FRANCE FRANCE 1.45%
  • TAIWAN TAIWAN 1.38%
  • UNITED KINGDOM UNITED KINGDOM 1.04% 
  • and 40 countries more

    Although we belong to very different cultural backgrounds there is something that unites Computer Scientists: knowledge and the communication of new ideas and thoughts! I hope you enjoy reading my Web Blog. Please feel free to leave comments and second thoughts. This time I am including one of my latest reviews at ACM Computing Reviews, a review about a very complex paper:

  • Shao, F., Novak, A., and Shanmugasundaram, J. 2006. Triggers over nested views of relational data. Trans. Database Syst. 31, 3 (Sep. 2006), 921-967. DOI= http://doi.acm.org/10.1145/1166074.1166080

    Available at: http://portal.acm.org/citation.cfm?id=1166080&dl=acm&coll=&CFID=15151515&CFTOKEN=6184618#

    This article is all about triggers and nothing but a special kind of triggers: triggers over nested virtual (non-materialized) Views that represent on top of object-relational DBMS’s. According to both the SQL99 and SQL2003 specifications, triggers as schema objects, are only specified on base tables. Therefore both specifications do not contemplate the existence of triggers on views. More over the SQL2003 ISO/IEC JTC 1/SC 32 draft standard on -Related Specifications SQL/XML does not include a single reference to trigger inclusion.

    We might have to wait till the ISO’s Work Group for database languages (WG3) schedules in 2007 the development of the next ISO standard which might include triggers on top of views as presented in this paper in the framework and foundations parts of the specification and in the -Related Specifications SQL/XML part, the inclusion of triggers in views.

    In summary the paper describes the entire round-trip of a middleware active triggering system of virtual views expressed on top of object-relational base tables. The round trip can be described as follows:

    1. An update occurs on base tables

    2. The system detects how updates on base tables affect triggers expressed on top of virtual views

    3. Activated triggers expressed on top of nested virtual views representing nested-relational unmaterialized views are translated into plain SQL triggers on top of object-relational base tables

    4. Base table triggers are executed and results are re-converted into nested relations passing as parameters to the affected virtual View triggers.

    Several important contributions are indicated by the authors in this large 47 pages journal article, among them we may cite the following:

    1. A systematic technique for the translation of triggers on top of NON-1NF nested-relational views into SQL triggers of flat tables

    2. An algorithm for detecting changes in nested views

    3. The definition and use of a special type of views entitled as “injective views” capable of detecting in an optimized way old values and new values in triggers with regard to transition tables and trigger execution context.

    4. The use and adaptation of trigger grouping optimization techniques used in publish-subscribe systems for the definition of triggers on top of virtual views.

    In their experimental evaluation with syntethic data sets we conclude that their techniques do not scale well for materialized views and for non-grouping techniques whereas using the adapted trigger grouping techniques in virtual views scales efficiently in standard commodity harware (an already outdated 933Mhz PIII 1GB RAM Computer).

    I’d rather say that the work of the authors have been presented in several VLDB conferences and that their contribution to the field is important which might be taken into account for the definition of the next ISO standards, especially with regard to views. According to some slides from Jim Melton, the next SQL 2007 standard will discuss the inclusion of materalized views but apparently there is no prior agenda containing virtual views, triggers on top of relational views and triggers on top of views.

    I only have to confess one negative criticism to the authors work with regard to the definition of a continuum of examples to prove the versality and universality of their techniques and more importantly decidability of trigger execution, which is a non-trivial very complex problem.

  • Sphere: Related Content

    TAGS:[ , , , , , , , ]

    Related posts

    Print This Post Print This Post

    There Are No BibTex Entries For This Post

    Hits for this Post:7080 | Posted in WWWPosted by VirgoBrain | No Comments »


    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
    Loading ... Loading ...
    No Comments Add Comments



    Jan
    21st

    The Web X.0 Ecological System

    Files under Semantic Web, WWW | Leave a Comment | 21 Sunday January 2007

    It has been a while since I posted an article in my Web blog, due to the fact I did not want to contaminate the specialized Web blog with superficial and non-trascendental content. Nevertheless I will incude some of my thoughts during my staying at the hospital taking care of my mother.

    I have been staying about 14/15 hours per day taking care of my mother who is a kind of special demanding patient. Meanwhile “killing” the idle hours spent in the hospital I thought it will be interesting to share my thoughts. I grabbed my laptop and start talking about the complexity involved in the maintenance and set up of duties in a typical more or less important provincial hospital.

    At night my hours were not disturbed much but only by my beloved patient, my mother and routine tasks performed by auxiliary night shift nurses. But in the morning the ecological system takes part in its climax. A majority of personnel are auxiliary nurses and cleaning personnel, then nurses and very few doctors. It is a typical hierarchical ecological society.

    The early morning starts with personnel checking body temperture, blood pressure, then breathing aids and so on. Then the catering personnel brings the welcomed breakfast. Then later the humble and very important cleaning personnel proceeds to clean the room and clean the non-mobile patients. Later the top layer of the hierarchy, the doctors will check the current progress of patients by examing vital signs and data available from patients.

    So what has to do this with regard to the WWW? Obviously there are major differences in that the WWW will be populated by automated software artifacts and that the hospital (by now) is populated by organic creatures. But we are able to determine analogies from a consumer/producer perspective in the ecology where consumers/producers are organic creatures (human beings) and automated software/hardware artifacts.

    In the WWW arena, who will be the consumers? Human beings and automated software process such as B2B processes and software agents. And producers? Information created and pushed into the WWW ecological system by billions of users and systems and automated processes that harvest, gather, retrieve, index and re-organize into compact repositories such as search engine databases, subject directories and so on.

    By now this is a very simple ecological system formed mainly by producers/consumers. When it will compare to a complex and dynamic system such as a provincial hospital? Well we may forecast that both successive generations and the will provide a minimal infrastructure capable of generating complex dynamic systems such as the aforementioned hospital scenario.

    Both of these require the population of very specialized software components such as automated Web services, workflow engines, service match-makers, dynamic on-the-fly and static versatile repositories, event-oriented rule-based/trigger-based active distributed components, optimized push and pull replicated components, intelligent information aggregators and summarizers, ontologies and automated ontology constructors and so on.

    All these components should be configured to cope with uncertainty, to provide partial decidability and incomplete information, to provide probablistic hypothesis or ranking of decisions. This future will require of many Web efforts such as the /OWL-based , the Fuzzy/Data mining , , the push and pull dynamic Web, the rule-based Web, the Web Services Web, the post-relational -based Web and so on.

    Sphere: Related Content

    TAGS:[ , , , , , , ]

    Related posts

    Print This Post Print This Post

    There Are No BibTex Entries For This Post

    Hits for this Post:8553 | Posted in Semantic Web, WWWPosted by VirgoBrain | No Comments »


    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
    Loading ... Loading ...
    No Comments Add Comments



    Oct
    28th

    About the I-Spy Meta Search Engine

    Files under WWW | Leave a Comment | 28 Saturday October 2006

    Submitted to October 2006

     I-Spy Logo

    A Review of the Paper:

    Smyth, B. and Balfe, E. 2006. Anonymous personalization in collaborative web search. Inf. Retr. 9, 2 (Mar. 2006), 165-190. DOI= http://dx.doi.org/10.1007/s10791-006-7148-z

    I remember as of 1998 meta-search engines such as the Meta Crawler were very popular among students and the research community until the arrival of Google that revolutionized the concept of searching on the WWW. Now meta-search engines are practically not very popular since the dominion of Google, Yahoo, AlltheWeb and Windows Live search engines. Also not much has happened in the search engine arena from the technological perspective since Google’s Page Rank algorithm. Some interesting alternatives have spawned such as the Lexxe search engine based upon advanced natural language processing, domain-specific search engines such as the Kosmix Health search engine, Clusty the clustering search engine or BrainBoost the somehow intelligent question answering search engine.

    Well, in this paper it is depicted in detail a proof-of-concept innovative meta-search engine entitled “I-Spy”. We may claim it as some-how revolutionary in the sense that it is based upon or query (qi) x document (dj) matrices storing the number of hits on each cell (Hi,j), which reminds us of the typical termxdocument matrices used in information retrieval algorithms. represent search histories of previous community users. uses (CBR) technology for displaying search results against a given query. In this case, the basic philosophy of CBR is the reuse of successful previous searches for the solution of future queries that present certain similarity. For each submitted query retrieves a preset maximum number of results per search engine, recombining results as a list ordered by normalized increasing overall scores (Rm).

    Next, re-ranks this ordered list based upon the selection history of previous searches (Rm is converted into RT). Results that are relevant to the current query are promoted (re-ranked with a higher score) in the list. Also relevant previous results that are included in the hit matrix (Hi,j) and not included in RT are finally included in RT and promoted.

    maintains separate hit-matrices for separate communities limiting their growth as compared to termxdocument matrices. We are able to intuitively foresee matrices with probably large number of rows (queries) but limited number of columns (documents) given that normally users only select the most relevant results (documents) to a query (i.e. users select documents among the first 25 results).

    We might be able to critize their implementation not taking advantage of parallel processing when users submit queries for determing a-priori not only (meanwhile waiting for results from search engines) but also the most relevant top-k community enabling naïve users to submit generic queries to all communites and letting the user to choose which community is appropiate for his interests (i.e. community discovery).

    We conclude this review stating that European Universities are not really competent in transferring technology to the industry sector loosing clear market opportunities, as compared to American Universities. The European Community should not only promote Network of Excellence (NOE’s), Specific Targeted Research Projects (STREPS) or Integrated Projects (IP’s) but also smaller innovative entrepreneur incubator projects. We believe there is a huge pool of proof-of-concept projects that can be transferred from the academia to the entrepreneur sector potentially generating alternative funds for Universities in the form of royalties, licenses and company shares.

    Companies such as Eurekster are trying to fill the niche market of community-based search engines. Their beta proposal named as Swicki learns the search behaviour of communities by enabling community users the promotion or exclusion of web sites and pages and focusing search on domain-specific sub-webs.

     

    NOTE: Formula (5) page 174 has a small errata: Relevance (pj,pi) should be Relevance(pj,qT)

    Sphere: Related Content

    TAGS:[ , , , , , , ]

    Related posts

    Print This Post Print This Post

    There Are No BibTex Entries For This Post

    Hits for this Post:7345 | Posted in WWWPosted by VirgoBrain | No Comments »


    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
    Loading ... Loading ...
    No Comments Add Comments



    Oct
    20th

    Speculating about the Semantic Web: Where is the Data?

    Files under Semantic Web, WWW | Leave a Comment | 20 Friday October 2006
    Where Is the Data: Current Estimates
    Having characterised the current situation of the WWW we are interested in knowing how much information and particularly information is available on the WWW as of October 2006. Since the publication of one of the very first works by Tim Berners-Lee explaining and mentioning the [20] (circa 2001), not much have been achieved in the WWW. As of the end of 2006 the Web is still characterised by the predominance of HTML and its more advanced dialects such as XHTML.

    Sampling Google: the versus the
    We might be able to arrive to some simple conclusions by sampling the Web, querying one of the most popular and biggest search engines available on the WWW. As of September of 2005 it was claimed on the WWW that Google indexed roughly 24 billion entries in its index [21], achieving 65% more results for a random query than its nearest rival, which was Yahoo at the time being [22]. We believe these figures might be distorted and we rather consider as more realistic the study size proposed by [15] as indicated in (9):

    WWW Size Jan 2005 * 1.14 increase = 13.11 billion of pages as of Jan 2006 (as stated in previous post Speculating About Internet/WWW Demographics and Size)

    Sampling HTML resources
    It seams that the Google Search Engine does not allow the definition of plain wildcard expression such as the following in order to find the number of resources indexes pertaining to a given file type (as far it is indicated on the Google search help pages):

    filetype:

    A way to overcome this impossibility is to include a negative query, or a query that retrieves the entries that do not contain a given string. By entering a strange string combination we are able to simulate the wildcard query. The following queries gives us a rough estimate of the number of entries as compared to plain html. The as of 10th October 2006 06:15 GMT+01 Madrid gives us the following responses:

    Sampling HTML Data in the

    Q1:html filetype:html returns 7,510,000,000 entries or 7.51 billion entries We might consider this query as an upper bound query. We are following the same procedure as in [23]

    Q2: Negative Query, give the number of html files that do not contain the “impossible” string combination -ñ^^ñ^^ñ^^ filetype:html returns 2,500,000,000 entries or 2.5 billion entries

    Please note we believe that the Google Search Engine here, does not interpret the "^" as raise to a power. We take this query as a possible approximation of the number of HTML entries in the .

    Sampling triples

    Q3: filetype: returns 35,600,000 entries or 35.6 million entries

    We believe in this case Google returns a super-optimistic upper-bound query, that was taken into account in [23]. We conclude that the estimates returned by Google by using the filetype operator are not consistent. Therefore given the lack of information from Google Web pages [25] and the lack of a consistent interpretation we have to acknowledge that these are merely speculative estimates.

    Q4: : filetype: returns only 34,300 entries or 34.4 k entries

    This query represents the number of entries with the qualified namespace declarations. We may consider this query as a very low bound query.

    Q5:Negative Query, give the number of files that do not contain the "impossible" string combination. (i.e. The qualified name reversed) -drf:DRF filetype: 1,780,000 entries or 1.78 million entries

    We take this query as the most possibly realistic scenario of file-based repositories on the WWW.

    As we can see the files represent a minimal (1,78 million / 2,5 billion) 0.0712 % of the html entries. Given that Google covers around 76.2% of the total size of the Web [15], and given our previous estimate of the Web according to World population connectivity as of January 2006 (9):

    WWW Size Jan 2005 * 1.14 increase = 13.11 billion of pages as of Jan 2006 (as stated in previous post Speculating About Internet/WWW Demographics and Size)

    The possible index size of Google as of January 2006 would be: 13.11 billion pages x 0.762 = 9.98982 billion pages ˜ 10 billion pages (10)

    The percentage of data represented mainly by files in Google, would be possibly be the following:

    1,780,000 entries / 9.98982 billion entries Google = 0.0178 % for the most realistic scenario (11) 35,600,000 entries / 9.98982 billion entries Google = 0.356 % for the most optimisitic scenario (12)

    We acknowledge that all these estimates are some-how speculative. Nevertheless we are capable of discerning that the current presence of the on the WWW is minimal or quasi non-existent as compared to html and sources.

    For instance the following query:

    Q6: filetype: returns 480,000,000 entries or 480 million

    Which even when considering both upper bound queries in Q3 and Q6, the data available on the WWW represents only a fraction of 7.4% of data available on the WWW. In other words, as of October of 2006, there might be around 13.5 times more data on the WWW than .

    References
     [15] Gulli, A. and Signorini, A. 2005, "The indexable web is more than 11.5 billion pages", In Special interest Tracks and Posters of the 14th international Conference on World Wide Web (Chiba, Japan, May 10 - 14, 2005). WWW ‘05. Press, New York, NY, 902-903. DOI= http://doi.acm.org/10.1145/1062745.1062789

     [20] T. Berners-Lee, J. Hendler, and O. Lassila, "The ", Scientific American, vol. 284, no. 5, 2001, pp. 34—43 Available at: http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21

     [21] Tristan Louis Blog, "Google has 24 billion items index, considers MSN search nearest competitor", September 27, 2005 Available at http://tnl.net/blog/2005/09/27/google-has-24-billion-items-index-considers-msn-search-nearest-competitor/

     [22] Matthew Cheney and Mike Perry, "A Comparison of the Size of the Yahoo! and Google Indices", 13 October 2006 Available at http://vburton.ncsa.uiuc.edu/indexsize.html

     [23] University of Maryland Baltimore County (UMBC) eBiquity Group Blog, "How many documents are on the Web?", 13 October 2006 Available at http://ebiquity.umbc.edu/blogger/how-many-semantic-web-documents-are-on-the-web/

     [25] Google Guide Web Site, "Using Search Operators (Advanced Operators)", 13 October 2006 Available at http://www.googleguide.com/advanced_operators.html

    Sphere: Related Content

    TAGS:[ , , , , , , ]

    Related posts

    Print This Post Print This Post

    There Are No BibTex Entries For This Post

    Hits for this Post:8930 | Posted in Semantic Web, WWWPosted by VirgoBrain | No Comments »


    1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
    Loading ... Loading ...
    No Comments Add Comments



    Oct
    18th

    Mare Magnum or finding a needle in a haystack

    Files under WWW | Leave a Comment | 18 Wednesday October 2006
    According to Technorati as of July 2005 there were over 14.2 million weblogs. They also stated that the number of double each 5 months ago. By applying simple maths we roughly calculate that nowadays as of October 2006 there hypothetically might be around (2 to the power of (15 months / 5 months)) * 14.2 = 113.6 million web . Somebody else has forecasted that the number of worldwide will exceed 150 million.
    So what is the probability that somebody else will read our web ? I suppose this will depend on your page rank for instance, how trendy is your weblog, the social networks that you belong to and so on. I assume this could be formalised by some fantastic equation taking into account some of these premises:
     
    "We live in a Mare Magnum where people are trying to find a needle in a haystack!" 

    To be honest the purpose of this web blog is just to record the thoughts of the day, the news of the world that influenced on my own consciousness and so on. I will re-read some time in the future what I recorded in the past. We have to acknowledge we are complex and evolving living creatures. In other words I do not represent the same human being I was 10 years ago as the environment and own circumstances have influenced on my own consciousness, my software.  

     

    Cheers, Carlos